Skip to main content

Posts

Showing posts from June, 2021

Row Space, Column Space & Null Space

Consider A mXn matrix, The row vectors, r of A, r 1 = (a 11 .... a 1n ), .................... , r m = (a m1 ......... a mn ) The column vectors, c of A, The subspace span { r 1  ..... r m  } of R n , where r 1 , ......., r m  are the rows of the matrix A, is called the row space of A. The subspace span { c 1 , ...... , c n }  of R m , where c 1, .....,  c n are the columns of  the matrix A, is called the column space of A. The solution space of the system of homogeneous linear equations Ax = 0 is called the nullspace  of A. To find a basis for the row space of A, the row echelon form of A to be derived, and consider the  non-zero rows that result from this reduction. These non-zero rows are linearly  independent. To find a basis for the column space of A, the row echelon form of A to be derived, and consider  the pivot columns that result from this reduction. These pivot columns are linearly  independent, and that any non-piv...

Rule based Classification

 A rule-based classifier uses a set of If - Then rules for classification. An If - Then rule is an expression of the form If condition Then conclusion. The If part is known as rule antecedent or precondition and Then part is known as rule consequent. The rule antecedent contains attribute test condition and rule consequent contains class prediction. If rule antecedents are satisfied, then the rule covers the data in the dataset. The coverage and accuracy of a rule is assessed by coverage = n covers /|D| accuracy = n correct /n covers where n covers - no of data covered by rule n correct - no of data classified correctly by rule D - no of data in the dataset. If a rule is satisfied by a data in the dataset, then the rule is said to be triggered . IF only one rule is triggered, then the rule classifies the data, which referred as rule firing. If more than one rule is triggered, then conflict resolution strategy to be applied to classify the data.  The size ordering and ru...

Decision Tree Scalability Methods

 The scalable decision tree induction methods are RainForest and BOAT. RainForest method maintains an AVC set for each attribute at each node. AVC stands for Attribute Value Classlabel. BOAT , stands for Bootstrapped Optimistic Algorithm for Tree construction, uses a statistical technique known as bootstrapping., by which several smaller subsets are created. The several trees are created using the subsets and finally the full tree is generated using the trees created by smaller subsets. BOAT was found to be two to three times faster than RainForest.

Decision Tree Pruning

Pruning is a technique in decision tree that reduces the size of trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruned trees tend to be smaller and less complex. They are  usually faster and better in classifying independent test data correctly. There are two approaches to carry out tree pruning. They are prepruning and postpruning . The prepruning apporoach is used at the early constrcution stage of decison tree. Hence further split of the tree is stopped at the given node, which become leaf node of the tree. This approach would overcome the overfitting issue. The postpruning approach is the one in which subtrees are removed from the fully grown tree. The cost complexity algorithm of CART and pessimistic pruning of C4.5 are example of the postpruning approach.

Decision Tree - Gini Index

The Gini index is used in CART. The Gini index measures the impurity of the data set, where p i - probability that data in the data set, D belong to class, C i  and pi = |C i,D |/|D| There are 2 v - 2 possible ways to form two partitions of the data set, D based on a binary split on a attribute. Each of the possible binary splits of the attribute is considered. The subset that gives the minimum Gini index is selected as the splitting subset for discrete valued attribute. The degree of Gini index varies between 0 and 1. The value 0 denotes that all elements belong to a certain class or if there exists only one class, and the value 1 denotes that the elements are randomly distributed across various classes. A Gini Index of 0.5 denotes equally distributed elements into some classes. The Gini index is biased toward multivalued attributes and has difficulty when the number of classes is large.

Orthogonal matrix

 A square matrix with n x n elements is said to be an orthogonal matrix, if its transpose is equal to its inverse matrix. The square matrix is also known as an orthogonal matrix when the product of a square matrix and its transpose is an identity matrix. A T = A -1 or A.A T = I All orthogonal matrix are square matrix but not all square matrix are orthogonal. Orthogonal matrix is a symmetric matrix. All identity matrix are orthogonal matrix. The product of two orthogonal matrix is also an orthogonal matrix. The inverse of the orthogonal matrix is also an orthogonal matrix. The determinant of an orthogonal matrix is equal to 1 or -1 The eigenvalues of the orthogonal matrix has a value of ±1, and its eigenvectors also be an orthogonal.

Amortized analysis

An amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. It guarantees the average performance of each operation in the worst case. The asymptotic analysis gives worst case analysis of each operation without taking the effect of one operation on  the other, whereas amortized analysis focuses on a sequence of operations, an interplay between operations, and thus yielding an analysis which is precise and depicts a micro-level analysis. Three most common techniques used are aggregate analysis, accounting method and potential method. The aggregate method, in which the total running time for a sequence of operations is analyzed. The accounting (or banker's) method, in which an extra charge would be imposed on inexpensive operations and use it to pay for expensive operations later on. The potential (or physicist's) method, in which a potential function would be derived characterizing...