Skip to main content

k-means Partition Clustering

The simplest method of cluster analysis is partitioning which organizes the objects of a set into several exclusive groups. The partitioning algorithm organizes the objects into k partitions where each partition represents a cluster.

The well-known and commonly used partitioning methods are k-means and k-medoids. An objective function is used to assess the partitioning quality so that objects within a cluster are similar to one another but dissimilar to objects in other clusters. A centroid-based partitioning technique uses the centroid of a cluster, Ci , to represent that cluster. Conceptually, the centroid of a cluster is its center point.

The quality of cluster Ci can be measured by the within cluster variation which is the sum of squared error between all objects in Ci and the centroid ci, defined as





where 

E is the sum of the squared error for all objects in the data set 

p is the point in space representing a given object

ci is the centroid of cluster Ci

Optimizing the within-cluster variation is computationally challenging. The problem is NP-hard (non-deterministic polynomial-time hardness) for a general number of clusters k even in the 2-D Euclidean space.

If the number of clusters k and the dimensionality of the space d are fixed, the problem can be solved in time O(ndk+1 log n), where n is the number of objects.

The k-means algorithm defines the centroid of a cluster as the mean value of the points within the cluster. The process of iteratively reassigning objects to clusters to improve the partitioning is referred to as iterative relocation.

The time complexity of the k-means algorithm is O(nkt)

where n is the total number of objects

k is the number of clusters.

t is the number of iterations.

The k-means method can be applied only when the mean of a set of objects is defined.

The k-modes method is a variant of k-means, which extends the k-means paradigm to cluster nominal data by replacing the means of clusters with modes.

Popular posts from this blog

Gaussian Elimination - Row reduction Algorithm

 Gaussian elimination is a method for solving matrix equations of the form, Ax=b.  This method is also known as the row reduction algorithm. Back  Substitution Solving the last equation for the variable and then work backward into the first equation to solve it.  The fundamental idea is to add multiples of one equation to the others in order to eliminate a variable and to continue this process until only one variable is left. Pivot row The row that is used to perform elimination of a variable from other rows is called the pivot row. Example: Solving a linear equation The augmented matrix for the above equation shall be The equation shall be solved using back substitution. The eliminating the first variable (x1) in the first row (Pivot row) by carrying out the row operation. As the second row become zero, the row will be shifted to bottom by carrying out partial pivoting. Now, the second variable (x2)  shall be eliminated by carrying out the row operation again. ...

Exercise 2 - Amdahl's Law

A programmer has parallelized 99% of a program, but there is no value in increasing the problem size, i.e., the program will always be run with the same problem size regardless of the number of processors or cores used. What is the expected speedup on 20 processors? Solution As per Amdahl's law, the speedup,  N - No of processors = 20 f - % of parallel operation = 99% = 1 / (1 - 0.99) + (0.99 / 20) = 1 / 0.01 + (0.99 / 20) = 16.807 The expected speedup on 20 processors is 16.807

Minor, Cofactor, Determinant, Adjoint & Inverse of a Matrix

Consider a matrix Minor of a Matrix I n the above matrix A, the minor of first element a 11  shall be Cofactor The Cofactor C ij  of an element a ij shall be When the sum of row number and column number is even, then Cofactor shall be positive, and for odd, Cofactor shall be negative. The determinant of an n x n matrix can be defined as the sum of multiplication of the first row element and their respective cofactors. Example, For a 2 x 2 matrix Cofactor C 11 = m 11 = | a 22 | = a 22  = 2 Determinant The determinant of A is  |A| = (3 x 2) - (1 x 1) = 5 Adjoint or Adjucate The Adjoint matrix of A , adjA is the transpose of its cofactor matrix. Inverse Matrix A matrix should be square matrix to have an inverse matrix and also its determinant should not be zero. The multiplication of matrix and its inverse shall be Identity matrix. The square matrix has no inverse is called Singular. Inv A = adjA / |A|           [ adjoint A / determ...