Principal components analysis (PCA) also known as K-L method searches for 'k' n-dimensional orthogonal vectors that can best be used to represent the data, where k is less than or equal to n. The original data are thus projected onto a much smaller space, resulting in dimensionality reduction.
Basic procedure of PCA
The input data are normalized so that each attribute falls within the same range to ensure that the attributes with large domains will not dominate attributes with smaller domains.
PCA computes k orthonormal vectors that provide a basis for the normalized input data. These are unit vectors that each point in a direction perpendicular to the others. These vectors are referred to as the principal components. The input data are a linear combination of the principal components.
The principal components are sorted in order of decreasing significance or strength. As the components are sorted in decreasing order of significance, the data size can be reduced by eliminating the weaker components.
PCA can be applied to ordered and unordered attributes, and can handle sparse data and skewed data. Multidimensional data of more than two dimensions can be handled by reducing the problem to two dimensions. PCA tends to be better at handling sparse data, whereas wavelet transforms are more suitable for data of high dimensionality.