Data preprocessing is a data mining technique that involves transforming raw data into a readable and an understandable format. The real-world data is often incomplete, inconsistent, and sometimes lacking in certain behaviors or trends, and is likely to contain many errors.
Data preprocessing includes cleaning, Instance selection, normalization, transformation, feature extraction and selection, etc. After the preprocessing of data, the data may be more valuable, it may be more informative. It may also reduce the computational load.
Data cleaning is the process of preparing data for analysis by removing or modifying data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted.
The other data preprocessing techniques are
- Aggregation
- Sampling
- Dimensional Reduction
- Feature subset selection
- Feature extraction
- Discretization
- Attribute transformation