Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large volumes of data. Data mining is the process of automatically discovering useful information in large data repositories. Data mining is an integral part of knowledge discovery in databases (KDD), which is the overall process of converting raw data into useful information.
Data mining techniques can be used to support a wide range of business intelligence applications such as customer profiling, targeted marketing, workflow management, store layout, and fraud detection.
Looking up individual records using a database management system or finding particular Web pages via a query to an Internet search engine are not data mining, but information retrieval tasks. Data mining techniques have been used to enhance information retrieval systems.
The data mining process consists of a series of transformation steps, from data preprocessing to postprocessing of data mining results.
Data Preprocessing
The purpose of preprocessing is to transform the raw input data into an appropriate format for subsequent analysis.
- Feature selection
- Dimensionality reduction
- Normalization
- Data Sub setting
Data Postprocessing
- Filtering patterns
- Visualization
- Pattern interpretation
Practical difficulties encountered by traditional data analysis techniques are
- Scalability
- High Dimensionality
- Heterogeneous & Complex Data
- Data Ownership & Distribution
- Non traditional analysis