Data quality is a measure of the condition of data based on the following factors
- Accuracy
- Uniqueness
- Consistency
- Reliability
- Availability
- Usability
- Sufficiency
The data is generally considered high quality if it is fit for its intended uses for decision making and strategy planning.
There can be several type of quality problems in data collection.
The most common problems are
Noise and Outliers
The noise is a kind of distortion on the actual data. The outliers are considerably different than most of the other data available in the data set.
Missing Values
The missing values are incomplete data in the data set.
Duplicate
The duplicate data are repetitive data in the data set.