Data Set - collection of data objects
Data objects can be of different types—quantitative or qualitative
Other names for a data object are record, point, vector, pattern, event, case, sample, observation, or entity.
Data objects are described by a number of attributes that capture the basic characteristics of an object.
Other names for an attribute are variable, characteristic, field, feature, or dimension.
An attribute is a property or characteristic of an object that may vary, either from one object to another or from one time to another.
A measurement scale is a rule (function) that associates a numerical or symbolic value with an attribute of an object.
The properties of an attribute need not be the same as the properties of the values used to measure it.
Four types of attributes:
Categorical (Qualitative)
Nominal
just different names
Eg. Zip Codes, Employee ID
Ordinal
ordering objects
Eg. Grades, Rating
Numeric (Quantitative)
Interval
difference between values
Eg. Dates, Temperatures
Ratio
both differences & ratio
Eg. Age, Length
Value of Attribute
Discrete
Discrete attributes are often represented using integer variables.
Binary attributes are a special case of discrete attributes and assume only two values (eg. yes/no, 0/1)
Continuous
Continuous attributes are typically represented as floating-point variables
Type of Datasets
- Record data
- Graphical data
- Ordered data
Characteristic of Datasets
Dimensionality
No of attributes that the objects in the dataset possess.
Sparsity
Fewer than 1% of the entries are non zero.
Resolution
obtain data at different levels of resolution, and often the properties of the data are different at different resolutions.
The difficulties associated with analyzing high-dimensional data are sometimes referred to as the curse of dimensionality.
Record Data
collection of records (data objects), each of which consists of a fixed set of data fields (attributes).
Types of Record data
- Transaction or Market Basket data
- Data Matrix
- Sparse Data Matrix
Graphical data
the graph captures relationships among data objects
the objects contain sub objects that have relationships, then such objects are frequently represented as graphs.
Ordered data
the attributes have relationships that involve order in time or space
Types of Ordered data
Sequential data (Temporal data)
record data associated with time
Sequence data
data set that is a sequence of individual entities, such as a sequence of words or letters.
there are no time stamps; instead, there are positions in an ordered sequence.
Time Series data
sequential data in which each record is a time series
Spatial data
Some objects have spatial attributes, such as positions or areas, as well as other types of attributes.