A decision tree is a flowchart-like tree structure. The topmost node in a tree is the root node. The each internal node (non-leaf node) denotes a test on an attribute and each branch represents an outcome of the test. The each leaf node (or terminal node) holds a class label.
Decision trees can handle multidimensional data.
Some of the decision tree algorithms are Iterative Dichotomiser (ID3), C4.5 (a successor of ID3), Classification and Regression Trees (CART). Most algorithms for decision tree induction follow a top-down approach.
The tree starts with a training set of tuples and their associated class labels.
The algorithm is called with data partition, attribute list, and attribute selection method, where the data partition is the complete set of training tuples and their associated class labels.
The splitting criterion is determined by attribute selection method which indicates the splitting attribute that may be splitting point or splitting subset.
Attribute selection measures are also known as splitting rules as they determine how the data at a given node are to be split.
The attribute selection measure provides a ranking for each attribute. The attribute having the best score for the measure is chosen as the splitting attribute for the given tuples.
The popular attribute selection measures are information gain, gain ratio and Gini index.