A set of items together is called an itemset. An itemset consists of two or more items. An itemset that occurs frequently is called a frequent itemset.
E.g., Bread and butter, Milk and Diaper
A set of items is called frequent if it satisfies a minimum threshold value for support and confidence. Support shows transactions with items purchased together in a single transaction. Confidence shows transactions where the items are purchased one after the other.
Only those transactions which meet minimum threshold support and confidence requirements shall be considered for frequent itemset mining method.
The frequent mining algorithm is an efficient algorithm to mine the hidden patterns of item sets within a short time and less memory consumption.
Association rules analysis is a technique to uncover how items are associated to each other. There are three common ways to measure association. Support, Confidence and Lift.
Market Basket Analysis is a popular application of Association Rules.
For example, Bread => butter [Support 5% , Confidence 75%]
The above statement refers that there is a 5% transaction in which bread and butter were bought together and there are 75% of customers who bought bread as well as butter.
Support and Confidence for Itemset A and B are represented by formulas
Support (A) = No of transaction in which A appears / Total no of transactions
Confidence (A -> B) = Support (A U B) / Support (A)
Apriori algorithm was the first algorithm that was proposed for frequent itemset mining. This algorithm uses two steps “join” and “prune” to reduce the search space. It is an iterative approach to discover the most frequent item sets.
If an itemset set has value less than minimum support then all of its supersets will also fall below min support, and thus can be ignored. This property is called the Antimonotone property.
The 'Join' step generates itemset from k item sets by joining each item with itself. The 'Prune' step scans the count of each item in the database. If the candidate item does not meet minimum support, then it is regarded as infrequent and removed.
To discover a frequent pattern of size 100, one need to generate 2100 candidates.
i.e, No of candidates to be generated for 'k' size shall be 2k.