Apriori Algorithm in Data Mining: Examples and Implementation

We use data science to uncover various patterns in data. These patterns are then utilized for purposes such as product recommendations for online shoppers and complementary product placement for offline shoppers in supermarkets. These techniques aim to boost sales and increase revenue.

But how do algorithms determine which items to recommend and which to place together for optimal sales? This is where Apriori algorithm comes into play. We use this algorithm to find product associations and determine the most effective product recommendations and placement strategies.

Key takeaways from this blog

After going through this blog, you will learn following things:

  • What is association rule mining?
  • What is Apriori Algorithm?
  • What are the various steps involved in Apriori Algorithm?
  • An example of the Apriori Algorithm.
  • What are the limitations of the Apriori algorithm?

Supermarkets associate products smartly such that users buy multiple items simultaneously. These types of association of products are determined by association rule mining, and Apriori is a type of this.

What is association rule mining?

Association rule mining is a method for discovering relationships among items. In a supermarket, for instance, there is often a pattern of items being purchased together. For example, bachelors might buy beer and chips together, and mosquito repelling liquid, and its corresponding machine are placed together so that customers will buy both.

  • Supermarkets use association rule mining to identify these patterns and increase profits. This can be a complex task when there are thousands of items, as there may be hundreds of thousands of relationships to analyze.
  • The mining method is used to simplify the process of identifying the most commonly purchased item combinations. These combinations are called "Frequent Itemsets". So the goal of association rule mining is to find these frequent item sets. Once these item sets are identified, supermarkets may offer a discount on the combined purchases to lure customers.

Several techniques are present today to find this association rule, and apriori is one of them, which we will discuss in detail.

What is Apriori Algorithm?

Supermarkets contain thousands of products, and they record data for every transaction that happens. The goal is to find the product association rule by looking into the transaction data. Most association rule mining algorithms form all possible combinations/associations among products and then loop through each transaction to find the best association rules. 

For example, if a Supermarket has four products, namely A, B, C, and D, then possible combinations would be: {A, B}, {A, C}, {A, D}, {B, C}, {B,D}, {C, D}, {A, B, C}, {B, C, D}, {A, C, D}, {A,B,D}, {A,B,C,D}.

Apriori algorithm example

General algorithms consider all these combinations and then try to find their occurrence in every transaction. Or, they loop over all the transactions and then again loop over all possible combinations to count the number of occurrences in every transaction. This becomes highly inefficient when the number of products is in the thousands.

Apriori Algorithm provides an efficient approach for this task. It loops over the same data multiple times and extracts the best-associated products. But during its first loop, it simply counts the occurrences of various products to find the large 1-itemsets. 1-itemset for the four products (A, B, C, D) will be: {A}, {B}, {C}, {D}.

If the number of occurrences of these products in transactions is less than some threshold, the apriori algorithm will discard that product from all the combinations. This saves a lot of time and makes the algorithm faster. Let's see the step-wise working of this algorithm in detail.

Step-wise working of Apriori Algorithm

Let's create hypothetical data where each transaction from the supermarket is maintained separately.

T1: Bread Butter 
T2: bread Cheese 
T3: Bread Cheese Butter
T5: T4: Milk Butter 
T6: Milk Biscuits 
T7: Potato chips Beer 
T8: egg flour 
T9: Beer Soft drinks 
T10: Chocolates Toffees 
T11: Chocolates Waffles 
T12: Bread Butter Potato Chips
T13: Bread Butter Milk
...
...

Defining Key Metrics For Apriori Algorithm

There can be thousands of possible associations, and hence there is a need for metrics to say which associations are better. The Apriori algorithm calculates three parameters support, confidence, and lift. If these values for an association are greater than pre-defined thresholds, that association can be selected. Let's learn these thresholds in great detail.

Support: 

It refers to the popularity of any item, which is nothing but counting several transactions containing that item. For example, Support of A means how often item A got purchased in all the transactions. To keep this number easily representable, we divide the count by the total items bought. For example, if bread, butter, and bread + butter are bought in 200, 150, and 100 transactions, respectively, in a day when 1000 transactions happened. Then support for these associations will be:

Support(A) = Transactions containing item A/ Transactions containing total items

Support(Bread) = Total bread bought/ Total items sold in supermarket
               = 200 / 1000 = 0.2

Support(Butter) = 150/1000 = 0.15

Support (Bread + Butter ) = 100/1000 = 0.1

Confidence

Confidence refers to the probability that item B will be bought if it is given that the user has purchased item A. It is the same as the conditional probability P(B/A) = P(A and B)/P(A) and denoted by confidence (A → B). For example, if A and B both are bought in 100 transactions and A is bought in 200 transactions, then confidence (A → B) would be:

Confidence(A --> B) = Transaction containing both A and B / Transaction only containing A

Confidence (Bread --> Butter) = Transaction having both bread and butter / transaction only containing bread
                              = 100/200 = 0.5

Lift

Even applying thresholds for Support and Confidence, we can have thousands of association rules to be filtered, which is still huge. That's why we need Lift(A → B), which refers to any association rule's strength. It is calculated by dividing confidence(A → B) by Support of B, or Support of (A, B) divided by Support(A)*Support(B). Support(A) and Support(B) in the denominator talks about the independent occurrence of A and B in transactions. If the denominator of lift is high, it means the purchase is happening because of randomness, and there is no such association rule.

Lift(A --> B) = Confidence(A --> B) / Support(B)
Lift(Bread --> Butter) = Confidence(Bread --> Butter) / Support(Butter)
                       = 0.5/0.15 = 33.33

Lift gives us an idea about which association rules we should consider ahead. If the lift value is 1, then products are independent of each other, and that rule can be discarded. 

Step 1: Computing Support for individual items

Here, we will calculate the support of every item in our dataset.

        ITEMS                 |           Frequency Count
-------------------------------------------------------------------
       Butter                 |                 5
       Milk                   |                 3
       Potato Chips           |                 2
       Beer                   |                 2
       Flour                  |                 1
       Egg                    |                 1
       Chocolates             |                 2
       Waffles                |                 1
       Toffees                |                 1
       Cheese                 |                 2
       Bread                  |                 5

Step 2: Deciding Support Threshold and selecting frequent items.

The next step is to decide a threshold value below which items will be discarded in forming association rules. Let's take the threshold value as 3. The items that are left will be:

Bread 5
Butter 5
Milk 3

All other items, like cheese and eggs having support values less than the threshold, are ignored.

Step 3: Forming itemsets with a combination of products

Only those items will be considered to form combinations not discarded in step 2: butter, bread, and Milk. Possible combinations would be:

     
     Item 1        |     Item 2     |    Frequency Count
--------------------------------------------------------------
     Bread         |     Butter     |           4
     Bread         |     Milk       |           1
     Butter        |     Milk       |           2      

Please note that a combination like bread and cheese is not considered as cheese is an item that does not have support above the decided threshold. Also, here only combination above the threshold is bread and butter, which has support above 3.

Step4: Finding Frequent Combinations of Items for larger sets

Now we will build larger combinations with three items, including bread and butter (the only combination of two items we had above the threshold). There are two possible combinations:

  • Bread, Butter, Potato Chips: We will eliminate this rule as Potato Chips were eliminated in step 2.
  • Bread, Butter, Milk: We will eliminate this rule as the (butter, Milk) combination has a support of 2, which is less than the threshold and got eliminated in the previous step.

Hence, all the possible combinations are eliminated here.

Step5: Finding Other metrics

We also need to find other metrics mentioned before, lift and confidence. These additional matrices will allow us to choose the suitable Association Rules. We must rely on more than one metric in finalizing the rules. For example, suppose a particular rule has a higher lift value but lower confidence than another; we will have to figure out which rule to choose based on additional criteria. Rules passing the most metrics will be finalized.

Too much theory; for now, let's see the complete implementation in Python. 

Apriori Algorithm Using Python 

Installation of efficient apriori package using pip

There is a suitable package for using the Apriori algorithm in Python, which can be installed using this command:

pip install efficient-apriori

Data Preparation

The data needs to be prepared for applying the apriori algorithm. We pass a list containing each transaction as a tuple of items. Let's take the same dataset:

transactions= [
    ("Bread","Butter"),
("bread","Cheese"),
("Bread","Cheese","Butter"),
("Milk","Butter"),
("Milk" , "biscuits"),
("Potato chips","Beer"),
("flour","egg"),
("Beer","Soft drinks"),
("Chocolates","Toffees"),
("Chocolates","Waffles"),
("Bread","Butter","Potato Chips"),
("Bread","Butter","Milk")
 ]

Setting up threshold values

The next step would be to set up parameters in the algorithm. The two main parameters required are:

  • min_support: This is the support threshold below which the association rule will be discarded.
  • min_confidence: This helps us filter out rules that do not meet particular confidence. Let's keep it 0 here to see all rules.

The Python code for the algorithm is mentioned below

# our min support is 3 which has to be expressed as a percentage while using efficient-apriori
min_support = 3/len(transactions)
# min confidence allows you to delete rules with low confidence. Here we set it 0 for now
min_confidence = 0
itemsets, rules = apriori(transactions, min_support=min_support, min_confidence=min_confidence)

Outputs

the output of the algorithm can be seen using the set of commands.

print(itemsets)

## Output
{1: {('Bread',): 4, ('Butter',): 5, ('Milk',): 3}, 2: {('Bread', 'Butter'): 4}}

Please note that the above output has two keys, 1 and 2. These 1 and 2 represent the number of items considered at a time to get support. The results match our expectation as key 1 has only three values ([bread, butter, Milk]). These values are the same that we selected after keeping a minimum threshold of 3.
For key 2, there is only one item, as we saw before, which passes our successive criteria (Bread, butter)

Let us inspect the rules and their metrics.

for rule in rules:
  print(rule)
  
## Output
{Butter} -> {Bread} (conf: 0.800, supp: 0.333, lift: 2.400, conv: 3.333)
{Bread} -> {Butter} (conf: 1.000, supp: 0.333, lift: 2.400, conv: 583333333.333)

Let's understand one rule in plain English.

{Butter} -> {Bread} (conf: 0.800, supp: 0.333, lift: 2.400, conv: 3.333)

It means that the rule (Butter → Bread) has a confidence of 0.8 and a lift of 3.3333. These high values tell us that customers are more likely to buy them together, so they should be kept together in the supermarket. One can observe an additional Conv parameter in the output representing the conviction in the rule. The greater the conviction, the higher the interest in the association rule.

Advantages and Limitations of the Apriori algorithm

Advantages of the Apriori Algorithm

  • This is the most simple and easy-to-understand algorithm among association rule learning algorithms.
  • The resulting rules are intuitive and easy to communicate to an end user.
  • As it is unsupervised, it does not require labeled data, making it easy to use in many cases.

Limitations of the Apriori algorithm

  • The main limitation is the costly wasting of time to hold many candidates sets with frequent itemsets, low minimum support, or large itemsets.
  • A large amount of data needs to be stored in memory for processing, so large transaction items require much more resources.

Possible Interview Questions on Apriori Algorithm

  • What is the basic principle of the Apriori algorithm?
  • What are the three matrices used in the apriori algorithm?
  • What is the application of association rule mining?
  • What are the limitations of the Apriori algorithm?

Conclusion

We have discovered a foundational association rule mining algorithm and how we can apply it to transactions in Python. This was the most basic algorithm with its advantages and disadvantages. For more advanced results, we can use other algorithms available under association rule mining, including AIS, SETM, and variations of basic Apriori.

References

  1. Official paper of Apriori Algorithm: Fast Algorithms for Mining Association Rules by Srikant and Agarwal. 

Credits: Thanks to Harshit Agarwal for creating the first version of this content

Enjoy Learning! Enjoy Algorithms!

Share Your Insights

☆ 16-week live DSA course
☆ 16-week live ML course
☆ 10-week live DSA course

More from EnjoyAlgorithms

Self-paced Courses and Blogs

Coding Interview

Machine Learning

System Design

Our Newsletter

Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.