Market Basket Explained

Problem

Long ago, not so long ago, I remember having my head in my hands sitting in the couch staring at the pile of items my wife and I bought from the DollarStore.

The items we bought were clearly stacked and arranged like a retail outlet that it resembled a model shop of the DollarStore within 10 ft. x 10 ft. space!

Just like any sample store, where one item of every product is showcased for display purposes only.

I was wondering how we ended up buying this long list without any care in the world.

Sample this, a retail outlet has multiple product lines, and within a product line multiple products.

The marketing manager has an interesting problem of deciding the products that are to be stacked nearby.

If he/she intelligently arranges the products, then the sales is going to go through the roof.

In a retail outlet, typically customers come over and buy multiple items in a cart. A cart has multiple items, 1, 2, 5 or even 20 items. Some planned ones and some impulsive.

Typically marketing tags a terminology known as “Impulsive Purchase” by the customer, but if you think deeply you can understand it as “forced impulsive purchase”.

We will see how this works.

Data

Let us come to the data part used for this marketing gimmick. What the store manager has a list of transactions, each transaction has a list of items.

Imagine two tables.

  1. The transaction table has the transaction id, no. of unique items bought.
  2. The items table has the transaction id, each individual item bought in that transaction.

For example if a customer purchases 5 different items in a cart, the transaction table will have 1 record with a transaction id and item field as 5 and the items table will have 5 records with that transaction id and 5 different item ids.

Cutting edge Big Data Engineering Services at your Finger Tips

Read More

The marketing manager has a thought that if he/she can look at all the market baskets of different customers then he/she can find which items are bought together frequently.

So much for the wish for multiple eyes and instant computing power and endless memory to process all the baskets of customers.

Behold, his/her wish comes true with the big data processing and the analytics technique known as association rules aka Market Basket Analysis (MBA).

Marketing Manager is not sure which items are to be placed together but has a strange hypothesis that, will something like placing sanitary napkins and condoms next to each other prove stupid?

Formulation

Association rules finds which are the items that are bought together within a transaction.

To provide analytical evidence, each association rule has 3 metrics – support, confidence and lift

For simplicity we are trying to look into only 2 items – Napkins and Condoms. We need to know if there is any evidence that suggests that buying Napkins leads to buying Condoms.

Napkins => Condoms

Antecedent => Consequent

If so then he/she can place them next to each other. Weird sight, but works if there is association!

For e.g. the outlet has a monthly count of 1,000 transactions, of which Napkins were purchased in 120 transactions, condoms in 140 transactions, (napkins lower than condoms, that is strange but the world has moved on) and both together in 40 transactions.

In set theory it can be represented as Napkins only – 80, Condoms only – 100, Napkins and Condoms – 40Step 1 – One needs to check if this combination is really being bought.

That is, out of the universe of transactions, are the number of transactions with Napkins and Condoms together significant or negligible? 

Let us say, marketing manager feels that if more than 2% of transactions having this combination means there is an association.

The metric of the probability of finding the transactions with this items’ combination in the transactions universe is called support.

Support (antecedent (Napkins) and consequent (Condoms)) = # of transactions having both of them / # of total transactions

4% of all transactions have this combination bought together. So the support is greater than the threshold of 2%, he/she has the necessary support to prove the weird hypothesis!

Step 2 is to deep-dive into the transactions having Napkins AND Condoms. Are customers buying the consequent when the antecedent is bought, i.e. are they buying Condoms when Napkins is bought?

The metric of the probability of finding this items’ combination whenever antecedent is bought, is called confidence. Let us say she wants this items’ combination to occur atleast 25% times when Napkins is bought.

Confidence (antecedent i.e. Napkins and consequent i.e. Condoms) = P (Consequent (Condoms) is bought GIVEN antecedent (Napkins) is bought)

In probability terms, this is represented as P (consequent / antecedent)

So, the confidence is greater than threshold of 25%, also confidence is more that the hypothesis is true whenever napkins are bought!

Step 3 is to find out that, is the purchase of Condoms independent of Napkins’ purchase (or) is Condoms’ purchase happening due to Napkins’ purchase?

The metric of finding how much the purchase of antecedent influences the consequent is called lift

Lift = P (antecedent ∩ consequent) / {(P (antecedent) * P (consequent)}

Let us find out why lift exists. We want to know which is higher, P (Condoms) or P (Condoms/ Napkins).

If the Condoms purchase is influenced by Napkins purchase, then

The lift should be greater than 1 to say that, indeed the Condoms purchase is influenced by Napkins purchase rather than Condoms purchase being independent of Napkins.

So the lift is 2.38 which is greater than 1, manager is happy that Napkins purchase lifts the Condoms purchase by 2.38 times.

The Marketing manager can justify the weird hypothesis that he/she

  1. a) Has the support of 4% transactions for Napkins and Condoms in the same basket
  2. b) Has 33% confidence that Condoms sales happen whenever Napkins are purchased.
  3. c) Knows the lift in condoms’ sales is 2.4 times more, whenever Napkins are purchased than when Condoms are purchased alone.

Now Napkins and Condoms appearing next to each other may look strange, though the manager knows that the customer is going to “impulsively” purchase them together.

Only they know, they have tricked the customer in to a “forced impulsive” purchase.

This technique can be extended to many items’ set, say antecedent’s set of (a1, a2, a3…) and consequent’s set of {c1, c2, c3…}. Instead of going with a fixed hypothesis of Napkins and Condoms, we can by machine learning find out the antecedent and consequent pairs by fixing the no. of items in the antecedent and consequent set. Say, trying for all combinations of 3 antecedents and 1 consequent.

Market Basket can be applied to:

a) Cross-sell products – targeting the customer to sell the next best product

b) Recommendation engine – showing the related product as “customers also bought”

c) Detecting fraud – what is the related action whenever  a fraudulent transaction is committed

d) Articles read together – which are the articles readers follow up with, after reading an article

e) Arrangement of items – which associated items are to be placed closer, which teams can sit next to each other

Leverge your Biggest Asset Data

Inquire Now

When I think about the list of items my wife and I bought, I can only rationalize that the store has smartly arranged all the associated items together.

In fact, they would have kept soaps nearby fresheners, candles nearby soaps, wallpapers nearby candles, tapes nearby wallpapers, scissors nearby tapes, china bowl nearby scissors and on and on.

No surprises an MBA alumnus got fooled by Market Basket Analysis that he keeps the soap in a china bowl, good thing atleast we built a DollarStore model in 10 ft. x 10 ft. square!



Author: Magesh Rajaram
Magesh is a data science professional with close to a decade of experience in the Analytics and Retail domain. He has a masters in management from IIM Calcutta. He has been a self-starter throughout his career, solving problems in ambiguous situations.