A graphical analysis for market basket data.

Author:Xu, Yaquan

    Market basket analysis is an important data mining technique in retail market to discover attractive relationships among the massive amounts of sales transactions. This technique is based upon the theory that if customers buy a certain group of items, they are more likely to buy another group of items. The items purchased by a consumer from a number of product categories on a single shopping trip is referred to as a market basket [1]. Retailers analyze it to understand customers' purchasing behaviors, develop discount and promotion plans, design appropriate store layout, and maximize the profit. In this paper, we construct a market graph based on a market basket data and study the structural properties of the market graph over the transactions. This paper is organized as follows. We begin by giving a brief literature review in Section 2. Then we discuss the features of our methods and study the structural properties of the market graph in section 3. The discussion and findings of the experimental study are summarized in Section 4.


    The common way to analyze market basket data is to search for meaningful association rules based on support and confidence for an itemset. Itemset is referred to the set of items a customer purchased in a basket. The support for an itemset is defined as the fraction of transactions which contain all items. An itemset is called frequent if its support exceeds a given threshold. The confidence for the rule is the fraction of the transaction containing X that also contain Y.[2] We use Guidici's [3] example to illustrate the concepts of support and confidence. Guidici [3] developed the following two-way contingency table for ice cream and coke in the transactions:

    Table 1. TWO-WAY CONTINGENCY TABLE FOR ICE CREAM AND COKE Coke Yes No Total Ice Yes 170 599 769 Cream No 4,779 41,179 45,958 Total 4,949 41,778 46,727 Support is one of the four joint probabilities. Calculation of support for the rule "If ice cream, then coke" is

    Support (ice cream [right arrow] coke)= iceCream [intersection] coke/dataset 170/46,727 =0.0036

    meaning that only 0.36% of the transactions have both ice cream and coke in the same basket. The support of an association rule is symmetric; the support for the rule "if coke, then ice cream" is the same.

    Confidence for a rule is a conditional probability. Calculation of confidences for the rules "If ice cream, then coke" and "If coke, then ice cream" are as the follows.

    Confidence (ice cream [right arrow ] coke) = 170/769 =0.22

    Confidence (coke [right arrow] ice cream) = 170/4,949 =0.034

    The purpose of association rule mining is to find all association rules that have support and confidence values greater than or equal to the user-specified support and confidence threshold respectively. The strong relationship between items of the transactions is described by means of rules of X [right arrow] Y, where X and Y are sets of items and have no items in common.

    The problem of mining meaningful association rules can be decomposed into two important steps [4, 5]:

    * Find frequent itemsets for a given support threshold.

    * Construct rules that exceed the confidence threshold from the frequent itemsets.


To continue reading