DATA MINING - IlmuKomputer.Com

voltaireblingData Management

Nov 20, 2013 (3 years and 8 months ago)

65 views

DATA MINING

Association Rule Discovery

AR Definition


aka Affinity Grouping


Common example:


Discovery of which items are frequently sold
together at a supermarket. If this is known,
decisions can be made about:


Arranging items on shelves


Which items should be promoted together


Which items should not simultanously be
discounted

2

AR Definition
-
2
-


3

AR Definition
-
3
-


Confidence Factor: the degree to which the rule is true across
individual records


Confidence Factor = the number of transactions supporting the rule
divided by the number of transactions supporting the rule body only


The Confidence Factor in the above example is 70%


Support Factor: the relative occurrence of the detected rules
within the overall data set of transactions


Support Factor = the number of transactions supporting the rule
divided by the total number of transactions


The Support Factor in the above example is thus 13.5%


The minimum thresholds for both factors can be set by users
or domain experts

4

AR Usefulness


Some rules are useful:


unknown, unexpected and indicative of some
action to take.


Some rules are trivial:


known by anyone familiar with the business.


Some rules are inexplicable:


seem to have no explanation and do not suggest a
course of action.

5

AR Example : Co
-
occurrence Table

Customer

Items


1


orange juice (OJ), cola


2


milk, orange juice, window cleaner


3


orange juice, detergent


4


orange juice, detergent, cola


5


window cleaner, cola





OJ

Cleaner

Milk

Cola

Detergent


OJ


4

1

1

2

2


Cleaner

1

2

1

1

0


Milk


1

1

1

0

0


Cola


2

1

0

3

1


Detergent

2

0

0

1

2

6

AR Discovery Process


A co
-
occurrence cube would show associations in 3D


it is hard to visualise more dimensions than that


Worse, the number of cells in a co
-
occurrence


hypercube grows exponentially with the number


of items:


It rapidly becomes impossible to store the required number of cells


Smart algorithms are thus needed for finding frequent large itemsets


We would like to:


Choose the right set of items


Generate rules by deciphering the counts in the co
-
occurrence matrix
(for two
-
item rules)


Overcome the practical limits imposed by many items in large
numbers of transactions

7

Choosing the Right Item Set


Choosing the right level of detail (the creation of
classes and a taxonomy)


For example, we might look for associations between
product categories, rather than at the finest
-
grain level of
product detail, e.g.


“Corn Chips” and “Salsa”, rather than


“Doritos Nacho Cheese Corn Chips (250g)” and “Masterfoods Mild Salsa
(300g)”


Important associations can be missed if we look at the
wrong level of detail


Virtual items may be added to take advantage of
information that goes beyond the taxonomy


8

AR: Rules


Note:




if

(nappies
and

Thursday)
then

beer


is usually better than (in the sense that it is more
actionable)




if
Thursday
then

nappies
and

beer


because it has just one item in the result. If a 3
-
way
combination is the most common, then perhaps consider
rules with just 1 item in the consequent, e.g.





if (A and B) then C





if (A and C) then B

9

if

condition

then

result

Discovering Large Itemsets


The term “frequent item set S” means “a set S that
appears in at least fraction s of the baskets,” where s
is some chosen constant, typically 0.01 (i.e. 1%).


DM datasets are usually too large to fit in main
memory. When evaluating the running time of AR
discovery algorithms we:


count the number of passes through the data


Since the principal cost is often the time it takes to read data
from disk, the number of times we need to read each datum is
often the best measure of running time of the algorithm.

10

Discovering Large Itemsets
-
2
-


There is a key principle, called monotonicity or the a
-
priori algorithm that helps us find frequent itemsets
[AgS1994]:


If a set of items S is frequent (i.e., appears in at least
fraction s of the baskets), then every subset of S is also
frequent.


To find frequent itemsets, we can:


Proceed level
-
wise, finding first the frequent items (sets of
size 1), then the frequent pairs, the frequent triples, etc. ¾


Level
-
wise algorithms use one pass per level.


Find all maximal frequent itemsets (i.e., sets S such that no
proper superset of S is frequent) in one (or few) passes

11

The Apriori Algorithm


The A
-
priori algorithm proceeds level
-
wise.


Given support threshold s, in the first pass we find the
items that appear in at least fraction s of the baskets. This
set is called L1, the frequent 1
-
itemsets


(Presumably there is enough main memory to count
occurrences of each item, since a typical store sells no
more than 100,000 different items.)


Pairs of items in L1 become the candidate pairs C2 for the
second pass. The pairs in C2 whose count reaches s become
L2, the frequent 2
-
itemsets.


(We hope that the number of C2 is not so large that there is
not enough memory for an integer count per candidate
pair)

12

The Apriori Algorithm
-
2
-


The candidate triples, C3 are those sets {X, Y, Z} such that all
of {X, Y}, {X, Z} and {Y, Z} are in L2. On the third pass, count
the occurrences of triples in C3; those with a count of at
least s are the frequent triples, L3.


Proceed as far as you like (or until the sets become empty).
Li is the frequent sets of size i; C(i+1) is the set of sets of
size i + 1 such that each subset of size i is in Li.


The pruning using the Apriori property:


All nonempty subsets of a frequent itemset must also be
frequent.


This helps because it means that the number of sets which must
be considered at each level is much smaller than it otherwise
would be.

13

Generating Association Rules from
Frequent Itemsets


Once the frequent itemsets from transactions in a
database D have been found, it is straightforward
to generate strong associations rules from them


Where strong association rules satisfy both minimum
support and minimum confidence


Step 1: For each frequent itemset L, generate all
nonempty subsets of L


Step 2: For each nonempty subset U of L, output
the rule:





14

Generating Association Rules from
Frequent Itemsets

Example 1
-


Suppose we have the following transactional data
from a store=







Suppose that the data contain the frequent
itemset L = {I1, I2, I5}. What are the association
rules that can be generated from L?

15

Generating Association Rules from
Frequent Itemsets

Example 2
-


The nonempty subsets of L are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.


The resulting association rules are thus:










Suppose the minimum confidence threshold is 70%. Hence, only
the second, third and last rules above are output


Since these are the only ones generated that are strong

16

Limitation of Minimum Support


Discontinuity in ‘interestingness’ function


Feast or famine


minimum support is a crude control mechanism


often results in too few or too many associations


Cannot handle dense data


Cannot prune search space using constraints on


relationship between antecedent and consequent


egconfidence


Minimum support may not be relevant


cannot be sufficiently low to capture all valid rules


cannot be sufficiently high to exclude all spurious rules

17

Roles of Constraint


Select most relevant patterns


patterns that are likely to be interesting


Control the number of patterns that the user
must consider


Make computation feasible

18


19

AR: Is the Rule a Useful Predictor?


Confidence Factor is the ratio of the number
of transactions with all the items in the rule to
the number of transactions with just the items
in the condition (rule body). Consider:





if B and C then A


If this rule has a confidence of 0.33, it means
that when B and C occur in a transaction,
there is a 33% chance that A also occurs.

20

AR: Is the Rule a Useful Predictor?
-
2
-


Consider the following table of probabilities of
items and their combinations:

21

AR: Is the Rule a Useful Predictor?
-
3
-


Now consider the following rules:






It is tempting to choose “If B and C then A”,
because it is most confident(33%)


but there is
a problem

22

AR: Is the Rule a Useful Predictor?
-
4
-

23


A measure called lift indicates whether the
rule predicts the result better than just
assuming the result in the first place


AR: Is the Rule a Useful Predictor?
-
5
-


When lift > 1, the rule is better at predicting the
result than random chance


The lift measure is based on whether or not the
probability P(condition& result) is higher than it
would be if condition and result were statistically
independent


If there is no statistical dependence between
condition and result, improvement = 1.


Because in this case:


P(condition & result) = P(condition)
×

P(result)

24

AR: Is the Rule a Useful Predictor?
-
6
-


Consider the lift for our rules:


Rule


support

confidence

lift


if A and B then C

0.05


0.20


0.50


if A and C then B

0.05


0.25


0.59


if B and C then A

0.05


0.33


0.74


if A then B


0.25


0.59


1.31


None of the rules with three items shows any lift
-

the
best rule in the data actually has only two items: “if A
then B”. A predicts the occurrence of B 1.31 times better
than chance.

25

AR: Is the Rule a Useful Predictor?
-
7
-

26


When lift < 1, negating the result produces a
better rule. For example




if B and C thennot A


has a confidence of 0.67 and thus an lift of
0.67/0.55 = 1.22


Negated rules may not be as useful as the
original association rules when it comes to
acting on the results