# DATA MINING - IlmuKomputer.Com

Διαχείριση Δεδομένων

20 Νοε 2013 (πριν από 4 χρόνια και 5 μήνες)

85 εμφανίσεις

DATA MINING

Association Rule Discovery

AR Definition

aka Affinity Grouping

Common example:

Discovery of which items are frequently sold
together at a supermarket. If this is known,
decisions can be made about:

Arranging items on shelves

Which items should be promoted together

Which items should not simultanously be
discounted

2

AR Definition
-
2
-

3

AR Definition
-
3
-

Confidence Factor: the degree to which the rule is true across
individual records

Confidence Factor = the number of transactions supporting the rule
divided by the number of transactions supporting the rule body only

The Confidence Factor in the above example is 70%

Support Factor: the relative occurrence of the detected rules
within the overall data set of transactions

Support Factor = the number of transactions supporting the rule
divided by the total number of transactions

The Support Factor in the above example is thus 13.5%

The minimum thresholds for both factors can be set by users
or domain experts

4

AR Usefulness

Some rules are useful:

unknown, unexpected and indicative of some
action to take.

Some rules are trivial:

known by anyone familiar with the business.

Some rules are inexplicable:

seem to have no explanation and do not suggest a
course of action.

5

AR Example : Co
-
occurrence Table

Customer

Items

1

orange juice (OJ), cola

2

milk, orange juice, window cleaner

3

orange juice, detergent

4

orange juice, detergent, cola

5

window cleaner, cola

OJ

Cleaner

Milk

Cola

Detergent

OJ

4

1

1

2

2

Cleaner

1

2

1

1

0

Milk

1

1

1

0

0

Cola

2

1

0

3

1

Detergent

2

0

0

1

2

6

AR Discovery Process

A co
-
occurrence cube would show associations in 3D

it is hard to visualise more dimensions than that

Worse, the number of cells in a co
-
occurrence

hypercube grows exponentially with the number

of items:

It rapidly becomes impossible to store the required number of cells

Smart algorithms are thus needed for finding frequent large itemsets

We would like to:

Choose the right set of items

Generate rules by deciphering the counts in the co
-
occurrence matrix
(for two
-
item rules)

Overcome the practical limits imposed by many items in large
numbers of transactions

7

Choosing the Right Item Set

Choosing the right level of detail (the creation of
classes and a taxonomy)

For example, we might look for associations between
product categories, rather than at the finest
-
grain level of
product detail, e.g.

“Corn Chips” and “Salsa”, rather than

“Doritos Nacho Cheese Corn Chips (250g)” and “Masterfoods Mild Salsa
(300g)”

Important associations can be missed if we look at the
wrong level of detail

Virtual items may be added to take advantage of
information that goes beyond the taxonomy

8

AR: Rules

Note:

if

(nappies
and

Thursday)
then

beer

is usually better than (in the sense that it is more
actionable)

if
Thursday
then

nappies
and

beer

because it has just one item in the result. If a 3
-
way
combination is the most common, then perhaps consider
rules with just 1 item in the consequent, e.g.

if (A and B) then C

if (A and C) then B

9

if

condition

then

result

Discovering Large Itemsets

The term “frequent item set S” means “a set S that
appears in at least fraction s of the baskets,” where s
is some chosen constant, typically 0.01 (i.e. 1%).

DM datasets are usually too large to fit in main
memory. When evaluating the running time of AR
discovery algorithms we:

count the number of passes through the data

Since the principal cost is often the time it takes to read data
from disk, the number of times we need to read each datum is
often the best measure of running time of the algorithm.

10

Discovering Large Itemsets
-
2
-

There is a key principle, called monotonicity or the a
-
priori algorithm that helps us find frequent itemsets
[AgS1994]:

If a set of items S is frequent (i.e., appears in at least
fraction s of the baskets), then every subset of S is also
frequent.

To find frequent itemsets, we can:

Proceed level
-
wise, finding first the frequent items (sets of
size 1), then the frequent pairs, the frequent triples, etc. ¾

Level
-
wise algorithms use one pass per level.

Find all maximal frequent itemsets (i.e., sets S such that no
proper superset of S is frequent) in one (or few) passes

11

The Apriori Algorithm

The A
-
priori algorithm proceeds level
-
wise.

Given support threshold s, in the first pass we find the
items that appear in at least fraction s of the baskets. This
set is called L1, the frequent 1
-
itemsets

(Presumably there is enough main memory to count
occurrences of each item, since a typical store sells no
more than 100,000 different items.)

Pairs of items in L1 become the candidate pairs C2 for the
second pass. The pairs in C2 whose count reaches s become
L2, the frequent 2
-
itemsets.

(We hope that the number of C2 is not so large that there is
not enough memory for an integer count per candidate
pair)

12

The Apriori Algorithm
-
2
-

The candidate triples, C3 are those sets {X, Y, Z} such that all
of {X, Y}, {X, Z} and {Y, Z} are in L2. On the third pass, count
the occurrences of triples in C3; those with a count of at
least s are the frequent triples, L3.

Proceed as far as you like (or until the sets become empty).
Li is the frequent sets of size i; C(i+1) is the set of sets of
size i + 1 such that each subset of size i is in Li.

The pruning using the Apriori property:

All nonempty subsets of a frequent itemset must also be
frequent.

This helps because it means that the number of sets which must
be considered at each level is much smaller than it otherwise
would be.

13

Generating Association Rules from
Frequent Itemsets

Once the frequent itemsets from transactions in a
database D have been found, it is straightforward
to generate strong associations rules from them

Where strong association rules satisfy both minimum
support and minimum confidence

Step 1: For each frequent itemset L, generate all
nonempty subsets of L

Step 2: For each nonempty subset U of L, output
the rule:

14

Generating Association Rules from
Frequent Itemsets

Example 1
-

Suppose we have the following transactional data
from a store=

Suppose that the data contain the frequent
itemset L = {I1, I2, I5}. What are the association
rules that can be generated from L?

15

Generating Association Rules from
Frequent Itemsets

Example 2
-

The nonempty subsets of L are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.

The resulting association rules are thus:

Suppose the minimum confidence threshold is 70%. Hence, only
the second, third and last rules above are output

Since these are the only ones generated that are strong

16

Limitation of Minimum Support

Discontinuity in ‘interestingness’ function

Feast or famine

minimum support is a crude control mechanism

often results in too few or too many associations

Cannot handle dense data

Cannot prune search space using constraints on

relationship between antecedent and consequent

egconfidence

Minimum support may not be relevant

cannot be sufficiently low to capture all valid rules

cannot be sufficiently high to exclude all spurious rules

17

Roles of Constraint

Select most relevant patterns

patterns that are likely to be interesting

Control the number of patterns that the user
must consider

Make computation feasible

18

19

AR: Is the Rule a Useful Predictor?

Confidence Factor is the ratio of the number
of transactions with all the items in the rule to
the number of transactions with just the items
in the condition (rule body). Consider:

if B and C then A

If this rule has a confidence of 0.33, it means
that when B and C occur in a transaction,
there is a 33% chance that A also occurs.

20

AR: Is the Rule a Useful Predictor?
-
2
-

Consider the following table of probabilities of
items and their combinations:

21

AR: Is the Rule a Useful Predictor?
-
3
-

Now consider the following rules:

It is tempting to choose “If B and C then A”,
because it is most confident(33%)

but there is
a problem

22

AR: Is the Rule a Useful Predictor?
-
4
-

23

A measure called lift indicates whether the
rule predicts the result better than just
assuming the result in the first place

AR: Is the Rule a Useful Predictor?
-
5
-

When lift > 1, the rule is better at predicting the
result than random chance

The lift measure is based on whether or not the
probability P(condition& result) is higher than it
would be if condition and result were statistically
independent

If there is no statistical dependence between
condition and result, improvement = 1.

Because in this case:

P(condition & result) = P(condition)
×

P(result)

24

AR: Is the Rule a Useful Predictor?
-
6
-

Consider the lift for our rules:

Rule

support

confidence

lift

if A and B then C

0.05

0.20

0.50

if A and C then B

0.05

0.25

0.59

if B and C then A

0.05

0.33

0.74

if A then B

0.25

0.59

1.31

None of the rules with three items shows any lift
-

the
best rule in the data actually has only two items: “if A
then B”. A predicts the occurrence of B 1.31 times better
than chance.

25

AR: Is the Rule a Useful Predictor?
-
7
-

26

When lift < 1, negating the result produces a
better rule. For example

if B and C thennot A

has a confidence of 0.67 and thus an lift of
0.67/0.55 = 1.22

Negated rules may not be as useful as the
original association rules when it comes to
acting on the results