Professor Anita Wasilewska Lecture Notes

The Apriori Algorithm: Basics

The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules. Key Concepts : • Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith-Itemset). • Apriori Property: Any subset of frequent itemset must be frequent. • Join Operation: To find Lk , a set of candidate k-itemsets is generated by joining Lk-1 with itself.

The Apriori Algorithm in a Nutshell

• Find the frequent itemsets: the sets of items that have minimum support

– A subset of a frequent itemset must also be a frequent itemset • i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset – Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) • Use the frequent itemsets to generate association rules.

The Apriori Algorithm : Pseudo code

• • Join Step: Ck is generated by joining Lk-1with itself Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

• Pseudo-code:

Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=∅; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do

Lk+1 = candidates in Ck+1 with min_support end return ∪k Lk;

increment the count of all candidates in Ck+1 that are contained in t

The Apriori Algorithm: Example

TID T100 T100 T100 T100 T100 T100 T100 T100 T100 List of Items I1, I2, I5 I2, I4 I2, I3 I1, I2, I4 I1, I3 I2, I3 I1, I3 I1, I2 ,I3, I5 I1, I2, I3

• •

• •

•

Consider a database, D , consisting of 9 transactions. Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % ) Let minimum confidence required is 70%. We have to first find out the frequent itemset using Apriori algorithm. Then, Association rules will be generated using min. support & min. confidence.

Step 1: Generating 1-itemset Frequent Pattern

Itemset Scan D for count of each candidate Sup.Count Compare candidate support count with minimum support count Itemset Sup.Count

{I1} {I2} {I3} {I4} {I5}

6 7 6 2 2

{I1} {I2} {I3} {I4} {I5}

6 7 6 2 2

C1

L1

• The set of frequent 1-itemsets, L1 , consists of the candidate 1itemsets satisfying minimum support. • In the first iteration of the algorithm, each item is a member of the set

of candidate.

Step 2: Generating 2-itemset Frequent Pattern

Generate C2 candidates from L1

Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5} Scan D for count of each candidate

Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5}

Sup. Count 4 4 1 2 4 2 2 0 1 0

Compare candidate support count with minimum support count

Itemset {I1, I2} {I1, I3} {I1, I5} {I2, I3} {I2, I4} {I2, I5}

Sup Count 4 4 2 4 2 2

L2

C2

C2

Step 2: Generating 2-itemset Frequent Pattern

• To discover the set of frequent 2-itemsets, L2 , the algorithm uses L1 Join L1 to generate a candidate set of 2-itemsets, C2. • Next, the transactions in D are scanned and the support count for each candidate itemset in C2 is accumulated (as shown in the middle table). • The set of frequent 2-itemsets, L2 , is then determined, consisting of those candidate 2-itemsets in C2 having minimum support. • Note: We haven’t used Apriori Property yet.

Step 3: Generating 3-itemset Frequent Pattern

Scan D for count of each candidate

Itemset {I1, I2, I3} {I1, I2, I5}

Scan D for count of each candidate

Itemset {I1, I2, I3} {I1, I2, I5}

Sup. Count 2 2

Compare candidate support count with min support count

Itemset {I1, I2, I3} {I1, I2, I5}

Sup Count 2 2

C3

C3

L3

• The generation of the set of candidate 3-itemsets, C3 , involves use of the Apriori Property. • In order to find C3, we compute L2 Join L2. • C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5},...