# Data Mining Chapter 11 Homework

Database and Data Mining, COS 514

Dr. Chi Shen

Homework No. 8, Chapter 13, Aklilu Shiketa

Q13. 3 Cosmetic Purchases

Consider the following Data on Cosmetics Purchases in Binary Matrix Form

a) Select several values in the matrix and explain their meaning. Value

Cell

Meaning

0

For example, Row 1, Column2

At transaction #1 bag was not purchased. (shows absence of Bag in the transaction) 1

Row 10, column (2 and 3)

“If a Bag is purchased, a Blush is also purchased at that same transaction.” (“If Bag, then Blush.”) While Bag is antecedent, Blush represents consequent. 1

Row 5, Column (3, 6, 8)

“If Blush and Concealer, then Bronzer. Item set {Blush, Concealer} = antecedent; { Bronzer} = consequent 1

Row 3, Column 1

If Blush and Concealer, then Eyebrow Pencils. While Eyebrow Pencil is associated with Blush and Concealer it is unassociated with the rest of the items. 12

Row 1 Column 1

Number of transactions.

13.3 b)

Consider the results of the association rules analysis shown in below.

I) For the first row, explain the “conf. %” output and how it is calculated. It includes the following interpretations:

The Confidence of rule # 2 is 60.19 %( Or it is marginally over 60 %.) Confidence shows the rate at which consequents will occur. In this case the consequents are Brushes and Concealer as the Rule goes “If Bronzer and Nail Polish, then Brushes and Concealer” In this we are telling how any times Brushes and Concealers appear in transactions that contain Bronzer and Nail Polish. It is calculated as follows.

Confidence = {transactions with antecedent and consequent items}/{transactions with antecedent items} According to the values in the matrix:

While support of all transactions with Brushes, Concealer, Bronzer, Nail Polish are 62 (support a U c) and support of the number of transactions that involved antecedents (Bronzer and Nail Polish) are 103. = Confidence = {transactions with antecedent and consequent items}/{transactions with antecedent items} = 62/103 = .6019417 = 60.19%

ii) For the first row, explain the “support (a)”, “support(c)” and “Support (a U C)” output and how it is calculated. Support is the percentage or number of occurrences of items in both antecedent and consequent item sets in a transaction.

In the case of the matrix :

Support ( a) is the number or the percentage of the occurrence of { Bronzer, Concealer, Brushes, Nail Polish}/ transaction = Support (a) = {Bronzes, Concealer, Brushes, Nail Polish}/ transaction

Support (c) = It is the number of occurrence of the item set in the consequent. = In this case it appears that Brushes, Concealers appeared in the consequent item 77 times. Support (a U c) = This is the support of the combined item set. Therefore it will be the Union of Support (a) = 103 and Support (c) =77, which is 62.

iii) For the first row explain the “Lift Ratio” and how it is calculated Lift Ratio is another way of testing or judging the strength of an association rule. It helps to know the effectiveness of the rule in finding the consequents. It is done by comparing the confidence of the rule with a benchmark confidence value. Benchmark confidence on the other hand is calculated in the following manner: Benchmark confidence = no. transactions with consequent item set/ no. transactions in database. Lift Ratio is the outcome of the comparison of Confidence to the Benchmark confidence. It is the confidence of the rule divided by the confidence, assuming independence of consequent from antecedent: Lift Ratio = confidence / benchmark confidence.

NB It is possible to calculate the value of the benchmark confidence, if need be, as we have at this stage the values for Lift Ratio and Confidence.

iv) For the first row, explain the rule that is represented there in words. Rule # 2 = :If items Bronzer, Nail Polish are purchased, this implies items...

Please join StudyMode to read the full document