CSE5334 Data Mining

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Lecture 15: Association Rule Mining (2) Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy of Vipin Kumar and Jiawei Han) Pattern Evaluation Pattern Evaluation Association rule algorithms tend to produce too many rules many of them are uninteresting or redundant Redundant if {A,B,C} {D} and {A,B} {D} have same support & confidence Interestingness measures can be used to prune/rank the derived patterns In the original formulation of association rules, support & confidence are the only measures used 3

There are lots of measures proposed in the literature Some measures are good for certain applications, but not for others What criteria should we use to determine whether a measure is good or bad? What about Aprioristyle support based pruning? How does it affect these measures? 4 Computing Interestingness Measure Given a rule X Y, information needed to compute rule interestingness can be obtained from a contingency table Contingency table for X Y Y

Y X f11 f10 f1+ X f01 f00 fo+ f+1 f+0 |T| f11: count of X and Y f10: count of X and Y f01: count of X and Y f00: count of X and Y Used to define various measures support, confidence, lift, Gini, J-measure, etc.

5 Drawback of Confidence Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Association Rule: Tea Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9

Although confidence is high, rule is misleading P(Coffee|Tea) = 0.9375 6 Statistical Independence Population of 1000 students 600 students know how to swim (S) 700 students know how to bike (B) 420 students know how to swim and bike (S,B) P(SB) = 420/1000 = 0.42 P(S) P(B) = 0.6 0.7 = 0.42 P(SB) = P(S) P(B) => Statistical independence P(SB) > P(S) P(B) => Positively correlated P(SB) < P(S) P(B) => Negatively correlated 7 Statistical-based Measures Measures that take into account statistical dependence

conf ( X Y ) P (Y | X ) Lift sup(Y ) P (Y ) Interest _ factor coefficient P( X , Y ) P ( X ) P (Y ) = 1, independent > 1, positively correlated < 1, negatively correlated = 0, independent P ( X , Y ) P ( X ) P (Y ) > 0, positively correlated P ( X )[1 P ( X )]P (Y )[1 P (Y )]< 0, negatively correlated 8 Example: Lift/Interest Coffee Coffee Tea 15 5

20 Tea 75 5 80 90 10 100 Association Rule: Tea Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9 Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated) 9 Drawback of Lift & Interest Y Y X 10

0 10 X 0 90 90 10 90 100 0.1 Lift 10 (0.1)(0.1) Y Y X 90 0

90 X 0 10 10 90 10 100 0.9 Lift 1.11 (0.9)(0.9) Statistical independence: If P(X,Y)=P(X)P(Y) => Lift = 1 10 Example: -Coefficient Coffee Coffee Tea 15 5

20 Tea 75 5 80 90 10 100 Association Rule: Tea Coffee 0.15 0.9 0.2 0.9 0.1 0.2 0.8 0.25(< 0, therefore is negatively correlated) 11 Drawback of -Coefficient Y Y X

60 10 70 X 10 20 30 70 30 100 0.6 0.7 0.7 0.7 0.3 0.7 0.3 0.5238 Y Y X 20 10

30 X 10 60 70 30 70 100 0.2 0.3 0.3 0.7 0.3 0.7 0.3 0.5238 Coefficient is the same for both tables 12

Recently Viewed Presentations

• E-mail; Must have X members at a meeting for a quorum ... The intent of LEAP Level 1 is to help self-evaluate, be sure LSC is in compliance with USA Swimming Rules and Regulations and the legal requirements of non-profit...
• The New York data are Manhattan only. Conduct the Kruskal-Wallis test to determine whether evidence exists that there are significant differences in the rents in these cities. If differences exit, where are they? 5-8 What If We Have More Than...
• TRAITEMENT 1)REDUCTION FRACTURE DEPLACEE REDUCTION STABILISATION STABILISATION ORTHOPEDIQUE Traitement orthopédique des fractures Les plâtres Diapositive 52 Fractures sans déplacement Diapositive 54 Botte plâtrée simple transformée en botte de marche quand le cal osseux est développé Réduction manuelle du déplacement puis...
• Apoptosis has been reported to be the major cause of cell death in CHO cells. A decrease in the viable cell density by apoptosis can easily reduce the final yield. Apoptosis can be induced by: Depletion of substrates, pH alteration,...
• IF the CM is absent for more than three days and with no imminent return in sight, the university expects school to provide an alternative class mentor, especially as this is the first placement for the PGCE students.
• PHP Success RatesDuPont RL, McLellan AT, White WL, Merlo LJ, Gold MS. Setting the standard for recovery: Physicians' Health Programs. J Subst Abuse Treat 2009; 36:159-171. McLellan et al., 2008 . Summary ("Take Home Points") Addiction is a brain disease.
• What methods exist to control somatic anxiety? 2.4 Controlling Anxiety. Controlling Stress . Controlling the pressure that sporting performance can bring. Psychological- Optimal state is relaxed concentration. Need to be alert and attentive without attentional narrowing.
• 4. Here is a simple way to contrast the relative size of prokaryotic and eukaryotic cells. Mitochondria and chloroplasts are thought to have evolved by endosymbiosis (see Chapter 16). Thus, mitochondria and chloroplasts are about the size of bacteria, contained...