Today

Summarizing categorical variables

Exploring the relationship between categorical variables

- contingency table, proportions, conditional proportions, marginal proportions

Ch 2, Sec 1-2, pages 15-29

Summarizing Categorical Variables: Blood Pressure (Exercise

2.37*)

2

A company held a blood pressure screening clinic for its employees. Data below is partial dataset for company employees. Create an appropriate display for blood pressure data among the employees.

Blood pressure

Low

Low

Normal

High

High

Low

Age

Under 30

30-49

30-49

Under 30

Over 50

Under 30

1

3

Blood Pressure Among Company Employees

Blood pressure among employees:

Blood pressure Frequency

High

147

Low

95

Normal

232

Relative Frequency

0.31

0.20

0.49

Distribution of a variable

• Graph or frequency table describes a distribution

• A distribution tells us the possible values of a variable as well as the occurrence of those values (frequency or relative frequency). • Distributions are important when exploring the relationship between variables.

4

2

Exploring the Relationship Between Two Categorical Variables: Blood Pressure (Exercise 2.37)

5

A company held a blood pressure screening clinic for its employees. Data below is partial dataset for company employees. Summarize the results in a table by age group and blood pressure level.

Blood pressure

Low

Low

Normal

High

High

Low

Age

Under 30

30-49

30-49

Under 30

Over 50

Under 30

6

Summarizing Two Categorical Variables

Blood Pressure and Age among Company Employees

Blood

Pressure

Age

Under 30

30-49

Over 50

Total

Low

27

37

31

95

Normal

48

91

93

232

High

23

51

73

147

Total

98

179

197

474

Does there appear to be an association/relationship between age and blood pressure?

3

7

Relationship Between Age and Blood Pressure

8

4

Analyzing Association Between Categorical Variables

9

Is there an association between caffeine consumption and miscarriages in pregnant women? 2008 U.S. study of 1063 pregnant women. Women asked to track caffeine consumption during pregnancy. Pregnancy outcome recorded. rows:

Contingency

columns:

Table

cells:

Miscarriage

Caffeine(mg per day)

Yes

No

Total

0

33

231

264

0 to 200

97

538

635

200 or more

42

122

164

Total

172

891

1063

Marginal Proportions and Conditional Proportions

10

What proportion of women studied had miscarriage?

Among women who consumed 0 mg, what proportion had miscarriage? Among women who had miscarriage, what proportion consumed 0 mg? What proportion of women consumed 0 mg and had miscarriage? What proportion of women consume 0 mg?

Miscarriage

Caffeine(mg per day)

0

Yes

33

No

231

Total

264

0 to 200

97

538

635

200 or more

Total

42

172

122

891

164

1063

5

11

Conditional Distributions (Proportions)

Is there an association between caffeine consumption and miscarriages in pregnant women? Usually helpful to focus on conditional proportions (row percentages or column percentages)

What would you expect if no association between caffeine and miscarriage? Miscarriage Caffeine(mg per day)

Yes

No

Total

0

33(12.5%)

231(87.5%)

264 (100%)

0 to 200

97(15.3%)

538(84.7%)

635 (100%)

200 or more

42(25.6%)

122(74.4%)

164 (100%)

Total

172 (16.2%)

891 (83.8%)

1063 (100%)

Comparing Miscarriages Among Caffeine Groups

12

Chart of Miscarriage within each Caffeine Category

90

80

70

Percent

60

50

40

30

20

10

0

Caffeine

Yes

No

0

Yes

No

0-200

Yes

No

200 or more

Percent within levels of Caffeine.

6

Next Time

13

Exploring the relationship between categorical variables

- contingency table, proportions, conditional proportions, marginal proportions, Simpson’s paradox

Ch 2, Sec 2, pages 18-29

7