Unit V - Homework Activities

Topic 23

Activity 23-4: Students' Credit (cont.)

(a)-(c) Answers will vary.

(d) Should agree

(e) Observational study

 

Activity 23-5: Age and Political Ideology (cont.)

(a) Let q <30 = proportion of all under 30 people who consider themselves liberal, let q >50=proportion of all over 50 people who consider themselves liberal.

<30 = 83/296 = .2804, >50 = 88/586 = .1502

H0: q <30 = q >50 (no differences in the two age groups)

Ha: q <30 ¹ q >50 (proportion of liberals differ in the two age groups)

(The sample sizes are large enough for the following procedure to be valid. We're assuming the samples were chosen independently.) c = (83+88)/(296+586) = .1939

z = (.2804 - .1502)/sqrt(.1939(1-.1939)(1/296+1/586)) = 4.62

p-value = 2 Pr(Z>|4.62|) = essentially zero.

This small p-value provides strong evidence that the proportion of liberals differs in the two age groups.

(b) 95% c.i. for q <30 - q >50: .2804 - .1502 ± 1.96 sqrt(.2804(1-.2804)/296 + .1502(1-.1502)/586) = .1302 ± .0588 = (.0714, .1890)

(c) We have evidence of a significant difference in the proportion who consider themselves liberal in the two age groups. In fact, we are 95% confident that q <30 - q >50 is between 7% and 19%, meaning there are between 7% and 19% more people in the under 30 group who consider themselves liberal.

 

Activity 23-6: BAP Study

(a) This is an observational study since the experimenter did not assign the patients to the BAP group and the control group. It is a case-control study as we examine those with and without the disease and then look back into their histories.

(b) (The sample sizes are large enough for the following to be valid, and we are assuming independently selected samples, e.g. male and non-male)
 
1 2 c Z p-value(2-sided)
Male .875 .8936 .8873 -.332 .74
White .8958 .9468 .9296 -1.123 .2614
Non-Hispanic .7919 .7979 .7958 -.087 .9309
AIDS .5000 .4681 .4789 .360 .7188
Own Cat .6667 .3936 .4859 3.080 .0021
Cat Scratch .6250 .3085 .4155 3.620 .0003
Cat Bite .4375 .1489 .2465 3.774 .0002
At the 5% level, we find a significant difference between the case patients and the control group for owning a cat, being scratched by a cat, and being bitten by a cat.

(c) No, since this was not an experiment. Also need to be cautious when running multiple significance tests on the same data set...

 

Activity 23-7: Baldness and Heart Disease (cont.)

(a) Heart Disease: Some or more = 247/663 = .3726

Control Group: 220/772 = .2850

(b) Let q heart = proportion of all heart attack patients with at least some baldness. Let q control = proportion of non-heart attack people with at least some baldness.

H0: q heart = q control (both groups have the same proportion of subjects with some baldness)

Ha: q heart ¹ q control (these proportions differ)

(The sample sizes are large enough for the following to be valid, and we are assuming independently selected samples.) C = (274+220)/(663+772) = .3254

z = (.3726 - .2850)/sqrt(.3254(1-.3254)(1/663+1/772)) = 3.53

p-value = 2Pr(Z>|3.53|) < 2(.0002) = .0004

There is strong evidence of a difference in the proportion of patients with some baldness between the heart attack and non heart attack patients.

(c) 97.5% c.i. for q heart - q control: .3726 - .2850 ± 2.24 sqrt(.3726(1-.3726)/663 + .2850(1-.2850)/772) = .0876 ± .0556 = (.032,.1432). We are 97.5% confident that a higher percentage of men (3% to 14% more) who have had heart attacks consider themselves as having at least some baldness.

(d) There appears to be an association, but we can not say the heart attacks caused the baldness since this was an observational study and not an experiment.

 

Activity 23-8: Sex on Television

(a) 1981: 6/47 = .1277; 1991: 44/615=.0715

(b) Let q 1981 = proportion of all 1981 sexual references which describe married sex. Let q 1991 = proportion of all 1991 sexual references which described married sex.

H0: q 1981 = q 1991 (same proportion both years)

Ha: q 1981 > q 1991 (Proportion decreased in 1991)

(The sample sizes are large enough for the following to be valid, and we are assuming independently selected samples.) C = (6+44)/(47+615) = .0755

z = (.1277 - .0715)/sqrt(.0755(1-.0755)(1/47+1/615)) = 1.41

p-value = Pr(Z>1.41) = .0793

At the 1% level, we do not have sufficient evidence to conclude a difference in the proportion of married couple references between the two years.

(c) 99% c.i. for q 1981 - q 1991: .1277 - .0715 ± 2.576 sqrt(.1277(1-.1277)/47 + .0715(1-.0715)/615) = .0562 ± .1282 = (-.072 , .1844).

(d) 95% c.i. for q : .0715 ± 1.96 sqrt(.0715(1-.0715)/615) = .0715 ± .0203 = (.0511, .0919)

 

Activity 23-9: Heart By-Pass Surgery

(a) Central: 125/3676 = .034; Southeast: 288/6313 = .0456; Western: 167/4906=.034.

The Southeast region had the highest death rate. The central and western regions had lower death rates.

(b) (The sample sizes are large enough for the following to be valid, and we are assuming independently selected samples.)
 
Central vs. SE:  z=-2.812 p-value=.0049
Central vs. Western:  z=-.009 p-value=.9928
SE vs. Western:  z=3.084 p-value=.002
There is a significant differences at the .10 level between the central and southeast regions, and between the western and southeast region.

(c) Condition of patients prior to operation.

 

Activity 23-10: Employment Discrimination

(a) Blacks: 26/48 = .5417; Whites: 206/256=.8047

(b) Let q Black = proportion of all black applicants who would pass the test. Let q White = proportion of all white applicants who would pass the test.

H0: q Black = q White (Blacks and Whites pass at the same rate)

HA: q Black < q White (The proportion of Blacks who pass the test is smaller than the proportion of whites)

(The sample sizes are large enough for the following to be valid, and we are assuming independently selected samples.)

C = (26+206)/(48+256) = .7632

z = (.5417-.8047)/sqrt(.7632(1-.7632)(1/48+1/256)) = -3.93

p-value = Pr(Z<-3.93) < .0002

We have strong evidence that the proportion of black applicants passing the tests is significantly lower than the proportion of white applicants passing the test (at the .05 and .01 level). That is, if they were passing at the same rate, we would see a difference this big by chance alone in less than .02% of samples from this population.

 

Activity 23-11: Campus Alcohol Habits (cont.)

1982: Fights after drinking: 502/4324 = .1161; Law due to drinking: 190/4324 = .0439

1991: Fights after drinking: 657/3820 = .1720; Law due to drinking: 290/3820 = .0759

These samples sizes are large enough to apply the test of two proportions and we're assuming the 1982 and 1991 samples were independently selected.

Let q 82F = proportion of a all college students who got into a fight after drinking in 1982. Let q 91F = proportion of all 1991 college students who got into a fight after drinking.

H0: q 82F = q 91F (no difference in likelihood of drunks getting into a fight between 1982 and 1991).

HA: q 82F ¹ q 91F (is a difference in the proportion getting into a fight in the two years)

C = (502+657)/(4324+3820) = .1423

z = (.1161 - .1720) /sqrt(.1423(1-.1423)(1/4324+1/3820)) = -7.21

p-value = 2 P(rZ>|-7.21|) = essentially zero.

95% c.i. for q 82F - q 92F: (-.071, -.041)

Let q 82L = proportion of a all college students who got into trouble with the law after drinking in 1982. Let q 91L = proportion of all 1991 college students who got into trouble with the law after drinking. H0: q 82L = q 91L (no difference in likelihood of drunks getting into trouble with law between 1982 and 1991).

HA: q 82L ¹ q 91L (is a difference in the proportion getting into law trouble in the two years)

C = (190+290)/(4324+3820) = .0589

z = (.0439 - .0759) / sqrt(.0589(1-.0589)(1/4324+1/3820)) = -5.12

p-value = 2 Pr(Z>|-5.12|) = essentially zero.

95% c.i. for q 82F - q 92F: (-.082, -.022)

There is very strong evidence of a difference between 1982 and 1991 in the likelihood of getting into a fight and the likelihood of getting into trouble with the law. Approximately 4% to 7% more students got into fights after drinking in 1991 vs. 1982 and about 2% to 8% more students got into trouble with the law. These problems have increased even though the proportion drinking has decreased.

 

Activity 23-12: Kids' Smoking

(a) Sample size

(b) n1=50, n2 =50 (4% of 50 = 2 daughters, 26% of 50 = 13 daughters)

C = (2+13)/100 = .15

z = (.04 - .26)/sqrt(.15(.85)(2/50)) = -3.08

p-value = 2Pr(Z>|-3.08|) = 2(.001) = .002

(c) n1=200, n2=50 (8 daughters of non-smokers, 13 daughters of smokers) C = (8+13)/250 = .084

z = (.04-.26)/sqrt(.084(1-.084)(1/200+1/50)) = -22.58

p-value essentially zero.

(d) n1=200, n2=200 (8 daughters of non-smokers and 52 daughters of smokers) C =(8+52)/400 = .15

z = (.04-.26)/sqrt(.15(1-.15)(1/200+1/200)) = -6.16

p-value essentially zero.

(e) Observational study

(f) No, since not an experiment, can't conclude causation, there are lots of other potential confounding factors.

 

Activity 23-13: Kids' Smoking (cont.)

(a) n1=60, n2=60 (9 sons of non smokers, 12 sons of smokers)

C = (9+12)/120 = .175

z = (.15-.20)/sqrt(.175(1-.175)(2/60)) = -.72

p-value = 2Pr(Z>|-.72|) = 2(.2358) = .4716

(b) n1=200, n2=200 (30 sons of non-smokers, 40 sons of smokers) C = (30+40)/400 = .175

z = (.15-.20)/sqrt(.175(1-.175)(2/200)) = -1.32

p-value = 2Pr(Z>|-1.32|) = 2(.0934) = .1868

(c) n1=500, n2=500 (75 sons of non-smokers, 100 sons of smokers) C = (75+100)/1000 = .175

z = (.15-.20)/sqrt(.175(1-.175)(2/500)) = -2.08

p-value = 2Pr(Z>|-2.08|) = 2(.0188) = .0376

(d) To be significant at the .05 level, need |z|>1.96 1.96 = (.15-.20)/sqrt(.175(1-.175)(2/n))

1.96sqrt(.175(1-.175)2)/(-.05) = sqrt(n)

n = 443.7 so need 444 in each group (concurs with (b) and (c), needed to be between 200 and 500)

 

Activity 23-14: Hypothetical Medical Recovery Rates (cont.)

(a) Let q new = proportion of all patients with the new treatment who recover. Let q Old = proportion of all patients with the old treatment who recover.

H0: q new = q Old (same recovery rate)

HA: q new > q Old (a higher proportion of patients with the new treatment recover)

C = (.867(50000) + .873(50000))/(50000+50000) = .87

z = (.867 - .873)/sqrt(.87(.13)(2/50000)) = -2.82

p-value = .0024

This result is significant at the .01 level.

(b) 99% c.i. for q new > q Old: .867-.873 ± 2.576 sqrt(.867(1-.867)/50000 + .873(1-.873)/50000) = -.006 ± .0055 = (-.0115, -.0005)

(c) While we have a statistically significant result (at the .01 level) indicating that a higher proportion of patients recover with the new treatment, the confidence interval tells us that the improvement is only a difference .05% to .06% of patients, not a very significant result in a practical sense.

 

Activity 23-15: Comparing Proportions of Personal Interest

Answers will vary.