# DSC2008 Tutorial 1 answered

**Topics:**Statistical hypothesis testing, Arithmetic mean, Normal distribution

**Pages:**6 (2384 words)

**Published:**March 8, 2015

This first tutorial is longer than usual, because it covers 2 weeks of lecture.

Since there are frequently no definitive answers to some parts of tutorial questions, please only take these files as containing suggested solutions. Some of you might well have different and better insights. In particular, your tutor may have different approaches to some questions.

Just as not all decisions in real life are correct, not all analytics have the last say. Indeed, please take all analyses as subjective, and don’t shy away from disagreeing with the instructor or the text. There will also be occasions when the instructor disagrees with the text. In the exams, there were times when opposite arguments earned equal marks, so long as cogently argued.

Please regard the above proviso as applicable for all future solution sketches. This is the university after all.

(1) The data shown below contains family incomes (in millions of wons) for a set of 50 families; sampled in 1980 and 1990. Assume that these families are good representatives of the entire Korea.

1980

1990

1980

1990

1980

1990

58

54

33

29

73

69

6

2

14

10

26

22

59

55

48

44

64

70

71

57

20

16

59

55

30

26

24

20

11

7

38

34

82

78

70

66

36

32

95

97

31

27

33

29

12

8

92

88

72

68

93

89

115

111

100

96

100

102

62

58

1

0

51

47

23

19

27

23

22

18

34

30

22

47

50

75

36

61

141

166

124

149

125

150

72

97

113

138

121

146

165

190

118

143

88

113

79

104

96

121

(a) Find the mean, median, standard deviation, first and third quartiles, and the 95th percentile for family incomes in both years. For 1980: mean: 62.7, median: 59, sd: 39.7, 1st quartile: 29.25, 3rd quartile: 93.5, 95th percentile: 132.2 For 1990: mean: 67.12, median: 57.5, sd: 48.09, 1st quartile: 57.5, 3rd quartile: 98.25, 95th percentile: 157.2

(b) It seems that the country was better off in 1990 than in 1980, because the average income increased. Do you agree? This is not necessarily true. While the average income has risen, this may simply meant that the higher earners are earning more than previously, with no certain indication that workers from the lower income bracket are earning more as well. Looking at the median that has decreased, it suggest the above. However, we can observe a higher value in the 1st, 3rd Quartile, as well as the 95th percentile, it may strongly suggest that overall income has risen, by quite a bit.

(c) Generate a boxplot to summarize the data (use the template BoxPlot.xls, or—for the really bold and longsuffering—see http://office.microsoft.com/en-us/excel-help/creating-a-box-plot-HA010278212.aspx, if wish to do from scratch!). What does the boxplot indicate?

(d) Generate histograms (Histogram.xls, or use Excel’s Data Analysis add-in) to summarize the data. What do the histograms indicate?

(2) Problem 14, (4e: p 382, 5e: p 331) of text on sampling distribution, using Tut1-Q2-P07_14.xlsx. Find some nice way to do this using Excel.

This question asks us to verify the Law of Large Numbers, for this very simple sampling situation.

Tut1-Q2-P07_14(answers).xlsx (look at histogram to see how fast CLT works). Using this template, it is also quite straightforward to simulate sampling 4 with replacement from a population of 6: Tut1-Q2-P07_14(answers2).xlsx.

This question is on the topic of theoretical sampling with replacement, which concerns exponentiation (e.g. 5^3). Sampling without replacement, the more practical procedure, is about permutation (e.g. 5*4*3).

Drawing 3 from a population of 5, 5*5*5 = 125, whereas 5P3 = PERMUT(5,3) = 5*4*3 = 60 only.

We are in fact able to verify LLN for this special case.

(3) Tut1-Q3-PivotTableExercise.xls. The text (Albright et al) has a large section on Excel Pivot Table (4e, longer: p114 to p144, 5e, shorter: p108 to 128).

Q4 & Q5 of the...

Please join StudyMode to read the full document