Tutorial Questions for Exploratory Data Analysis – Summary Statistics & Graphs
Customers of a particular bank rated the service provided by the bank on a scale of one to ten, correct to one decimal point. The bank categorised their customers as either (1) Private Account holders or (2) Business Account holders. The information below summarises customer attitudes towards the quality of service provided by the bank. Use the output to answer the questions below.
a) Briefly describe and compare the distributions of results for the two groups of customers. You should mention appropriate measures of centre and spread, and any other points you feel may be of interest. b) The survey results were described as being symmetrically distributed. What does this mean, and what evidence is there below to support this claim? c) The standard deviation of the ratings for the Private Account holders is 1.336. What does this value mean? d) Verify the value of the standard error of the mean (SE Mean) for the Business Account holders. e) Determine a 90% confidence interval for the true mean of the Business Account holders and interpret the result in the context of the situation.
Descriptive Statistics: Quality
Variable Use N N* Mean SE Mean StDev Minimum Q1 Median Q3 Quality 1 45 0 5.971 0.199 1.336 3.700 4.800 6.000 7.050 2 30 0 8.323 0.172 0.941 6.200 7.675 8.400 9.025
Variable Use Maximum
Quality 1 8.400
Stem-and-Leaf Displays: Quality
a) Mean for Business group (8.3) is higher than that for Private group (5.97).
StDev for Private group (1.336) is larger than for Business group (0.941), i.e. there is more variation or less consistency in Private group.
No outliers are evident in either distribution and both seem to be quite symmetrical as evidenced by similarities in means and medians.
b) Symmetry implies similar patterns of variation either side of the median – top half is distributed in similar manner to bottom half.
This is evidenced by similarities in means and medians and also by appearance of boxplots – to lesser extent by stem & leaf plots.
c) The stdev measures the spread of the raw data values around the mean.
d) SEMean = 0.941((30 = 0.1718
90% CI = 8.323 ( 1.5(0.1718) = 8.323 ( 0.258 which gives 8.065 to 8.581 Accept t29 = 1.6991 and/or z = 1.64(5) for full marks – deduct 1 mark for any other values This means we can be 90% certain the real mean will lie between 8.065 to 8.581.
Students enrolled in a university statistics unit were given the choice of three tutorial methods to support the lectures in the unit. These were: on-line tutorials, self-paced tutorials and tutorials conducted by a tutor. At the end of the semester a random sample of students was selected from each of the three tutorial methods and their final results (%) compared. Use the output provided below to answer the following questions:
a) Which of the three methods gave the best results overall? Explain your reasoning. b) The results for the “tutor” group were described as being skewed. What does this mean, and what evidence is there below to support this claim? c) Which group has an inter-quartile range of 7.25? What does this value mean? d) Briefly comment on the differences between the three groups. e) Determine a 95% confidence interval for the true mean of the “self-paced” group, and interpret the result in the context of the situation.
Descriptive Statistics: Result
Variable Method N N* Mean SE Mean StDev Minimum Q1 Median Q3 Result On-Line 15 0 68.40 4.50 17.45 32.00 61.00 65.00 83.00 Self 25 0 55.76 1.01...
Please join StudyMode to read the full document