# Statistical Analysis

Homework 1

Reminders:

1. Due date: Jan-14-2012 (Saturday) in class.

2. Please submit only the hardcopy.

3. Please show the names and ID numbers of all your group members on the cover page. Please also indicate your session (DSME5110W).

1.

Problem 2.1 (p. 33)

The file P02_01.xlsx indicates the gender and nationality of the MBA incoming class in two successive years at the Kelley School of Business at Indiana University. a. For each year, create tables of counts of gender and of nationality. Then create column charts of these counts. Do they indicate any noticeable change in the composition of the two classes? b. Repeat part a for nationality, but recode this variable so that all nationalities that have counts of 1 or 2 are classified as Other.

2.

Problem 2.5 (p. 33)

The file DJIA Monthly Close.xlsx contains monthly values of the Dow Jones Industrial Average from 1950 through 2009. It also contains the percentage changes from month to month. (This file will be used for an example later in this chapter.) Create a new column for recoding the percentage changes into six categories: Large negative (< -3%), Medium negative (< -1%, ≥ -3%), Small negative (< 0%, ≥ -1%), Small positive (< 1%, ≥ 0%), Medium positive (< 3%, ≥ 1%), and Large positive (≥ 3%). Then create a column chart of the counts of this categorical variable. Comment on its shape.

3.

Problem 2.6 (p. 55)

The file P02_06.xlsx lists the average time (in minutes) it takes citizens of 379 metropolitan areas to travel to work and back home each day.

a. Create a histogram of the daily commute times.

b. Find the most representative average daily commute time across this distribution. c. Find a useful measure of the variability of these average commute times around the mean. d. The empirical rule for standard deviation indicates that approximately 95% of these average travel times will fall between which two values? For this particular data set, is this empirical rule at least approximately correct?

4.

Problem 2.9 (p. 55)

The file P02_09.xlsx lists the times required to service 200 consecutive customers at a (fictional) fast-foods restaurant.

a. Create a histogram of the customer service times. How would you characterize the distribution of service times?

b. Calculate the mean, median, and first and third quartiles of this distribution. c. Which measure of central tendency, the mean or the median, is more appropriate in describing this distribution? Explain your reasoning.

d. Find and interpret the variance and standard deviation of these service times. e. Are the empirical rules for standard deviations applicable for these service times? If note, explain why. Can you tell whether they apply, or at least make an educated guess, by looking at the shape of the histogram? Why?

5.

Problem 2.14 (p. 56)

Recall that the file Supermarket Transactions.xlsx contains over 14,000 transactions made by supermarket customers over a period of approximately two years. Using these data, create a box plot to characterize the distribution of revenues earned from the given transactions. Is this distribution

essentially symmetric or skewed? What if you restrict the box plot to transactions in the food product family?

(Hint: StatTools will not let you define a second data set that is a subset of an existing data set. But you can copy data for the second question to a second worksheet.) 6.

Problem 2.17 (p. 56)

The file P02_17.xlsx contains salaries of 200 recent graduates from a (fictional) MBA program. a. What salary level is most indicative of those earned by students graduating from this MBA program this year?

b. Do the empirical rules for standard deviations apply to these data? Can you tell, or at least make an educated guess, by looking at the shape of the histogram? Why? c. If the empirical rules apply here, between which two numbers can you be about 68% sure that the salary of any one of these 200 students will...

Please join StudyMode to read the full document