1. If the correlation coefficient between the variables is 0, it means that the two variables aren’t related. – TRUE 2. In a simple regression analysis the error terms are assumed to be independent and normally distributed with zero mean and constant variance. – TRUE 3. The difference between the actual Y-value and the predicted Y-value found using a regression equation is called the residual (ε) – TRUE 4. In a multiple regression analysis with N observations and k independent variables, the degrees of freedom for the residual error is given by (N-k). – FALSE (correct answer N-k-1) 5. From the following scatter plot, we can say that between y & x there is _______. – Negative correlation

6. According to the graph, X & Y have ________. – Virtually no correlation

7. A cost accountant is developing a regression model to predict the total cost of producing a batch of printed circuit boards as a function of batch size (the number of boards produced in one lot or batch.) The explanatory variable is called the _______. – Coefficient of determination 8. In the regression equation, y = 75.65 + 0.50x, the intercept is ______. – 75.65 9. The assumptions underlying simple regression analysis include ______. – The error terms are independent 10. The proportion of variability of the dependent variable accounted for or explained by the independent variable is called the _______. – coefficient of determination 11. A manager wishes to predict the annual cost(y) of an automobile based on the number of miles(x) driven. The following model was developed, y = 1550 + 0.36x -- If a car is driven 30,000 miles, the predicted cost is _______. – 12,350 12. A cost accountant is developing a regression model to predict the total cost as a linear function of batch size (the number of boards produced in ne lot or batch) and production shift (day and evening). The dependent variable in this model is ________. –total cost 13. The...

...RegressionAnalysis (Tom’s Used Mustangs)
Irving Campus
GM 533: Applied Managerial Statistics
04/19/2012
Memo
To:
From:
Date: April 19st, 2012
Re: Statistic Analysis on price settings
Various hypothesis tests were compared as well as several multipleregressions in order to identify the factors that would manipulate the selling price of Ford Mustangs. The data being used contains observations on 35 used Mustangs and 10 different characteristics.
The test hypothesis that price is dependent on whether the car is convertible is superior to the other hypothesis tests conducted. The analysis performed showed that the test hypothesis with the smallest P-value was favorable, convertible cars had the smallest P-value.
The data that is used in this regressionanalysis to find the proper equation model for the relationship between price, age and mileage is from the Bryant/Smith Case 7 Tom’s Used Mustangs. As described in the case, the used car sales are determined largely by Tom’s gut feeling to determine his asking prices.
The most effective hypothesis test that exhibits a relationship with the mean price is if the car is convertible. The RegressionAnalysis is conducted to see if there is any relationship between the price and mileage, color, owner and age and GT. After running several models with different...

...RegressionAnalysis: A Complete Example
This section works out an example that includes all the topics we have discussed so far in this chapter.
A complete example of regressionanalysis.
PhotoDisc, Inc./Getty Images
A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums.
Driving Experience (years) Monthly Auto Insurance Premium
5 2 12 9 15 6 25 16
$64 87 50 71 44 56 42 60
a. Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or a negative relationship between these two variables? b. Compute SSxx, SSyy, and SSxy. c. Find the least squares regression line by choosing appropriate dependent and independent variables based on your answer in part a. d. Interpret the meaning of the values of a and b calculated in part c. e. Plot the scatter diagram and the regression line. f. Calculate r and r2 and explain what they mean. g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience. h. Compute the standard deviation of errors. i. Construct a 90% confidence interval for B. j. Test at the 5% significance level whether B is negative. k. Using α = .05, test whether ρ is different from zero.
Solution a. Based...

...Quick Stab Collection Agency: A RegressionAnalysis
Gerald P. Ifurung
04/11/2011
Keller School of Management
Executive Summary
Every portfolio has a set of delinquent customers who do not make their payments on time.
The financial institution has to undertake collection activities on these customers to recover the
amounts due. A lot of collection resources are wasted on customers who are difficult or
impossible to recover. Predictive analytics can help optimize the allocation of collection
resources by identifying the most effective collection agencies, contact strategies, legal actions
and other strategies to each customer, thus significantly increasing recovery at the same time
reducing collection costs. A random sample of accounts closed out during the month of January through June will be used in determining if the size of the bill has an effect on the number of days the bill is late. The statistical analysis of the data involves the application of regressionanalysis. Based on the calculated value of correlation coefficient, there is no relationship between the size of the bill and the number of days to collect.
.
Introduction
The author was hired by the Quick Stab Collection Agency (QSCA) on a contractual basis to assist the company in auditing potential business in buying the rights to collect debts from its original owners. QSCA is a collection...

...referring to the recent boom in house prices in many developed countries following a sharp bust in 2008. Researches and policy makers alike have realized that housing has significant influences on the business cycle. This paper tries to figure out the determinants of the selling price of houses in Oregon. The data set used in this paper has been retrieved from the case study titled “Housing Price” (Case #27 - Practical Data Analysis: Case Studies in Business Statistics- Marlene A. Smith & Peter G. Bryant)
The most important factor in determining the selling prices ofhouses is to know the features that drive the selling prices of the house. People tend to have more interest in houses with multiple bed rooms/bathrooms, fireplace, garage for multiple cars and location while choosing a house. So, a house that meets this requirement tends to be priced more and the house with these features being absent is priced low. According to the survey conducted by Marlene A. Smith & Peter G. Bryant while forming their case study titled “Housing Price” (Case #27 - Practical Data Analysis: Case Studies in Business Statistics), 10 variables were selected to find out their impact in determining the housing price. A sample of 108 houses wasselected from East Ville, Oregon along with their characteristics on 10 selected selected variables. The variable set for the study is:
Selling Price of House
Area
No of Bed rooms
No of Bath rooms...

...Multipleregression: OLS method
(Mostly from Maddala)
The Ordinary Least Squares method of estimation can easily be extended to models involving two or more explanatory variables, though the algebra becomes progressively more complex. In fact, when dealing with the general regression problem with a large number of variables, we use matrix algebra, but that is beyond the scope of this course.
We illustrate the case of two explanatory variables, X1 and X2, with Y the dependant variable. We therefore have a model
Yi = α + 1X1i + 2X2i + ui
Where ui~N(0,σ2).
We look for estimators so as to minimise the sum of squared errors,
S =
Differentiating, and setting the partial differentials to zero we get
=0 (1)
=0 (2)
=0 (3)
These three equations are called the “normal equations”. They can be simplified as follows: Equation (1) can be written as
or
(4)
Where the bar over Y, X1 and X2 indicates sample mean. Equation (3) can be written as
Substituting in the value of from (4), we get
(5)
A similar equation results from (3) and (4). We can simplify this equation using the following notation. Let us define:
Equation (5) can then be written
S1Y = (6)
Similarly, equation (3) becomes
S2Y = (7)
We can solve these two equations to get:
and
Where =S11S22 – S122. We may therefore obtain from equation (4).
We can...

...0905-section2.QX5
7/12/04
4:10 PM
Page 140
13 MultipleregressionMultipleregression
In this chapter I will briefly outline how to use SPSS for Windows to run multipleregression analyses. This is a very simplified outline. It is important that you do
more reading on multipleregression before using it in your own research. A good
reference is Chapter 5 in Tabachanick and Fiddell (2001), which covers the
underlying theory, the different types of multipleregression analyses and the
assumptions that you need to check.
Multipleregression is not just one technique but a family of techniques that
can be used to explore the relationship between one continuous dependent variable
and a number of independent variables or predictors (usually continuous). Multipleregression is based on correlation (covered in Chapter 11), but allows a more
sophisticated exploration of the interrelationship among a set of variables. This
makes it ideal for the investigation of more complex real-life, rather than
laboratory-based, research questions. However, you cannot just throw variables
into a multipleregression and hope that, magically, answers will appear. You
should have a sound theoretical or conceptual reason for the analysis and, in
particular, the...

...
Mortality Rates
RegressionAnalysis of Multiple Variables
Neil Bhatt
993569302
Sta 108 P. Burman
11 total pages
The question being posed in this experiment is to understand whether or not pollution has an impact on the mortality rate. Taking data from 60 cities (n=60) where the responsive variable Y = mortality rate per population of 100,000, whose variables include Education, Percent of the population that is nonwhite, percent of population that is deemed poor, the precipitation, the amount sulfur dioxide, and amount of nitrogen dioxide.
Data:
60 Standard Metropolitan Statistical Area (SMSA) in the United States, obtained for the years 1959-1961. [Source: GC McDonald and JS Ayers, “Some applications of the ‘Chernoff Faces’: a technique for graphically representing multivariate data”, in Graphical Representation of Multivariate Data, Academic Press, 1978.
Taking the data, we can construct a matrix plot of the data in order to take a visible look at whether a correlation seems to exist or not prior to calculations.
Data Distribution:
Scatter Plot Matrix
As one can observe there seems to be a cluster of data situated on what appears to be a correlation of relationship between Y=Mortality rate and X= potential variables influencing Y.
From this we construct a correlation matrix in order to see a relationship in matrix...