LINEAR REGRESSION MODELS W4315
HOMEWORK 2 ANSWERS February 15, 2010

Instructor: Frank Wood 1. (20 points) In the ﬁle ”problem1.txt”(accessible on professor’s website), there are 500 pairs of data, where the ﬁrst column is X and the second column is Y. The regression model is Y = β0 + β1 X + a. Draw 20 pairs of data randomly from this population of size 500. Use MATLAB to run a regression model speciﬁed as above and keep record of the estimations of both β0 and β1 . Do this 200 times. Thus you will have 200 estimates of β0 and β1 . For each parameter, plot a histogram of the estimations. b. The above 500 data are actually generated by the model Y = 3 + 1.5X + , where ∼ N (0, 22 ). What is the exact distribution of the estimates of β0 and β1 ? c. Superimpose the curve of the estimates’ density functions from part b. onto the two histograms respectively. Is the histogram a close approximation of the curve? Answer: First, read the data into Matlab. pr1=textread(’problem1.txt’); V1=pr1(1:250,1); V2=pr1(1:250,2); T1=pr1(251:500,1); T2=pr1(251:500,2); X=[V1;V2]; Y=[T1;T2]; Randomly draw 20 pairs of (X,Y) from the original data set, calculate the coeﬃcients b0 and b1 and repeat the process for 200 times b0=zeros(200,1); b1=zeros(200,1); i=0 for i=1:200 indx=randsample(500,20); x=X(indx); 1

y=Y(indx); avg x = mean(x); avg y = mean(y); sxx = sum((x − avg x).2 ); sxy = sum((x − avg x). ∗ (y − avg y)); b1(i) = sxy/sxx; b0(i) = avg y − b1(i) ∗ avg x; end; Draw histograms of the coeﬃcients b0 and b1 hist(b0) hist(b1)

Figure 1: Histogram of b0

Figure 2: Histogram of b1

2

i b. As we have known, b1 = i i(Xi −X)2 = i (Xii −X)2i = i Ki Yi whereKi = Xi −X¯ 2 ¯ ¯ i i i (Xi −X) So, b1 is a linear combination of Yi . Since Yi has a normal distribution, b1 also follows a normal distribution. E(b1 ) = i Ki E(Yi ) = i Ki (β0 + β1 Xi ) = i Ki β0 + ( i Ki Xi )β1 ¯ i (Xi −X) =0 ¯ i Ki = (Xi −X)2 i i i i i i i =1 ¯ 2 = ¯ 2 i Ki X i = i (Xi −X) i (Xi −X) E(b1 ) = 0 + 1 ∗...

...
Simple LinearRegressionModel
1. The following data represent the number of flash drives sold per day at a local computer shop and their prices.
| Price (x) | Units Sold (y) |
| $34 | 3 |
| 36 | 4 |
| 32 | 6 |
| 35 | 5 |
| 30 | 9 |
| 38 | 2 |
| 40 | 1 |
| a. Develop as scatter diagram for these data. b. What does the scatter diagram indicate about the relationship between the two variables? c. Develop the estimated regression equation and explain what the slope of the line indicates. d. Compute the coefficient of determination and comment on the strength of relationship between x and y. e. Compute the sample correlation coefficient between the price and the number of flash drives sold. f. Perform a t test and determine if the price and the number of flash drives sold are related. Let α = 0.01. g. Perform an F test and determine if the price and the number of flash drives sold are related. Let α = 0.01. |
ANS:
b. Negative linear relationship.
c. | = 29.7857 - 0.7286xThe slope indicates that as the price goes up by $1, the number of units sold goes down by 0.7286 units. |
d. | r 2 = .8556; 85.56% of the variability in y is explained by the linear relationship between x and y. |
e. | rxy = -0.92; negative strong relationship. |
f. t = -5.44 < -4.032 (df = 5); reject Ho, and conclude x and y are related.
g. | F = 29.642...

...LinearRegression deals with the numerical measures to express the relationship between two variables. Relationships between variables can either be strong or weak or even direct or inverse. A few examples may be the amount McDonald’s spends on advertising per month and the amount of total sales in a month. Additionally the amount of study time one puts toward this statistics in comparison to the grades they receive may be analyzed using theregression method. The formal definition of Regression Analysis is the equation that allows one to estimate the value of one variable based on the value of another.
Key objectives in performing a regression analysis include estimating the dependent variable Y based on a selected value of the independent variable X. To explain, Nike could possibly measurer how much they spend on celebrity endorsements and the affect it has on sales in a month. When measuring, the amount spent celebrity endorsements would be the independent X variable. Without the X variable, Y would be impossible to estimate. The general from of the regression equation is Y hat "=a + bX" where Y hat is the estimated value of the estimated value of the Y variable for a selected X value. a represents the Y-Intercept, therefore, it is the estimated value of Y when X=0. Furthermore, b is the slope of the line or the average change in Y hat for each change of one unit in the...

... and briefly explain your
reasoning.)
1. Assume we have a simple linearregressionmodel:
. Given a random sample from the population, which of
the following statement is true?
a. OLS estimators are biased when BMI do not vary much in the sample.
b. OLS estimators are biased when the sample size is small (say 20 observations).
c. OLS estimators are biased when the error u captures perseverance and self‐
control, and you believe that people who are perseverant and have more self‐
control are less likely overweight.
d. None of the above.
2. Suppose you are interested in the effect of class attendance on college
performance, and plan to estimate the following model:
, where colGPA is current GPA and skipped is the
average number of classes skipped per week. Due to lack of information, students’
motivation is omitted from the regression. Assume that more motive students are
less likely skip classes. OLS estimator of the coefficient for skipped will most likely
a) be biased away from zero, so that the impact of skipped on colGPA is
overestimated.
b) be biased toward zero, so that the impact of skipped on colGPA is underestimated.
c) be unbiased.
d) be biased, but not enough information to determine if the impact is
overestimated or underestimated.
Questions 3‐5 are based on the following information:
...

...Linear -------------------------------------------------
Important
EXERCISE 27 SIMPLE LINEARREGRESSION
STATISTICAL TECHNIQUE IN REVIEW
Linearregression provides a means to estimate or predict the value of a dependent variable based on the value of one or more independent variables. The regression equation is a mathematical expression of a causal proposition emerging from a theoretical framework. The linkage between the theoretical statement and the equation is made prior to data collection and analysis. Linearregression is a statistical method of estimating the expected value of one variable, y, given the value of another variable, x. The term simple linearregression refers to the use of one independent variable, x, to predict one dependent variable, y.
The regression line is usually plotted on a graph, with the horizontal axis representing x (the independent or predictor variable) and the vertical axis representing the y (the dependent or predicted variable) (see Figure 27-1). The value represented by the letter a is referred to as the y intercept or the point where the regression line crosses or intercepts the y-axis. At this point on the regression line, x = 0. The value represented by the letter b is referred to as the slope, or the coefficient of x. The slope determines the...

...This article considers the relationship between two variables in two ways: (1) by using regression analysis and (2) by computing the correlation coefficient. By using the regressionmodel, we can evaluate the magnitude of change in one variable due to a certain change in another variable. For example, an economist can estimate the amount of change in food expenditure due to a certain change in the income of a household by using theregressionmodel. A sociologist may want to estimate the increase in the crime rate due to a particular increase in the unemployment rate. Besides answering these questions, a regressionmodel also helps predict the value of one variable for a given value of another variable. For example, by using the regression line, we can predict the (approximate) food expenditure of a household with a given income. The correlation coefficient, on the other hand, simply tells us how strongly two variables are related. It does not provide any information about the size of the change in one variable as a result of a certain change in the other variable.
Let us return to the example of an economist investigating the relationship between food expenditure and income. What factors or variables does a household consider when deciding how much money it should spend on food every week or every month? Certainly, income of the household is one factor....

...Linear-Regression Analysis
Introduction
Whitner Autoplex located in Raytown, Missouri, is one of the AutoUSA dealerships. Whitner Autoplex includes Pontiac, GMC, and Buick franchises as well as a BMW store. Using data found on the AutoUSA website, Team D will use LinearRegression Analysis to determine whether the purchase price of a vehicle purchased from Whitner Autoplex increases as the age of the consumer purchasing the vehicle increases. The data set provided information about the purchasing price of 80 domestic and imported automobiles at Whitner Autoplex as well as the age of the consumers purchasing the vehicles. Team D selected the first 30 of the sampled domestic vehicles to use for this test. The business research question Team D will answer is: Does the purchase price of a consumer increase as the age of the consumer increases? Team D will use a linear-regression analysis to test the age of the consumers and the prices of the vehicles.
Five Step Hypothesis Testing
Team D will conduct the two-sample hypothesis using the following five steps:
1. Formulate the hypothesis
2. State the decision rule
3. Calculate the Test Statistic
4. Make the decision
5. Interpret the results
Step 1- Formulate the Hypothesis
Using the research question: Does the purchase price of an automobile purchased at Whitner Autoplex, increase as the age of the...

...1. What nursing action is required b4 you measure fundal height= empty bladder full bladder make the fundal height higher.
2. What should a nurse do to prevent heat loss from evaporation= dry them up and remove the wet linen.
3. Child with cephalohematoma. What condition is associated with cephalohemetoma = jaundice
4. Why do we perform gestational age in a baby= to identify developmental level
5. What kind of exam do we perform to access for gestational age = ballot score
6. A baby has been circumcised a mother called the unit and complains that she saw a yellow crust on the penile area what do you tell the mother=Normal
7. You are teaching a mom how to use a bulb syringe what will you tell her to do= tilt babies head to the side and sanction the check
8. You are providing umbilical cord care, what will you do to provide this care= dye, open, dry, to prevent infection.
9. You have a patient who is breast feeding you want to prevent nipple trauma what will you teach= latching on, make sure the oriole is in the baby mouth and the baby is sucking onto it. And the baby is not sucking the nipple.
10. When babies have jaundice and are placed on a phototherapy why should we make sure that they have fluid and they get fed= prevent dehydration, hypoglycemia and promote growth
11. A neonate that was born 4hours after delivery mother is diabetic and some of the signs and symptoms is that the baby is jittery = hypoglycemia, check blood sugar and feed them
12. A woman who...