Linear Regression Models Study Guide

Only available on StudyMode
  • Download(s): 42
  • Published: November 14, 2011
Read full document
Text Preview
LINEAR REGRESSION MODELS W4315
HOMEWORK 2 ANSWERS February 15, 2010

Instructor: Frank Wood 1. (20 points) In the file ”problem1.txt”(accessible on professor’s website), there are 500 pairs of data, where the first column is X and the second column is Y. The regression model is Y = β0 + β1 X + a. Draw 20 pairs of data randomly from this population of size 500. Use MATLAB to run a regression model specified as above and keep record of the estimations of both β0 and β1 . Do this 200 times. Thus you will have 200 estimates of β0 and β1 . For each parameter, plot a histogram of the estimations. b. The above 500 data are actually generated by the model Y = 3 + 1.5X + , where ∼ N (0, 22 ). What is the exact distribution of the estimates of β0 and β1 ? c. Superimpose the curve of the estimates’ density functions from part b. onto the two histograms respectively. Is the histogram a close approximation of the curve? Answer: First, read the data into Matlab. pr1=textread(’problem1.txt’); V1=pr1(1:250,1); V2=pr1(1:250,2); T1=pr1(251:500,1); T2=pr1(251:500,2); X=[V1;V2]; Y=[T1;T2]; Randomly draw 20 pairs of (X,Y) from the original data set, calculate the coefficients b0 and b1 and repeat the process for 200 times b0=zeros(200,1); b1=zeros(200,1); i=0 for i=1:200 indx=randsample(500,20); x=X(indx); 1

y=Y(indx); avg x = mean(x); avg y = mean(y); sxx = sum((x − avg x).2 ); sxy = sum((x − avg x). ∗ (y − avg y)); b1(i) = sxy/sxx; b0(i) = avg y − b1(i) ∗ avg x; end; Draw histograms of the coefficients b0 and b1 hist(b0) hist(b1)

Figure 1: Histogram of b0

Figure 2: Histogram of b1

2

i b. As we have known, b1 = i i(Xi −X)2 = i (Xii −X)2i = i Ki Yi whereKi = Xi −X¯ 2 ¯ ¯ i i i (Xi −X) So, b1 is a linear combination of Yi . Since Yi has a normal distribution, b1 also follows a normal distribution. E(b1 ) = i Ki E(Yi ) = i Ki (β0 + β1 Xi ) = i Ki β0 + ( i Ki Xi )β1 ¯ i (Xi −X) =0 ¯ i Ki = (Xi −X)2 i i i i i i i =1 ¯ 2 = ¯ 2 i Ki X i = i (Xi −X) i (Xi −X) E(b1 ) = 0 + 1 ∗ β1 = β1 σ2 V ar(b1 ) = (Xi −X)2 (see the proof in homework 1 solution) ¯ i

¯ ¯ (X −X)(Y −Y )

¯ (X −X)Y

¯

¯ (X −X)X

¯ ¯ (X −X)(X −X)

σ Therefore, b1 ∼ N (β1 , (Xi −X)2 ) ¯ i ¯ − b1 X ¯ b0 = Y E(b0 ) = β0 2 1 V ar(b0 ) = ( n + (XXi X)2 )σ 2 ¯ i− i

2

1 Therefore, b0 ∼ N (β0 , ( n + (XXi X)2 )σ 2 ) ¯ i− i Since the data are generated by the model Y = 3 + 1.5X + , where ∼ N (0, 22 ). β0 = 3; β1 = 1.5 and σ 2 = 4. The mean and variance of b0 and b1 can thus be determined. Calculate the variance of b0 and b1 in Matlab avg X = mean(X); avg Y = mean(Y); SXX = sum((X − avg X).2 ); SXY = sum((X − avg X). ∗ (Y − avg Y )); B1 = SXY /SXX; B0 = avg Y - b1*avg X; var B1=4/SXX var B0 = 4 ∗ (1/500 + ((avg X).2 )/SXX) sd B0 = sqrt(var B0) sd B1 = sqrt(var B1) The results showed that V ar(b0 ) = 0.0334; V ar(b1 ) = 9.457E − 004 The exact distribution of the estimates of β0 and β1 is b0 ∼ N (3, 0.0334); b1 ∼ N (1.5, 9.457E− 004)

2

c. We have obtained the estimates’ exact distribution in part(b), we can now plot the curve of their pdf function and compare them with the histograms. a = 0 : 0.1 : 6; mu = 3; sigma = sd B0; pdfNormal = normpdf(a, mu, sigma); 3

[n, xout] = hist(b0); n = 6 ∗ n/200; bar(xout,n) hold on; plot(a, pdfNormal) hold off xlabel(’b0’) ylabel(’6*Frequency’)

Figure 3: Histogram and the pdf curve of b0 on the same plot b = 1 : 0.1 : 2 mu = 1.5; sigma = sd B1; pdfNormal = normpdf(b, mu, sigma); [n, xout] = hist(b1); n = 40 ∗ n/200; bar(xout,n) hold on; plot(b, pdfNormal) hold off xlabel(’b1’) ylabel(’40*Frequency’)

4

Figure 4: Histogram and pdf curve ofb1 on the same plot As we can see from Figure 3 and Figure 4, the shape of the histogram of the coefficients obtained from the 200 times simulations is similar to that of the curve of the estimated distubtioin of the coefficients. 2. (20 points) Use the same data set in the last problem, we will estimate β0 and β1 using Newton-Raphson method. a. Draw a 3d plot using MATLAB(check ”surf”...
tracking img