Regression Analysis of Multiple Variables

Neil Bhatt

993569302

Sta 108 P. Burman

11 total pages

The question being posed in this experiment is to understand whether or not pollution has an impact on the mortality rate. Taking data from 60 cities (n=60) where the responsive variable Y = mortality rate per population of 100,000, whose variables include Education, Percent of the population that is nonwhite, percent of population that is deemed poor, the precipitation, the amount sulfur dioxide, and amount of nitrogen dioxide.

Data:

60 Standard Metropolitan Statistical Area (SMSA) in the United States, obtained for the years 1959-1961. [Source: GC McDonald and JS Ayers, “Some applications of the ‘Chernoff Faces’: a technique for graphically representing multivariate data”, in Graphical Representation of Multivariate Data, Academic Press, 1978.

Taking the data, we can construct a matrix plot of the data in order to take a visible look at whether a correlation seems to exist or not prior to calculations.

Data Distribution:

Scatter Plot Matrix

As one can observe there seems to be a cluster of data situated on what appears to be a correlation of relationship between Y=Mortality rate and X= potential variables influencing Y.

From this we construct a correlation matrix in order to see a relationship in matrix form.

Correlation Matrix EDUC MORTALITY NONWHITE NOX POOR PRECIP

EDUC 1.0000000 -0.51098130 -0.2087739 0.22440191 -0.40333845 -0.4904252

MORTALITY -0.5109813 1.00000000 0.6437364 -0.07738105 0.41045399 0.5094924

NONWHITE -0.2087739 0.64373637 1.0000000 0.01838530 0.70491501 0.4132045

NOX 0.2244019 -0.07738105 0.0183853 1.00000000 -0.10254386 -0.4873207

POOR -0.4033385 0.41045399 0.7049150 -0.10254386 1.00000000