Multi Regression Problem for Wine Quality
The purpose of this regression analysis was to test wine quality. An evaluation like this would help assure quality for the wine market. We collected or data from “Machine Learning Repository” a data mining website. The data we obtained from Machine Learning Repository compares variables such as fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphate, and alcohol to help identify the quality of the wine
The first step in or regression analysis was to use SAS to run a stepwise and backward elimination test in order to remove any unneeded variables. The summary of the stepwise and backward elimination test determined that pH, total sulfur dioxide, volatile acidity, density, alcohol, and sulphate were all variables that could be removed from our models we were comparing. Once the unneeded variables were eliminated, three models were created and compared against one another to determine which model was best. The variables for model one were color, fixed acidity, citric acid, residual sugar, and free sulfur dioxide , u=5.8255 + .2117x1 - .1104X2 + 1.4832X3 - .0597X4 + .0183X5. The variables used in model two were color, citric acid, residual sugar, and free sulfur dioxide, u=5.0404 +.3279x1 + 1.1687X2 - .0607X3 + .0183X4. Model three variables were citric acid, residual sugar, and free sulfur dioxide, u=4.9968 + 1.6035X1 - .0577X2 + .02188. Once the models were set up we compared there t and p-values with one another and found that model three had the best p-values and also the lowest variance inflation factors so model three was chosen as the best model.
After running model three whose variables are citric acid, residual sugar, and free sulfur dioxide in SAS the results of the variance inflation factors showed no signs of multicollinearity. The next step was to run a complete regression analysis of model three. The residual by...
Please join StudyMode to read the full document