EXERCISE 27 SIMPLE LINEAR REGRESSION
STATISTICAL TECHNIQUE IN REVIEW
Linear regression provides a means to estimate or predict the value of a dependent variable based on the value of one or more independent variables. The regression equation is a mathematical expression of a causal proposition emerging from a theoretical framework. The linkage between the theoretical statement and the equation is made prior to data collection and analysis. Linear regression is a statistical method of estimating the expected value of one variable, y, given the value of another variable, x. The term simple linear regression refers to the use of one independent variable, x, to predict one dependent variable, y. The regression line is usually plotted on a graph, with the horizontal axis representing x (the independent or predictor variable) and the vertical axis representing the y (the dependent or predicted variable) (see Figure 27-1). The value represented by the letter a is referred to as the y intercept or the point where the regression line crosses or intercepts the y-axis. At this point on the regression line, x = 0. The value represented by the letter b is referred to as the slope, or the coefficient of x. The slope determines the direction and angle of the regression line within the graph. The slope expresses the extent to which y changes for every 1-unit change in x. The score on variable y (dependent variable) is predicted from the subject's known score on variable x (independent variable). The predicted score or estimate is referred to as Ŷ (expressed as y-hat) (Burns & Grove, 2005). FIGURE 27-1 Graph of a Simple Linear Regression Line
Simple linear regression is an effort to explain the dynamics within a scatter plot by drawing a straight line through the plotted scores. No single regression line can be used to predict with complete accuracy every y value from every x value. However, the purpose of the regression equation is to develop the line to allow the highest degree of prediction possible, the line of best fit. The procedure for developing the line of best fit is the method of least squares. If the data were perfectly correlated, all data points would fall along the straight line or line of best fit. However, not all data points fall on the line of best fit in studies, but the line of best fit provides the best equation for the values of y to be predicted by locating the intersection of points on the line for any given value of x. The algebraic equation for the regression line of best fit is: = a + b, where: is the predicted value of y,
a is the y intercept and represents the value of y when x = 0 (see Figure 27-1), a is also called the regression constant,
b is the slope of the line that is the amount of change in y for each one unit of change in x, b is also called the regression coefficient.
In Figure 27-2, the x-axis represents Gestational Age and the y-axis represents Birth Weight. As gestational age increases from 20 weeks to 34 weeks, birth weight also increases. In other words, the slope of the line is positive. This line of best fit can be used to predict the birth weight (dependent variable) for an infant based on his or her gestational age in weeks (independent variable). Figure 27-2 is an example of a line of best fit and was not developed from research. In addition, the x-axis was started with 22 weeks rather than 0, which is the usual start in a regression figure. Using the formula Ŷ = a + bx, the birth weight of a baby born at 28 weeks of gestation is calculated below. FIGURE 27-2 Example Line of Best Fit for Gestational Age and Birth Weight
Formula: = a + bx
In this example, a = 500, b = 20, and x = 28 weeks
= 500 + 20(28) = 500 + 560 = 1,060 grams
The regression line represents Ŷ for any given value of x. As you can see, some data points fall above the line and some fall below the line. If we substitute any x value in...