Jamie DeCoster

Department of Psychology

University of Alabama

348 Gordon Palmer Hall

Box 870348

Tuscaloosa, AL 35487-0348

Phone: (205) 348-4431

Fax: (205) 348-8648

September 26, 2006

Textbook references refer to Cohen, Cohen, West, & Aiken’s (2003) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. I would like to thank Angie Maitner and Anne-Marie Leistico for comments made on earlier versions of these notes. If you wish to cite the contents of this document, the APA reference for them would be:

DeCoster, J. (2006). Applied Linear Regression Notes set 1. Retrieved (month, day, and year you downloaded this ﬁle, without the parentheses) from http://www.stat-help.com/notes.html

For future versions of these notes or help with data analysis visit http://www.stat-help.com

ALL RIGHTS TO THIS DOCUMENT ARE RESERVED

Contents

1 Introduction and Review

1

2 Bivariate Correlation and Regression

9

3 Multiple Correlation and Regression

21

4 Regression Assumptions and Basic Diagnostics

29

5 Sequential Regression, Stepwise Regression, and Analysis of IV Sets

37

6 Dealing with Nonlinear Relationships

45

7 Interactions Among Continuous IVs

51

8 Regression with Categorical IVs

59

9 Interactions involving Categorical IVs

69

10 Outlier and Multicollinearity Diagnostics

75

i

Chapter 1

Introduction and Review

1.1

Data, Data Sources, and Data Sets

• Most generally, data can be deﬁned as a list of numbers with meaningful relations. We are interested in data because understanding the relations among the numbers can help us understand the relations among the things that the numbers measure.

• The numbers that you collect from an experiment, survey, or archival source is known as a data source. Before you can learn anything from a data source, however, you must ﬁrst translate it into a data set. A data set is a representation of a data source, deﬁning a set of “variables” that are measured on a set of “cases.”

◦ A variable is simply a feature of an object that can be categorized or measured by a number. A variable takes on diﬀerent values to reﬂect the particular nature of the object being observed. The values that a variable takes will change when measurements are made on diﬀerent objects at diﬀerent times. A data set will typically contain measurements on several diﬀerent variables. ◦ Each time that we record information about an object we create a case in the data set. Cases are also sometimes referred to as observations. Like variables, a data set will typically contain multiple cases. The cases should all be derived from observations of the same type of object, with each case representing a diﬀerent example of that type. The object “type” that deﬁnes your cases is called your unit of analysis. Sometimes the unit of analysis in a data set will be very small and speciﬁc, such as the individual responses on a questionnaire. Sometimes it will be very large, such as companies or nations. The most common unit of analysis in social science research is the participant or subject.

◦ Data sets are traditionally organized in a table where each column represents a diﬀerent variable and each row represents a diﬀerent case. The value in a particular cell of the table represents the value of the variable corresponding to the column for the case corresponding to the row. For example, if you give a survey to a bunch of diﬀerent people, you might choose to organize your data set so that each variable represents an item on the survey and each row represents a diﬀerent person who answered your survey. A cell inside the data set would hold the value that the person represented by the row gave to the item represented by the column. • The distinction between a data source and a data set is important because you can create a number of diﬀerent data sets from a single data source by choosing diﬀerent...