Binary Dependent Variables and the Linear Probability Model

• • •

Many of the decisions made by people are binary. What factors drive a person's decision? This question leads to regression with a binary dependent

variable.

The binary choice problem is an example of models with limited dependent variables (see Appendix 9.3 for details). Note that the multiple regression model discussed earlier does not preclude a dependent variable from being binary.

Boston HMDA data set

•

Under Home Mortgage Disclosure Act (HMDA), researchers at Federal Reserve Bank of Boston collected information about mortgage applicants and lending institutions (banks and others) in the greater Boston metropolitan area in 1990.

• • •

The full data set contains 2,925 observations, consisting of all mortgage applications by blacks and Hispanics plus a random sample of mortgage applications by whites. We here use a subsample only containing the applications for single-family residences. The sample size of this subset is 2,380. In this data set, 28% of black applicants were denied mortgages, while only 9% of white applicants were denied. Does this indicate some discriminatory treatment of applications?

Key variables in our example:

deny :

A binary variable that is one if and only if the loan application is denied.

P/I ratio: The applicant's anticipated monthly loan payment divided by his/her monthly income. Q: What relationship would you expect between A:

deny

and the P/I ratio?

76

Note: This graph was created by using only 127 observations out of 2,380. Q: What can we learn from the OLS regression result? A:

Provided that the linear regression of

deny

on the P/I ratio is the correct specication, we have that

The OLS regression of

deny

on the P/I ratio estimates

β0

and

β1

in this equality, so that we can learn the

probability of application denial conditional on the P/I ratio.

Linear probability model

Denition:

Suppose that a binary random variable

Y

and random variables

X1 , X2 ,

...,

Xk

are related through

Y = β0 + β1 X1 + · · · + βk Xk + u

in a population, where constants. Q: What does A:

u

is the error term satisfying

E[u|X1 , . . . , Xk ] = 0,

and

β0 , β1 ,

...,

βk

are real

βj

captures (j

= 1, 2, . . . , k )?

77

Besides the interpretation of the coecients of the regression model, having a binary variable for the dependent variable changes nothing in the regression analysis.

• • •

You can test hypotheses on regression parameters using the

t

and

F

tests.

You can form condence intervals for the regression parameters in the same technique you have been using. You can assess the eects of changes in the regressors on you did in analyzing causal eects before.

Pr[Y = 1 | X1 , . . . , Xk ]

in the same way as

• R2

works as well. But it may not be as attractive as before, because when you predict a binary variable,

your prediction is naturally binary, not the value given by the regression function. We will discuss some alternative measures for goodness-of-t later.

Application to the Boston HMDA data

The OLS regression of

deny

on the P/I ratio using the 2,380 observations yields:

deny =

−

0.080 (0.032)

+

0.604 (0.098)

P/I ratio.

This indicates:

•

The higher the P/I ratio, the higher the probability of denial.

Q: How much would the probability of denial increase when the P/I ratio increases by 10%? A: Q: What's the probability of denial, when the P/I ratio is 30%? A: We now add the black dummy to account for the possible eect of race. Then the OLS regression shows:

deny =

−

0.091 (0.029)

+

0.559 (0.089) +

P/I ratio

0.177 (0.025)

black. black .

• • •

The slope estimate is virtually the same between this regression and the previous one without The slope for

black...