Preview

Linear Probability Model

Powerful Essays
Open Document
Open Document
3043 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Linear Probability Model
The linear probability model, ctd.
When Y is binary, the linear regression model
Yi = β0 + β1Xi + ui is called the linear probability model.
• The predicted value is a probability:
• E(Y|X=x) = Pr(Y=1|X=x) = prob. that Y = 1 given x
• Yˆ = the predicted probability that Yi = 1, given X
• β1 = change in probability that Y = 1 for a given ∆x:
Pr(Y = 1 | X = x + ∆x ) − Pr(Y = 1 | X = x ) β1 =
∆x
5

Example: linear probability model,
HMDA data
Mortgage denial v. ratio of debt payments to income
(P/I ratio) in the HMDA data set (subset)

6

1

Linear probability model: HMDA data, ctd. n = -.080 + .604P/I ratio deny (.032) (.098)

(n = 2380)

• What is the predicted value for P/I ratio = .3? n Pr( deny = 1| P / Iratio = .3) = -.080 + .604×.3 = .151
• Calculating “effects:” increase P/I ratio from .3 to .4: n Pr( deny = 1| P / Iratio = .4) = -.080 + .604×.4 = .212
The effect on the probability of denial of an increase in P/I ratio from .3 to .4 is to increase the probability by .061, that is, by 6.1 percentage points.

7

Linear probability model: HMDA data, ctd
Next include black as a regressor: n = -.091 + .559P/I ratio + .177black deny (.032) (.098)
(.025)
Predicted probability of denial:
• for black applicant with P/I ratio = .3: n Pr( deny = 1) = -.091 + .559×.3 + .177×1 = .254
• for white applicant, P/I ratio = .3: n Pr( deny = 1) = -.091 + .559×.3 + .177×0 = .077
• difference = .177 = 17.7 percentage points
• Coefficient on black is significant at the 5% level
• Still plenty of room for omitted variable bias…
8

2

The linear probability model:
Summary
• Models Pr(Y=1|X) as a linear function of X
• Advantages:
• simple to estimate and to interpret
• inference is the same as for multiple regression (need heteroskedasticity-robust standard errors)
• Disadvantages:
• Does it make sense that the probability should be linear in X?
• Predicted probabilities can be <0 or >1!
• These disadvantages can be solved by using a nonlinear probability model: probit and logit

You May Also Find These Documents Helpful

  • Good Essays

    Nt1330 Unit 3

    • 1201 Words
    • 5 Pages

    17. Suppose X is a random variable with mean µX and standard deviation σX. Suppose Y is a random variable with mean µY and standard deviation σY. The mean of X + Y is…

    • 1201 Words
    • 5 Pages
    Good Essays
  • Satisfactory Essays

    Regression Model

    • 1130 Words
    • 5 Pages

    By running the above regression model for each brand, we got the following elasticity matrix and the figures for “V” and “C.” Note that we used the average price and quantity for P and Q to calculate each brand’s elasticity.…

    • 1130 Words
    • 5 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Linear Math Scenarios

    • 327 Words
    • 2 Pages

    The teacher's hypothesis is horribly inaccurate. First of all, Scenario A is the only linear function in the group consisting of A,B, and C. Scenario B is a function, but not linear. Scenario C is not a function.…

    • 327 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Linear Modeling Project

    • 597 Words
    • 3 Pages

    The purpose of this experiment is to determine whether a player’s statistics in baseball are related to the player’s salary. The sample set was taken out of 30 players who were randomly selected from the top 100 fantasy baseball players in 2007. We displayed the information with a scatter plot, and then determined with a linear equation the line of best fit. Along with the line of best fit we are going to analyze the Pearson Correlation Coefficient. This value is represented as an “r-value”. The closer this number is to 1 the better the relationship between the two variables being compared. The three statistics that we compared to the player’s salaries are; Homeruns, RBI, (runs batted in), and batting Average.…

    • 597 Words
    • 3 Pages
    Good Essays
  • Good Essays

    a. What is the probability that one or more customers will be turned away on a given day?…

    • 5518 Words
    • 23 Pages
    Good Essays
  • Powerful Essays

    a.|choosing a letter from the alphabet that has line symmetry|c.|choosing a pair of parallel lines that have unequal slopes|…

    • 5784 Words
    • 24 Pages
    Powerful Essays
  • Satisfactory Essays

    Markov Analysis

    • 461 Words
    • 2 Pages

    1. Describe the internal labor market of the company in terms of job stability (staying in same job), promotion paths and rates, transfer paths and rates, demotion paths and rates, and turnover (exit) rates.…

    • 461 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Math Probabilities

    • 602 Words
    • 3 Pages

    [10 pts] 3. List the jobid, podate, custid, and name for any jobs with purchase orders dated (podate) since February 1, 2006.…

    • 602 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Study Guide

    • 1350 Words
    • 5 Pages

    Slope: expected change on Y for a unit change in X E[X|Y] = b0 + b1X…

    • 1350 Words
    • 5 Pages
    Good Essays
  • Good Essays

    Qmb- Probability

    • 339 Words
    • 2 Pages

    Tab 1----All graphs, including the histogram should have an appropriate title and the x and y axis should be labeled. Bin and frequency does not give any information as to what is being represented by the numerical data in the histogram (hint: Electricity cost (in $) and one-bedroom apartments). As Professor Ellis stated in the lectures, graphs should be able to stand alone. “A Graph should sing its song!” Bin ranges are correct. However, the largest percentage does not lie between 139, 179. Both are upper boundaries. Following this logic would mean that there are a total of 31 data values as being the largest percentage, which your graph does not support. In determining between what two amounts does the largest percentage of observation lie? You should identify the tallest bar or view the bin-frequency table. That location will be one of the two numbers. Where would that range start (range cannot start at an upper boundary)? That is the other number you are to identify; will be the number starting the next range. So, if 139 is an upper boundary; where would the next range start if it ends at 159?…

    • 339 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Statistic and Probablilty

    • 481 Words
    • 2 Pages

    Suppose you hear an "old-timer" say, "Why, in my day, kids were much more respectful and didn't cause as much trouble as they do nowadays!" Formulate a hypothesis related to this statement that you could test. How would you test it?…

    • 481 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Probability (or likelihood[1]) is a measure or estimation of how likely it is that something will happen or that a statement is true. Probabilities are given a value between 0 (0% chance or will not happen) and 1 (100% chance or will happen).[2] The higher the degree of probability, the more likely the event is to happen, or, in a longer series of samples, the greater the number of times such event is expected to happen.…

    • 2893 Words
    • 12 Pages
    Powerful Essays
  • Good Essays

    (this is assuming that we sold everything we made as what we made them as)…

    • 1666 Words
    • 7 Pages
    Good Essays
  • Satisfactory Essays

    Probability Theory

    • 311 Words
    • 2 Pages

    Complement of A : The event consisting of all sample points that are not in A.…

    • 311 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Cov (x, y) = ∑(x- μx)*(y- μy) PXY(x,y) = ∑ x y PXY(x,y) – μx* μy…

    • 361 Words
    • 2 Pages
    Satisfactory Essays