Preview

Predictive Analytics and Regression

Satisfactory Essays
Open Document
Open Document
1515 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Predictive Analytics and Regression
Data Mining
95-791 Spring 2013

Lecture #8 Predictive analytics: Regression
Artur Dubrawski awd@cs.cmu.edu This unit
• Good-old correlation scores revisited • Locally weighted regression
– As an approximator of non-linear functions – As a framework for active/purposive acquisition of data

95-791 Data Mining

Lecture #8 Slide 2

Copyright © 2000-2013 Artur Dubrawski

Correlational scores of association between attributes of data
• • • • Linear Rank Quadratic ….
Would not it be great to have an universal formula for computing correlations of all types, no matter how complex were the underlying models (linear, quadratic, …, any kind)... hmmmm… life would be so much more fulfilling then… 
95-791 Data Mining Lecture #8 Slide 3 Copyright © 2000-2013 Artur Dubrawski

Correlation coefficient generalized
• Idea: take your data and apply some function approximator to it (e.g. fit some regression model to it), and compute the following:
R2  1 ˆ  y i 1 N i 1 N i

 yi  i 2

   y 

ˆ y,  : from data, y : predicted

2

Using linear regression to predict
Basically, to predict we can use:

yi ?  linear correlation

Using quadratic regression?  quadratic correlation multiple regression, any kind of non-linear regression, any other function approximator we like, and we should still be able to compute the corresponding correlation coefficient. Life is perfect!
95-791 Data Mining Lecture #8 Slide 4 Copyright © 2000-2013 Artur Dubrawski

Generalized correlation total variation = explained variation + unexplained variation
2 2 2 ˆ ˆ    y i     y i       y i  y i  i 1 i 1 i 1 N N N

total variation: ~variance observed in the training data explained variation: part of the total variation accounted for

(“explained”) by the trained model unexplained variation: mismatch between the data and the model-based predictions (part of the total variance that is left “unexplained” by the model)

R1.0

You May Also Find These Documents Helpful