Predictive Analytics and Regression

Data Mining
95-791 Spring 2013

Lecture #8 Predictive analytics: Regression
Artur Dubrawski awd@cs.cmu.edu This unit
• Good-old correlation scores revisited • Locally weighted regression
– As an approximator of non-linear functions – As a framework for active/purposive acquisition of data

95-791 Data Mining

Lecture #8 Slide 2

Copyright © 2000-2013 Artur Dubrawski

Correlational scores of association between attributes of data
• • • • Linear Rank Quadratic ….
Would not it be great to have an universal formula for computing correlations of all types, no matter how complex were the underlying models (linear, quadratic, …, any kind)... hmmmm… life would be so much more fulfilling then… 
95-791 Data Mining Lecture #8 Slide 3 Copyright © 2000-2013 Artur Dubrawski

Correlation coefficient generalized
• Idea: take your data and apply some function approximator to it (e.g. fit some regression model to it), and compute the following:
R2  1 ˆ  y i 1 N i 1 N i

 yi  i 2

   y 

ˆ y,  : from data, y : predicted

2

Using linear regression to predict
Basically, to predict we can use:

yi ?  linear correlation

Using quadratic regression?  quadratic correlation multiple regression, any kind of non-linear regression, any other function approximator we like, and we should still be able to compute the corresponding correlation coefficient. Life is perfect!
95-791 Data Mining Lecture #8 Slide 4 Copyright © 2000-2013 Artur Dubrawski

Generalized correlation total variation = explained variation + unexplained variation
2 2 2 ˆ ˆ    y i     y i       y i  y i  i 1 i 1 i 1 N N N

total variation: ~variance observed in the training data explained variation: part of the total variation accounted for

(“explained”) by the trained model unexplained variation: mismatch between the data and the model-based predictions (part of the total variance that is left “unexplained” by the model)

R1.0

Predictive Analytics and Regression

You May Also Find These Documents Helpful

Nt1330 Project 4

Nt1330 Project 4

3.09 Honors Chem Online

3.09 Honors Chem Online

Buad 310 Cheat Sheet

Buad 310 Cheat Sheet

Text Questions 6

Text Questions 6

Buad 310 Case Analysis Instruction

Buad 310 Case Analysis Instruction

HS AL2 L3 S1 02 11 UT Part 2 Student D

HS AL2 L3 S1 02 11 UT Part 2 Student D

Week 1

Week 1

Ap Stats Notes

Ap Stats Notes

Relationship Between ABV And Time In Secondary Fermentation

Relationship Between ABV And Time In Secondary Fermentation

Exam paper Quality Control Management and Six sigma

Exam paper Quality Control Management and Six sigma

Data Science and Prediction

Data Science and Prediction

Describing Sensor Performance

Describing Sensor Performance

Predicitve Analytics

Predicitve Analytics

Regression Analysis for Demand Estimation Essay Example

Regression Analysis for Demand Estimation Essay Example

Ib Math Sl Ia Type 2 Essay Example

Ib Math Sl Ia Type 2 Essay Example