# Decision Tree and Neural Network Gradient

Topics: Regression analysis, Decision tree, Standard deviation Pages: 10 (520 words) Published: September 22, 2015
Group 2
Forest Cover Type
Prediction
ATUL JENA
RAJAT JAIN
SAHIL LALWANI
SAGNIK MAZUMDER

To predict the forest cover type (the predominant kind of tree cover) from strictly cartographic variables (as opposed to
remotely sensed data)

7 Cover types
Spruce/f
r

Lodgepol
e Pine

Ponderos
a Pine

Cottonwo
od/
Willow

Aspen

DouglasFir

Krummh
olz

Getting familiar with data

The source

Forest Cover data
set

Training set
15120
observation
s

Test set
565892
observation
s

Getting familiar with data

Description of attributes
40 soil types ( 0= absence or 1= presence )
Elevation, Aspect
Slope
4 areas of wilderness ( 0= absence or 1= presence )
Horizontal distance to Hydrology and Vertical distance to hydrology HIllshade (9am/noon/3pm)
Horizontal distance to firepoints

Pre-Processing

Filter:

Excludes certain observations such as extreme outliers and errant data

Default filtering method: standard deviation from mean

Cut-off was set to 3 standard deviation (1637 observations filtered)

Data partition

Partition allocation:
Training  70% Validation  30% Test  0%

No. of observations:
Training 9433

Validation 4050

Pre-Processing (contd.)

Transformation

Used to stabilise variance, remove non-linearity, improve additivity and counter non-normality

Default method: Maximum normal

Reduces skewness

Variable selection

Helps reduce the number of inputs in the models by rejecting few of the input variables that do not contribute significantly to the target

As the output variable is nominal, We select R-square as the model

Minimum R-square is set to 0.001

Variables with R-square value less than 0.001 are rejected(14 variables remain)

Classification techniques
• Regressio
n

• Decision
Tree
• Neural
Network

Classification techniques
↗ Stepwise Regression

Variable Selection → Forward Regression
↘ Backward Regression

• Regressio
n

• Decision
Tree
• Neural
Network

Best Model: Forward Regression, Validation Misclassification Rate= .3323 Convergence Criteria met at each step.

Classification techniques
Data Partition → Forward Regression
• Regressio
n

• Decision
Tree
• Neural
Network

Validation Misclassification Rate= .298765
Convergence Criteria met at each step.

Classification techniques
• Regression

• Decision
Tree
• Neural
Network

Initial Validation Misclassification Rate with Default Setting = .3371

Classification techniques
Forward Regression → Decision Tree
• Regression

• Decision
Tree
• Neural
Network

Validation Misclassification Rate = .2736
Important Variables : 20 Most Important Variable: Elevation

Classification techniques
Variable Selection → Forward Regression → Decision Tree
• Regression

• Decision
Tree
• Neural
Network

Validation Misclassification Rate = .2123
Important Variables : 13, Most Important Variable: Transformed Elevation

Classification techniques
Data Partition → Decision Tree
• Regression

• Decision
Tree
• Neural
Network

Validation Misclassification Rate = .1991
Important Variables : 27, Most Important Variable: Elevation

Classification techniques
• Regression

• Decision
Tree
• Neural
Network

Classification techniques
• Regression

• Decision
Tree
• Neural
Network

Classification techniques
• Regression

• Decision
Tree
• Neural
Network
Boosting

Classification techniques
• Regression

• Decision
Tree
• Neural
Network
Boosting

Which model worked best

Which model worked best

BEST FIT MODEL: DECISION TREE

Analysis Summary

...