# Decision Tree and Neural Network Gradient

**Topics:**Regression analysis, Decision tree, Standard deviation

**Pages:**10 (520 words)

**Published:**September 22, 2015

Forest Cover Type

Prediction

ATUL JENA

RAJAT JAIN

SAHIL LALWANI

SAGNIK MAZUMDER

SHRADHA SANTUKA

Business Problem

To predict the forest cover type (the predominant kind of tree cover) from strictly cartographic variables (as opposed to

remotely sensed data)

7 Cover types

Spruce/f

r

Lodgepol

e Pine

Ponderos

a Pine

Cottonwo

od/

Willow

Aspen

DouglasFir

Krummh

olz

Getting familiar with data

The source

Forest Cover data

set

Training set

15120

observation

s

Test set

565892

observation

s

Getting familiar with data

Description of attributes

40 soil types ( 0= absence or 1= presence )

Elevation, Aspect

Slope

4 areas of wilderness ( 0= absence or 1= presence )

Horizontal distance to Hydrology and Vertical distance to hydrology HIllshade (9am/noon/3pm)

Horizontal distance to roadways

Horizontal distance to firepoints

Pre-Processing

Filter:

Excludes certain observations such as extreme outliers and errant data

Default filtering method: standard deviation from mean

Cut-off was set to 3 standard deviation (1637 observations filtered)

Data partition

Partition allocation:

Training 70% Validation 30% Test 0%

No. of observations:

Training 9433

Validation 4050

Pre-Processing (contd.)

Transformation

Used to stabilise variance, remove non-linearity, improve additivity and counter non-normality

Default method: Maximum normal

Reduces skewness

Variable selection

Helps reduce the number of inputs in the models by rejecting few of the input variables that do not contribute significantly to the target

As the output variable is nominal, We select R-square as the model

Minimum R-square is set to 0.001

Variables with R-square value less than 0.001 are rejected(14 variables remain)

Classification techniques

• Regressio

n

• Decision

Tree

• Neural

Network

• Gradient

Classification techniques

↗ Stepwise Regression

Variable Selection → Forward Regression

↘ Backward Regression

• Regressio

n

• Decision

Tree

• Neural

Network

• Gradient

Best Model: Forward Regression, Validation Misclassification Rate= .3323 Convergence Criteria met at each step.

Classification techniques

Data Partition → Forward Regression

• Regressio

n

• Decision

Tree

• Neural

Network

• Gradient

Validation Misclassification Rate= .298765

Convergence Criteria met at each step.

Classification techniques

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Initial Validation Misclassification Rate with Default Setting = .3371

Classification techniques

Forward Regression → Decision Tree

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Validation Misclassification Rate = .2736

Important Variables : 20 Most Important Variable: Elevation

Classification techniques

Variable Selection → Forward Regression → Decision Tree

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Validation Misclassification Rate = .2123

Important Variables : 13, Most Important Variable: Transformed Elevation

Classification techniques

Data Partition → Decision Tree

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Validation Misclassification Rate = .1991

Important Variables : 27, Most Important Variable: Elevation

Classification techniques

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Classification techniques

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Classification techniques

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Boosting

Classification techniques

• Regression

• Decision

Tree

• Neural

Network

• Gradient

Boosting

Which model worked best

Which model worked best

BEST FIT MODEL: DECISION TREE

Analysis Summary

Business Problem

...

Please join StudyMode to read the full document