Assgn

Assgn

Assignment 1: Using the WEKA Workbench

A. Become familiar with the use of the WEKA workbench to invoke several different machine learning schemes.
Use latest stable version. Use both the graphical interface (Explorer) and command line interface (CLI).
See Weka home page for Weka documentation.
B. Use the following learning schemes, with the default settings to analyze the weather data (in weather.arff). For test options, first choose "Use training set", then choose "Percentage Split" using default 66% percentage split. Report model percent error rate.
ZeroR (majority class)
OneR
Naive Bayes Simple
J4.8
C. Which of these classifiers are you more likely to trust when determining whether to play? Why?
D. What can you say about accuracy when using training set data and when using a separate percentage to train?

Assignment 2: Preparing the data and mining it

A. Take the file genes-leukemia.csv (here is the description of the data) and convert it to Weka file genes-a.arff.
You can convert the file either using a text editor like emacs (brute force way) or find a Weka command that converts .csv file to .arff (a smart way).
B. Target field is CLASS. Use J48 on genes-leukemia with "Use training set" option.
C. Use genes-leukemia.arff to create two subsets: genes-leukemia-train.arff, with the first 38 samples (s1 ... s38) of the data genes-leukemia-test.arff, with the remaining 34 samples (s39 ... s72).
D. Train J48 on genes-leukemia-train.arff and specify "Use training set" as the test option.
What decision tree do you get? What is its accuracy?
E. Now specify genes-leukemia-test.arff as the test set.
What decision tree do you get and how does its accuracy compare to one in the previous question?
F. Now remove the field "Source" from the classifier (unclick checkmark next to Source, and click on Apply Filter in the top menu) and repeat steps D and E.
What do you observe? Does the accuracy on test set improve and if so, why do you think it does?

You May Also Find These Documents Helpful

Problem Set 1 302 2014T1

Problem Set 1 302 2014T1

PartOneFlameTest1 CreateandcompleteadatatableforPartOneofthelabItshouldinclude

PartOneFlameTest1 CreateandcompleteadatatableforPartOneofthelabItshouldinclude

Acctstuff

Acctstuff

lab report 5

lab report 5

Pt1420 Final Exam

Pt1420 Final Exam

HomeworkIntroduction To Statistics1 Credit Card Usage

HomeworkIntroduction To Statistics1 Credit Card Usage

Study Guide

Study Guide

Ecology Invasive Species Worksheet

Ecology Invasive Species Worksheet

ap psychology

ap psychology

Wee 7 Data Analysis

Wee 7 Data Analysis

lab3c chem11

lab3c chem11

Statistics Using Excel

Statistics Using Excel

Learning Curve

Learning Curve

Automatic Emotion Recognition from Speech Using Reduced Feature Set & Different Classifiers

Automatic Emotion Recognition from Speech Using Reduced Feature Set & Different Classifiers

What Is 5.5 Experimental Algorithm

What Is 5.5 Experimental Algorithm

Related Topics