Preview

Decision

Good Essays
Open Document
Open Document
647 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Decision
Lab 1: Decision Trees and Decision Rules

Evgueni N. Smirnov smirnov@cs.unimaas.nl August 21, 2010

1. Introduction Given a data-mining problem, you need to have data that represent the problem, models that are suitable for the data, and of course a data-mining environment that contains the algorithms capable of learning these models. In this lab you will study two well-known classification problems. You will try to find classification models for these problems using decision trees and decision rules. The algorithms to learn these models are given in Weka, a data-mining environment that accompanies our course. You will study the explorer part of Weka to learn how to call decision-tree and decision-rule algorithms, how to evaluate the accuracy of the learned models, and how to use reduced error pruning.

2. Concept-Learning Problems In this lab you are expected to build classification models for two classification problems: • Labor-negotiation problem; • Soybean classification problem.

The data files for all the two problems are provided in the directory:

http://www.unimaas.nl/datamining/UCI/datasets-UCI.zip

3. Environment As stated above to build the desired classification models you will use Weka. Weka is a data-mining environment that contains a collection of machine-learning algorithms for solving real-world data-mining problems. The algorithms can either be applied directly or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU General Public License.

4. Algorithms To build the classifiers you will use four learning algorithms provided in Weka: 1. zeroR is a majority/average predictor. It assigns to each instance the classification of the

You May Also Find These Documents Helpful

  • Good Essays

    Scor eStore.com

    • 677 Words
    • 2 Pages

    Q2: Secondly, we are a bit unclear on the way in which the decision trees can be applied to this case.…

    • 677 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    You Decide

    • 448 Words
    • 2 Pages

    In the scenario where I would have to find someone to fill a position, I would find myself gauging each candidate for the qualities which suit the company’s needs. The job listing would state that the person who applies must be articulate, sophisticated, and knowledgeable in the fiber optic field. The position also requires the candidate to spend leisure time with perspective clients after 5:00 P.M. This particular aspect of the job might deter applicants, but would not affect my judgment towards a final decision. Since the sales are at an all-time low for the Fortune 500 Company, to me that means that a piece of this company’s fate is in my hands because the person I hire could possibly save it.…

    • 448 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    decision

    • 464 Words
    • 2 Pages

    “Hell, I’ll buy all your turkeys…just to help you out. I’ll show you fellows that not all white men are bastards.”…

    • 464 Words
    • 2 Pages
    Good Essays
  • Good Essays

    The data mining model chosen for this project is the Naïve Bayes classification model. This…

    • 642 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    In order to explain the use of various algorithms in this study, the algorithms will be discussed in this research. Naïve Bayes and Apriori will be used against the Stylometry data set. IBk will be used against the Keystroke Capture and Mouse Movement data sets. J48 will be used with the Mushroom Database. The choices of these techniques and their implementation will be discussed in detail in the methodologies section. According to Witten and Frank in Data Mining, the Naïve Bayes method is, “based on Bayes’srule and ‘Naïvely’ assumed independence — it is only valid to multiply probabilities when the According to Witten and Frank in Data Mining, the Naïve Bayes method is, “based on Bayes’s rule and ‘Naïvely’ assumed independence — it is only valid to multiply probabilities when the events are independent. The assumption that attributes are independent in real life certainly is simplistic one events are independent. The assumption that attributes are independent in real life certainly is a simplistic one.…

    • 494 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Here we use the mean decrease in the accuracy to define the predictors used for the classification of the data. More is the mean decrease in the accuracy, more important is the predictor for classification. Based on the output we select the important predictors to be used for running the tree. We have chosen the first 15 predictors with higher mean decrease in the accuracy.…

    • 366 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    Cloud Burst

    • 1039 Words
    • 5 Pages

    The proposed system mainly concentrates on the diagnosis of Endoscopy Images . This work gives the Endoscopy Surgeons a second option for the easy identification of interior images of esophagus. The important data mining concept that has been included in the proposed work consists of pre-processing of the Endoscopy Images. The method used for pre-processing includes Shape priori technique. The feature selection from the image has been done using the association rule mining. The rules generated for extracted features are stored in the transactional database have been classified using the data mining concept called Decision Tree Classification. The combination of both the association rule mining and the decision tree classification gives the high degree of accuracy and efficiency for the proposed system.…

    • 1039 Words
    • 5 Pages
    Better Essays
  • Powerful Essays

    How to Increase Retail Sales

    • 5808 Words
    • 24 Pages

    References: Berry, M.J.A., Linoff, G.S.: Data Mining Techniques: for Marketing, Sales and Customer Relationship Management (second edition), Hungry Minds Inc., 2004…

    • 5808 Words
    • 24 Pages
    Powerful Essays
  • Powerful Essays

    Data mining is characterized generally by the exploration and exploitation of large collections of opportunistically collected data whose internal structure is unknown and unmodeled a priori. Data set size and complexity are usually key parameters in data mining. Data quality and algorithmic complexity are concomitants that impact upon the success of data mining efforts.…

    • 4120 Words
    • 17 Pages
    Powerful Essays
  • Powerful Essays

    decision

    • 3404 Words
    • 24 Pages

    Feel free to share this ebook with your friends and co-workers, post it on your blog, email it,…

    • 3404 Words
    • 24 Pages
    Powerful Essays
  • Good Essays

    ● Project Preliminary : A quick recap ● Running the SVM classifier - Weka ● Improvising the baseline model ○ Principal Component Analysis ○ Feature Subset Selection ● Comparison of different models ● Building a local database ● Next Steps...…

    • 1309 Words
    • 6 Pages
    Good Essays
  • Powerful Essays

    from the huge amount of data, which can be utilized for future prediction or intelligence and also for knowledge discovery. There are many applications of data mining techniques in various fields such as engineering, medical, financial, and business. Here we have discussed application of classification algorithm which is an important part of the data mining algorithms into medical field.…

    • 3228 Words
    • 13 Pages
    Powerful Essays
  • Good Essays

    Assgn

    • 2191 Words
    • 10 Pages

    A. Become familiar with the use of the WEKA workbench to invoke several different machine learning schemes.…

    • 2191 Words
    • 10 Pages
    Good Essays
  • Powerful Essays

    3. Classification of Data: – The step of classification in general terms can be defined as the arrangement of the data into groups and classes depending on the resemblance and the similarities. With the help of the classification of the data, the entire data can be condensed; with this important characteristics can be very easily noticed. The various features of the variables can be…

    • 1684 Words
    • 7 Pages
    Powerful Essays
  • Satisfactory Essays

    Thesis Proposal for Ncae

    • 354 Words
    • 2 Pages

    The system will use the principles of data mining in order to forecast whether the course will be beneficial for the student by comparing the student’s records on the pattern that is stored in the training data in the system.…

    • 354 Words
    • 2 Pages
    Satisfactory Essays