"Iris” Data Set Contains 3 Classes, Iris Setosa, Iris...

"Iris” Data Set Contains 3 Classes, Iris Setosa, Iris Versicolour, and Iris Virginica, Each with 50 Attributes.

Paul Perez
ID: 2247878
November 9th, 2010
Project Two

Analysis The “Iris” data set contains 3 classes, Iris Setosa, Iris Versicolour, and Iris Virginica, each with 50 attributes. Each attribute contains the Iris’ sepal and petal length, as well as its sepal and petal width in centimeters for its class. The “Adult” data set contains 48,842 records, each with 15 variables to analyze. Those fields include age, work class, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, final weight, hours-per-week, native-country, and income. Finally, the “Zoo” data set is a trivial one with 7 classes, which are animal groups, with a total of 101 instances. Each animal instance contains 18 attributes, those of which include the animal’s name or race, 2 numerics for its legs and its type, and 15 Boolean-valued attributes; those that involve simple yes or no answers. The following is an analysis of 4 classification algorithms that can be optimally used for these data sets. Naive Bayes The Naive Bayes classification is a good medium to many user modeling situations, as in the “Iris” data set, given its advantages of fast learning or intuition and low structural cost. It would work the following way: Suppose your data consisted of vegetables, described by their color and shape. This would work by saying "If you see a vegetable that is green and spherical, what type of vegetable is it most likely to be, based on the data? In the future, classify green and spherical vegetables as that type of vegetable." The advantages are that it works well on text and numerical data and is easy to implement and compute when comparing to other classification algorithms. The disadvantages are that it does not do well at all when features are highly dependent, and it does not consider multiples or repeats of the same word or data.

"Iris” Data Set Contains 3 Classes, Iris Setosa, Iris Versicolour, and Iris Virginica, Each with 50 Attributes.

You May Also Find These Documents Helpful

Pt1420 Unit 1 Problem Solving Paper

Pt1420 Unit 1 Problem Solving Paper

Shark Animal Lab

Shark Animal Lab

Ehr Pros And Cons Essay

Ehr Pros And Cons Essay

Hearts R Us Preferred Stock Classification Solution

Hearts R Us Preferred Stock Classification Solution

Unit 3 Assignment 3

Unit 3 Assignment 3

EXAM 3 MGMT 310A way better

EXAM 3 MGMT 310A way better

Compare and contrast the features, advantages and disadvantages of the three applications.

Compare and contrast the features, advantages and disadvantages of the three applications.

Harrods Survey

Harrods Survey

Data Mining - Chapter 2 questions

Data Mining - Chapter 2 questions

Resourcing Talent

Resourcing Talent

Text Mining for Gold

Text Mining for Gold

Case Studie 9-1 Crowdsourcing at AOL

Case Studie 9-1 Crowdsourcing at AOL

Data Mining-East West Airlines

Data Mining-East West Airlines

Harvard Business School Case - the Fashion Channel Analysis

Harvard Business School Case - the Fashion Channel Analysis

The Methodology Used in Amex

The Methodology Used in Amex

Related Topics