Rapidminer

Use K-Means for Clustering
1. Dataset For this tutorial, we will work on some unlabeled data from the US Census Bureau. The following introduction to this dataset is for you to learn about its attributes and interpret results: Attributes of the raw data is discretized to have less attribute values, which is the data we are seeing now. Attributes description of the raw data attributes is at: http://archive.ics.uci.edu/ml/databases/census1990/USCensus1990raw.attributes.txt Some attributes are kept the same from raw dataset to the current dataset, with an “i” attached to the front of current attribute name indicating it’s unchanged; the discretized attributes of raw data set are named with a “d” added in front of their original names. For example, in current data set, attribute “dAge” is discretized from raw data set, and its description should be “AAGE” in the raw data description (Age); “iAvail” means the attribute values is not changed from its raw values, and its corresponding attribute is “AVAIL” in raw data description (Available for work). For more information, the mapping functions from raw attributes to current attributes can be found here: http://archive.ics.uci.edu/ml/databases/census1990/USCensus1990.mapping.sql The file used in this tutorial is an abbreviated version of the data set, obtaining the first 10,000 instances out of 2,458,285. [Note: If your computer does not have big memory, you will notice the following clustering process is executed very slowly. Then you may use the file UScensus_3000.xlsx to do this Lab. This file has only 3000 instances, although it may not get as interesting results as the larger file, it should take much less memory than the larger set with 10000 instances.] Start RapidMiner and ReadExcel UScensus_10000.xlsx, and set role of the “case ID” to be id, then store the dataset to your repository (please recall tutorial 2 on importing and storing data). Please note the dataset is a little bigger than those we have worked on,

Rapidminer

You May Also Find These Documents Helpful

Acct 505 Course Project

Acct 505 Course Project

Aj Davis Course Project Parts a and B

Aj Davis Course Project Parts a and B

TangleWood Case 2

TangleWood Case 2

Mgmt600-1204a-06 P2 Ip

Mgmt600-1204a-06 P2 Ip

Workbook Exercise 11

Workbook Exercise 11

Econ 4130 Review 1

Econ 4130 Review 1

It/205 Week 8 Checkpoint

It/205 Week 8 Checkpoint

Sas Assignment

Sas Assignment

Strayer University: Week 3 Business Quiz 2 Chapter 3 Questions

Strayer University: Week 3 Business Quiz 2 Chapter 3 Questions

Population and Sampling

Population and Sampling

Health Care Practitioner

Health Care Practitioner

Study Guide

Study Guide

Unequal Representation In The US Census Bureau

Unequal Representation In The US Census Bureau

Child Poverty In Texas Research Paper

Child Poverty In Texas Research Paper

Electric Bill Extimation

Electric Bill Extimation

Related Topics