# Data mining

Topics: Data mining, Data analysis, Data set Pages: 5 (1174 words) Published: September 28, 2014

Department of Computer Science
Western Connecticut State University
CS 450 Data Mining, Fall 2014
Take-Home Test N#1

Date: September 22nd, 2014
Final deadline for submissionSeptember 29th, 2014
Weighting: 5%
Total number of points: 100

Instructions:
1.Attempt all questions.
2.This is an individual test. No collaboration is permitted for assessment items. All submitted materials must be a result of your own work.

Part I

Question 1[20 points]
Discuss whether or not each of the following activities is a data mining task. •Dividing the customers of the company according to their gender No. This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining. •Dividing the customers of a company according to their profitability. Yes, this is a data mining task because it requires data analysis to determine who the costumers are that brings more business to the company. •Computing the total sales of the company.

No, this is not a data mining task because there is not analysis involve, this information can be pull out of any booking program. •Sorting a student database based on student ID numbers.
No, this is not a data mining activity because sorting by ID numbers doesn’t involved any data mining task. This is a simple database query •Predicting the future stock price of a company using historical records. Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modelling. We could use regression for this modelling, although researchers in many fields have developed a wide variety of techniques for predicting time series. •Monitoring the heart rate of a patient for abnormalities. Yes. We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem if we had examples of both normal and abnormal heart behavior.

Question 2[25 points]
For each of the following, identify the relevant data mining task(s): •The Boston Celtics would like to approximate how many points their next opponent will score against them. •A military intelligence officer is interested in learning about the respective proportions of Sunnis and Shias in a particular strategic region. •A NORAD defense computer must decide immediately whether a blip on the radar is a flick of geese or an incoming nuclear missile. •A political strategist is seeking the best groups to canvass for donations in particular county. •A homeland security official would like to determine whether a certain sequence of financial and residence moves implies a tendency to terrorist acts. •A Wall Street analyst has been asked to find out the expected change in stock price for a set of companies with similar price/earnings ratios. Question 3[20 points]

For each of the following meetings, explain which phase in the CRISP-DM process is represented: •Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is. This is the Evaluation phase in the CRISP-DM process. In the evaluation phase the data mining analysts determine if the model and technique used meets business objectives established in the first phase. •The data mining project manager meets with data warehousing manager to discuss how the data will be collected. This is the Data Understanding phase in the CRISP-DM process. The data warehouse is identified as a resource during the Business Understanding phase; however the actual data collection takes place during the Data Understanding Phase. In this phase data is collected and accessed from the resources listed and identified in the Business Understanding...