SPSS Modeler Tutorial 1
– The Drug Project
Data Warehousing and Data Mining March 2013
SPSS Modeler (formerly Clementine) is the SPSS enterprise-strength data mining workbench. It helps organizations to improve customer and citizen relationships through an in-depth understanding of data. Organizations use the insight gained from SPSS Modeler to retain profitable customers, identify cross-selling opportunities, attract new customers, detect fraud, reduce risk, and improve government service delivery. The current version is “SPSS Modeler 15”.
The Drug Project Exercise
Briefing: Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Part of your job is to use data mining to find out which drug might be appropriate for a future patient with the same illness.
Launch the SPSS Modeler:
Open the SPSS Modeler by going to the Start menu All Programs IBM SPSS Modeler 15.0 IBM SPSS Modeler 15.0. Select “Open an existing project” and double-click on “More files…”. In the Open dialog window, goto the path of “N:\DWDM\SPSSModeler\Demos” and double-click on the “drug.cpj” file to open it. The SPSS Modeler should open and displays as Figure 1. Control Panel
Current Working Space
Figure 1: The Drug Project
Displaying the Properties of the Data
To open a data source, the SPSS Modeler provides many options listed in the “Sources” tab from the “Module Panel”. Here, we will use the “Var. File” node. 1. Select the “Sources” tab from the “Module Panel” 2. Double click on the “Var.File” node and it will appear in the “Main Panel”. You can also add a node by single left-click on the node in the “Module Panel”, then single left-click at the place where you want to place that node in the “Main Panel”. 3. Double click the “Var.File” node in the “Main Panel” to open its property window (Figure 2), and Click the “…” button next to the “File” field. In the “Open” dialog window, select to open the “DRUG1n” file that contains records of drug information. The “Var.File” node now should have properties as in Figure 2. The DRUG1n file contains records for 7 attributes, termed “Age”, “Sex”, “BP”, “Cholesterol”, “Na”, “K”, and “Drug”. 4. Click “OK” to close the “Var.File” property window.
Figure 2: Var.File Property
To display the properties of the data, we use a “Distribution” node. 1. Select the “Distribution” node listed in the “Graphs” tab from the “Module Panel”, and add it to the “Main Panel”. 2. Establish a link between the “DRUG1n” node and the “Distribution” node by right-clicking on the “DRUG1n” node and select the “Connect…” option, then left-clicking on the “Distribution” node (Figure 3).
Figure 3: Link between two nodes
3. Double-click the “Distribution” node to open its property window. 4. Select “Drug” for the “Field” option (Figure 4) to display the distribution of drugs. Click “Run” 2
Figure 4: Distribution Node Property
5. You should see a distribution window for the Drug attribute in the DRUG1n file (Figure 5). This window illustrates the count of different drugs and their percentages.
Figure 5: Distribution of Drugs
6. Click OK to close the window.
Finding a Relationship in Numeric Data
To investigate a relationship between sodium (Na) and potassium (K) levels, the most natural way would be to produce a point plot. To do this, we create a “Plot” node and connect it to the “Var.File” node. 1. Select the “Plot” node listed in the “Graphs” tab from the “Module Panel”, and add it to the “Main Panel”. 2. Establish a link between the “DRUG1n” node and the “Distribution” node by right-clicking on the “DRUG1n” node and select the “Connect…” option, then left-clicking on the “Plot” node (Figure 6). 3
Figure 6: Link...
Please join StudyMode to read the full document