International Journal of Management & Information Systems – Third Quarter 2010
Volume 14, Number 3
Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, And IBM Intelligent Miner – A Comparative Analysis Abdullah M. Al Ghoson, Virginia Commonwealth University, USA
ABSTRACT Decision tree induction and Clustering are two of the most prevalent data mining techniques used separately or together in many business applications. Most commercial data mining software tools provide these two techniques but few of them satisfy business needs. There are many criteria and factors to choose the most appropriate software for a particular organization. This paper aims to provide a comparative analysis for three popular data mining software tools, which are SAS® Enterprise Miner, SPSS Clementine, and IBM DB2® Intelligent Miner based on four main criteria, which are performance, functionality, usability, and auxiliary Task Support. Keywords: Data mining, classification, decision tree, clustering, software evaluation, SAS Enterprise Miner, SPSS Clementine, IBM Intelligent miner, Comparative Analysis, evaluation criteria.
usinesses face challenges such as growth, regulations, globalization, mergers and acquisitions, competition, and economic changes, which require fast and good decisions rather than guess work. Taking good decisions requires accurate and clear analysis such as prediction, estimation, classification, or segmentation using data mining techniques. Decision tree induction and Clustering are two of the most important data mining techniques that find interesting patterns. There are many commercial data mining software in the market, and most of them provide decision trees induction and clustering data mining techniques. There is no doubt that commercial data mining software are expensive and costly, and choosing one of them is crucial and difficult decision. Therefore, this paper objective is to help organizations to make the decision of choosing one of three preselected famous and giant commercial data mining software by providing comparative analysis among them based on selected criteria. These software tools are: SAS® Enterprise Miner, SPSS Clementine, and IBM DB2® Intelligent Miner. The analysis is based on four criteria, which are performance, functionality, usability, and auxiliary Task Support. Performance criterion focused on hosting variety, architecture, and connectivity. Functionality criterion focused on algorithm variety, and prescribed methodology criterion. Usability Criterion focused on user interface, and visualization. Auxiliary task support criterion focused on data cleansing, and binning. However, there are many commercial data mining software in the market. Our choice for SAS® Enterprise Miner, SPSS Clementine, and IBM DB2® Intelligent Miner doesn‟t mean that they are the best. In addition, the chosen criteria for the comparative analysis are not sufficient to decide which of these tools is the best where there are other criteria not covered such as security, price, flexibility and reusability. Also, this paper has covered only two data mining techniques, which are decision tree induction and clustering whereas there are many other important techniques that are not covered such as Neural Network, association rules, Logistics Regression. Of course, the more techniques the tool has, is better. In short, the choice of certain commercial data mining software and the choice of certain evaluation criteria depend more on the business objectives and goals.
International Journal of Management & Information Systems – Third Quarter 2010 2. DECISION TREE INDUCTION OVERVIEW
Volume 14, Number 3
Decision trees are class of data mining techniques that break up a collection of heterogeneous records into smaller groups of homogeneous records using a directed knowledge discovery. Directed knowledge discovery is "goal-oriented" where it...
References: 1. 2. Berry, Michael J. A, and Gordon Linoff. “Data Mining Techniques: for marketing, sales, and customer support”. N.p.: John Wiley & Sons, Inc, 1997. Print. Jovanovic, N.; Milutinovic, V.; Obradovic, Z.; Foundations of Predictive Data Mining. Neural Network Applications in Electrical Engineering, 2002. NEUREL '02. 2002 6th Seminar on 26-28 Sept. 2002 Page(s):53 – 58 Berry, Michael J. A, and Gordon Linoff. Data Mining Techniques: for marketing, sales, and customer support. 2nd Edition, N.p.: John Wiley & Sons, Inc, 1997. p180-183. Print. Ajith Abraham, Swagatam Das,, and Amit Konar. "Automatic Clustering Using an Improved Differential Evolution Algorithm." IEEE Transactions On Systems, Man, And Cybernetics. 38.1 (2008): 218-236. Print. Castro, Vladimir Estivill. "Why so many clustering algorithms" SIGKDD Explorations”. 4.1 (2009): 65-75. Print. A. Ultsch, “Self Organizing Neural Networks perform different from statistical k-means clustering”. Retrieved December 6th, 2009, from http://www.mathematik.unimarburg.de/~databionics/downloads/papers/ultsch95kmeans.pdf Cabena, Peter. Discovering data mining. Prentice Hall, 1998. 78-79. Print. Collier, Ken etl. “A Methodology for Evaluating and Selecting Data Mining Software”, 32nd Hawaii International Conference on System Sciences, 1999, SAS Institute Inc. The SAS® Enterprise Intelligence Platform: SAS® Business Intelligence, 2008, retrieved in 2009 from http://www.sas.com/apps/whitepaper/index.jsp?cid=3596. Eric Hunley, SAS, Cary, NC. SAS Data Quality – A Technology Overview, SAS Inc., http://www2.sas.com/proceedings/sugi29/099-29.pdf. Randall Matignon, Data Mining Using SAS Enterprise Miner, retrieved in 2009from http://www.sasenterpriseminer.com. 69
3. 4. 5. 6.
7. 8. 9. 10. 11.
International Journal of Management & Information Systems – Third Quarter 2010 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Volume 14, Number 3
Fast, scalable predictive analytics for the enterprise,SAS® Data Mining Solutions, retrieved in 2009 from www.sas.com. SAS® Enterprise Miner™ for Desktop 6.1, retrieved in 2009from www.sas.com. Dave Norris, Clementine data mining workbench from SPSS, retrieved in 2009 from www.bloorresearch.com. Data Mining: Data Understanding and Data Preparation, SPSS Inc, retrieved in 2009 from www.vcu.edu. Data Mining:Modeling, SPSS Inc, retrieved in 2009 from www.vcu.edu. Peter Cabena, Hyun Hee Choi, Il Soo Kim, Shuichi Otsuka, Joerg Reinschmidt, Gary Saarenvirta Intelligent Miner for Data Applications Guide, retrieved in 2009 from www.ibm.com. Daniel S. Tkach, Information Mining with the IBM Intelligent Miner Family, retrieved in 2009 from www.ibm.com. Joerg Reinschmidt, Helena Gottschalk, Hosung Kim, Damiaan Zwietering, Intelligent Miner for Data:Enhance Your Business Intelligence. www.ibm.com. IBM DB2 Intelligent Miner Modeling Administration and Programming, retrieved in 2009 from www.ibm.com. IBM DB2 Intelligent Miner Modeling IBM DB2 Intelligent Miner ScoringData Mining with Easy Mining procedures, retrieved in 2009 from www.ibm.com. IBM DB2 Intelligent Miner VisualizationUsing the Intelligent Miner Visualizers, retrieved in 2009 from www.ibm.com. Data Mining:Modeling, SPSS Inc retrieved in 2009 from , www.vcu.edu. SAS Enterprise Miner Help files. N. Jovanovic, V. Milutinovic, and Z. Obradovic, Member, IEEE, „Foundations of Predictive Data Mining‟, 2002. SAS Enterprise Miner help files. Retreived in 2009.
Please join StudyMode to read the full document