Volume 14, Number 3
Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, And IBM Intelligent Miner – A Comparative Analysis Abdullah M. Al Ghoson, Virginia Commonwealth University, USA
ABSTRACT Decision tree induction and Clustering are two of the most prevalent data mining techniques used separately or together in many business applications. Most commercial data mining software tools provide these two techniques but few of them satisfy business needs. There are many criteria and factors to choose the most appropriate software for a particular organization. This paper aims to provide a comparative analysis for three popular data mining software tools, which are SAS® Enterprise Miner, SPSS Clementine, and IBM DB2® Intelligent Miner based on four main criteria, which are performance, functionality, usability, and auxiliary Task Support. Keywords: Data mining, classification, decision tree, clustering, software evaluation, SAS Enterprise Miner, SPSS Clementine, IBM Intelligent miner, Comparative Analysis, evaluation criteria.
usinesses face challenges such as growth, regulations, globalization, mergers and acquisitions, competition, and economic changes, which require fast and good decisions rather than guess work. Taking good decisions requires accurate and clear analysis such as prediction, estimation, classification, or segmentation using data mining techniques. Decision tree induction and Clustering are two of the most important data mining techniques that find interesting patterns. There are many commercial data mining software in the market, and most of them provide decision trees induction and clustering data mining techniques. There is no doubt that commercial data mining software are expensive and costly, and choosing one of them is crucial and difficult decision. Therefore, this paper objective is to help organizations to make the decision of choosing one of three preselected famous and giant commercial data mining software by providing comparative analysis among them based on selected criteria. These software tools are: SAS® Enterprise Miner, SPSS Clementine, and IBM DB2® Intelligent Miner. The analysis is based on four criteria, which are performance, functionality, usability, and auxiliary Task Support. Performance criterion focused on hosting variety, architecture, and connectivity. Functionality criterion focused on algorithm variety, and prescribed methodology criterion. Usability Criterion focused on user interface, and visualization. Auxiliary task support criterion focused on data cleansing, and binning. However, there are many commercial data mining software in the market. Our choice for SAS® Enterprise Miner, SPSS Clementine, and IBM DB2® Intelligent Miner doesn‟t mean that they are the best. In addition, the chosen criteria for the comparative analysis are not sufficient to decide which of these tools is the best where there are other criteria not covered such as security, price, flexibility and reusability. Also, this paper has covered only two data mining techniques, which are decision tree induction and clustering whereas there are many other important techniques that are not covered such as Neural Network, association rules, Logistics Regression. Of course, the more techniques the tool has, is better. In short, the choice of certain commercial data mining software and the choice of certain evaluation criteria depend more on the business objectives and goals.
International Journal of Management & Information Systems – Third Quarter 2010 2. DECISION TREE INDUCTION OVERVIEW
Volume 14, Number 3
Decision trees are class of data mining techniques that break up a collection of heterogeneous records into smaller groups of homogeneous records using a directed knowledge discovery. Directed knowledge discovery is "goal-oriented" where it...