Top Ten Algorithms

Only available on StudyMode
  • Topic: Data mining, Likelihood function, Maximum likelihood
  • Pages : 57 (18870 words )
  • Download(s) : 93
  • Published : May 15, 2012
Open Document
Text Preview
Knowl Inf Syst (2008) 14:1–37 DOI 10.1007/s10115-007-0114-2 SURVEY PAPER

Top 10 algorithms in data mining
Xindong Wu · Vipin Kumar · J. Ross Quinlan · Joydeep Ghosh · Qiang Yang · Hiroshi Motoda · Geoffrey J. McLachlan · Angus Ng · Bing Liu · Philip S. Yu · Zhi-Hua Zhou · Michael Steinbach · David J. Hand · Dan Steinberg

Received: 9 July 2007 / Revised: 28 September 2007 / Accepted: 8 October 2007 Published online: 4 December 2007 © Springer-Verlag London Limited 2007

Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,

X. Wu (B ) Department of Computer Science, University of Vermont, Burlington, VT, USA e-mail: xwu@cs.uvm.edu V. Kumar Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA e-mail: kumar@cs.umn.edu J. Ross Quinlan Rulequest Research Pty Ltd, St Ives, NSW, Australia e-mail: quinlan@rulequest.com J. Ghosh Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX 78712, USA e-mail: ghosh@ece.utexas.edu Q. Yang Department of Computer Science, Hong Kong University of Science and Technology, Honkong, China e-mail: qyang@cs.ust.hk H. Motoda AFOSR/AOARD and Osaka University, 7-23-17 Roppongi, Minato-ku, Tokyo 106-0032, Japan e-mail: motoda@ar.sanken.osaka-u.ac.jp

123

2

X. Wu et al.

clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development. 0 Introduction In an effort to identify some of the most influential algorithms that have been widely used in the data mining community, the IEEE International Conference on Data Mining (ICDM, http://www.cs.uvm.edu/~icdm/) identified the top 10 algorithms in data mining for presentation at ICDM ’06 in Hong Kong. As the first step in the identification process, in September 2006 we invited the ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining. All except one in this distinguished set of award winners responded to our invitation. We asked each nomination to provide the following information: (a) the algorithm name, (b) a brief justification, and (c) a representative publication reference. We also advised that each nominated algorithm should have been widely cited and used by other researchers in the field, and the nominations from each nominator as a group should have a reasonable representation of the different areas in data mining. G. J. McLachlan Department of Mathematics, The University of Queensland, Brisbane, Australia e-mail: gjm@maths.uq.edu.au A. Ng School of Medicine, Griffith University, Brisbane, Australia B. Liu Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA P. S. Yu IBM T. J. Watson Research Center, Hawthorne, NY 10532, USA e-mail: psyu@us.ibm.com Z.-H. Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China e-mail: zhouzh@nju.edu.cn M. Steinbach Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA e-mail: steinbac@cs.umn.edu D. J. Hand Department of Mathematics, Imperial College, London, UK e-mail: d.j.hand@imperial.ac.uk D. Steinberg Salford Systems, San Diego, CA 92123, USA e-mail: dsx@salford-systems.com

123

Top 10 algorithms in data mining

3

After the nominations in Step 1, we verified each nomination for its citations on Google Scholar in late October 2006, and...
tracking img