Na¨ve Bayes ı

David J. Hand

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power Despite Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensions of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 9.2 9.3 9.4 9.5 9.6 163 164 167 169 171 171 171 173 174 175 176

9.1 Introduction

Given a set of objects, each of which belongs to a known class, and each of which has a known vector of variables, our aim is to construct a rule which will allow us to assign future objects to a class, given only the vectors of variables describing the future objects. Problems of this kind, called problems of supervised classiﬁcation, are ubiquitous, and many methods for constructing such rules have been developed. One very important method is the na¨ve Bayes method—also called idiot’s Bayes, ı simple Bayes, and independence Bayes. This method is important for several reasons, including the following. It is very easy to construct, not needing any complicated iterative parameter estimation schemes. This means it may be readily applied to huge data sets. It is easy to interpret, so users unskilled in classiﬁer technology can understand why it is making the classiﬁcation it makes. And, particularly important, it often does surprisingly well: It may not be the best possible classiﬁer in any given application, but it can usually be relied on to be robust and to do quite well. For example, in an early classic study comparing supervised classiﬁcation methods, Titterington et al. (1981) found that the independence model yielded the best overall result, while Mani et al. (1997) found that the model was most effective in predicting

163

© 2009 by Taylor & Francis Group, LLC

164

Na¨ve Bayes ı

breast cancer recurrence. Many further examples showing the surprising effectiveness of the na¨ve Bayes method are listed in Hand and Yu (2001) and further empirical ı comparisons, with the same result, are given in Domingos and Pazzani (1997). Of course, there are also some other studies which show poorer relative performance from this method: For a comparative assessment of such studies, see Jamain and Hand (2008). For convenience, most of this chapter will describe the case in which there are just two classes. This is, in fact, the most important special case as many situations naturally form two classes (right/wrong, yes/no, good/bad, present/absent, and so on). However, the simplicity of the na¨ve Bayes method is such that it permits ready ı generalization to more than two classes. Labeling the classes by i = 0, 1, our aim is to use the initial set of objects which have known class memberships (known as the training set) to construct a score such that larger scores are associated with class 1 objects (say) and smaller scores...