C OMPUTER V ISION
PATTERN R ECOGNITION 2001
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola email@example.com Mitsubishi Electric Research Labs 201 Broadway, 8th FL Cambridge, MA 02139 Michael Jones firstname.lastname@example.org Compaq CRL One Cambridge Center Cambridge, MA 02142 tected at 15 frames per second on a conventional 700 MHz Intel Pentium III. In other face detection systems, auxiliary information, such as image differences in video sequences, or pixel color in color images, have been used to achieve high frame rates. Our system achieves high frame rates working only with the information present in a single grey scale image. These alternative sources of information can also be integrated with our system to achieve even higher frame rates. There are three main contributions of our object detection framework. We will introduce each of these ideas brieﬂy below and then describe them in detail in subsequent sections. The ﬁrst contribution of this paper is a new image representation called an integral image that allows for very fast feature evaluation. Motivated in part by the work of Papageorgiou et al. our detection system does not work directly with image intensities . Like these authors we use a set of features which are reminiscent of Haar Basis functions (though we will also use related ﬁlters which are more complex than Haar ﬁlters). In order to compute these features very rapidly at many scales we introduce the integral image representation for images. The integral image can be computed from an image using a few operations per pixel. Once computed, any one of these Harr-like features can be computed at any scale or location in constant time. The second contribution of this paper is a method for constructing a classiﬁer by selecting a small number of important features using AdaBoost . Within any image subwindow the total number of Harr-like features is very large, far larger than the number of pixels. In order to ensure fast classiﬁcation, the learning process must exclude a large majority of the available features, and focus on a small set of critical features. Motivated by the work of Tieu and Viola, feature selection is achieved through a simple modiﬁcation of the AdaBoost procedure: the weak learner is constrained so that each weak classiﬁer returned can depend on only a 1
This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The ﬁrst is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efﬁcient classiﬁers. The third contribution is a method for combining increasingly more complex classiﬁers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object speciﬁc focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.
This paper brings together new algorithms and insights to construct a framework for robust and extremely rapid object detection. This framework is demonstrated on, and in part motivated by, the task of face detection. Toward this end we have constructed a frontal face detection system which achieves detection...