visual categorization with bag of keypoints

Visual Categorization with Bags of Keypoints
Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray
Xerox Research Centre Europe
6, chemin de Maupertuis
38240 Meylan, France
{gcsurka,cdance}@xrce.xerox.com

Abstract. We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches.
We propose and compare two alternative implementations using different classifiers: Naïve Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for simultaneously classifying seven semantic visual categories. These results clearly demonstrate that the method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.

1. Introduction
The proliferation of digital imaging sensors in mobile phones and consumer-level cameras is producing a growing number of large digital image collections. To manage such collections it is useful to have access to high-level information about objects contained in the image. Given an appropriate categorization of image contents, one may efficiently search, recommend, react to or reason with new image instances.
We are thus confronted with the problem of generic visual categorization. We should like to identify processes that are sufficiently generic to cope with many object types simultaneously and which are readily extended to new object types. At the same time, these processes should handle the variations in view, imaging, lighting and occlusion, typical of the real world, as well as the intra-class variations typical of semantic classes of everyday objects.
The task-dependent and evolving nature of visual categories

References: [1] E. Osuna, R. Freund, F and Girosi. Training support vector machines: An application to face detection, CVPR (Computer Vision and Pattern Recognition), 1997. [2] C. Papageorgiou, T. Evgeniou and T. Poggio. A trainable pedestrian detection system, IEEE Conference on Intelligent Vehicles, 1998. [3] H. Schneiderman and T. Kanade, "A Statistical method for 3D object detection applied to faces and cars", CVPR, 2000. [4] P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, CVPR, 2001 [5] S.Z. Li, L. Zhu, Z.Q. Zhang, A. Blake, H.J. Zhang and H. Shum, Statistical learning of multi-view face detection, ECCV (European Conference on Computer Vision), 2002. [7] T. Joachims. Text categorization with support vector machines: Learning with many relevant features, ECML, 1998. [10] N. Cristianini, J.Shawe-Taylor and H. Lodhi, Latent Semantic Kernels, Journal of Intelligent Information Systems, 18 (2), 127-152, 2002. [11] L. Zhu, A. Rao and A. Zhang, Theory of Keyblock-based image retrieval, ACM Transactions on Information Systems, 20, (2), 224-257, 2002. [17] T. Lindenberg, Scale-space theory in computer vision, Kluwer Academic Publishers, 1994. [18] D. G. Lowe, Object Recognition from local scale–invariant features, ICCV (International Conference on Computer Vision), 1999. [19] J. Matas, J. Burianek, and J. Kittler. Object recognition using the invariant pixel-set signature, BMVC (British Machine Vision Conference), 2000. [20] F. Schaffalitzky and A. Zisserman. Viewpoint invariant texture matching and wide baseline stereo, ICCV, 2001. [21] K. Mikolajczyk and C. Schmid. An affine invariant interest point detector, ECCV, 2002. [22] K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, CVPR, 2003. [23] O. Duda, P.E. Hart, D.G. Stork, Pattern classification, John Wiley & Sons, 2000. [24] D. Pelleg and A. Moore. X-Means: Extending K-means with Efficient Estimation of the Number of Clusters, International Conference on Machine Learning, 2000. [25] V. Vapnik. Statistical Learning Theory. Wiley, 1998 [26] D [27] P. Domingos and M. Pazzani, On the optimality of simple Bayesian classifier under zeroone loss, Machine Learning, 29, 1997.

visual categorization with bag of keypoints

You May Also Find These Documents Helpful

Pt1420 Unit 1 Assignment

Pt1420 Unit 1 Assignment

Analyzing Aaron Douglas's Aspects Of Negro Life

Analyzing Aaron Douglas's Aspects Of Negro Life

Epidemiology in Populations CVD Assessment-

Epidemiology in Populations CVD Assessment-

Facial Recognition Systems, Is This an Effective Tool for Security?

Facial Recognition Systems, Is This an Effective Tool for Security?

Compare and Contrast Theories of Visual and Auditory Attention

Compare and Contrast Theories of Visual and Auditory Attention

Testing Reaction Times of Local and Global Perception

Testing Reaction Times of Local and Global Perception

Year 1 Mathematics the Learner

Year 1 Mathematics the Learner

Experiment on Perceptual Inconsistency

Experiment on Perceptual Inconsistency

What Do Theories of Face Perception Tell Us About Object Perception in General?

What Do Theories of Face Perception Tell Us About Object Perception in General?

Acrostic Poems

Acrostic Poems

Echolocation

Echolocation

Role of Play in Child Development

Role of Play in Child Development

Theoretical and Empirical Analysis of Relieff and Rrelieff

Theoretical and Empirical Analysis of Relieff and Rrelieff

The CRISP-DM Case Study

The CRISP-DM Case Study

digital image processing

digital image processing

Related Topics