Gesture Recognition: A Survey
Sushmita Mitra, Senior Member, IEEE, and Tinku Acharya, Senior Member, IEEE Abstract—Gesture recognition pertains to recognizing meaningful expressions of motion by a human, involving the hands, arms, face, head, and/or body. It is of utmost importance in designing an intelligent and efﬁcient human–computer interface. The applications of gesture recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. In this paper, we provide a survey on gesture recognition with particular emphasis on hand gestures and facial expressions. Applications involving hidden Markov models, particle ﬁltering and condensation, ﬁnite-state machines, optical ﬂow, skin color, and connectionist models are discussed in detail. Existing challenges and future research possibilities are also highlighted. Index Terms—Face recognition, facial expressions, hand gestures, hidden Markov models (HMMs), soft computing, optical ﬂow.
N THE PRESENT day framework of interactive, intelligent computing, an efﬁcient human–computer interaction is assuming utmost importance in our daily lives. Gesture recognition can be termed as an approach in this direction. It is the process by which the gestures made by the user are recognized by the receiver. Gestures are expressive, meaningful body motions involving physical movements of the ﬁngers, hands, arms, head, face, or body with the intent of: 1) conveying meaningful information or 2) interacting with the environment. They constitute one interesting small subspace of possible human motion. A gesture may also be perceived by the environment as a compression technique for the information to be transmitted elsewhere and subsequently reconstructed by the receiver. Gesture recognition has wide-ranging applications  such as the following: r developing aids for the hearing impaired; r enabling very young children to interact with computers; r designing techniques for forensic identiﬁcation; r recognizing sign language; r medically monitoring patients’ emotional states or stress levels; r lie detection; r navigating and/or manipulating in virtual environments; r communicating in video conferencing; r distance learning/tele-teaching assistance; r monitoring automobile drivers’ alertness/drowsiness levels, etc.
Manuscript received June 22, 2004; revised February 28, 2005. This paper was recommended by Editor E. Trucco. S. Mitra is with the Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700 108, India (e-mail: email@example.com). T. Acharya is with Avisere Inc., Tucson, AZ 85287-5706 USA, and also with the Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-5706 USA (e-mail: firstname.lastname@example.org). Digital Object Identiﬁer 10.1109/TSMCC.2007.893280
Generally, there exist many-to-one mappings from concepts to gestures and vice versa. Hence, gestures are ambiguous and incompletely speciﬁed. For example, to indicate the concept “stop,” one can use gestures such as a raised hand with palm facing forward, or, an exaggerated waving of both hands over the head. Similar to speech and handwriting, gestures vary between individuals, and even for the same individual between different instances. There have been varied approaches to handle gesture recognition , ranging from mathematical models based on hidden Markov chains  to tools or approaches based on soft computing . In addition to the theoretical aspects, any practical implementation of gesture recognition typically requires the use of different imaging and tracking devices or gadgets. These include instrumented gloves, body suits, and markerbased optical tracking. Traditional 2-D keyboard-, pen-, and mouse-oriented graphical user interfaces are often not suitable for working in virtual environments....