Voice Recognition

Only available on StudyMode
  • Topic: Speech, Speech recognition, Vocabulary
  • Pages : 7 (1843 words )
  • Download(s) : 177
  • Published : May 22, 2011
Open Document
Text Preview
● Introduction

● Initial problem

● How to Compare Recordings

● Dependence of system’s accuracy

● Algorithm instruction

● Source Code

● Software Requirements

● Hardware Requirements

● References


The project “Attendance through Voice Recognition” is a tool that can help an organization or academic institute to have attendance of their employee or students and also the faculty members.It also record the time and date at which the member is present. This project allows a organization or academic institute to overcome the problem of proxy to a great extend. Many organization is facing the problem of proxy. Employee may mark their attendance by some other guy and the organization may not detect it because there is no such process of verification and it is difficult to recognize the face or voice of every person. The same situation is their in academic institute also.

The faculty member may mark their attendance though they are late or absent from the institute by some other colleagues which is a common scenario in a government institute. The faculty members can also get help from this software by detecting proxy of students also.

The Project actually describes the process behind implementing a voice recognition algorithm in MATLAB. The algorithm utilizes the Discrete Fourier Transform in order to compare the frequency spectra of two voices. Chebyshev’s Inequality is then used to determine (with reasonable certainty) whether two voices came from the same person or not. If the two voices matches each other than a present is marked in the attandence register i.e in a database and with that present, the date and time of attendance is also stored.

Initial Problem

Speech is a natural mode of communication for people. We learn all the relevant skills during early childhood, without instruction, and we continue to rely on speech communication throughout our lives. It comes so naturally to us that we don't realize how complex a phenomenon speech is. The human vocal tract and articulators are biological organs with nonlinear properties, whose operation is not just under conscious control but also affected by factors ranging from gender to upbringing to emotional state. As a result, vocalizations can vary widely in terms of their accent, pronunciation, articulation, roughness, nasality, pitch, volume, and speed; moreover, during transmission, our irregular speech patterns can be further distorted by background noise and echoes, as well as electrical characteristics (if telephones or other electronic equipment are used). All these sources of variability make speech recognition, even more than speech generation, a very complex problem.

A human can easily recognize a familiar voice however, getting a computer to distinguish a particular voice among others is a more difficult task. Immediately, several problems arise when trying to write a voice recognition algorithm. The majority of these difficulties are due to the fact that it is almost impossible to say a word exactly the same way on two different occasions. Some factors that continuously change in human speech are how fast the word is spoken, emphasizing different parts of the word, etc… Furthermore, suppose that a word could in fact be said the same way on different occasions, then we would still be left with another major dilemma. Namely, in order to analyze two sound files in time domain, the recordings would have to be aligned just right so that both recordings would begin at precisely the same moment.

How to Compare Recordings

Frequency Domain

Given the difficulties mentioned in the above paragraph, one thing becomes very evident. That is, any attempt to analyze sounds in time domain will be extremely impractical. Instead, this led us to analyze the frequency spectra of a voice which remains predominately unchanged as speech is slightly varied. We then effectively utilized the Discrete Fourier...
tracking img