The study process is initialized by going through different web sites and blogs in order to know about the Text-To-Speech methodology. We have tried to understand the purpose of voice synthesis. Whatever we have discovered from the Internet is described below.
Text to speech synthesizer:
A Text-To-Speech (TTS) synthesizer is a computer-based system that should be able to read any text aloud, whether it was directly introduced in the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. Let us try to be clear. There is a fundamental difference between the system we are about to discuss here and any other talking machine (as a cassette-player for example) in the sense that we are interested in the automatic production of new sentences. This definition still needs some refinements. Systems that simply concatenate isolated words or parts of sentences, denoted as Voice Response Systems, are only applicable when a limited vocabulary is required (typically a few one hundreds of words), and when the sentences to be pronounced respect a very restricted structure, as is the case for the announcement of arrivals in train stations for instance. In the context of TTS synthesis, it is impossible (and luckily useless) to record and store all the words of the language. It is thus more suitable to define Text-To-Speech as the automatic production of speech, through a grapheme-to-phoneme transcription of the sentences to utter.
How do we make computers speak: techniques for speech synthesis:
In speech generation, there are three basic techniques (in order of increasing complexity): 1) "waveform encoding “, 2) “analog formant frequency synthesis” and 3) "digital vocal tract modeling" of speech. Each of these techniques will be described in brief detail.
In waveform encoding, the computer simply becomes like a tape recorder; it records phrases or words onto digital memory, and then plays these phrases in...