AUTOMATIC SENTENCE GENERATOR SYSTEM FOR SPEECH RECOGNITION APPLICATIONS. José Luciano Maldonado. Universidad de Los Andes, FACES, Núcleo La Liria, edificio G, piso 1, Instituto de Estadística Aplicada y Computación, IEAC, Mérida, Venezuela. email@example.com firstname.lastname@example.org Abstract. We describe an experimental computer program which can generate sentences
automatically. To do so, models are first created based on contexts of interest. These models incorporate word histories that are detected in a context dependent training set of sentences. Not only will we be able to automatically generate sentences associated with the theme being modeled, but we will also be able to help recognize phrases and sentences. In other words, this is a module which could be part of an automatic speech recognition system, so that proposed recognized word sequences can be validated according to acceptable contexts. The system is adaptive and incremental, since models can be modified with additional training sentences, which would expand a previously established capacity.
Key words: corpus, vocabulary, training, recognition, recognizer, generator, histories, context, decoder.
1.- Introduction. The growing, unstoppable development of very high speed information processing computers with tremendous main memory capacity which we see today leads us to think that it will be possible to design and construct automatic speech recognition systems which can detect and code all the grammatical components of a training corpus. As part of our effort to make a contribution to the fascinating world of Automatic Speech Recognition, we have developed a system composed of a set of computer programs. We have observed that on the basis of a model of a small corpus made up of sentences in a particular context, we can automatically generate a great quantity of grammatically correct sentences with this context. Also, our system can effect a linguistic discrimination to the point of rejecting, as out of context or grammatically incorrect, those word sequences with words or word histories not registered in its memory. We believe that a system that processes information in the way we describe in this paper can work successfully in recognition tasks of a variety of context whose vocabulary size extends to thousands of words.
2.- Terminology. Training corpus. The set of sentences and paragraphs used to construct the context model used for the generation and recognition of phrases. Vocabulary. The set of distinct words found in the training corpus. Training. Process which results in the creation of context models. Histories. Sets of words that appear contiguously in the training corpus, . For example, if the following sentence is part of the training corpus “there are three reasons which seem to be the origin of this fact”, then a history of two words could be “the origin”, and a history of three words would be “the origin of”. Context. Knowledge area to which belong the sentences and paragraphs of the training corpus. Recognition. The processing of a word sequence and deciding whether it is a valid sentence with regards to the grammatical rules which have been established in the context model. Sentence generation. The process which creates a sentence based on the context model. Grammatically valid sentence. A sentence which has a structure which follows the grammatical rules that have been detected in the training corpus. Sentence hypothesis. The set of possible sentences corresponding to a word sequence that is to be recognized or generated.
3.- Context model generator. In figure 1, we show the principal elements of the system, the inputs, the outputs, and a graphical indication of how the elements interact.
A word sequence
Generator of sentences Genered sentence Figure 1. System structure.
To create a model, we begin with a grammatically...
Please join StudyMode to read the full document