Can Money Bring Happiness

Only available on StudyMode
  • Download(s) : 277
  • Published : March 16, 2013
Open Document
Text Preview
In this paper we address the issue of pronunciation model-
ing for conversational speech synthesis. We experiment with
two different HMM topologies (fully connected state model
and forward connected state model) for sub-phonetic model-
ing to capture the deletion and insertion of sub-phonetic states during speech production process. We show that the experi-
mented HMM topologies have higher log likelihood than the
traditional 5-state sequential model. We also study the first and second mentions of content words and their influence on the pronunciation variation. Finally we report phone recogni- tion experiments using the modified HMM topologies.

Modeling of pronunciation variations in conversational speech is essential for speech recognition as well as speech synthe- sis. The state-of-art speech synthesis systems are built using unit selection databases of carefully read speech recorded in a controlled environment. While these systems produce high

quality natural speech they produce little effect of a conversa- tion and lack the genre and style of conversational speech.
the pronunciation variations [2]. Jande used phonological rule system for adapting the pronunciation for faster speech rate [3]. Bennett et al., used acoustic models trained on single
speaker database to label the alternate pronunciations of the words: ”to, for, a, the” and used CART tree to predict the probable pronunciation with the given context [4].
There has been considerable research in speech recogni-
tion field towards capturing the pronunciation variants. Bates et al., showed that prosodic features derived from energy, F0 and duration could be cues to model the pronunciation vari-
ability [5]. Nedel et al., used phone splitting technique to model the pronunciation variants of two phones AA and IY
Most of the work in speech recognition and speech syn-
thesis use multiple entries in the dictionary generated either manually or by automatic means....
tracking img