In this paper we address the issue of pronunciation model-
ing for conversational speech synthesis. We experiment with
two different HMM topologies (fully connected state model
and forward connected state model) for sub-phonetic model-
ing to capture the deletion and insertion of sub-phonetic states during speech production process. We show that the experi-
mented HMM topologies have higher log likelihood than the
traditional 5-state sequential model. We also study the ﬁrst and second mentions of content words and their inﬂuence on the pronunciation variation. Finally we report phone recogni- tion experiments using the modiﬁed HMM topologies.
Modeling of pronunciation variations in conversational speech is essential for speech recognition as well as speech synthe- sis. The state-of-art speech synthesis systems are built using unit selection databases of carefully read speech recorded in a controlled environment. While these systems produce high
quality natural speech they produce little effect of a conversa- tion and lack the genre and style of conversational speech.
the pronunciation variations . Jande used phonological rule system for adapting the pronunciation for faster speech rate . Bennett et al., used acoustic models trained on single
speaker database to label the alternate pronunciations of the words: ”to, for, a, the” and used CART tree to predict the probable pronunciation with the given context .
There has been considerable research in speech recogni-
tion ﬁeld towards capturing the pronunciation variants. Bates et al., showed that prosodic features derived from energy, F0 and duration could be cues to model the pronunciation vari-
ability . Nedel et al., used phone splitting technique to model the pronunciation variants of two phones AA and IY
Most of the work in speech recognition and speech syn-
thesis use multiple entries in the dictionary generated either manually or by automatic means....