Characteristics of Indian Languages

Only available on StudyMode
  • Download(s): 142
  • Published: January 27, 2013
Read full document
Text Preview
MADHAVI VARALWAR and NIXON PATEL Bhrigus Inc. Hyderabad,India {,} A text to speech system often requires simple information such as language of the input text; voice-gender (male/female) to be used, pronunciation of a telephone number as isolated digits etc. A raw input text could be embedded with such information using XML like tags often referred to as Speech Synthesis Markup Language (SSML) which aims to produce a better content by a TTS in various contexts. In this positional paper, we discuss some of the possible SSML extensions keeping in the view of Indian language scripts and the corresponding TTS systems. 1. INTRODUCTION : Bhrigus Inc. is actively involved in developing TTS and ASR for Indian languages, and is currently developing unit selection voice for Telugu. The goal is to build high quality voices and speech recognition for many of the Indian languages and interface them with computertelephony applications. Some of these applications include verticals such as entertainment, health care, financial in the context of India. In this paper, we describe the nature of the Indian languages and describe and discuss our proposal where we feel the requirements of some more SSML elements to improve the rendering of Indian languages. FEATURES OF INDIAN LANGUAGES AND SCRIPTS Some of the features of Indian languages and the scripts used to express them are : PHONEME SET : Indian languages have a more sophisticated notion of a character unit or akshara that forms the fundamental linguistic unit. An akshara consists of 0, 1, 2, or 3 consonants and a vowel. Words are made up of one or more aksharas. Each akshara can be pronounced independently as the languages are completely phonetic. Aksharas with more than one consonants are called samyuktaksharas or combo-characters. The last of the consonants is the main one in a samyuktakshara. All Indian languages have essentially the same alphabet...
tracking img