Preview

Speech Compression

Powerful Essays
Open Document
Open Document
9537 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Speech Compression
Information Sciences 173 (2005) 115–139 www.elsevier.com/locate/ins

Investigating spoken Arabic digits in speech recognition setting
Yousef Ajami Alotaibi
Computer Engineering Department, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11574, Saudi Arabia Received 3 October 2003; received in revised form 18 May 2004; accepted 14 July 2004

Abstract Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the 10 digits, zero through nine. Except for zero, all Arabic digits are polysyllabic words. In this paper Arabic digits were investigated from the speech recognition problem point of view. An artificial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition. The system is an isolated whole word speech recognizer and it was implemented as both a multi-speaker and speaker-independent modes. During the recognition process, noise was removed from digitized speech by means of band-pass filters, the signal was also pre-emphasized, and windowed and blocked by Hamming window. A time alignment algorithm was used to compensate for differences in utterance lengths and misalignments between phonemes. Frame features were extracted by using MFCC coefficients to reduce the amount of the information in the input signal. Finally the neural network classified the unknown digit. This recognition system achieved a 99.5% correct digit recognition in the multispeaker mode, and 94.5% in speaker-independent mode. This paper also investigated Arabic digits as ‘‘patterns on paper’’ by using spectrogram and waveform information to cross check and investigate digit recognition system results and to try to locate the causes of miss-recognized digits. All Arabic digits were described by showing their

E-mail address: yalotaibi@ccis.ksu.edu.sa 0020-0255/$ - see front matter Ó 2004 Elsevier



References: [1] M. Al-Zabibi, An Acoustic–Phonetic Approach in Automatic Arabic Speech Recognition, The British Library in Association with UMI, 1990. [2] A. Muhammad, Alaswaat Alaghawaiyah, Daar Alfalah, Jordan, 1990 (in Arabic). [3] J. Deller, J. Proakis, J.H. Hansen, Discrete-Time Processing of Speech Signal, Macmillan, NY, 1993. [4] M. Elshafei, Toward an arabic text-to-speech system, The Arabian Journal for Science and Engineering 16 (4B) (1991) 565–583. [5] Y.A. El-Imam, An unrestricted vocabulary arabic speech synthesis system, IEEE Transactions on Acoustic, Speech, and Signal Processing 37 (12) (1989) 1829–1845. [6] E. Hagos, Implementation of an Isolated Word Recognition System, Master thesis, University of Petroleum and Minerals, Dhahran, Saudi Arabia, 1985. [7] W. Abdulah, M. Abdul-Karim, Real-time spoken arabic recognizer, International Journal of Electronics 59 (5) (1984) 645–648. [8] A. Al-Otaibi, Speech Processing, The British Library in Association with UMI, 1988. [9] G. Pullum, W. Ladusaw, Phonetic Symbol Guide, The University of Chicago Press, 1996. [10] R. Lippmann, Review of Neural Networks for Speech Recognition, Neural Computation, MIT press, 1989, pp. 1–38. [11] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice-Hall, Englewood Cliffs, NJ, 1999. [12] T.H. Nong, J. Yunus, S.H. Salleh, Classification of Malay speech sounds based on place of articulation and voicing using neural networks, in: Proceedings of IEEE Electrical and Electronic Technology, TENCON, 2001, pp. 170–173. [13] S.A. Selouani, D. OÕShaughnessy, Hybrid architectures for complex phonetic features classification: a unified approach, in: International Symposium on Signal Processing and its Applications (ASSPA), Kuala Lumpur, Malaysia, August 2001, pp. 719–722. [14] M. Salam, D. Mohamad, S. Salleh, Neural Network speaker dependent isolated malay speech recognition system: handcrafted vs. genetic algorithm, in: International Symposium on Signal Processing and its Application (ISSPA), Kuala Lumpur, Malaysia, August 2001, pp. 731–734. [15] L. Rabiner, M. Samber, An algorithm for determining the endpoints of isolated utterances, The Bell System Technical Journal 54 (2) (1975) 297–315. [16] N. Nocerino, F. Soong, L. Rabiner, D. Klatt, Comparative study of several distortion measures for speech recognition, Speech Communication 4 (1985) 317–331. [17] S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustic, Speech, and Signal Processing ASSP-28 (4) (1980) 357–366. [18] Yousef A. Alotaibi, A simple and effective time-alignment algorithm for spoken arabic digits, unpublished. [19] P.C. Loizou, A.S. Spanias, High-performance alphabet recognition, IEEE Transactions on Speech and Audio Processing 4 (6) (1996) 430–445.

You May Also Find These Documents Helpful

  • Powerful Essays

    Nt1330 Unit 1 Assignment

    • 883 Words
    • 4 Pages

    “Arabic is a language of rich morphology and complex syntax” [Al-Sughaiyer and Al-Kharashi 2004]. It is classified into three main types: Classical Arabic; which is the language of Islam that used for over 1500 years. Modern Standard Arabic; which is one of the six official languages of United Nations, and most of Arabic NLP researches are focused on Colloquial Arabic; which is the spoken Arabic language. It is irregular and differs among countries and regions.…

    • 883 Words
    • 4 Pages
    Powerful Essays
  • Satisfactory Essays

    Automatic speech recognition is the most successful and accurate of these applications. It is currently making a use of a technique called "shadowing" or sometimes called "voicewriting." Rather than have the speaker's speech directly transcribed by the system, a hearing person…

    • 416 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Text to Speech Engine

    • 432 Words
    • 2 Pages

    The study process is initialized by going through different web sites and blogs in order to know about the Text-To-Speech methodology. We have tried to understand the purpose of voice synthesis. Whatever we have discovered from the Internet is described below.…

    • 432 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Automatic Sentence Generator

    • 3412 Words
    • 14 Pages

    Bibliography: [1] A. Bonafonte and J. Mariño, "Language Modeling using X-Grams", International Conference on Spoken Language Processing, ICSLP-96. [2] J. Deller, J. Proakis and J. Hansen, Discrete-Time Processing of Speech Signals. Macmillan Publishing Company.…

    • 3412 Words
    • 14 Pages
    Powerful Essays
  • Powerful Essays

    * Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese.…

    • 1668 Words
    • 7 Pages
    Powerful Essays
  • Good Essays

    Text to Speech

    • 781 Words
    • 4 Pages

    At present most speech synthesis systems use raw text as their input which is understandable from a human point of view but problematic for the machines since the process of converting text to speech is very complex; in this paper we discuss the need for having a specific SSML tag for each “mention” (1st occurrence, 2nd occurrence) of a proper noun in the text or paragraph. We discuss that when a proper noun appears first time in the text, then it is spoken more prominently than its second or third or subsequent occurrence. We highlight the need for incorporating a specific tag in SSML to take care of this mention-case. The SSML format is a compromise between human and machine needs. SSML is often embedded in Voice-XML scripts to drive interactive telephony systems. However, it also may be used alone, such as for creating audio books. The advantage that SSML brings is that the designers of such language generation systems need only understand the basic SSML language and do not need specialist speech synthesis knowledge. Introduction Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. SSML directs all Text Analysis steps, providing a standard way to control aspects of speech such as pronunciation, acronym expansion, volume, pitch, rate, range, duration, pause, emphasis, etc., across different synthesis-capable platforms. The intended use of SSML is to improve the quality of synthesized content. Different markup elements impact different stages of the synthesis process. The markup may be produced either automatically, for instance via XSLT or CSS3 from an XHTML document, or by human authoring. Markup may be present within a complete SSML document or as part of a fragment embedded in another language, although no interactions with other languages are specified as…

    • 781 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    The processing of recognizing and responding to the meaning embedded in spoken words is defined as speech recognition. Phonemes are series of corresponding sounds part of each letter of the alphabet. When a computer recieves input from speech recognition, it has to break down a word into the different phonemes to determine what word was being said. Likewise, if a whole sentence or phrase was said, the computer has to work to find the different starting and ending points of each phoneme, while also recognizing points of silence to indicate different words. Sound is captured in analog form and is then transformed into digital form by method of digital sampling, and the resulting digital pattern is compared with a library of patterns corresponding to known phonemes. There are…

    • 508 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Voice Recognition

    • 672 Words
    • 3 Pages

    artifact illustrates this by going through the basic functions that all voice recognition applications use to…

    • 672 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Celta: Assigment 2

    • 1612 Words
    • 7 Pages

    XXX’s first language is Tigre, however he is also proficient in Arabic, Tigrinya, Bilen and Amara. Tigre is a south Semitic language spoken by about 800.000 people in Eritrea. Muslim Tigre is written in the Arabic script, whereas Christian Tigre is written in roman script. XXX is a Muslim Tigre speaker and a lot of characteristics of the language are very similar to Arabic. The differences between Arabic and English language vary from the range of sounds used to the emphasis placed on the vowels and consonants. While English has 22 vowels and diphthongs to 24 consonants, Arabic only has eight vowels and diphthongs to 32 consonants. Arabic speakers tend to confuse English short vowel sounds and emphasise consonants since in Arabic the consonants, long vowels and diphthongs are those which give meaning to a word. They also tend to avoid elisions and short forms. The articulation of Arabic speakers is very energetic. They usually stress syllables and do not articulate vowels clearly, which gives their pronunciation a dull “jabber” effect. (1,3)…

    • 1612 Words
    • 7 Pages
    Powerful Essays
  • Good Essays

    Iraqi Arabic Dialect

    • 342 Words
    • 2 Pages

    According to a synchronic study of Metathesis in Eastern Arabic by Shadia Banjar, Metathesis exists in Arabic language which is a phonological process of transposition of sounds within a word and it involves redistribution of consonants, thus a change of the linear order of the word segments takes place. Also, Arabic dialects differ from Standard Arabic in that they have a reduced and restructured consonant system but more complexity in the vowel system. The rules of syllable structure and accentuation are also different and some inventory vowels in the actual pronunciation of words have been realized in spoken Arabic varieties.…

    • 342 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Phonics For Research Paper

    • 2148 Words
    • 9 Pages

    The first type of instruction used in the study was in phonemic awareness and the decoding method known as phonological awareness plus synthetic phonics, (PASP). This is a very intensive and explicit form of instruction. The other form of instruction, while systematic, was less explicit in phonemic decoding processes and was more practiced based instruction. It was referred to as embedded phonics, (EP). The students in the PASP group were provided materials and instruction from the Auditory Discrimination in Depth program developed by Lindamood. Those students in the EP program were taught to recognize whole words using sight words and other various techniques. The study showed that the EP group had significantly lower performance in not only phonological awareness, phonemic decoding accuracy and word reading accuracy but also in actual word reading skills.…

    • 2148 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    Speech Recognition

    • 2325 Words
    • 10 Pages

    [4]S.-I. Amari, Aapo Hyvarinen, Soo-Young Lee, Te-Won Lee and V. David Sanchez A. The Guest Editorial Team : “Blind signal separation and independent component analysis”.…

    • 2325 Words
    • 10 Pages
    Powerful Essays
  • Satisfactory Essays

    english

    • 281 Words
    • 2 Pages

    Firstly, the project requires to encode the input analog voice signal using a pulse code modulation (PCM) encoder chip TP3054 (or TP3057). Subsequently, generate the training sequence and transmit the digital voice signal using a laser link. The photodiode is used on the receiver side to detect the transmitted signals and frame marker. And then the received signal is decoded by the PCM chip to get back an analog signal, which in turn is replayed by the speaker. And bit synchronization is used for clock recovery.…

    • 281 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    Reading Report

    • 2186 Words
    • 9 Pages

    Students must learn to recognize phonemes and their corresponding letter combinations to the point where it becomes automatic. This requires a lot of drill and practice in two directions: auditory to visual, and visual to auditory. In auditory drills, students might listen to the phoneme and write down the corresponding letter. In visual drills, they might practice recognizing letters and sounding them out.…

    • 2186 Words
    • 9 Pages
    Better Essays
  • Powerful Essays

    finger print

    • 1840 Words
    • 8 Pages

    Over the last 25 years artificial neural networks have found its way into various applications ranging from character recognition, pattern recognition, handwriting recognition and so many others. Artificial neural networks are models inspired by the animal central nervous system which includes the brain and that of many other organisms. Frequently neural networks is used in a broad sense which group together different families of algorithms and methods.…

    • 1840 Words
    • 8 Pages
    Powerful Essays