Speech Compression

Topics: International Phonetic Alphabet, Arabic alphabet, Consonant Pages: 27 (9537 words) Published: April 28, 2013
Information Sciences 173 (2005) 115–139 www.elsevier.com/locate/ins

Investigating spoken Arabic digits in speech recognition setting Yousef Ajami Alotaibi
Computer Engineering Department, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11574, Saudi Arabia Received 3 October 2003; received in revised form 18 May 2004; accepted 14 July 2004

Abstract Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the 10 digits, zero through nine. Except for zero, all Arabic digits are polysyllabic words. In this paper Arabic digits were investigated from the speech recognition problem point of view. An artificial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition. The system is an isolated whole word speech recognizer and it was implemented as both a multi-speaker and speaker-independent modes. During the recognition process, noise was removed from digitized speech by means of band-pass filters, the signal was also pre-emphasized, and windowed and blocked by Hamming window. A time alignment algorithm was used to compensate for differences in utterance lengths and misalignments between phonemes. Frame features were extracted by using MFCC coefficients to reduce the amount of the information in the input signal. Finally the neural network classified the unknown digit. This recognition system achieved a 99.5% correct digit recognition in the multispeaker mode, and 94.5% in speaker-independent mode. This paper also investigated Arabic digits as ‘‘patterns on paper’’ by using spectrogram and waveform information to cross check and investigate digit recognition system results and to try to locate the causes of miss-recognized digits. All Arabic digits were described by showing their

E-mail address: yalotaibi@ccis.ksu.edu.sa 0020-0255/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2004.07.008


Y.A. Alotaibi / Information Sciences 173 (2005) 115–139

constructing phonemes and syllables. Comparisons of all possible pairs of digits were also investigated and comments were stated with links to digit recognition system output. An understanding of the causes of automatic digit recognition system errors may help in building digit recognition systems that are simple, cheap, and fast. Ó 2004 Elsevier Inc. All rights reserved. Keywords: Neural network; Arabic digits; Speech; Recognition; Spectrogram

1. Introduction 1.1. Arabic language Arabic is a Semitic language, and it is one of the oldest languages in the world today. It is the fifth widely used language nowadays. Arabic is the first language in the world today and Arabic alphabets are used in several languages, such as Persian and Urdu [1]. Standard Arabic has 34 basic phonemes, of which six are vowels, and 28 are consonants [2]. A phoneme is the smallest element of speech that indicates a difference in meaning, word, or sentence. Arabic has fewer vowels than English. It has three long and three short vowels, while American English has at least 12 vowels [3]. Arabic phonemes contain two distinctive classes, which are named pharyngeal and emphatic phonemes. These two classes can be found only in Semitic languages like Hebrew [2,4]. The allowed syllables in Arabic language are: CV, CVC, and CVCC where V indicates a (long or short) vowel while C indicates a consonant. Arabic utterances can only start with a consonant [2]. All Arabic syllables must contain at least one vowel. Also Arabic vowels cannot be initials and can occur either between two consonants or final in a word. Arabic syllables can be classified as short or long. CV type is a short one while all others are long. Syllables can also be classified as open or closed. An open syllable ends with a vowel, while a closed syllable ends with a consonant. For Arabic, a vowel always forms a...

References: [1] M. Al-Zabibi, An Acoustic–Phonetic Approach in Automatic Arabic Speech Recognition, The British Library in Association with UMI, 1990. [2] A. Muhammad, Alaswaat Alaghawaiyah, Daar Alfalah, Jordan, 1990 (in Arabic). [3] J. Deller, J. Proakis, J.H. Hansen, Discrete-Time Processing of Speech Signal, Macmillan, NY, 1993. [4] M. Elshafei, Toward an arabic text-to-speech system, The Arabian Journal for Science and Engineering 16 (4B) (1991) 565–583. [5] Y.A. El-Imam, An unrestricted vocabulary arabic speech synthesis system, IEEE Transactions on Acoustic, Speech, and Signal Processing 37 (12) (1989) 1829–1845. [6] E. Hagos, Implementation of an Isolated Word Recognition System, Master thesis, University of Petroleum and Minerals, Dhahran, Saudi Arabia, 1985. [7] W. Abdulah, M. Abdul-Karim, Real-time spoken arabic recognizer, International Journal of Electronics 59 (5) (1984) 645–648. [8] A. Al-Otaibi, Speech Processing, The British Library in Association with UMI, 1988. [9] G. Pullum, W. Ladusaw, Phonetic Symbol Guide, The University of Chicago Press, 1996. [10] R. Lippmann, Review of Neural Networks for Speech Recognition, Neural Computation, MIT press, 1989, pp. 1–38. [11] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice-Hall, Englewood Cliffs, NJ, 1999. [12] T.H. Nong, J. Yunus, S.H. Salleh, Classification of Malay speech sounds based on place of articulation and voicing using neural networks, in: Proceedings of IEEE Electrical and Electronic Technology, TENCON, 2001, pp. 170–173. [13] S.A. Selouani, D. OÕShaughnessy, Hybrid architectures for complex phonetic features classification: a unified approach, in: International Symposium on Signal Processing and its Applications (ASSPA), Kuala Lumpur, Malaysia, August 2001, pp. 719–722. [14] M. Salam, D. Mohamad, S. Salleh, Neural Network speaker dependent isolated malay speech recognition system: handcrafted vs. genetic algorithm, in: International Symposium on Signal Processing and its Application (ISSPA), Kuala Lumpur, Malaysia, August 2001, pp. 731–734. [15] L. Rabiner, M. Samber, An algorithm for determining the endpoints of isolated utterances, The Bell System Technical Journal 54 (2) (1975) 297–315. [16] N. Nocerino, F. Soong, L. Rabiner, D. Klatt, Comparative study of several distortion measures for speech recognition, Speech Communication 4 (1985) 317–331. [17] S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustic, Speech, and Signal Processing ASSP-28 (4) (1980) 357–366. [18] Yousef A. Alotaibi, A simple and effective time-alignment algorithm for spoken arabic digits, unpublished. [19] P.C. Loizou, A.S. Spanias, High-performance alphabet recognition, IEEE Transactions on Speech and Audio Processing 4 (6) (1996) 430–445.
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Essay on Speech Compression Using Wavelets
  • Speech Essay
  • Speech Essay
  • Speech Essay
  • Essay about Inspirational Speech for school elections
  • Informative Speech Tips Essay
  • DBT 5 Persuasive speech Essay
  • speech work shop Essay

Become a StudyMode Member

Sign Up - It's Free