Speech Compression

Information Sciences 173 (2005) 115–139 www.elsevier.com/locate/ins

Investigating spoken Arabic digits in speech recognition setting
Yousef Ajami Alotaibi
Computer Engineering Department, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11574, Saudi Arabia Received 3 October 2003; received in revised form 18 May 2004; accepted 14 July 2004

Abstract Arabic language is a Semitic language that has many diﬀerences when compared to European languages such as English. One of these diﬀerences is how to pronounce the 10 digits, zero through nine. Except for zero, all Arabic digits are polysyllabic words. In this paper Arabic digits were investigated from the speech recognition problem point of view. An artiﬁcial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition. The system is an isolated whole word speech recognizer and it was implemented as both a multi-speaker and speaker-independent modes. During the recognition process, noise was removed from digitized speech by means of band-pass ﬁlters, the signal was also pre-emphasized, and windowed and blocked by Hamming window. A time alignment algorithm was used to compensate for diﬀerences in utterance lengths and misalignments between phonemes. Frame features were extracted by using MFCC coeﬃcients to reduce the amount of the information in the input signal. Finally the neural network classiﬁed the unknown digit. This recognition system achieved a 99.5% correct digit recognition in the multispeaker mode, and 94.5% in speaker-independent mode. This paper also investigated Arabic digits as ‘‘patterns on paper’’ by using spectrogram and waveform information to cross check and investigate digit recognition system results and to try to locate the causes of miss-recognized digits. All Arabic digits were described by showing their

E-mail address: yalotaibi@ccis.ksu.edu.sa 0020-0255/$ - see front matter Ó 2004 Elsevier

References: [1] M. Al-Zabibi, An Acoustic–Phonetic Approach in Automatic Arabic Speech Recognition, The British Library in Association with UMI, 1990. [2] A. Muhammad, Alaswaat Alaghawaiyah, Daar Alfalah, Jordan, 1990 (in Arabic). [3] J. Deller, J. Proakis, J.H. Hansen, Discrete-Time Processing of Speech Signal, Macmillan, NY, 1993. [4] M. Elshafei, Toward an arabic text-to-speech system, The Arabian Journal for Science and Engineering 16 (4B) (1991) 565–583. [5] Y.A. El-Imam, An unrestricted vocabulary arabic speech synthesis system, IEEE Transactions on Acoustic, Speech, and Signal Processing 37 (12) (1989) 1829–1845. [6] E. Hagos, Implementation of an Isolated Word Recognition System, Master thesis, University of Petroleum and Minerals, Dhahran, Saudi Arabia, 1985. [7] W. Abdulah, M. Abdul-Karim, Real-time spoken arabic recognizer, International Journal of Electronics 59 (5) (1984) 645–648. [8] A. Al-Otaibi, Speech Processing, The British Library in Association with UMI, 1988. [9] G. Pullum, W. Ladusaw, Phonetic Symbol Guide, The University of Chicago Press, 1996. [10] R. Lippmann, Review of Neural Networks for Speech Recognition, Neural Computation, MIT press, 1989, pp. 1–38. [11] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice-Hall, Englewood Cliﬀs, NJ, 1999. [12] T.H. Nong, J. Yunus, S.H. Salleh, Classiﬁcation of Malay speech sounds based on place of articulation and voicing using neural networks, in: Proceedings of IEEE Electrical and Electronic Technology, TENCON, 2001, pp. 170–173. [13] S.A. Selouani, D. OÕShaughnessy, Hybrid architectures for complex phonetic features classiﬁcation: a uniﬁed approach, in: International Symposium on Signal Processing and its Applications (ASSPA), Kuala Lumpur, Malaysia, August 2001, pp. 719–722. [14] M. Salam, D. Mohamad, S. Salleh, Neural Network speaker dependent isolated malay speech recognition system: handcrafted vs. genetic algorithm, in: International Symposium on Signal Processing and its Application (ISSPA), Kuala Lumpur, Malaysia, August 2001, pp. 731–734. [15] L. Rabiner, M. Samber, An algorithm for determining the endpoints of isolated utterances, The Bell System Technical Journal 54 (2) (1975) 297–315. [16] N. Nocerino, F. Soong, L. Rabiner, D. Klatt, Comparative study of several distortion measures for speech recognition, Speech Communication 4 (1985) 317–331. [17] S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustic, Speech, and Signal Processing ASSP-28 (4) (1980) 357–366. [18] Yousef A. Alotaibi, A simple and eﬀective time-alignment algorithm for spoken arabic digits, unpublished. [19] P.C. Loizou, A.S. Spanias, High-performance alphabet recognition, IEEE Transactions on Speech and Audio Processing 4 (6) (1996) 430–445.

Speech Compression

You May Also Find These Documents Helpful

Nt1330 Unit 1 Assignment

Nt1330 Unit 1 Assignment

Telecommunication Devices for the Deaf

Telecommunication Devices for the Deaf

Text to Speech Engine

Text to Speech Engine

Automatic Sentence Generator

Automatic Sentence Generator

Common Commands in Speech Recognition

Common Commands in Speech Recognition

Text to Speech

Text to Speech

Phonemes In Spoken Language

Phonemes In Spoken Language

Voice Recognition

Voice Recognition

Celta: Assigment 2

Celta: Assigment 2

Iraqi Arabic Dialect

Iraqi Arabic Dialect

Phonics For Research Paper

Phonics For Research Paper

Speech Recognition

Speech Recognition

english

english

Reading Report

Reading Report

finger print

finger print

Related Topics