Investigating spoken Arabic digits in speech recognition setting Yousef Ajami Alotaibi
Computer Engineering Department, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11574, Saudi Arabia Received 3 October 2003; received in revised form 18 May 2004; accepted 14 July 2004
Abstract Arabic language is a Semitic language that has many diﬀerences when compared to European languages such as English. One of these diﬀerences is how to pronounce the 10 digits, zero through nine. Except for zero, all Arabic digits are polysyllabic words. In this paper Arabic digits were investigated from the speech recognition problem point of view. An artiﬁcial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition. The system is an isolated whole word speech recognizer and it was implemented as both a multi-speaker and speaker-independent modes. During the recognition process, noise was removed from digitized speech by means of band-pass ﬁlters, the signal was also pre-emphasized, and windowed and blocked by Hamming window. A time alignment algorithm was used to compensate for diﬀerences in utterance lengths and misalignments between phonemes. Frame features were extracted by using MFCC coeﬃcients to reduce the amount of the information in the input signal. Finally the neural network classiﬁed the unknown digit. This recognition system achieved a 99.5% correct digit recognition in the multispeaker mode, and 94.5% in speaker-independent mode. This paper also investigated Arabic digits as ‘‘patterns on paper’’ by using spectrogram and waveform information to cross check and investigate digit recognition system results and to try to locate the causes of miss-recognized digits. All Arabic digits were described by showing their
E-mail address: firstname.lastname@example.org 0020-0255/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2004.07.008
Y.A. Alotaibi / Information Sciences 173 (2005) 115–139
constructing phonemes and syllables. Comparisons of all possible pairs of digits were also investigated and comments were stated with links to digit recognition system output. An understanding of the causes of automatic digit recognition system errors may help in building digit recognition systems that are simple, cheap, and fast. Ó 2004 Elsevier Inc. All rights reserved. Keywords: Neural network; Arabic digits; Speech; Recognition; Spectrogram
1. Introduction 1.1. Arabic language Arabic is a Semitic language, and it is one of the oldest languages in the world today. It is the ﬁfth widely used language nowadays. Arabic is the ﬁrst language in the world today and Arabic alphabets are used in several languages, such as Persian and Urdu . Standard Arabic has 34 basic phonemes, of which six are vowels, and 28 are consonants . A phoneme is the smallest element of speech that indicates a diﬀerence in meaning, word, or sentence. Arabic has fewer vowels than English. It has three long and three short vowels, while American English has at least 12 vowels . Arabic phonemes contain two distinctive classes, which are named pharyngeal and emphatic phonemes. These two classes can be found only in Semitic languages like Hebrew [2,4]. The allowed syllables in Arabic language are: CV, CVC, and CVCC where V indicates a (long or short) vowel while C indicates a consonant. Arabic utterances can only start with a consonant . All Arabic syllables must contain at least one vowel. Also Arabic vowels cannot be initials and can occur either between two consonants or ﬁnal in a word. Arabic syllables can be classiﬁed as short or long. CV type is a short one while all others are long. Syllables can also be classiﬁed as open or closed. An open syllable ends with a vowel, while a closed syllable ends with a consonant. For Arabic, a vowel always forms a...