Speech Compression Using Wavelets
JNTU College of Engineering,
Ch. Naresh Kumar,
JNTU College of Engineering,
Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. The wavelet transform of a signal decomposes the original signal into wavelets coefficients at different scales and positions. These coefficients represent the signal in the wavelet domain and all data operations can be performed using just the corresponding wavelet coefficients. The major issues concerning the design of this Wavelet based speech coder are choosing optimal wavelets for speech signals, decomposition level in the DWT, thresholding criteria for coefficient truncation and efficient encoding of truncated coefficients. The performance of the wavelet compression scheme on both male and female spoken sentences is compared. On a male spoken sentence the scheme reaches a signal-to-noise ratio of 17.45 db and a compression ratio of 3.88, using a level dependent thresholding approach.
Speech is a very basic way for humans to convey information to one another. With a bandwidth of only 4 kHz, speech can convey information with the emotion of a human voice. People want to be able to hear someone’s voice from anywhere in the world as if the person was in the same room .As a result a greater emphasis is being placed on the design of new and efficient speech coders for voice communication and transmission. Today applications of speech coding and compression have become very numerous. This paper looks at a new technique for analyzing and compressing speech signals using wavelets. Any signal can be represented by a set of scaled and translated versions of a basic function called the. mother wavelet. This set of wavelet functions forms the wavelet coefficients at different scales and positions and results from taking the wavelet transform of the original signal. Speech is a non-stationary random process due to the time varying nature of the human speech production system. Non-stationary signals are characterized by numerous transitory drifts, trends and abrupt changes. The localization feature of wavelets, along with its time-frequency resolution properties makes them well suited for coding speech signals.
2. WAVELETS Vs FOURIER ANALYSIS
A major draw back of Fourier analysis is that in transforming to the frequency domain, the time domain information is lost. The most important difference between these two kinds of transforms is that individual wavelet functions are localized in space. In contrast Fourier sine and cosine functions are non-local and are active for all time t.
3. DISCRETE WAVELET TRANSFORM
The Discrete Wavelet Transform (DWT) involves choosing scales and positions based on powers of two so called dyadic scales and positions. The mother wavelet is rescaled or. dilated, by powers of two and translated by integers. The numbers a(L, k) are known as the approximation coefficients at scale L, while d(j,k) are known as the detail coefficients at scale j. The approximation and detail coefficients can be expressed as:
3.1. VANISHING MOMENTS
The number of vanishing moments of a wavelet indicates the smoothness of the wavelet function as well as the flatness of the frequency response of the wavelet filters (filters used to compute the DWT) Typically a wavelet with p vanishing moments satisfies the following equation,
Wavelets with a high number of vanishing moments lead to a more compact signal representation and are hence useful in coding applications. However, in general, the length of the filters increases with the number of vanishing moments and the complexity of computing the DWT coefficients increases with...
References: . A. Chen, N. Shehad, A. Virani and E. Welsh, Discrete Wavelet Transform for
Audio Compression, (current July. 16, 2001).
. Speech Compression Using Wavelets by Nikhil Rao
. S.Haykin, Communication Systems, John Wiley & Sons, New York, 2001.
Please join StudyMode to read the full document