Jeremy Bradbury December 5, 2000
I. II. Proposal Introduction A. Speech Coding B. Voice Coders C. LPC Overview III. Historical Perspective of Linear Predictive Coding A. B. C. IV. V. VI. History of Speech & Audio Compression History of Speech Synthesis Analysis/Synthesis Techniques
Human Speech Production LPC Model LPC Analysis/Encoding A. B. C. D. E. Input speech Voice/Unvoiced Determination Pitch Period Estimation Vocal Tract Filter Transmitting the Parameters
LPC Synthesis/Decoding LPC Applications A. B. C. D. Telephone Systems Text-to-Speech Synthesis Voice Mail Systems Multimedia
Linear predictive coding(LPC) is defined as a digital method for encoding an analog signal in which a particular value is predicted by a linear function of the past values of the signal. It was first proposed as a method for encoding human speech by the United States Department of Defence in federal standard 1015, published in 1984. Human speech is produced in the vocal tract which can be approximated as a variable diameter tube. The linear predictive coding (LPC) model is based on a mathematical approximation of the vocal tract represented by this tube of a varying diameter. At a particular time, t, the speech sample s(t) is represented as a linear sum of the p previous samples. The most important aspect of LPC is the linear predictive filter which allows the value of the next sample to be determined by a linear combination of previous samples. Under normal circumstances, speech is sampled at 8000 samples/second with 8 bits used to represent each sample. This provides a rate of 64000 bits/second. Linear predictive coding reduces this to 2400 bits/second. At this reduced rate the speech has a distinctive synthetic sound and there is a noticeable loss of quality. However, the speech is still audible and it can still be easily understood. Since there is information loss in linear predictive coding, it is a lossy form of compression. I will describe the necessary background needed to understand how the vocal tract produces speech. I will also explain how linear predictive coding mathematically approximates the parameters of the vocal tract to reduce a speech signal to a state that is noticeably synthetic but still understandable. I will conclude by discussing other speech encoding schemes that have been based on LPC and by discussing possible disadvantages and applications of the LPC model.
There exist many different types of speech compression that make use of a variety of different techniques. However, most methods of speech compression exploit the fact that speech production occurs through slow anatomical movements and that the speech produced has a limited frequency range. The frequency of human speech production ranges from around 300 Hz to 3400 Hz. Speech compression is often referred to as speech coding which is defined as a method for reducing the amount of information needed to represent a speech signal. Most forms of speech coding are usually based on a lossy algorithm. Lossy algorithms are considered acceptable when encoding speech because the loss of quality is often undetectable to the human ear. There are many other characteristics about speech production that can be exploited by speech coding algorithms. One fact that is often used is that period of silence take up greater than 50% of conversations. An easy way to save bandwidth and reduce the amount of information needed to represent the speech signal is to not transmit the silence. Another fact about speech production that can be taken advantage of is that mechanically there is a high correlation between adjacent samples of speech. Most forms of speech compression are achieved by modelling the process of speech production as a linear digital filter. The digital filter and its slow changing parameters are usually encoded to achieve...