i.Table of Contents2
ii.Scope of the document3
v.Existing and emerging codecsix.4
vi.Popularity of mpeg6
vii.Basic MP3 Encoder 6
1.SCOPE OF THE DOCUMENT
This document provides a high level overview of audio coding with a brief description of MPEG 1\2 Layer 3(MP3) codec.
The basic task of an audio coding/compression system is to compress the digital audio data in a way that the compression is as efficient as possible, i.e. the compressed file is as small as possible and the decoded audio sounds exactly (or as close as possible) to the original audio before compression. Looking at the larger picture the important issues are compression ratio / audio bandwidth / artifacts tradeoffs.
Other requirements for audio compression techniques include low complexity (to enable software decoders or in-expensive hardware decoders with low power consumption) and flexibility for different application scenarios. The technique to do this is called perceptual encoding and uses knowledge from psychoacoustics to reach the target of efficient but inaudible compression. Perceptual encoding is a lossy compression technique, i.e. the decoded file is not a bit-exact replica of the original digital audio data. Perceptual coders for high quality audio coding have been a research topic since the late 70’s, with most activity occurring since about 1986.
Fig.1The combined result of frequency and time masking. Signals under the curve are inaudible. Most of the modern codecs rely upon the celebrated acoustic masking principle – an amazing property of the human ear/brain aural perception system. When audio is present at a particular frequency, you cannot hear audio at nearby frequencies that are sufficiently low in volume. The inaudible components are masked owing to properties of the human ear that occur at a very low ‘hardware’ level – researchers say the information is dropped immediately within the ear and is not passed to the brain. This appears to be a kind of natural rate reduction that helps to keep the brain from being overloaded with unnecessary information. There is a similar effect working in the time domain, with signals coming soon after the removal of another being also inaudible.
So putting it in simple terms the information which is not necessary to convey /perceive particular audio information by the human ear is simply discarded/removed, just before the encoding. The prediction of this threshold is done by well established psychoacoustic models.
Fig.2Block diagram of a basic perceptual encoding/decoding system
In fig.2 the basic building blocks of the encoding/decoding system are
oFilter Bank: A filter bank is used to sample/decompose the input signal into subsampled spectral components (time/frequency domain).Together with the corresponding filter bank in the decoder it forms an analysis / synthesis system.
oPerceptual model: Using either the time domain input signal and/or the output of the analysis filter bank, an estimate of the actual (time and frequency dependent)masking threshold is computed using rules known from psychoacoustics. This is called the perceptual model of the perceptual encoding system.
oQuantization and coding: The spectral components are quantized and coded with the aim of keeping the noise, which is introduced by quantizing, below the masking threshold. Depending on the algorithm, this step is done in very different ways, from simple block companding to analysis-by-synthesis systems using additional noiseless compression.
oEncoding of bitstream:A bitstream formatter is used to assemble the bitstream, which typically consists of the quantized and coded spectral coefficients and some side information, e.g. bit allocation information.