Basics of Audio Compression
Advances in digital
audio technology are fueled by two sources: hardware developments and new
signal processing techniques. When processors dissipated tens of watts of power
and memory densities were on the order of kilobits per square inch, portable
playback devices like an MP3 player were not possible. Now, however, power
dissipation, memory densities, and processor speeds have improved by several
orders of magnitude.
Audio Compression vs. Speech Compression
This paper focuses on
audio compression techniques, which differ from those used in speech
compression. Speech compression uses a model of the human vocal tract to
express particular signal in a compressed format. This technique is not usually
applied in the field of audio compression due to the vast array of sounds that
can be generated - models that represent audio generation would be too complex
to implement. So instead of modeling the source of sounds, modern audio
compression models the receiver, i.e., the human ear.
Spectral Analysis
Of the three masking
phenomena explained above, two are best described in the frequency domain.
Thus, a frequency domain representation, also called the "spectrum"
of a signal, is a useful tool for analyzing the signal's frequency
characteristics and determining thresholds. There are several different
techniques for converting a finite time sequence into its spectral
representation, and these typically fall into one of two categories: transforms
and filter banks. Transforms calculate the spectrum of their inputs in terms of
a set of basis sequences; e.g., the Fourier Transform uses basic sequences that
are complex exponentials. Filter banks apply several different band pass
filters to the input. Typically the result is several time sequences, each of
which corresponds to a particular frequency band. Taking the spectrum of a
signal has two purposes.
Frequency Maskins
Even if a signal
component exceeds the hearing threshold, it may still be masked by louder
components that are near it in frequency. This phenomenon is known as frequency
masking or simultaneous masking. Each component in a signal can cast a
"shadow" over neighboring components. If the neighboring components
are covered by this shadow, they will not be heard. The effective result is
that one component, the masker, shifts the hearing threshold. Figure 4 shows a
situation in which this occurs.
MP3 Decoding
The great bulk of the
work in the MP3 system as a whole is placed on the encoding process. Since one
typically plays files more frequently than one encodes them, this makes sense.
Decoders do not need to store or work with a model of human psychoacoustic
principles, nor do they require a bit allocation procedure. All the MP3 player
has to worry about is examining the bitstream of header and data frames for
spectral components and the side information stored alongside them, and then
reconstructing this information to create an audio signal.
Future of Digital Music
Today's music
technologies have turned passive listeners into active participants that can
capture, record, transform, edit, and save their music in a variety of digital
formats. An emerging technology that can significantly reduce the size of
digital music files while maintaining their original sound quality is mp3PRO. A
coding scheme for compressing audio signals, MPEG reduces the size of audio
files using three coding schemes or layers. The third layer, commonly known as
MP3, uses audio coding and psychoacoustic compression to remove the information
or sounds that can't be perceived by the human ear.
Conclusion
By eliminating audio
information that the human ear cannot detect, modern audio coding standards are
able to compress a typical 1.4 Mbps signal by a factor of about twelve. This is
done by employing several different methodologies, including noise allocation
techniques based on psychoacoustic models. Future goals for the field of audio
compression are quite broad.
No comments:
Post a Comment