Psychoacoustics


Basics of Audio Compression

Advances in digital audio technology are fueled by two sources: hardware developments and new signal processing techniques. When processors dissipated tens of watts of power and memory densities were on the order of kilobits per square inch, portable playback devices like an MP3 player were not possible. Now, however, power dissipation, memory densities, and processor speeds have improved by several orders of magnitude.

Audio Compression vs. Speech Compression

This paper focuses on audio compression techniques, which differ from those used in speech compression. Speech compression uses a model of the human vocal tract to express particular signal in a compressed format. This technique is not usually applied in the field of audio compression due to the vast array of sounds that can be generated - models that represent audio generation would be too complex to implement. So instead of modeling the source of sounds, modern audio compression models the receiver, i.e., the human ear.


Spectral Analysis

Of the three masking phenomena explained above, two are best described in the frequency domain. Thus, a frequency domain representation, also called the "spectrum" of a signal, is a useful tool for analyzing the signal's frequency characteristics and determining thresholds. There are several different techniques for converting a finite time sequence into its spectral representation, and these typically fall into one of two categories: transforms and filter banks. Transforms calculate the spectrum of their inputs in terms of a set of basis sequences; e.g., the Fourier Transform uses basic sequences that are complex exponentials. Filter banks apply several different band pass filters to the input. Typically the result is several time sequences, each of which corresponds to a particular frequency band. Taking the spectrum of a signal has two purposes.

Frequency Maskins

Even if a signal component exceeds the hearing threshold, it may still be masked by louder components that are near it in frequency. This phenomenon is known as frequency masking or simultaneous masking. Each component in a signal can cast a "shadow" over neighboring components. If the neighboring components are covered by this shadow, they will not be heard. The effective result is that one component, the masker, shifts the hearing threshold. Figure 4 shows a situation in which this occurs.

MP3 Decoding

The great bulk of the work in the MP3 system as a whole is placed on the encoding process. Since one typically plays files more frequently than one encodes them, this makes sense. Decoders do not need to store or work with a model of human psychoacoustic principles, nor do they require a bit allocation procedure. All the MP3 player has to worry about is examining the bitstream of header and data frames for spectral components and the side information stored alongside them, and then reconstructing this information to create an audio signal.

Future of Digital Music

Today's music technologies have turned passive listeners into active participants that can capture, record, transform, edit, and save their music in a variety of digital formats. An emerging technology that can significantly reduce the size of digital music files while maintaining their original sound quality is mp3PRO. A coding scheme for compressing audio signals, MPEG reduces the size of audio files using three coding schemes or layers. The third layer, commonly known as MP3, uses audio coding and psychoacoustic compression to remove the information or sounds that can't be perceived by the human ear.

Conclusion

By eliminating audio information that the human ear cannot detect, modern audio coding standards are able to compress a typical 1.4 Mbps signal by a factor of about twelve. This is done by employing several different methodologies, including noise allocation techniques based on psychoacoustic models. Future goals for the field of audio compression are quite broad.


No comments:

Post a Comment