MP3

MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a popular digital audio encoding and lossy compression format invented and standardized in 1991 by a team of engineers directed by the Fraunhofer Society in Erlangen, Germany. It was designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. In popular usage, MP3 also refers to files of sound or music recordings stored in the MP3 format on computers.

Overview

MP3 is a compression format. It provides a representation of pulse-code modulation-encoded (PCM) audio data in a much smaller size by discarding portions that are considered less important to human hearing (similar to JPEG, a lossy compression for images).

A number of techniques are employed in MP3 to determine which portions of the audio can be discarded, including psychoacoustics. MP3 audio can be compressed with different bit rates, providing a range of tradeoffs between data size and sound quality.

The MP3 format uses, a hybrid transformation to transform a time domain signal into a frequency domain signal:

32-band polyphase quadrature filter
36 or 12 tap MDCT; size can be selected independent for sub-band 0...1 and 2...31
Aliasing reduction postprocessing

MP3 Surround, a version of the format supporting 5.1 channels for surround sound, was introduced in December 2004. MP3 Surround is backward compatible with standard stereo MP3, and file sizes are similar. In terms of the MPEG specifications, AAC (Advanced audio coding) from MPEG-4 is to be the successor of the MP3 format, although there has been a significant movement to create and popularize other audio formats. Nevertheless, any succession is not likely to happen for a significant amount of time due to MP3's overwhelming popularity (MP3 enjoys extremely wide popularity and support, not just by end-users and software but by hardware such as DVD and CD players).

History

In October 1993, MP2 (MPEG-1 Audio Layer 2) files appeared on the Internet and were often played back using the Xing MPEG Audio Player, and later in a program for Unix by Tobias Bading called MAPlay, which was initially released on February 22nd, 1994 (MAPlay was also ported to Microsoft Windows).

Initially the only encoder available for MP2 production was the Xing Encoder, accompanied by the program CDDA2WAV, a CD ripper that transformed CD audio tracks to computer data files.

The Internet Underground Music Archive (IUMA) is generally recognized as the start of the on-line music revolution. IUMA was the Internet's first high-fidelity music web site, hosting thousands of authorized MP2 recordings before MP3 or the web was popularized.

In the first half of 1995 through the late 1990s, MP3 files began flourishing on the Internet. MP3 popularity was mostly due to, and interchangeable with, the successes of companies and software packages like Nullsoft's Winamp (released in 1997), mpg123, and Napster (released in 1999). Those programs made it very easy for the average user to playback, create, share, and collect MP3s.

Controversies regarding peer-to-peer file sharing of MP3 files have flourished in recent years — largely because high compression enables sharing of files that would otherwise be too large and cumbersome to share. Due to the vastly increased spread of MP3s through the Internet some major record labels reacted by filing a lawsuit against Napster to protect their Copyrights (see also intellectual property).

Commercial online music distribution services (like the iTunes Music Store) usually prefer other/proprietary music file formats that support Digital Rights Management (DRM) to control and restrict the use of digital music. The use of formats that supports DRM is in an attempt to prevent piracy of copyright protected materials, but any computer savvy person can easily rip the DRM from a song file turning it into a file that is not locked to any computer.

Quality of MP3 audio

Because MP3 is a lossy format, it is able to provide a number of different options for its "bit rate"—that is, the number of bits of encoded data that are used to represent each second of audio. Typically rates chosen are between 128 and 320 kilobit per second. By contrast, uncompressed audio as stored on a compact disc has a bit rate of 1411.2 kbit/s (16 bits/sample × 44100 samples/second × 2 channels).

MP3 files encoded with a lower bit rate will generally play back at a lower quality. With too low a bit rate, "compression artifacts" (i.e., sounds that were not present in the original recording) may appear in the reproduction. A good demonstration of compression artifacts is provided by the sound of applause: it is hard to compress because of its randomness and sharp attacks, therefore the failings of the encoder are more obvious, and are audible as ringing or pre-echo.

As well as the bit rate of the encoded file, the quality of MP3 files depend on the quality of the encoder and the difficulty of the signal being encoded. For average signals with good encoders, some listeners accept the MP3 bit rate of 128 kbit/s and the CD sampling rate of 44.1 kHz as near enough to compact disc quality for them, providing a compression ratio of approximately 11:1. MP3s properly compressed at this ratio can achieve sound quality superior to that of FM radio and cassette tape[citation needed], primarily due to the limited bandwidth, SNR, and other limitations of these analog media. However, listening tests show that with a bit of practice many listeners can reliably distinguish 128 kbit/s MP3s from CD originals[citation needed]; in many cases reaching the point where they consider the MP3 audio to be of unacceptably low quality. Yet other listeners, and the same listeners in other environments (such as in a noisy moving vehicle or at a party) will consider the quality acceptable. Obviously, imperfections in an MP3 encode will be much less apparent on low-end computer speakers than on a good stereo system connected to a computer or -- especially -- using high-quality headphones.

Fraunhofer Gesellschaft (FhG) publish on their official webpage the following compression ratios and data rates for MPEG-1 Layer 1, 2 and 3, intended for comparison:

Layer 1: 384 kbit/s, compression 4:1
Layer 2: 192...256 kbit/s, compression 8:1...6:1
Layer 3: 112...128 kbit/s, compression 12:1...10:1

The differences between the layers are caused by the different psychoacoustic models used by them; the Layer 1 algorithm is typically substantially simpler, therefore a higher bit rate is needed for transparent encoding. However, as different encoders use different models, it is difficult to draw absolute comparisons of this kind.

Many people consider these quoted rates as being heavily skewed in favour of Layer 2 and Layer 3 recordings. They would contend that more realistic rates would be as follows:

Layer 1: excellent at 384 kbit/s
Layer 2: excellent at 256...384 kbit/s, very good at 224...256 kbit/s, good at 192...224 kbit/s
Layer 3: excellent at 224...320 kbit/s, very good at 192...224 kbit/s, good at 128...192 kbit/s

When comparing compression schemes, it is important to use encoders that are of equivalent quality. Tests may be biased against older formats in favour of new ones by using older encoders based on out-of-date technologies, or even buggy encoders for the old format. Due to the fact that their lossy encoding loses information, MP3 algorithms work hard to ensure that the parts lost cannot be detected by human listeners by modeling the general characteristics of human hearing (e.g., due to noise masking). Different encoders may achieve this with varying degrees of success.

A few possible encoders:

LAME first created by Mike Cheng in early 1998. It is (in contrast to others) a fully LGPL'd MP3 encoder, with excellent speed and quality, rivaling even MP3's technological successors.
Fraunhofer Gesellschaft: Some encoders are good, some have bugs.

Many early encoders that are no longer widely used:

ISO dist10 reference code
Xing
BladeEnc
ACM Producer Pro.

Good encoders produce acceptable quality at 128 to 160 Kibit/s and near-transparency at 160 to 192 kbit/s, while low quality encoders may never reach transparency, not even at 320 kbit/s. It is therefore misleading to speak of 128 kbit/s or 192 kbit/s quality, except in the context of a particular encoder or of the best available encoders. A 128 kbit/s MP3 produced by a good encoder might sound better than a 192 kbit/s MP3 file produced by a bad encoder.

It is important to note that quality of an audio signal is subjective. A given bit rate suffices for some listeners but not for others. Individual acoustic perception may vary, so it is not evident that a certain psychoacoustic model can give satisfactory results for everyone. Merely changing the conditions of listening, such as the audio playing system or environment, can expose unwanted distortions caused by lossy compression. The numbers given above are rough guidelines that work for many people, but in the field of lossy audio compression the only true measure of the quality of a compression process is to listen to the results.

If your aim is to archive sound files with no loss of quality (or work on the sound files in a studio for example), then you should use Lossless compression algorithms, currently capable of compressing 16-bit PCM audio to 38% while leaving the audio identical to the original, such as Lossless Audio LA, Apple Lossless, TTA, FLAC, Windows Media Audio 9 Lossless (wma) and Monkey's Audio (among others). Lossless formats are strongly preferred for material that will be edited, mixed, or otherwise processed because the perceptual assumptions made by lossy encoders may not hold true after processing. The losses produced by multiple stages of coding may also compound each other, becoming more evident when the signal is reencoded after processing. Lossless formats produce the best possible result, at the expense of a lower compression ratio.

Some simple editing operations, such as cutting sections of audio, may be performed directly on the encoded MP3 data without necessitating reencoding. For these operations, the concerns mentioned above are not necessarily relevant, as long as appropriate software (such as mp3DirectCut and MP3Gain) is used to prevent extra decoding-encoding steps.

Bit rate

The bit rate is variable for MP3 files. The general rule is that more information is included from the original sound file when a higher bit rate is used, and thus the higher the quality during play back. In the early days of MP3 encoding, a fixed bit rate was used for the entire file.

Bit rates available in MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s, and the available sample frequencies are 32, 44.1 and 48 kHz. 44.1 kHz is almost always used (coincides with the sampling rate of compact discs), and 128 kbit/s has become the de facto "good enough" standard, although 192 kbit/s is becoming increasingly popular over peer-to-peer file sharing networks. MPEG-2 and [the non-official] MPEG-2.5 includes some additional bit rates: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kbit/s.

Variable bit rates (VBR) are also possible. Audio in MP3 files are divided into frames (which have their own bit rate) so it is possible to change the bit rate dynamically as the file is encoded (although not originally implemented, VBR is in extensive use today). This technique makes it possible to use more bits for parts of the sound with higher dynamics (more sound movement) and fewer bits for parts with lower dynamics, further increasing quality and decreasing storage space. This method compares to a sound activated tape recorder that reduces tape consumption by not recording silence. Some encoders utilize this technique to a great extent.

Non-standard bitrates up to 640 kbit/s can be achieved with the LAME encoder and the --freeformat option, however only few MP3 players can play those files.