Joint encoding

In audio engineering, joint refers to a joining of several channels of similar information in order to obtain higher quality or smaller file size.

Joint stereo

The term joint stereo has become prominent as the Internet has allowed for the transfer of relatively low bit rate, acceptable-quality audio with modest Internet access speeds. Joint stereo refers to any number of encoding techniques used for this purpose. Two forms are described here, both of which are implemented in various ways with different codecs, such as MP3, AAC and Ogg Vorbis.

Intensity stereo coding

This form of joint stereo uses a technique known as joint frequency encoding, which functions on the principle of sound localization. Human hearing is predominantly less acute at perceiving the direction of certain audio frequencies. By exploiting this 'limitation', intensity stereo coding can reduce the data rate of an audio stream with little or no perceived change in apparent quality.

More specifically, the dominance of inter-aural time differences (ITD) for sound localization by humans is only given for lower frequencies. That leaves inter-aural amplitude differences (IAD) as the dominant location indicator for higher frequencies. The idea of intensity stereo coding is to merge the upper spectrum into just one channel (thus reducing overall differences between channels) and to transmit a little side information about how to pan certain frequency regions to recover the IAD cues.

This type of coding does not perfectly reconstruct the original audio because of the loss of information which results in the simplification of the stereo image. It can produce unwanted artifacts that affect this image to a perceivable extent. However, for very low bitrates this tool usually provides a gain of perceived quality.

It is supported by many audio compression formats (including MP3, AAC and Vorbis) but not always by every encoder.

M/S stereo coding

M/S stereo coding transforms the left and right channels into a mid channel and a side channel, hence the name. The mid channel is the sum of the left and right channels, or $L+R$ . The side channel is the difference of the left and right channels, i.e., $L-R$ . Unlike intensity stereo coding, M/S coding is a special case of transform coding, and retains the audio perfectly without introducing artifacts. Lossless codecs such as FLAC or Monkey's Audio use M/S stereo coding because of this.

This kind of coding is also sometimes known as matrix stereo, and is used in many different forms of audio processing and recording equipment. It is therefore not limited to digital systems, and can even be created with passive audio transformers or analog amplifiers.

One example of the use of M/S stereo is in FM stereo broadcasting, where $L+R$ modulates the carrier wave and $L-R$ modulates a subcarrier.

Another example of M/S stereo is the stereophonic microgroove record. Lateral motions of a stylus represent the sum of two channels. The vertical motion represents the difference between the channels. These data are either added (L + R) + (L - R) = 2L or subtracted (L + R) - (L - R) = 2R to reconstruct the two channels.

Joint frequency encoding

Joint frequency encoding is an encoding technique used in audio data compression to reduce the data rate.

The idea is to merge a given frequency range of multiple sound channels together so that the resulting encoding will preserve the sound information of that range not as a bundle of separate channels but as one homogeneous data stream. This will naturally destroy the original channel separation for good, as the information cannot be accurately reconstructed, but this process will greatly lessen the amount of required storage space. Only some forms of joint stereo use the joint frequency encoding technique, such as intensity stereo coding.

Implementations

When used within the MP3 compression process, joint stereo normally employs multiple techniques and can switch between them for each MPEG frame, including simple L/R stereo. With some encoding software it is possible to force the use of joint stereo without ever using L/R stereo. Within the LAME encoder, this is known as forced joint-stereo.

Ogg Vorbis stereo files can employ either LR stereo or joint-stereo. When using joint-stereo, both MS stereo and intensity stereo methods may be used. As opposed to MP3 where MS stereo (when used) is applied before quantization, an Ogg Vorbis encoder applies MS stereo to samples in the frequency domain after quantization, making application of MS stereo a lossless step. After this step any frequency area can be converted to intensity stereo by removing the corresponding part of the MS signal's S part. Ogg Vorbis' floor function will take care of the required left-right panning.

External links

Jürgen Herre, Fraunhofer IIS. From Joint Stereo to Spatial Audio Coding - Recent Progress and Standardization. October 2004, Paper 157, DAFx'04 7th International Conference of Digital Audio Effects.

References