Ambisonic decoding

Last updated

This page focusses on decoding of classic first-order Ambisonics. Other relevant information is available on the Ambisonic reproduction systems page.

Contents

The Ambisonic B-format WXYZ signals define what the listener should hear. How these signals are presented to the listener by the speakers for best results, depends on the number of speakers and their location. Ambisonics treats directions where no speakers are placed with as much importance as speaker positions. It is undesirable for the listener to be conscious that the sound is coming from a discrete number of speakers. Some simple decoding equations are known to give good results for common speaker arrangements.

But Ambisonic Speaker Decoders can use much more information about the position of speakers, including their exact position and distance from the listener. Because human beings use different mechanisms to locate sound, Classic Ambisonic Decoders it is desirable to modify the speaker feeds at each frequency to present the best information using Shelf Filters.

Some views on the complexities of Shelf Filters and Distance Compensation are explained in "Ambisonic Surround Decoders" [1] and "SHELF FILTERS for Ambisonic Decoders". [2]

There are specialised decoders for large audiences in large spaces.

Hardware decoders have been commercially available since the late 1970s; currently, Ambisonics is standard in surround products offered by Meridian Audio, Ltd. Ad hoc software decoders are also available.

There are five main types of decoder:

Diametric decoders

This design is intended for a domestic, small room setting, and allows speakers to be arranged in diametrically opposed pairs.

Regular Polygon decoders

This design is intended for a domestic, small room setting. The speakers are equidistant from the listener and lie equally spaced on the circumference of a circle. The simplest Regular Polygon decoder is a Square with the listener in the centre. At least four speakers are required. Triangles do not work, exhibiting large "holes" between the speakers. Regular Hexagons perform better than Squares especially to the sides.

For the simplest (two dimensional) case (no height information), and spacing the loudspeakers equally in a circle, we derive the loudspeaker signals from the B-format W, X and Y channels:

where is the direction of the speaker under consideration.

The most useful of these is the Square 4.0 decoder.

The coordinate system used in Ambisonics follows the right hand rule convention with positive X pointing forwards, positive Y pointing to the left and positive Z pointing upwards. Horizontal angles run anticlockwise from due front and vertical angles are positive above the horizontal, negative below.

Auditorium decoders

This design is intended for a large, public space setting.

"Vienna" decoders

These are so named because the paper introducing deriving Ambisonic Decoders for irregular loudspeaker layouts was presented at the 1992 AES conference held in Vienna. The design was covered by a 1998 patent. [3] from Trifield Productions. The technology provides one approach to the decoding of Ambisonic signals to irregular loudspeaker arrays (such as ITU) commonly used for 5.1 surround sound replay. A slight flaw in the 1992 published papers decoder coefficients, and the use of heuristic search algorithms in order to solve the set of non-linear simultaneous equations needed to generate the decoders was published by Wiggins et al. in 2003, [4] and later extended to higher order irregular decoders in 2004 [5]

Parametric decoders

The idea behind parametric decoding is to treat the sound's direction of incidence as a parameter that can be estimated through time–frequency analysis. A large body of research into human spatial hearing [6] [7] suggests that our auditory cortex applies similar techniques in its auditory scene analysis, which explains why these methods work.

The major benefits of parametric decoding is a greatly increased angular resolution and the separation of analysis and synthesis into separate processing steps. This separation allows B-format recordings to be rendered using any panning technique, including delay panning, VBAP [8] and HRTF-based synthesis.

Parametric decoding was pioneered by Lake DSP [9] in the late 1990s and independently suggested by Farina and Ugolotti in 1999. [10] Later work in this domain includes the DirAC method [11] and the Harpex method. [12]

Irregular Layout Decoders

The Rapture3D decoder from Blue Ripple Sound supports this and is already used in a number of computer games using OpenAL.

See also

Related Research Articles

<span class="mw-page-title-main">Head-related transfer function</span> Response that characterizes how an ear receives a sound from a point in space

A head-related transfer function (HRTF), also known as a head shadow, is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person.

<span class="mw-page-title-main">Quadraphonic sound</span> Four-channel speaker audio

Quadraphonic sound – equivalent to what is now called 4.0 surround sound – uses four audio channels in which speakers are positioned at the four corners of a listening space. The system allows for the reproduction of sound signals that are independent of one another.

<span class="mw-page-title-main">Ambisonics</span> Full-sphere surround sound format

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

<span class="mw-page-title-main">Surround sound</span> System with loudspeakers that surround the listener

Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener. Its first application was in movie theaters. Prior to surround sound, theater sound systems commonly had three screen channels of sound that played from three loudspeakers located in front of the audience. Surround sound adds one or more channels from loudspeakers to the side or behind the listener that are able to create the sensation of sound coming from any horizontal direction around the listener.

Dolby Pro Logic is a surround sound processing technology developed by Dolby Laboratories, designed to decode soundtracks encoded with Dolby Surround. The terms Dolby Stereo and LtRt are also used to describe soundtracks that are encoded using this technique.

<span class="mw-page-title-main">OpenAL</span> API for rendering audio

OpenAL is a cross-platform audio application programming interface (API). It is designed for efficient rendering of multichannel three-dimensional positional audio. Its API style and conventions deliberately resemble those of OpenGL. OpenAL is an environmental 3D audio library, which can add realism to a game by simulating attenuation, the Doppler effect, and material densities.

Matrix decoding is an audio technology where a small number of discrete audio channels are decoded into a larger number of channels on play back. The channels are generally, but not always, arranged for transmission or recording by an encoder, and decoded for playback by a decoder. The function is to allow multichannel audio, such as quadraphonic sound or surround sound to be encoded in a stereo signal, and thus played back as stereo on stereo equipment, and as surround on surround equipment – this is "compatible" multichannel audio.

3D audio effects are a group of sound effects that manipulate the sound produced by stereo speakers, surround-sound speakers, speaker-arrays, or headphones. This frequently involves the virtual placement of sound sources anywhere in three-dimensional space, including behind, above or below the listener.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

<span class="mw-page-title-main">Nimbus Records</span> British record label

Nimbus Records is a British record company based at Wyastone Leys, Ganarew, Herefordshire. They specialise in classical music recordings and were the first company in the UK to produce compact discs.

Virtual surround is an audio system that attempts to create the perception that there are many more sources of sound than are actually present. In order to achieve this, it is necessary to devise some means of tricking the human auditory system into thinking that a sound is coming from somewhere that it is not. Most recent examples of such systems are designed to simulate the true (physical) surround sound experience using one, two or three loudspeakers. Such systems are popular among consumers who want to enjoy the experience of surround sound without the large number of speakers that are traditionally required to do so.

<span class="mw-page-title-main">Wave field synthesis</span> Technique for creating virtual acoustic environments

Wave field synthesis (WFS) is a spatial audio rendering technique, characterized by creation of virtual acoustic environments. It produces artificial wavefronts synthesized by a large number of individually driven loudspeakers from elementary waves. Such wavefronts seem to originate from a virtual starting point, the virtual sound source. Contrary to traditional phantom sound sources, the localization of WFS established virtual sound sources does not depend on the listener's position. Like as a genuine sound source the virtual source remains at fixed starting point.

MPEG Surround, also known as Spatial Audio Coding (SAC) is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwards compatible fashion. The total bit rates used for the core and the MPEG Surround data are typically only slightly higher than the bit rates used for coding of the core. MPEG Surround adds a side-information stream to the core bit stream, containing spatial image data. Legacy stereo playback systems will ignore this side-information while players supporting MPEG Surround decoding will output the reconstructed multi-channel audio.

The Trifield process is a form of audio rendering in which a conventional two-channel signal is decoded to an additional number of loudspeakers, typically three in the form of a Left-Centre-Right front stage. The technique provides significant additional image stability, especially when the listener is moving or off-axis.

Ambisonic UHJ format is a development of the Ambisonic surround sound system designed to be compatible with mono and stereo media. It is a hierarchy of systems in which the recorded soundfield will be reproduced with a degree of accuracy that varies according to the available channels. Although UHJ permits the use of up to four channels, only the 2-channel variant is in current use. In Ambisonics, UHJ is also known as "C-Format".

Ambiophonics is a method in the public domain that employs digital signal processing (DSP) and two loudspeakers directly in front of the listener in order to improve reproduction of stereophonic and 5.1 surround sound for music, movies, and games in home theaters, gaming PCs, workstations, or studio monitoring applications. First implemented using mechanical means in 1986, today a number of hardware and VST plug-in makers offer Ambiophonic DSP. Ambiophonics eliminates crosstalk inherent in the conventional stereo triangle speaker placement, and thereby generates a speaker-binaural soundfield that emulates headphone-binaural sound, and creates for the listener improved perception of reality of recorded auditory scenes. A second speaker pair can be added in back in order to enable 360° surround sound reproduction. Additional surround speakers may be used for hall ambience, including height, if desired.

The sweet spot is a term used by audiophiles and recording engineers to describe the focal point between two speakers, where an individual is fully capable of hearing the stereo audio mix the way it was intended to be heard by the mixer. The sweet spot is the location which creates an equilateral triangle together with the stereo loudspeakers, the stereo triangle. In the case of surround sound, this is the focal point between four or more speakers, i.e., the location at which all wave fronts arrive simultaneously. In international recommendations the sweet spot is referred to as reference listening point.

<span class="mw-page-title-main">Equalization (audio)</span> Changing the balance of frequency components in an audio signal

Equalization, or simply EQ, in sound recording and reproduction is the process of adjusting the volume of different frequency bands within an audio signal. The circuit or equipment used to achieve this is called an equalizer.

The design of speaker systems for Ambisonic playback is governed by several constraints:

References

  1. Lee, Richard (18 February 2007). "Ambisonic Surround Decoder". Ambisonia.com. Archived from the original on 19 March 2009. Retrieved 4 April 2009.
  2. Lee, Richard (14 April 2007). "SHELF FILTERS for Ambisonic Decoders". Ambisonia.com. Archived from the original (Zipped Microsoft Word document) on 15 April 2009. Retrieved 4 April 2009.
  3. US 5757927, Gerzon, Michael Anthony &Barton, Geoffrey James,"Surround sound apparatus",published 1998-05-26, assigned to Trifield Productions Ltd.
  4. Wiggins, Bruce; Paterson-Stephens, Iain; Lowndes, Val; Berry, Stuart (2003). "The Design and Optimisation of Surround Sound Decoders Using Heuristic Methods". Proceedings of UKSim 2003, Conference of the UK Simulation Society: 106–114.
  5. Wiggins, Bruce (2004). An Investigation into the Real-time Manipulation and Control of Three-dimensional Sound Fields (PhD). University of Derby. hdl:10.48773/93q0q.
  6. Blauert, Jens (1997). Spatial Hearing: The Psychophysics of Human Sound Localization (Revised ed.). Cambridge, MA: MIT Press. ISBN   978-0-262-02413-6.
  7. Bregman, Albert S. (29 September 1994). Auditory Scene Analysis: The Perceptual Organization of Sound. Bradford Books. Cambridge, MA: MIT Press. ISBN   978-0-262-52195-6.
  8. "Vector base amplitude panning". Research / Spatial sound. Otakaari, Finland: TKK Acoustics. 18 January 2006. Retrieved 12 May 2012.
  9. US 6628787,McGrath, David Stanley&McKeag, Adam Richard,"Wavelet conversion of 3-D audio signals",published 2003-09-30, assigned to Lake Technology Ltd.
  10. Farina, Angelo; Ugolotti, Emanuele (April 1999). "Subjective Comparison Between Stereo Dipole and 3D Ambisonic Surround Systems for Automotive Applications" (PDF). Proceedings of the AES 16th International Conference. AES 16th International conference on Spatial Sound Reproduction. Rovaniemi, Finland: AES. s78357. Retrieved 12 May 2012.
  11. "Directional Audio Coding". Research / Spatial sound. Otakaari, Finland: TKK Acoustics. 23 May 2011. Retrieved 12 May 2012.
  12. "Harpex". Oslo, Norway: Harpex Limited. 2011. Retrieved 12 May 2012.