Ambisonic reproduction systems

Last updated

The design of speaker systems for Ambisonic playback is governed by several constraints:

Contents

This page attempts to discuss the interaction of these constraints and their various trade-offs in theory and practice, as well as perceptional advantages or drawbacks of specific speaker layouts which have been observed in actual deployments.

General considerations

Near-field effect

In its original formulation, Ambisonics assumed plane-wave sources for reproduction, which implies speakers that are infinitely far away. This assumption will lead to a pronounced bass boost for speaker rigs of small diameter, which increases with Ambisonic order. The cause is the very same proximity effect that occurs with directional microphones. Therefore, appropriate near-field compensation (bass equalisation) is beneficial.

Speaker distance vs. angles

This same plane-wave assumption makes it possible to vary the distance of speakers within reasonable limits without upsetting the correct function of the decoder, provided that the difference is compensated with delay, the power is adjusted for uniform loudness at the center, and that per-speaker near-field compensation is used. Distance does not affect the decoder matrix.

Variable speaker distance is therefore the most important degree of freedom when deploying idealized layouts in actual rooms. It is constrained by the reverberation of the room which leads to uneven direct-to-reverb ratios between speakers at different distances, and the power handling capability of the most distant speaker. If speakers have to be moved very close, care must be taken to ensure they still cover the entire listening area with reasonably flat frequency response.

Speaker angles on the other hand should be adhered to as precisely as possible, unless an optimised irregular decoder can be generated in-the-field.

Horizontal vs. full-sphere accuracy

For horizontal-only content, horizontal systems provide more stable localisation at high frequencies than full-sphere ones, as shown by a simulation of the energy vector . Therefore, if occasional horizontal-only reproduction at the highest precision is desired, full-sphere layouts with a dense horizontal ring are preferable.

Phasing

Since multiple speakers will inevitably radiate very highly correlated content, a moving listener may experience a phasing effect that affects the perceived timbre and can upset localisation. Phasing artefacts are most prominent in dry rooms on very precisely calibrated systems. They can be reduced by adding height speakers, which tend to smoothen the effect, or tuned to a subjective minimum by introducing staggered delays to the speakers, with the understanding that this may adversely affect low-frequency localisation if overdone.

Phasing problems usually become evident in walk-around environments, and are of less concern for a seated audience, unless the interference pattern is so dense that it is perceived by small head movements.

Loudspeaker occlusion

For multi-listener environments and auditoria, the occlusion of speakers by other listeners must not be under-estimated. Generally, the higher the order and the more physically accurate the reproduction, the more robust it is, up to the point where occlusion produces realistic effects that are consistent with the affected listener's visual perception. For low order systems however, reconstruction can easily fail entirely when line-of-sight to speakers is blocked, which has led to odd seating arrangements in listening tests. [1]

With-height systems usually provide more unhindered lines-of-sight per direction for a given audience, which might increase their robustness.

Number of loudspeakers vs. source material resolution

Solvang [2] and others have shown that using much more than the minimally required number of speakers can be detrimental. The reason is simple: more speakers with constant angular resolution means higher crosstalk and thus higher correlation between speakers. If not managed, this leads to stronger comb-filtering effect and phasing artefacts when the listener moves.

Therefore, with some decoding techniques, it may be advisable to consider if and how a reasonably regular lower-order decoder that omits some speakers can be fitted into any higher-order system design. For example, the third-order octagon allows for a perfectly regular first-order square using only every other speaker.

Horizontal-only systems

Horizontal-only playback rigs are the most commonly deployed and most extensively researched Ambisonic speaker systems, because they constitute an economic next step after conventional stereo. They can reproduce full-sphere content, but elevated sources will be projected onto the horizontal plane, and sources at zenith and nadir will be reproduced in mono by all available speakers.

The literature is rife with horizontal decoders based on the simpler cylindrical harmonics, which do not depend on the elevation angle . Their use is discouraged, because they wrongly assume cylindrical waves which would require perfect line sources for reproduction. Actual speakers are point sources and will inevitably leak energy along the vertical axis, which has consequences for near-field compensation and the tuning of dual-band decoders. Hence, cylindrical decoders do not usually fulfill the Ambisonic criteria.

Triangle

The theoretical minimum of speakers for horizontal playback is , or the number of Ambisonic components. However, the triangle demonstrates that at least one more speaker is necessary for proper soundfield reconstruction, since it exhibits extreme speaker detention: when panned around, sounds will stick to speaker locations and then jump across to the next speaker, rather than showing uniform motion. As a consequence, the directions of and do not match between speakers, which causes localisation errors. [3]

Hence, the triangle is a suitable setup for Ambisonic reproduction only at low frequencies.

Square or rectangular setups

Four-speaker setups are the most economical way of reproducing first-order horizontal material, and a rectangular layout is most easily fit into a living room, which makes these setups the most common in domestic environments. With rectangles, there is a localisation performance trade-off: the short sides will localise more stably than the square, the long sides worse. Consequently, for predominantly frontal sound stages, Benjamin, Lee, and Heller (2008) have observed a preference for rectangular layouts over squares. [4]

All legacy domestic hardware decoders supported rectangular layouts, usually with variable aspect ratios.

ITU 5.1

It is tempting to consider 5.1 systems for Ambisonic playback due to their wide availability, but the ITU-R BS775 layout is quite hostile to Ambisonics due to its extreme irregularity. The three front speakers are so close together (-30°, 0°, +30°) that they will exhibit significant crosstalk in first-order, which causes irritating phasing artefacts without any benefit. Therefore, it is advisable to omit the center speaker and decode only for L, R, Ls and Rs, as has been done in all pre-decoded G-format releases for 5.1. These G-format disks also assume a rectangular layout. If first-order playback is desired, the rear speakers should be moved accordingly, otherwise the Ambisonic imaging will be very unstable due to the wide angle between the surround speakers.

Decoding approaches to 5.1 were first suggested by Gerzon and Barton in 1992 [5] and subsequently patented. [6] Adriansen provides a free second-order decoder obtained by genetic search, [7] and Wiggins (2007) has shown that source material as high as fourth order can be beneficial in order to 'steer' the decoding functions, even though the system is unable to reproduce the full spatial resolution. [8]

Second and third-order material can be played satisfactorily over the ITU 5.1 layout, but due to the problems with first-order reproduction, it should not be considered for Ambisonics except as a compromise when 5.1 content predominates.

Hexagon

If six speakers and sufficient space are available, the hexagon is a very good option that has outperformed four-channel setups for first-order reproduction in listening tests [4] and is capable of second-order reproduction. It can be driven by an inexpensive 5.1 sound card and domestic 5.1 amplifier, provided the LFE output is full-range.

When used with one speaker in front, the hexagon can be abused for native 5.1 playback at the expense of a significantly wider and more blurry stereo stage (120° as opposed to 60° between L and R as per ITU-R BS775). Alternatively, reasonably sharp virtual speakers at the canonical ITU locations can be created with second-order panners - this is an interesting option if a phantom center is tolerable, and it will also work with a two-in-front orientation, which leaves more room for a TV or projection screen.

Octagon

The Octagon is a flexible choice for up to third-order playback. When oriented one-in-front, it can be used for reasonably accurate native 5.1 playback (L and R at +/- 45° vs. 30°, and surrounds within the standardized sector at +/- 112.5°). For first order, phasing artefacts might become obvious under non-reverberant listening conditions due to the use of significantly more speakers than required, and Solvang's results (2008) suggest slightly increased timbral defects outside the sweet spot. [9]

With eight channels, an octagon can be driven by affordable 7.1 consumer equipment, again as long as the LFE output is full-range. Driven in third order, it is a reasonable lower bound for concert sound reinforcement over an extended listening area, either for native Ambisonic content or to produce virtual speakers, [10] which has been found to scale to several hundred listeners under favourable conditions. [11]

Systems with limited height reproduction

Stacked rings

Stacked rings have been a popular way of obtaining limited with-height reproduction. Spatial resolution will be weak close to the zenith and nadir, but these are somewhat rare positions for sound sources. Rings are generally easier to rig than (hemi-)spherical setups because they do not require overhead trussing, speaker stands can be shared unless the rings are twisted, and entrances, fire escape routes etc. can be more easily accommodated for.

Double hexagons and octagons are the most common variations.

Since the introduction of #H#V mixed-order schemes by Travis (2009), [12] stacked rings can be operated at their full horizontal resolution even for elevated sources. #H#V decoding matrices for common layouts are available from Adriaensen (2012). [7]

Triple rings are rare, but have been used to good effect. [13]

Upper hemisphere systems

Since stacked rings are somewhat wasteful at higher elevations and necessarily have a hole at the zenith, they have been largely surpassed by hemispherical layouts since mature methods for decoder generation have become available. As they are difficult to rig and require overhead points, hemispheres are usually found either in permanent installations or experimental studios, where expensive and visually intrusive trussing is not an issue.

Full-sphere systems: Platonic solids

The regular Platonic solids are the only full-sphere layouts for which closed-form solutions for decoding matrices exist. Before the development and adoption of modern mathematical tools for the optimisation of irregular layouts and the generation of T-designs and Lebedev grids with higher numbers of speakers, the regular polyhedra were the only tractable options.

Tetrahedron

Tetrahedral speaker setups were used in the 1970s for first trials of full-sphere sound reproduction. One such experiment conducted by the Oxford University Tape Recording Society was documented by Michael Gerzon in 1971. [14] [15] [16] In this setup, the tetrahedron was inscribed into a cuboid, using every other corner.

Despite Gerzon's somewhat over-enthusiastic description (which pre-dates the introduction of Ambisonics and the proper formulation of its psychoacoustic criteria), the tetrahedron exhibits the same stability problems in 3D that plague the triangle for horizontal-only reproduction. It is a viable option for adequate full-sphere reproduction only at low frequencies.

Octahedron

The octahedron is difficult to set up in "upright" orientation, since the listener would occlude the floor speaker. Hence, a "slanted" setup is usually preferred. It provides basic full-sphere first-order reproduction for a single listener.

Goodwin (2009) has suggested a slanted octahedron with separate front center (which he calls 3D7.1) [17] as an alternative way of using 7.1 systems to achieve with-height Ambisonic reproduction in games, and to allow reasonably accurate native 5.1 playback. An OpenAL game audio backend and decoder for this setup is commercially available. [18]

Cube

The most commonly encountered full-sphere systems are cubes or rectangular cuboids. The same localisation trade-offs apply as for square vs. rectangle (see above). Cuboids are easily fit into standard rooms and provide precise localisation in first order for a single listener plus enjoyable envelopment for one or two more, and they can be built using off-the shelf 7.1 components. If all speakers are placed in room corners, their acoustic loading and resulting bass boost will be uniform, which means they can all be equalised in the same way.

Icosahedron

For the sake of consistency, we consider the vertices of the regular polyhedra as speaker positions, which makes the twelve-vertex icosahedron the next in the list. [note 1] If suitable rigging options are available, it is capable of second-order full-sphere reproduction. A good and slightly more practical alternative is a horizontal hexagon complemented by two twisted triangles on floor and ceiling.

Dodecahedron

With twenty vertices, [note 1] the dodecahedron is capable of third-order full-sphere playback. Budget dodecahedra can be built by combining four domestic 5.1 sets as demonstrated at IRCAM's Studio 4, [19] which would also allow for a square horizontal subwoofer decode,

Irregular speaker layouts

It is possible to decode Ambisonics and Higher-Order Ambisonics onto fairly arbitrary speaker arrays, and this is a subject of ongoing research. A number of free decoding toolkits as well as a commercial implementation [20] are available.

Binaural stereo

Higher-Order Ambisonics can be decoded to produce 3D stereo headphone output similar to that produced using binaural recording. This can be done in a number of ways, including the use of virtual loudspeakers in combination with HRTF data. [21] Other methods are possible. [22]

Notes

  1. 1 2 Unfortunately, in the literature the icosahedral layout is commonly called a dodecahedron and vice versa, without justification as to why we should now consider faces rather than vertices.

Related Research Articles

<span class="mw-page-title-main">DVD-Audio</span> DVD format for storing high-fidelity audio

DVD-Audio is a digital format for delivering high-fidelity audio content on a DVD. DVD-Audio uses most of the storage on the disc for high-quality audio and is not intended to be a video delivery format.

<span class="mw-page-title-main">Quadraphonic sound</span> Four-channel speaker audio

Quadraphonic sound – equivalent to what is now called 4.0 surround sound – uses four audio channels in which speakers are positioned at the four corners of a listening space. The system allows for the reproduction of sound signals that are independent of one another.

<span class="mw-page-title-main">Ambisonics</span> Full-sphere surround sound format

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

<span class="mw-page-title-main">Surround sound</span> System with loudspeakers that surround the listener

Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener. Its first application was in movie theaters. Prior to surround sound, theater sound systems commonly had three screen channels of sound that played from three loudspeakers located in front of the audience. Surround sound adds one or more channels from loudspeakers to the side or behind the listener that are able to create the sensation of sound coming from any horizontal direction around the listener.

Matrix decoding is an audio technology where a small number of discrete audio channels are decoded into a larger number of channels on play back. The channels are generally, but not always, arranged for transmission or recording by an encoder, and decoded for playback by a decoder. The function is to allow multichannel audio, such as quadraphonic sound or surround sound to be encoded in a stereo signal, and thus played back as stereo on stereo equipment, and as surround on surround equipment – this is "compatible" multichannel audio.

3D audio effects are a group of sound effects that manipulate the sound produced by stereo speakers, surround-sound speakers, speaker-arrays, or headphones. This frequently involves the virtual placement of sound sources anywhere in three-dimensional space, including behind, above or below the listener.

<span class="mw-page-title-main">DTS (company)</span> Series of multichannel audio technologies

DTS, Inc. is an American company. DTS company makes multichannel audio technologies for film and video. Based in Calabasas, California, the company introduced its DTS technology in 1993 as a competitor to Dolby Laboratories, incorporating DTS in the film Jurassic Park (1993). The DTS product is used in surround sound formats for both commercial/theatrical and consumer-grade applications. It was known as The Digital Experience until 1995. DTS licenses its technologies to consumer electronics manufacturers.

<span class="mw-page-title-main">Nimbus Records</span> British record label

Nimbus Records is a British record company based at Wyastone Leys, Ganarew, Herefordshire. They specialise in classical music recordings and were the first company in the UK to produce compact discs.

The Soundfield microphone is an audio microphone composed of four closely spaced subcardioid or cardioid (unidirectional) microphone capsules arranged in a tetrahedron. It was invented by Michael Gerzon and Peter Craven, and is a part of, but not exclusive to, Ambisonics, a surround sound technology. It can function as a mono, stereo or surround sound microphone, optionally including height information.

<span class="mw-page-title-main">Wave field synthesis</span> Technique for creating virtual acoustic environments

Wave field synthesis (WFS) is a spatial audio rendering technique, characterized by creation of virtual acoustic environments. It produces artificial wavefronts synthesized by a large number of individually driven loudspeakers from elementary waves. Such wavefronts seem to originate from a virtual starting point, the virtual sound source. Contrary to traditional phantom sound sources, the localization of WFS established virtual sound sources does not depend on the listener's position. Like as a genuine sound source the virtual source remains at fixed starting point.

The Trifield process is a form of audio rendering in which a conventional two-channel signal is decoded to an additional number of loudspeakers, typically three in the form of a Left-Centre-Right front stage. The technique provides significant additional image stability, especially when the listener is moving or off-axis.

Ambisonic UHJ format is a development of the Ambisonic surround sound system designed to be compatible with mono and stereo media. It is a hierarchy of systems in which the recorded soundfield will be reproduced with a degree of accuracy that varies according to the available channels. Although UHJ permits the use of up to four channels, only the 2-channel variant is in current use. In Ambisonics, UHJ is also known as "C-Format".

This page focusses on decoding of classic first-order Ambisonics. Other relevant information is available on the Ambisonic reproduction systems page.

<span class="mw-page-title-main">Stereo Quadraphonic</span> Matrix 4-channel quadraphonic sound system

SQ Quadraphonic was a matrix 4-channel quadraphonic sound system for vinyl LP records. It was introduced by CBS Records in 1971. Many recordings using this technology were released on LP during the 1970s.

Ambiophonics is a method in the public domain that employs digital signal processing (DSP) and two loudspeakers directly in front of the listener in order to improve reproduction of stereophonic and 5.1 surround sound for music, movies, and games in home theaters, gaming PCs, workstations, or studio monitoring applications. First implemented using mechanical means in 1986, today a number of hardware and VST plug-in makers offer Ambiophonic DSP. Ambiophonics eliminates crosstalk inherent in the conventional stereo triangle speaker placement, and thereby generates a speaker-binaural soundfield that emulates headphone-binaural sound, and creates for the listener improved perception of reality of recorded auditory scenes. A second speaker pair can be added in back in order to enable 360° surround sound reproduction. Additional surround speakers may be used for hall ambience, including height, if desired.

Data exchange formats for Ambisonics have undergone radical changes since the early days of four-track magnetic tape. Researchers working on very high-order systems found no straightforward way to extend the traditional formats to suit their needs. Furthermore, there was no widely accepted formulation of spherical harmonics for acoustics, so one was borrowed from chemistry, quantum mechanics, computer graphics, or other fields, each of which had subtly different conventions. This led to an unfortunate proliferation of mutually incompatible ad hoc formats and much head-scratching.

It is possible to define an Ambisonic signal set with non-uniform resolution depending on source direction. This practice is called mixed-order, and it has consequences for the layout and interpretation of files, streams, or physical connections in Ambisonic data exchange. As with all things Ambisonic, complexity has increased as research progressed, and the term has grown to include new concepts which were not anticipated when Ambisonics was first invented in the 1970s.

The Oxford University Tape Recording Society (OUTRS) was a student's club of recording enthusiasts that has existed at Oxford from at least 1966 until at least 1976. Among its members were AES fellow Michael Gerzon and Peter Craven, co-inventors of the Soundfield microphone, Nimbus Records director Jonathan Halliday and sound engineer and prolific Ambisonic recordist Paul Hodges.

<span class="mw-page-title-main">Hafler circuit</span> Derived 4-channel quadraphonic sound system

The Hafler circuit is a passive electronics circuit with the aim of getting derived surround sound or ambiophony from regular stereo recordings without using costly electronics. Such circuits are generally known as matrix decoders. The Dynaquad system works using similar principles.

References

  1. Stephen Thornton, Surround sound from two-channel stereo, see photos, retrieved 2014-01-02
  2. Audun Solvang, Spectral Impairment of Two-Dimensional Higher Order Ambisonics, JAES Vol.56 No. 4, April 2008
  3. Bruce Wiggins, Has Ambisonics Come of Age?, Reproduced Sound 24 - Proceedings of the Institute of Acoustics, Vol 30. Pt 6, 2008, Fig. 7
  4. 1 2 Eric Benjamin, Richard Lee, and Aaron Heller, Localisation in Horizontal-only Ambisonic Systems, 121st AES Convention, San Francisco 2006
  5. Michael A Gerzon, Geoffrey J Barton, "Ambisonic Decoders for HDTV", 92nd AES Convention, Vienna 1992. http://www.aes.org/e-lib/browse.cfm?elib=6788
  6. US 5757927,Gerzon, Michael Anthony&Barton, Geoffrey James,"Surround sound apparatus",published 1998-05-26, assigned to Trifield Productions Ltd.
  7. 1 2 Fons Adriaensen, AmbDec Ambisonic Decoder, 2012
  8. Bruce Wiggins, The Generation of Panning Laws for Irregular Speaker Arrays Using Heuristic Methods Archived 2016-05-17 at the Portuguese Web Archive. 31st AES Conference, London 2007
  9. Audun Solvang, Spectral Impairment for Two-Dimensional Higher-Order Ambisonics, JAES Vol. 56, No. 4, April 2008, http://www.aes.org/e-lib/browse.cfm?elib=14385
  10. Jörn Nettingsmeier, General-purpose Ambisonic playback systems for electroacoustic concerts, 2nd International Symposium on Ambisonics and Spherical Acoustics, Paris 2010
  11. Jörn Nettingsmeier and David Dohrmann, Preliminary studies on large-scale higher-order Ambisonic sound reinforcement systems, Ambisonics Symposium 2011, Lexington (KY) 2011
  12. Travis, Chris, A new mixed-order scheme for Ambisonic signals Archived 2009-10-04 at the Wayback Machine , Ambisonics Symposium, Graz 2009
  13. Jörn Nettingsmeier, Field Report II A contemporary music recording in Higherorder Ambisonics, Linux Audio Conference 2012, Stanford 2012, p.8
  14. Michael Gerzon, Experimental Tetrahedral Recording: part one, Studio Sound, Vol. 13, August 1971, pp 396-398
  15. Michael Gerzon, Experimental Tetrahedral Recording: part two, Studio Sound, Vol. 13, September 1971, pp 472, 473 and 475
  16. Michael Gerzon, Experimental Tetrahedral Recording: part three, Studio Sound, Vol. 13, October 1971, pp 510, 511, 513 and 515
  17. Simon Goodwin, 3D sound for 3D games - beyond 5.1, AES 35th International Conference, London 2009
  18. Blue Ripple Sound, HOA Technical Notes - 3D7.1, retrieved 2014-01-02
  19. 2nd International Symposium on Ambisonics and Spherical Acoustics, IRCAM, Paris 2010, demo of Blue Ripple Sound's Rapture3D engine
  20. Blue Ripple Sound, HOA Technical Notes - Custom Layouts in Rapture3D Advanced Edition, retrieved 2014-01-24
  21. Richard Furse, Building an OpenAL Implementation Using Ambisonics, AES 35th International Conference, London 2009
  22. Blue Ripple Sound, HOA Technical Notes - Amber HRTF, retrieved 2014-01-24