Ambisonic data exchange formats

Last updated

Data exchange formats for Ambisonics have undergone radical changes since the early days of four-track magnetic tape. Researchers working on very high-order systems found no straightforward way to extend the traditional formats to suit their needs. Furthermore, there was no widely accepted formulation of spherical harmonics for acoustics, so one was borrowed from chemistry, quantum mechanics, computer graphics, or other fields, each of which had subtly different conventions. This led to an unfortunate proliferation of mutually incompatible ad hoc formats and much head-scratching.

Contents

This page attempts to document the different existing formats, their rationales and history, for the terminally curious and those unfortunate enough to have to deal with them in detail.

Most modern applications use ACN and SN3D, although traditional first order is still common.

Spherical harmonics in Ambisonics

A common formulation for spherical harmonics in the context of Ambisonics is [1]

where denotes a spherical harmonic of degree and index with a range of .

(Note that if , then .)

is a normalisation factor (see below), and is the associated Legendre polynomial of degree and order . The azimuth angle is zero straight ahead and increases counter-clockwise. The elevation angle is zero on the horizontal plane and positive in the upper hemisphere.

Unfortunately, the "Ambisonic order" is called the degree in mathematical parlance, which uses order for the "Ambisonic index" .

Relationship of spherical harmonics and B-format signals

For a source signal in direction , the Ambisonic components are given by

If we span a direction vector from the origin towards the source until it intersects the respective spherical harmonic, the length of this vector is the coefficient that gets multiplied with the source signal. Repeat for all spherical harmonics up to the desired Ambisonic order.

Prerequisites for successful data exchange

For successful exchange of Ambisonic material, some software requires the sender and receiver have to agree on the ordering of the components, their normalisation or weighting, and the relative polarity of the harmonics.

Since it is possible to omit parts of the spherical harmonic multipole expansion for content that has non-uniform, direction-dependent resolution (known as mixed-order ), it might also be necessary to define how to deal with missing components.

In the case of transmission "by wire", be it an actual digital multichannel link or any number of virtual patchcords within an audio processing engine, these properties must be explicitly matched on both ends, since there is usually no provision for metadata exchange and parameter negotiation. In the case of files, some flexibility might be possible, depending on the file format and the expressiveness of its metadata set.

However, in practice, just two formats are in widespread use. The first is Furse-Malham higher-order format, which is an extension of traditional B-Format, and the more modern SN3D, in ACN channel order. In neither case is there any ambiguity about ordering, normalisation, weighting or polarity and it is rare to see cases with missing components. A third format is in limited use: N3D, also in ACN channel order.

Component ordering

Spherical Harmonics up to Ambisonic order 5 as commonly displayed, sorted by increasing Ambisonic Channel Number (ACN), aligned for symmetry. Spherical Harmonics deg5.png
Spherical Harmonics up to Ambisonic order 5 as commonly displayed, sorted by increasing Ambisonic Channel Number (ACN), aligned for symmetry.

The traditional B-format () only concerned itself with zeroth and first Ambisonic order. Because of a strong correspondence between the spherical harmonics and microphone polar patterns, and the fact that those polar patterns have clearly defined directions, it seemed natural to order and name the components in the same way as the axes of a right-hand coordinate system.

For higher orders, this precedent becomes awkward, because spherical harmonics are most intuitively arranged in symmetric fashion around the single z-rotationally symmetric member m=0 of each order, with the horizontal sine terms m<0 to the left, and the cosine terms m>0 to the right (see illustration).

Furse-Malham

In Furse-Malham higher-order format, an extension of traditional B-format up to third order, [2] orders 2 () and 3 () begin with their z-rotationally symmetric member and then jump outward right and left (see table), with the horizontal components at the end. Higher-order extensions are trivially defined, but are not used. [2]

SID

0
231
57864
1012141513119

In his seminal 2001 thesis, [3] Daniel used a three-index nomenclature for the spherical harmonics, which corresponds to in the notation used here. [note 1] He implied yet another channel ordering, subsequently developed into an explicit proposal called SID for Single Index Designation [4] which was adopted by a number of researchers. This scheme is compatible with first-order B-format, and continues to traverse the higher spherical harmonics in the same fashion, with the z-rotationally symmetric component at the end, going through the horizontal components first. It is, however, incompatible with Furse-Malham. SID ordering is not in widespread use.

ACN

0
123
45678
9101112131415

For future higher-order systems, adoption of the Ambisonic Channel Number (ACN) [5] has reached wide consensus. It is determined algorithmically as .

ACN is used widely with SN3D and N3D, below.

More simply:

FuMa = WXYZ | RSTUV | KLMNOPQ

ACN = WYZX | VTRSU | QOMKLNP

SID = WXYZ | UVSTR | PQNOLMK

SID used in iem_ambi in Puredata.

Normalisation

For successful reconstruction of the sound field, it is important to agree on a normalisation method for the spherical harmonic components. The following approaches are common:

maxN

The maxN scheme by Daniel normalizes each single component to never exceed a gain of 1.0 for a panned monophonic source. Malham states that "[w]hilst this approach is not rigorously "correct" in mathematical terms[ why? ], it has significant engineering advantages in that it restricts the maximum levels a panned mono source will generate in some of the higher-order channels." [2] This property is particularly interesting for fixed-point digital interfaces. The maxN weights may be determined by visual inspection up to the third order; above this value the maxima of each polynomial need to be determined explicitly. [2]

MaxN is used in the Furse-Malham format (with the exception of a -3dB correction factor for , which makes it directly compatible with traditional B-Format). Otherwise, it is not in widespread use.

SN3D

SN3D stands for Schmidt semi-normalisation and is commonly used in geology and magnetics. The weighting coefficients are[ dubious ] [6]

Originally introduced into Ambisonic use by Daniel, he notes: "High degree of generality - the encoding coefficients are recursively computable, and the first-order components are unity vectors in their respective directions of incidence". [7]

With SN3D, unlike N3D, no component will ever exceed the peak value of the 0th order component for single point sources. [1] This scheme has been adopted by the proposed AmbiX format.

SN3D (in the ACN channel order) is in widespread use and a common choice in new software development.

In the Ambix specification paper the term is replaced with .

N3D

N3D or full three-D normalisation is the most obvious approach to normalisation. Daniel describes it as follows: "Orthonormal basis for 3D decomposition. Simple relationship to SN3D [..]. Ensures equal power of the encoded components in the case of a perfectly diffuse 3D field. [..] Obvious significance for solving decoding problems [..] (3D reconstruction)." [8]

The relation to SN3D is [9]

This normalisation is standard in physics and mathematics and is supported by some Ambisonic software packages. It is used in MPEG-H. However, SN3D is now much more common.

As N3D and SN3D differ only by scaling factors, care is needed when working with both, as it may not be obvious on first listening if an error has been made, particularly on a system with a small number of speakers.

N2D / SN2D

Additionally, two schemes exist which consider only the horizontal components. This has practical advantages for fixed-point media in the common situation where sources are concentrated on the horizontal plane, but the normalisation is somewhat arbitrary and its assumptions do not hold for strongly diffuse soundfields and sound scenes with strong elevated sources. Since Ambisonics is meant to be isotropic and the 2D schemes definitely are not, their use is discouraged.

Polarity

A third complication arises from the quantum mechanical formulation of spherical harmonics, which was adopted by some Ambisonics researchers. It includes a factor of , a convention called Condon–Shortley phase , which will invert the relative polarity of every other component within a given Ambisonic order. The term can be folded both into the formulation of the associated Legendre polynomials or the normalisation coefficient, so it may not always be obvious.

MATLAB and GNU Octave both include Condon–Shortley phase in its legendre(ℓ,X) functions, but undo it by applying the factor again in the Schmidt semi-normalized form legendre(ℓ,X,'sch'). [10] [11]

Wolfram Language also includes C-S phase in its legendreP(ℓ,X) implementation, [12] and retains it in SphericalHarmonicY[ℓ,m,θ,φ], which is fully normalized. [13] Note that this function returns complex values and uses the physics convention for spherical coordinates where is the zenith angle (angle from the positive Z-axis) and is the azimuth (counter-clockwise angle around the positive Z-axis).

The presence of Condon–Shortley phase in parts of the signal chain usually manifests itself in erratic panning behaviour and increasing apparent source width when going to higher orders, which can be somewhat difficult to diagnose and much harder to eliminate. Hence, its use isstrongly discouragedin the context of Ambisonics.

None of the ambisonic exchange formats described above use Condon–Shortley phase. Polarity is generally only a concern when trying to reconcile theoretical formulations of the spherical harmonics from other academic disciplines.

Reference table of layouts and normalisations

The following table gives an overview of all Ambisonic formats published so far.

Conversion factors can be applied either to the Ambisonic components or the spherical harmonics .

The data is taken from Chapman (2008). [14]

Please do not rely on this table until it has been thoroughly checked and the "Under construction" notice has been removed.

Conversion factors
ACNFuMaSIDSpherical harmonic in N3Dto SN3Dto maxN* [note 2]
00000
1221-1
23310
31111
4852-2
5672-1
64820
75621
87422
915103-3
1013123-2
1111143-1
1291530
13101331
14121132
1514933
16ø174-4ø
17ø194-3ø
18ø214-2ø
19ø234-1ø
20ø2440ø
21ø2241ø
22ø2042ø
23ø1843ø
24ø164-4ø

However, please note that only the Furse-Malham and SN3D/ACN encodings are in wide use. (Traditional B-Format is a subset of Furse-Malham.) For both of these encodings, the equations can be expressed directly, without separate normalisation or conversion factors, and there is no ambiguity around ordering.

File formats and metadata

For file-based storage and transmission, additional properties need to be defined, such as the base file format and, if desired, accompanying metadata.

AMB

The .amb file format was proposed and defined by Richard Dobson in 2001, [15] based on Microsoft's WAVE_FORMAT_EXTENSIBLE amendment to the WAV audio file format. It mandates the use of Furse-Malham encoding.

From its parent, it inherits a maximum file size of 4GB, which is a serious limitation for live recording in higher orders.

.amb Files are distinguished from other multichannel content by their suffix and by setting the file subtype Globally Unique Identifier in their header data to either of the following values:

The definition mandates that the WAVE_EX dwChannelMask must be set to zero. Furthermore, it recommends that the file should contain a PEAK chunk, containing the value and position of the highest sample in each channel.

The channels within an .amb file are interleaved, and any unused channels are omitted. This makes it possible to identify traditional #H#P mixed-order content by the number of channels present, as per the following table: [15]

The free and open source C library libsndfile has included .amb support since 2007.

Dobson's format has been instrumental in making native Ambisonic content easily accessible to enthusiasts, and to pave the way for research and deployment of Higher-order Ambisonics. While it cannot scale any further than third order and does not accommodate #H#V mixed order sets, its capabilities are more than sufficient for most Ambisonic content in existence today, and backwards-compatibility to .amb is an important feature of any real-world Ambisonic workflow.

AmbiX

AmbiX [1] adopts Apple's Core Audio Format or .caf. It scales to arbitrarily high orders and has no practically relevant limitation of file size. AmbiX files contain linear PCM data with word lengths of 16, 24, or 32 bit fixed point, or 32 or 64 bit float, at any sample rate valid for .caf. It uses ACN channel ordering with SN3D normalisation.

The basic format of AmbiX mandates a complete full-sphere signal set, the order of which can be uniquely and trivially deduced from the number of channels. Only the minimum header information required by the .caf specification are present and no other metadata is included.

The extended format is marked by the presence of a User-Defined Chunk with the UUID

1AD318C3-00E5-5576-BE2D-0DCA2460BC89.

(The original specifications used 49454D2E-4154-2F41-4D42-49582F584D4C, which is an invalid UUID [16] ). Additionally, the header now contains an adaptor matrix of coefficients, which needs to be applied to the data streams before they can be played back. This matrix provides a generic way of mapping payloads in any previous format and any mix of orders to canonical periphony, ACN ordering and SN3D normalisation. Theoretically, it can even accommodate sound fields that span only subsets of the sphere.

AmbiX was originally proposed at the Ambisonic Symposium 2011, building upon previous work by Travis [17] and Chapman et al. [5]

Notes

  1. sgn(x) is the Sign function.
  2. 1 2 MaxN* (starred) denotes maxN normalisation with the additional -3dB correction factor for W.
  3. Dobson (2001) uses "1", which would imply a complete set of horizontal components WXY.

Related Research Articles

<span class="mw-page-title-main">Hydrogen atom</span> Atom of the element hydrogen

A hydrogen atom is an atom of the chemical element hydrogen. The electrically neutral atom contains a single positively charged proton and a single negatively charged electron bound to the nucleus by the Coulomb force. Atomic hydrogen constitutes about 75% of the baryonic mass of the universe.

<span class="mw-page-title-main">Laplace's equation</span> Second-order partial differential equation

In mathematics and physics, Laplace's equation is a second-order partial differential equation named after Pierre-Simon Laplace, who first studied its properties. This is often written as

<span class="mw-page-title-main">Legendre polynomials</span> System of complete and orthogonal polynomials

In mathematics, Legendre polynomials, named after Adrien-Marie Legendre (1782), are a system of complete and orthogonal polynomials with a vast number of mathematical properties and numerous applications. They can be defined in many ways, and the various definitions highlight different aspects as well as suggest generalizations and connections to different mathematical structures and physical and numerical applications.

<span class="mw-page-title-main">Spherical harmonics</span> Special mathematical functions defined on the surface of a sphere

In mathematics and physical science, spherical harmonics are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields. A list of the spherical harmonics is available in Table of spherical harmonics.

<span class="mw-page-title-main">Ambisonics</span> Full-sphere surround sound format

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

<span class="mw-page-title-main">Particle in a spherically symmetric potential</span> Quantum mechanical model

In quantum mechanics, a particle in a spherically symmetric potential is a system with a potential that depends only on the distance between the particle and a center. A particle in a spherically symmetric potential can be used as an approximation, for example, of the electron in a hydrogen atom or of the formation of chemical bonds.

In mathematics, the associated Legendre polynomials are the canonical solutions of the general Legendre equation

In rotordynamics, the rigid rotor is a mechanical model of rotating systems. An arbitrary rigid rotor is a 3-dimensional rigid object, such as a top. To orient such an object in space requires three angles, known as Euler angles. A special rigid rotor is the linear rotor requiring only two angles to describe, for example of a diatomic molecule. More general molecules are 3-dimensional, such as water, ammonia, or methane.

A multipole expansion is a mathematical series representing a function that depends on angles—usually the two angles used in the spherical coordinate system for three-dimensional Euclidean space, . Similarly to Taylor series, multipole expansions are useful because oftentimes only the first few terms are needed to provide a good approximation of the original function. The function being expanded may be real- or complex-valued and is defined either on , or less often on for some other .

In quantum mechanics, the angular momentum operator is one of several related operators analogous to classical angular momentum. The angular momentum operator plays a central role in the theory of atomic and molecular physics and other quantum problems involving rotational symmetry. Such an operator is applied to a mathematical representation of the physical state of a system and yields an angular momentum value if the state has a definite value for it. In both classical and quantum mechanical systems, angular momentum is one of the three fundamental properties of motion.

In physics, spherical multipole moments are the coefficients in a series expansion of a potential that varies inversely with the distance R to a source, i.e., as  Examples of such potentials are the electric potential, the magnetic potential and the gravitational potential.

A hydrogen-like atom (or hydrogenic atom) is any atom or ion with a single valence electron. These atoms are isoelectronic with hydrogen. Examples of hydrogen-like atoms include, but are not limited to, hydrogen itself, all alkali metals such as Rb and Cs, singly ionized alkaline earth metals such as Ca+ and Sr+ and other ions such as He+, Li2+, and Be3+ and isotopes of any of the above. A hydrogen-like atom includes a positively charged core consisting of the atomic nucleus and any core electrons as well as a single valence electron. Because helium is common in the universe, the spectroscopy of singly ionized helium is important in EUV astronomy, for example, of DO white dwarf stars.

In physics, the Laplace expansion of potentials that are directly proportional to the inverse of the distance, such as Newton's gravitational potential or Coulomb's electrostatic potential, expresses them in terms of the spherical Legendre polynomials. In quantum mechanical calculations on atoms the expansion is used in the evaluation of integrals of the inter-electronic repulsion.

In physics and mathematics, the solid harmonics are solutions of the Laplace equation in spherical polar coordinates, assumed to be (smooth) functions . There are two kinds: the regular solid harmonics, which are well-defined at the origin and the irregular solid harmonics, which are singular at the origin. Both sets of functions play an important role in potential theory, and are obtained by rescaling spherical harmonics appropriately:

In mathematics, vector spherical harmonics (VSH) are an extension of the scalar spherical harmonics for use with vector fields. The components of the VSH are complex-valued functions expressed in the spherical coordinate basis vectors.

In the mathematical study of rotational symmetry, the zonal spherical harmonics are special spherical harmonics that are invariant under the rotation through a particular fixed axis. The zonal spherical functions are a broad extension of the notion of zonal spherical harmonics to allow for a more general symmetry group.

Multipole radiation is a theoretical framework for the description of electromagnetic or gravitational radiation from time-dependent distributions of distant sources. These tools are applied to physical phenomena which occur at a variety of length scales - from gravitational waves due to galaxy collisions to gamma radiation resulting from nuclear decay. Multipole radiation is analyzed using similar multipole expansion techniques that describe fields from static sources, however there are important differences in the details of the analysis because multipole radiation fields behave quite differently from static fields. This article is primarily concerned with electromagnetic multipole radiation, although the treatment of gravitational waves is similar.

Partial-wave analysis, in the context of quantum mechanics, refers to a technique for solving scattering problems by decomposing each wave into its constituent angular-momentum components and solving using boundary conditions.

In pure and applied mathematics, quantum mechanics and computer graphics, a tensor operator generalizes the notion of operators which are scalars and vectors. A special class of these are spherical tensor operators which apply the notion of the spherical basis and spherical harmonics. The spherical basis closely relates to the description of angular momentum in quantum mechanics and spherical harmonic functions. The coordinate-free generalization of a tensor operator is known as a representation operator.

It is possible to define an Ambisonic signal set with non-uniform resolution depending on source direction. This practice is called mixed-order, and it has consequences for the layout and interpretation of files, streams, or physical connections in Ambisonic data exchange. As with all things Ambisonic, complexity has increased as research progressed, and the term has grown to include new concepts which were not anticipated when Ambisonics was first invented in the 1970s.

References

  1. 1 2 3 Christian Nachbar; Franz Zotter; Etienne Deleflie; Alois Sontacchi (June 2–3, 2011). AmbiX – A Suggested Ambisonics Format. Ambisonics Symposium 2011. Lexington (KY).
  2. 1 2 3 4 Malham, David (April 2003). "Higher order Ambisonic systems" (PDF). Space in Music – Music in Space (Mphil thesis). University of York. pp. 2–3. Retrieved 2 November 2007. Max-Normalisation (MaxN) by Daniel [...] The factors for this can be obtained by inspection up to about third order but over this point it becomes more difficult and requires the maxima of each polynomial to be determined (either mathematically or numerically) explicitly and then inverted.
  3. Jérôme Daniel, Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia, Paris 2001, p.151
  4. Jérôme Daniel, Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format, 23rd AES Conference, Copenhagen 2003, p. 13
  5. 1 2 Michael Chapman et al., A standard for interchange of Ambisonic signal sets , Ambisonics Symposium, Graz 2009
  6. Nachbar, Zotter, Deleflie, and Sontacchi (2011) lc, p.3, eq(3)
  7. Daniel (2001), lc, p.156, translated from French "Grande généricité: calcul récursif des coefficients d'encodage, les composantes d'ordre 1 étant celles du vecteur incidence (unitaire) ."
  8. Daniel (2001) lc, p.156, translated from French "Base orthonormée pour la décomposition 3D. Relation simple á SN3D (facteur ). Assure une puissance égale des composantes encodées dans le cas d'un champ parfaitement diffus 3D (intérêt dans le domaine analogique). Intérêt évident pour la résolution (en 3.3) des problèmes de décodage (restitution 3D)."
  9. Daniel (2001), lc, p.150 eq(3.9)
  10. MathWorks documentation: legendre
  11. GNU Octave documentation: legendre
  12. Wolfram language documentation: LegendreP
  13. Wolfram language documentation: SphericalHarmonicY
  14. Michael Chapman, Ambisonic channel sequence (proposed standard) Archived 2012-09-30 at the Wayback Machine
  15. 1 2 Richard Dobson The AMB Ambisonic File Format Archived 2014-04-22 at the Wayback Machine
  16. IEM, AmbiX reference implementation (API documentation)
  17. Travis, Chris, A new mixed-order scheme for Ambisonic signals Archived 2009-10-04 at the Wayback Machine , Ambisonics Symposium, Graz 2009