Mixed-order Ambisonics

Last updated

It is possible to define an Ambisonic signal set with non-uniform resolution depending on source direction. This practice is called mixed-order, and it has consequences for the layout and interpretation of files, streams, or physical connections in Ambisonic data exchange. As with all things Ambisonic, complexity has increased as research progressed, and the term has grown to include new concepts which were not anticipated when Ambisonics was first invented in the 1970s.

Contents

When dealing with subsets of the multipole expansion, the base ordering of the components (according to whatever ordering scheme has been agreed on) is usually not changed. Instead, the unused components are simply left out, or, where bandwidth or space is not at a premium, zeroed.

True Mixed-order schemes

Horizontal-only

The Ambisonic components up to third order. Spherical Harmonics deg3.png
The Ambisonic components up to third order.

Horizontal-only recordings or mixes are the most commonly encountered mixed-order signals, because the horizontal plane is by far the most likely location of musical or other performers, and the vast majority of playback systems deployed today does not have height capability.

Horizontal-only signal sets are derived by following along the edges of the pyramid of spherical harmonics (see illustration) and taking only the outermost components. The other components in each order have either nulls on their equators or lobes with broader expansion than the outer ones, adding no additional resolution information.

Superimposed horizontal and full-sphere signals (#H#P)

Often, height localisation is of secondary importance and full-sphere information is only sought for a more pleasant rendering of diffuse room reverberation. Hence, it is customary to take a first-order recording of a tetrahedral microphone for ambiance and augment it with artificially panned direct signals or spot microphones in higher order on the equator, resulting for example in a 2H1P (or second-order horizontal, superimposed with a first-order periphonic) signal.

If sources are concentrated on the horizontal plane, this approach is quite efficient in terms of bandwidth and reproduction complexity. However, such a system has an often-overlooked misfeature: as soon as a source leaves the equator, its sharpness and area of satisfactory reconstruction degrades rapidly to that of the (lower) periphonic order.

Complete mixed-order sets (#H#V)

To address this issue, Travis (2009) suggested an augmented mixed-order scheme which retains its horizontal resolution even for elevated sources, while limiting the lower order "smear" to the vertical dimension. [1] For such sets, he introduced the #H#V notation, where #H denotes the horizontal and #V the vertical resolution, with the understanding that they will be constant for all source positions on the sphere.

#H#V sets are derived by following the pyramid of spherical harmonics along the edges, taking the outermost components of each row on each side. For example, a 3H2V set has 15 components, leaving out only the third-order z-rotationally symmetric one in the middle of the bottom row. Since the savings in terms of bandwidth during transmission with this scheme are usually minimal, it is most often employed on the decoding side when the loudspeaker density is not quite sufficient for full-sphere high-order decoding. It is particularly beneficial to minimize resolution loss on the popular stacked-ring speaker layouts.

With the introduction of #H#V, it has become impossible to uniquely identify the mixed-order signal sets by the number of component channels.

Signals defined for particular subsets of the sphere

A proposed set of basis functions for decoding to a hemispherical speaker layout. Hemispherical basis deg5.png
A proposed set of basis functions for decoding to a hemispherical speaker layout.

In their basic AmbiX file format proposal, Nachbar et al. (2009) [2] deal with mixed-order signals by zeroing the unneeded components, with the understanding that any lossless compression algorithm used on an AmbiX file prior to transmission will deal with this situation efficiently.

However, they also propose an extended specification which makes use of an adaptor matrix that maps arbitrary data formats (such as legacy Furse-Malham signals) to the desired standardized outputs, and encompasses both #H#P and #H#V mixed order schemes. Furthermore, it can deal with input that is defined only for a subset of the sphere, such as a hemisphere or an even more constrained "window". Such a reduced input set would miss some components, and rely on the adaptor matrix to get mapped to a fully periphonic output signal set again.

This approach would in theory allow for a more compact transmission format if the part of the sphere relevant for reproduction is known in advance.

Related Research Articles

Analog television Television that uses analog signals

Analog television is the original television technology that uses analog signals to transmit video and audio. In an analog television broadcast, the brightness, colors and sound are represented by amplitude, phase and frequency of an analog signal.

PAL Colour encoding system for analogue television

Phase Alternating Line (PAL) is a colour encoding system for analogue television. It was one of three major analogue colour television standards, the others being NTSC and SECAM. In most countries it was broadcast at 625 lines, 50 fields per second, and associated with CCIR analogue broadcast television systems B, D, G, H, I or K. The articles on analog broadcast television systems further describe frame rates, image resolution, and audio modulation.

SECAM French analog color television system

SECAM, also written SÉCAM, is an analog color television system first used in France. It was one of three major analog color television standards, the others being PAL and NTSC. This page primarily discusses the SECAM colour encoding system. The articles on broadcast television systems and analog television further describe frame rates, image resolution, and audio modulation. SECAM video is composite video because the luminance and chrominance are transmitted together as one signal.

Digital-to-analog converter Device that converts a digital signal into an analog signal

In electronics, a digital-to-analog converter is a system that converts a digital signal into an analog signal. An analog-to-digital converter (ADC) performs the reverse function.

Chroma subsampling Practice of encoding images

Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance.

Ambisonics Full-sphere surround sound format

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

Multichannel Television Sound, better known as MTS, is the method of encoding three additional audio channels into an analog NTSC-format audio carrier. It was developed by the Broadcast Television Systems Committee, an industry group, and sometimes known as BTSC as a result.

Component video Video signal that has been split into component channels

Component video is an analog video signal that has been split into two or more component channels. In popular use, it refers to a type of component analog video (CAV) information that is transmitted or stored as three separate signals. Component video can be contrasted with composite video in which all the video information is combined into a single signal that is used in analog television. Like composite, component-video cables do not carry audio and are often paired with audio cables.

HD-MAC was a proposed broadcast television systems standard by the European Commission in 1986, a part of Eureka 95 project. It is an early attempt by the EEC to provide High-definition television (HDTV) in Europe. It is a complex mix of analogue signal, multiplexed with digital sound, and assistance data for decoding (DATV). The video signal was encoded with a modified D2-MAC encoder.

1080p Video mode

1080p is a set of HDTV high-definition video modes characterized by 1,920 pixels displayed across the screen horizontally and 1,080 pixels down the screen vertically; the p stands for progressive scan, i.e. non-interlaced. The term usually assumes a widescreen aspect ratio of 16:9, implying a resolution of 2.1 megapixels. It is often marketed as Full HD or FHD, to contrast 1080p with 720p resolution screens. Although 1080p is sometimes informally referred to as 2K, these terms reflect two distinct technical standards, with differences including resolution and aspect ratio.

The Soundfield microphone is an audio microphone composed of four closely spaced subcardioid or cardioid (unidirectional) microphone capsules arranged in a tetrahedron. It was invented by Michael Gerzon and Peter Craven, and is a part of, but not exclusive to, Ambisonics, a surround sound technology. It can function as a mono, stereo or surround sound microphone, optionally including height information.

HD ready European certification label

HD ready is a certification program introduced in 2005 by EICTA, now DIGITALEUROPE. HD ready minimum native resolution is 720 rows in widescreen ratio.

Low-definition television (LDTV) refers to TV systems that have a lower screen resolution than standard-definition TV systems. The term is usually used in reference to digital TV, in particular when broadcasting at the same resolution as low-definition analog TV systems. Mobile DTV systems usually transmit in low definition, as do all slow-scan TV systems.

HD Lite is the re-transmission of a particular HDTV channel at reduced picture quality compared to the source.

MUSE was a Japanese analog HDTV system, using dot-interlacing and digital video compression to deliver 1125 line, 60 field-per-second (1125i60) signals to the home. The system was standardized as ITU-R recommendation BO.786 and specified by SMPTE 260M, using a colorimetry matrix specified by SMPTE 240M.

Ambisonic UHJ format is a development of the Ambisonic surround sound system designed to be compatible with mono and stereo media. It is a hierarchy of systems in which the recorded soundfield will be reproduced with a degree of accuracy that varies according to the available channels. Although UHJ permits the use of up to four channels, only the 2-channel variant is in current use. In Ambisonics, UHJ is also known as "C-Format".

This glossary defines terms that are used in the document "Defining Video Quality Requirements: A Guide for Public Safety", developed by the Video Quality in Public Safety (VQIPS) Working Group. It contains terminology and explanations of concepts relevant to the video industry. The purpose of the glossary is to inform the reader of commonly used vocabulary terms in the video domain. This glossary was compiled from various industry sources.

Data exchange formats for Ambisonics have undergone radical changes since the early days of four-track magnetic tape. Researchers working on very high-order systems found no straightforward way to extend the traditional formats to suit their needs. Furthermore, there was no widely accepted formulation of spherical harmonics for acoustics, so one was borrowed from chemistry, quantum mechanics, computer graphics, or other fields, each of which had subtly different conventions. This led to an unfortunate proliferation of mutually incompatible ad hoc formats and much head-scratching.

The design of speaker systems for Ambisonic playback is governed by several constraints:

References

  1. Travis, Chris, A new mixed-order scheme for Ambisonic signals, Ambisonics Symposium, Graz 2009
  2. Christian Nachbar, Franz Zotter, Etienne Deleflie, and Alois Sontacchi: AmbiX - A Suggested Ambisonics Format Ambisonics Symposium 2011, Lexington (KY) 2011