3D sound synthesis

Last updated December 09, 2023

3D sound is most commonly defined as the daily human experience of sounds. The sounds arrive to the ears from every direction and varying distances, which contribute to the three-dimensional aural image humans hear. Scientists and engineers who work with 3D sound work to accurately synthesize the complexity of real-world sounds.

Purpose

Due to the presence of 3D sound in daily life and the widespread use of 3D sound localization, the application of 3D sound synthesis rose in popularity in fields such as games, home theaters and human aid systems. The purpose of 3D sound synthesis is to interpret the information gathered from 3D sound, in a way that enables the data to be studied and applied.

Applications

An application of 3D sound synthesis is the sense of presence in a virtual environment, by producing more realistic environments and sensations in games, teleconferencing systems, and tele-ensemble systems. 3D sound can also be used to help those with sensory impairments, such as the visually impaired, and act as a substitute for other sensory feedback.

The 3D sound may include the location of a source in three-dimensional space, as well as the three-dimensional sound radiation characteristics of a sound source. ^[1]

Problem statement and basics

The three main problems in 3D sound synthesis are front-to-back reversals, intracranially heard sounds, and HRTF measurements.

Front-to-back reversals are sounds that are heard directly in front of a subject when it is located at the back, and vice versa. This problem can be lessened by accurate inclusion of the subject's head movement and pinna response. When these two are missed during the HRTF calculation, the reverse problem will occur. Another solution is the early echo response, which exaggerates the differences of sounds from different directions and strengthens the pinna effects to reduce the front-to-back reversal rates. ^[2]^[3]

Intracranially heard sounds are external sounds that seem to be heard inside a person's head. This can be resolved by adding reverberation cues.

HRTF measurements are the sound noises and linearity problems that occur. By using several primary auditory cues with a subject that is skilled in localization, an effective HRTF can be generated for most cases.

Methods

The three main methods used in the 3D sound synthesis are the head-related transfer function, sound rendering, and synthesizing 3D sound with speaker location.

Head-related transfer function

Head-related transfer function (HRTF) is a linear function based on the sound source position and considers other information humans use to localize the sounds, such as the interaural time difference, head shadow, pinna response, shoulder echo, head motion, early echo response, reverberation, and vision.

The system attempts to model the human acoustic system by using an array of microphones to record sounds in human ears, which allows for more accurate synthesis of 3D sounds. The HRTF is obtained by comparing these recordings to the original sounds. Then, the HRTF is used to develop pairs of Finite Impulse Response (FIR) filters for specific sound positions with each sound having two filters for left and right. In order to place a sound at a certain position in 3D space, the set of FIR filters that correspond to the position is applied to the incoming sound, yielding a spatial sound. ^[4] The computations involved in convolving the sound signal from a particular point in space is typically large, therefore lots of work is generally needed to reduce the complexity. One such work is based on combining Principal Component Analysis (PCA) and Balanced Model Truncation (BMT) together. PCA is a widely used method in data mining and data reduction, which was used in 3D sound synthesis prior to the BMT to reduce redundancy. The BMT is applied to lower the computation complexity.

Sound rendering

The method of sound rendering involves creating a sound world by attaching a characteristic sound to each object in the scene to synthesize it as a 3D sound. The sound sources can be obtained either by sampling or artificial methods. There are two distinct passes in the method. The first pass computes the propagation paths from each object to the microphone and the result is collected for the geometric transformations of the sound source. The transformation from the first step is controlled by both delay and attenuation. The second pass creates the final soundtrack of the sound objects after being created, modulated and summed.^[5]

The rendering method, a simpler method than HRTF generation, uses the similarity between light and sound waves because sounds in space propagate in all directions. The sound waves reflect and refract just like light. The final sound heard is the integral of multi-path transmitted signals.

There are four steps to the processing procedure. The first step involves generating the characteristic sound in each object. The second step is when the sound is created and attached to the moving objects. The third step is to calculate the convolutions, which are related to the effect of reverberation. Sound rendering approximates this by using the wavelength of sound similar to the object so it diffuses in its reflections, providing a smoothing effect of the sound. The last step is applying the calculated convolutions to the sound sources in step two. These steps allow a simplified soundtracking algorithm to be used without making much difference.

Synthesizing 3D sound with speaker location

This method involves strategically placing eight speakers to simulate spatial sound, instead of attaching sampled sound to objects. ^[6] The first step consists of capturing the sound by using a cubic microphone array in the original sound field. The sound is then captured using the cubic loudspeaker array in the reproduced sound field. The listener, who is in the loudspeaker array, will feel that the sound is moving above their head when the sound is moving above the microphone array. ^[6]

The Wave field synthesis is a spatial audio rendering technique that synthesizes wavefronts by using Huygens–Fresnel principle. First, the original sound is recorded by microphone arrays and then loudspeaker arrays are used to reproduce the sound in the listening area. The arrays are placed along the boundaries of their own area where the microphones and the loudspeakers are placed as well. This technique allows multiple listeners to move in the listening area and still hear the same sound from all directions, which the binaural and crosstalk cancellation techniques cannot achieve. Generally, the sound reproduction systems using wave field synthesis place the loudspeakers in a line or around the listener in a 2D space.

Related Research Articles

Binaural recording is a method of recording sound that uses two microphones, arranged with the intent to create a 3-D stereo sound sensation for the listener of actually being in the room with the performers or instruments. This effect is often created using a technique known as dummy head recording, wherein a mannequin head is fitted with a microphone in each ear. Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers. This idea of a three-dimensional or "internal" form of sound has also translated into useful advancement of technology in many things such as stethoscopes creating "in-head" acoustics and IMAX movies being able to create a three-dimensional acoustic experience.

<span class="mw-page-title-main">Head-related transfer function</span> Response that characterizes how an ear receives a sound from a point in space

A head-related transfer function (HRTF), also known as a head shadow, is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person.

<span class="mw-page-title-main">Ambisonics</span> Full-sphere surround sound format

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener. Its first application was in movie theaters. Prior to surround sound, theater sound systems commonly had three screen channels of sound that played from three loudspeakers located in front of the audience. Surround sound adds one or more channels from loudspeakers to the side or behind the listener that are able to create the sensation of sound coming from any horizontal direction around the listener.

Acoustical engineering is the branch of engineering dealing with sound and vibration. It includes the application of acoustics, the science of sound and vibration, in technology. Acoustical engineers are typically concerned with the design, analysis and control of sound.

Synthesis or synthesize may refer to:

3D audio effects are a group of sound effects that manipulate the sound produced by stereo speakers, surround-sound speakers, speaker-arrays, or headphones. This frequently involves the virtual placement of sound sources anywhere in three-dimensional space, including behind, above or below the listener.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

Wave field synthesis (WFS) is a spatial audio rendering technique, characterized by creation of virtual acoustic environments. It produces artificial wavefronts synthesized by a large number of individually driven loudspeakers from elementary waves. Such wavefronts seem to originate from a virtual starting point, the virtual sound source. Contrary to traditional phantom sound sources, the localization of WFS established virtual sound sources does not depend on the listener's position. Like as a genuine sound source the virtual source remains at fixed starting point.

Acoustic location is a method of determining the position of an object or sound source by using sound waves. Location can take place in gases, liquids, and in solids.

This page focusses on decoding of classic first-order Ambisonics. Other relevant information is available on the Ambisonic reproduction systems page.

Ambiophonics is a method in the public domain that employs digital signal processing (DSP) and two loudspeakers directly in front of the listener in order to improve reproduction of stereophonic and 5.1 surround sound for music, movies, and games in home theaters, gaming PCs, workstations, or studio monitoring applications. First implemented using mechanical means in 1986, today a number of hardware and VST plug-in makers offer Ambiophonic DSP. Ambiophonics eliminates crosstalk inherent in the conventional stereo triangle speaker placement, and thereby generates a speaker-binaural soundfield that emulates headphone-binaural sound, and creates for the listener improved perception of reality of recorded auditory scenes. A second speaker pair can be added in back in order to enable 360° surround sound reproduction. Additional surround speakers may be used for hall ambience, including height, if desired.

LARES is an electronic sound enhancement system that uses microprocessors to control multiple loudspeakers and microphones placed around a performance space for the purpose of providing active acoustic treatment. LARES was invented in Massachusetts in 1988, by Dr David Griesinger and Steve Barbar who were working at Lexicon, Inc. LARES was given its own company division in 1990, and LARES Associates was formed in 1995 as a separate corporation. Since then, hundreds of LARES systems have been used in concert halls, opera houses performance venues, and houses of worship from outdoor music festivals to permanent indoor symphony halls.

A parabolic loudspeaker is a loudspeaker which seeks to focus its sound in coherent plane waves either by reflecting sound output from a speaker driver to a parabolic reflector aimed at the target audience, or by arraying drivers on a parabolic surface. The resulting beam of sound travels farther, with less dissipation in air, than horn loudspeakers, and can be more focused than line array loudspeakers allowing sound to be sent to isolated audience targets. The parabolic loudspeaker has been used for such diverse purposes as directing sound at faraway targets in performing arts centers and stadia, for industrial testing, for intimate listening at museum exhibits, and as a sonic weapon.

The sweet spot is a term used by audiophiles and recording engineers to describe the focal point between two speakers, where an individual is fully capable of hearing the stereo audio mix the way it was intended to be heard by the mixer. The sweet spot is the location which creates an equilateral triangle together with the stereo loudspeakers, the stereo triangle. In the case of surround sound, this is the focal point between four or more speakers, i.e., the location at which all wave fronts arrive simultaneously. In international recommendations the sweet spot is referred to as reference listening point.

Auralization is a procedure designed to model and simulate the experience of acoustic phenomena rendered as a soundfield in a virtualized space. This is useful in configuring the soundscape of architectural structures, concert venues, and public spaces, as well as in making coherent sound environments within virtual immersion systems.

3D sound localization refers to an acoustic technology that is used to locate the source of a sound in a three-dimensional space. The source location is usually determined by the direction of the incoming sound waves and the distance between the source and sensors. It involves the structure arrangement design of the sensors and signal processing techniques.

Perceptual-based 3D sound localization is the application of knowledge of the human auditory system to develop 3D sound localization technology.

3D sound reconstruction is the application of reconstruction techniques to 3D sound localization technology. These methods of reconstructing three-dimensional sound are used to recreate sounds to match natural environments and provide spatial cues of the sound source. They also see applications in creating 3D visualizations on a sound field to include physical aspects of sound waves including direction, pressure, and intensity. This technology is used in entertainment to reproduce a live performance through computer speakers. The technology is also used in military applications to determine location of sound sources. Reconstructing sound fields is also applicable to medical imaging to measure points in ultrasound.

Apparent source width (ASW) is the audible impression of a spatially extended sound source. This psychoacoustic impression results from the sound radiation characteristics of the source and the properties of the acoustic space into which it is radiating. Wide source widths are desired by listeners of music because these are associated with the sound of acoustic music, opera, classical music, and historically informed performance. Research concerning ASW comes from the field of room acoustics, architectural acoustics and auralization, as well as musical acoustics, psychoacoustics and systematic musicology.

References

↑ Ziemer, Tim (2020). Psychoacoustic Music Sound Field Synthesis. Current Research in Systematic Musicology. Vol. 7. Cham: Springer. p. 287. doi:10.1007/978-3-030-23033-3. ISBN 978-3-030-23033-3. S2CID 201136171.
↑ Burgess; David A (1992). "Techniques for low cost spatial audio". Proceedings of the 5th annual ACM symposium on User interface software and technology. pp. 53–59. CiteSeerX 10.1.1.464.4403 . doi:10.1145/142621.142628. ISBN 978-0897915496. S2CID 7413673.
↑ Zhang, Ming; Tan, Kah-Chye; M.H.Er (1998). "A refined algorithm of 3-D sound synthesis". ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344). Vol. 2. pp. 1408–1411 vol.2. doi:10.1109/ICOSP.1998.770884. ISBN 978-0-7803-4325-2. S2CID 57484436.
↑ Tonnesen, Cindy; Steinmetz, Joe. "3D Sound Synthesis".
↑ Takala; Tapio; James, Hahn (1992). "Sound rendering". Proceedings of the 19th annual conference on Computer graphics and interactive techniques. Vol. 26. pp. 211–220. doi:10.1145/133994.134063. ISBN 978-0897914796. S2CID 6252100.
1 2 3 M. Naoe; T. Kimura; Y. Yamakata; M. Katsumoto (2008). "Performance Evaluation of 3D Sound Field Reproduction System Using a Few Loudspeakers and Wave Field Synthesis". 2008 Second International Symposium on Universal Communication. pp. 36–41. doi:10.1109/ISUC.2008.35. ISBN 978-0-7695-3433-6. S2CID 16506730.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[musicsfs-1] Ziemer, Tim (2020). Psychoacoustic Music Sound Field Synthesis. Current Research in Systematic Musicology. Vol. 7. Cham: Springer. p. 287. doi:10.1007/978-3-030-23033-3. ISBN 978-3-030-23033-3. S2CID 201136171.

[2] Burgess; David A (1992). "Techniques for low cost spatial audio". Proceedings of the 5th annual ACM symposium on User interface software and technology. pp. 53–59. CiteSeerX 10.1.1.464.4403 . doi:10.1145/142621.142628. ISBN 978-0897915496. S2CID 7413673.

[3] Zhang, Ming; Tan, Kah-Chye; M.H.Er (1998). "A refined algorithm of 3-D sound synthesis". ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344). Vol. 2. pp. 1408–1411 vol.2. doi:10.1109/ICOSP.1998.770884. ISBN 978-0-7803-4325-2. S2CID 57484436.

[4] Tonnesen, Cindy; Steinmetz, Joe. "3D Sound Synthesis".

[5] Takala; Tapio; James, Hahn (1992). "Sound rendering". Proceedings of the 19th annual conference on Computer graphics and interactive techniques. Vol. 26. pp. 211–220. doi:10.1145/133994.134063. ISBN 978-0897914796. S2CID 6252100.

[auto-6] 1 2 3 M. Naoe; T. Kimura; Y. Yamakata; M. Katsumoto (2008). "Performance Evaluation of 3D Sound Field Reproduction System Using a Few Loudspeakers and Wave Field Synthesis". 2008 Second International Symposium on Universal Communication. pp. 36–41. doi:10.1109/ISUC.2008.35. ISBN 978-0-7695-3433-6. S2CID 16506730.

[1]

[2]

[3]

[4]

[5]

[6]