3D sound reconstruction

Last updated July 22, 2023

3D sound reconstruction is the application of reconstruction techniques to 3D sound localization technology. These methods of reconstructing three-dimensional sound are used to recreate sounds to match natural environments and provide spatial cues of the sound source. They also see applications in creating 3D visualizations on a sound field to include physical aspects of sound waves including direction, pressure, and intensity. This technology is used in entertainment to reproduce a live performance through computer speakers. The technology is also used in military applications to determine location of sound sources. Reconstructing sound fields is also applicable to medical imaging to measure points in ultrasound.^[1]

Techniques

To reproduce robust and natural-sounding audio from a three-dimensional audio recording, sound localization and reverberation reconstruction techniques are used. These techniques process sound to reproduce the spatial cues.

The location of a sound source is determined through three-dimensional sound localization using multiple microphone arrays, binaural hearing methods, and HRTF (head-related transfer function).
After identifying the direction, other signal processing techniques are used to measure the impulse response over lengths of time to determine the intensity components in different directions. By having both data and combining intensity of sound with direction, a three-dimensional sound field is determined and physical qualities that create the resulting changes in intensity are reconstructed.

As a result of this two-step process, the reconstructed three-dimensional sound field contains information not only on the localization of the sound source, but also on the physical aspects of the environment of the original signal source. This is its difference from the results of the sound localization process.

After the sound is reconstructed and the spatial cues are available, they need to be delivered to the customer. The different methods to do this are included in this section.

Listening room

In the listening room method, the listener receives the sound either through headphones or through loudspeakers. Headphones introduce enough sound sources for a listener to experience 3D sound with directionality. With loudspeakers, the placement and number of loudspeakers affects the depth of reproduction. There are various methods to select the speakers location. A simple model consists of five speakers, placed in the ITU-R recommended formation: center, 30° to the left, 110° to the left, 30° to the right, and 110° to the right. This setup is used with several three-dimensional sound systems and reconstruction techniques.^[2] As an alternative, the head-related transfer function can be used on the sound source signal to pan its convolution to each of the loudspeakers depending on their direction and location. This allows the calculation of the energy of signal for each speaker through evaluation of sound in several control points within the listening room.^[3]

Reverberation reconstruction

The reverberation reconstruction involves measuring the sound by a four-point microphone to measure its real delivery delays in different locations. Each microphone measures an impulse response from a time-stretched pulse signal for various time frames with various sound sources. The obtained data is applied to the 5-speaker three-dimensional sound system, as in the listening room technique. The system also convolves the head-related transfer function with the impulse response from the signal recorded by the microphones and the energy is adjusted per the original time frame of the sound signal, and an additional delay is added to the sound to match the time frame of the impulse response. The convolution and delays are applied to all the sound source data taken and summed for the resulting signal.

This technique also improves the directionality, naturalness, and clarity of the reconstructed sound with respect to the original. A drawback of this method is that the assumption of a single sound source—while real-life reverberations include various sounds with overlap—coupled with adding all the different values does not improve listeners perception of the size of the room, the perception of distance is not improved.^[3]

Laser projections

As the sound waves cause changes in air density, it subsequently causes sound pressure changes. They are measured and then processed using tomography signal processing to reconstruct the sound field. These measurements can be done using projections, eliminating the need to use multiple microphones to determine separate impulse responses. These projectors use a laser Doppler vibrometer to measure the refractive index of the medium on the laser path.^[1] These measurements are processed by Tomographic reconstruction to reproduce the three-dimensional sound field, and then the convolution back projection is used to visualize it.

Near-field acoustical holography

In near-field acoustical holography, light refraction is measured in a two-dimensional area in the medium (this two-dimensional sound field is a cross section of the three-dimensional sound field) to produce a hologram. Then the wave number of the medium is estimated through analysis of the water temperature. Multiple two-dimensional sound fields are calculated, and the three-dimensional sound field can be reconstructed as well.

This method is applicable primarily to ultrasound and to lower sound pressures, often in water and in medical imaging. The method works under the assumption that the wave number of the medium is constant. If the wave number is changing throughout the medium, this method cannot reconstruct the three-dimensional sound field as accurately.^[4]

Related Research Articles

<span class="mw-page-title-main">Loudspeaker</span> Converts an electrical audio signal into a corresponding sound

A loudspeaker is an electroacoustic transducer that converts an electrical audio signal into a corresponding sound. A speaker system, also often simply referred to as a "speaker" or "loudspeaker", comprises one or more such speaker drivers, an enclosure, and electrical connections possibly including a crossover network. The speaker driver can be viewed as a linear motor attached to a diaphragm which couples that motor's movement to motion of air, that is, sound. An audio signal, typically from a microphone, recording, or radio broadcast, is amplified electronically to a power level capable of driving that motor in order to reproduce the sound corresponding to the original unamplified electronic signal. This is thus the opposite function to the microphone; indeed the dynamic speaker driver, by far the most common type, is a linear motor in the same basic configuration as the dynamic microphone which uses such a motor in reverse, as a generator.

Binaural recording is a method of recording sound that uses two microphones, arranged with the intent to create a 3-D stereo sound sensation for the listener of actually being in the room with the performers or instruments. This effect is often created using a technique known as dummy head recording, wherein a mannequin head is fitted with a microphone in each ear. Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers. This idea of a three-dimensional or "internal" form of sound has also translated into useful advancement of technology in many things such as stethoscopes creating "in-head" acoustics and IMAX movies being able to create a three-dimensional acoustic experience.

<span class="mw-page-title-main">Head-related transfer function</span> Response that characterizes how an ear receives a sound from a point in space

A head-related transfer function (HRTF), also known as anatomical transfer function (ATF), or a head shadow, is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person.

Reverberation, in acoustics, is a persistence of sound after it is produced. Reverberation is created when a sound or signal is reflected. This causes numerous reflections to build up and then decay as the sound is absorbed by the surfaces of objects in the space – which could include furniture, people, and air. This is most noticeable when the sound source stops but the reflections continue, their amplitude decreasing, until zero is reached.

<span class="mw-page-title-main">Ambisonics</span> Full-sphere surround sound format

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener. Its first application was in movie theaters. Prior to surround sound, theater sound systems commonly had three screen channels of sound that played from three loudspeakers located in front of the audience. Surround sound adds one or more channels from loudspeakers to the side or behind the listener that are able to create the sensation of sound coming from any horizontal direction around the listener.

3D audio effects are a group of sound effects that manipulate the sound produced by stereo speakers, surround-sound speakers, speaker-arrays, or headphones. This frequently involves the virtual placement of sound sources anywhere in three-dimensional space, including behind, above or below the listener.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

In the field of acoustics, a diaphragm is a transducer intended to inter-convert mechanical vibrations to sounds, or vice versa. It is commonly constructed of a thin membrane or sheet of various materials, suspended at its edges. The varying air pressure of sound waves imparts mechanical vibrations to the diaphragm which can then be converted to some other type of signal; examples of this type of diaphragm are found in microphones and the human eardrum. Conversely a diaphragm vibrated by a source of energy beats against the air, creating sound waves. Examples of this type of diaphragm are loudspeaker cones and earphone diaphragms and are found in air horns.

Absolute phase refers to the phase of a waveform relative to some standard. To the extent that this standard is accepted by all parties, one can speak of an absolute phase in a particular field of application.

<span class="mw-page-title-main">Digital room correction</span> Acoustics process

Digital room correction is a process in the field of acoustics where digital filters designed to ameliorate unfavorable effects of a room's acoustics are applied to the input of a sound reproduction system. Modern room correction systems produce substantial improvements in the time domain and frequency domain response of the sound reproduction system.

Acoustic location is the use of sound to determine the distance and direction of its source or reflector. Location can be done actively or passively, and can take place in gases, liquids, and in solids.

Ambiophonics is a method in the public domain that employs digital signal processing (DSP) and two loudspeakers directly in front of the listener in order to improve reproduction of stereophonic and 5.1 surround sound for music, movies, and games in home theaters, gaming PCs, workstations, or studio monitoring applications. First implemented using mechanical means in 1986, today a number of hardware and VST plug-in makers offer Ambiophonic DSP. Ambiophonics eliminates crosstalk inherent in the conventional stereo triangle speaker placement, and thereby generates a speaker-binaural soundfield that emulates headphone-binaural sound, and creates for the listener improved perception of reality of recorded auditory scenes. A second speaker pair can be added in back in order to enable 360° surround sound reproduction. Additional surround speakers may be used for hall ambience, including height, if desired.

Interferometry examines the general interference phenomena between pairs of signals in order to gain useful information about the subsurface. Seismic interferometry (SI) utilizes the crosscorrelation of signal pairs to reconstruct the impulse response of a given media. Papers by Keiiti Aki (1957), Géza Kunetz, and Jon Claerbout (1968) helped develop the technique for seismic applications and provided the framework upon which modern theory is based.

An acoustic camera is an imaging device used to locate sound sources and to characterize them. It consists of a group of microphones, also called a microphone array, from which signals are simultaneously collected and processed to form a representation of the location of the sound sources.

3D sound localization refers to an acoustic technology that is used to locate the source of a sound in a three-dimensional space. The source location is usually determined by the direction of the incoming sound waves and the distance between the source and sensors. It involves the structure arrangement design of the sensors and signal processing techniques.

Perceptual-based 3D sound localization is the application of knowledge of the human auditory system to develop 3D sound localization technology.

3D sound is most commonly defined as the daily human experience of sounds. The sounds arrive at the ears from every direction and varying distances, which contribute to the three-dimensional aural image humans hear. Scientists and engineers who work with 3D sound work to accurately synthesize the complexity of real-world sounds.

Deep learning in photoacoustic imaging combines the hybrid imaging modality of photoacoustic imaging (PA) with the rapidly evolving field of deep learning. Photoacoustic imaging is based on the photoacoustic effect, in which optical absorption causes a rise in temperature, which causes a subsequent rise in pressure via thermo-elastic expansion. This pressure rise propagates through the tissue and is sensed via ultrasonic transducers. Due to the proportionality between the optical absorption, the rise in temperature, and the rise in pressure, the ultrasound pressure wave signal can be used to quantify the original optical energy deposition within the tissue.

A reverb effect, or reverb, is an audio effect applied to a sound signal to simulate reverberation. It may be created through physical means, such as echo chambers, or electronically through audio signal processing.

References

1 2 Oikawa; Goto; Ikeda; Takizawa; Yamasaki (2005). "Sound Field Measurements Based on Reconstruction from Laser Projections". Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Vol. 4. pp. iv/661–iv/664. doi:10.1109/ICASSP.2005.1416095. ISBN 978-0-7803-8874-1. S2CID 15044296.
↑ Kim; Jee; Park; Yoon; Choi (2004). "The real-time implementation of 3D sound system using DSP". IEEE 60th Vehicular Technology Conference, 2004. VTC2004-Fall. 2004. Vol. 7. pp. 4798–480. doi:10.1109/VETECF.2004.1405005. ISBN 978-0-7803-8521-4. S2CID 9906064.
1 2 Tanno; Saiji; Huang (2013). "A new 5-loudspeaker 3D sound system with a reverberation reconstruction method". 2013 International Joint Conference on Awareness Science and Technology & Ubi-Media Computing (ICAST 2013 & UMEDIA 2013). pp. 174–179. doi:10.1109/ICAwST.2013.6765429. ISBN 978-1-4799-2364-9. S2CID 11582154.
↑ Ohbuchi; Mizutani; Wakatsuki; Nishimiya; Masuyama (2009). "Reconstruction of Three-Dimensional Sound Field from Two-Dimensional Sound Field Using Optical Computerized Tomography and Near-Field Acoustical Holography". Japanese Journal of Applied Physics. 48 (7): 07. Bibcode:2009JaJAP..48gGC03O. doi:10.1143/JJAP.48.07GC03. S2CID 119815337.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Oikawa-1] 1 2 Oikawa; Goto; Ikeda; Takizawa; Yamasaki (2005). "Sound Field Measurements Based on Reconstruction from Laser Projections". Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Vol. 4. pp. iv/661–iv/664. doi:10.1109/ICASSP.2005.1416095. ISBN 978-0-7803-8874-1. S2CID 15044296.

[Kim-2] Kim; Jee; Park; Yoon; Choi (2004). "The real-time implementation of 3D sound system using DSP". IEEE 60th Vehicular Technology Conference, 2004. VTC2004-Fall. 2004. Vol. 7. pp. 4798–480. doi:10.1109/VETECF.2004.1405005. ISBN 978-0-7803-8521-4. S2CID 9906064.

[Tanno-3] 1 2 Tanno; Saiji; Huang (2013). "A new 5-loudspeaker 3D sound system with a reverberation reconstruction method". 2013 International Joint Conference on Awareness Science and Technology & Ubi-Media Computing (ICAST 2013 & UMEDIA 2013). pp. 174–179. doi:10.1109/ICAwST.2013.6765429. ISBN 978-1-4799-2364-9. S2CID 11582154.

[Ohbuchi-4] Ohbuchi; Mizutani; Wakatsuki; Nishimiya; Masuyama (2009). "Reconstruction of Three-Dimensional Sound Field from Two-Dimensional Sound Field Using Optical Computerized Tomography and Near-Field Acoustical Holography". Japanese Journal of Applied Physics. 48 (7): 07. Bibcode:2009JaJAP..48gGC03O. doi:10.1143/JJAP.48.07GC03. S2CID 119815337.

[1]

[2]

[3]

[4]