Last updated
Not to be confused with audiation .

Auralization is a procedure designed to model and simulate the experience of acoustic phenomena rendered as a soundfield in a virtualized space. This is useful in configuring the soundscape of architectural structures, concert venues, and public spaces, as well as in making coherent sound environments within virtual immersion systems.

A soundscape is the acoustic environment as perceived by humans, in context. The term was originally coined by Michael Southworth, and popularised by R. Murray Schafer. There is a varied history of the use of soundscape depending on discipline, ranging from urban design to wildlife ecology to computer science. An important distinction is to separate soundscape from the broader acoustic environment. The acoustic environment is the combination of all the acoustic resources, natural and artificial, within a given area as modified by the environment. The International Organization for Standardization (ISO) standardized these definitions in 2014.(ISO 12913-1:2014)



The English term auralization was used for the first time by Kleiner et al. in an article in the journal of the AES en 1991 [1] .

Established in 1948, the Audio Engineering Society (AES) draws its membership from engineers, scientists, other individuals with an interest or involvement in the professional audio industry. The membership largely comprises engineers developing devices or products for audio, and persons working in audio content production. It also includes acousticians, audiologists, academics, and those in other disciplines related to audio. The AES is the only worldwide professional society devoted exclusively to audio technology.

The increase of computational power allowed the development of the first acoustic simulation software towards the end of the 1960s [2] .


Auralizations are experienced through systems rendering virtual acoustic models made by convolving or mixing acoustic events recorded 'dry' (or in an anechoic chamber) projected within a virtual model of an acoustic space, the characteristics of which are determined by means of sampling its impulse response (IR). Once this has been determined, the simulation of the resulting soundfield in the target environment is obtained by convolution:

Impulse response a dynamic systems output when inputted with a brief input signal

In signal processing, the impulse response, or impulse response function (IRF), of a dynamic system is its output when presented with a brief input signal, called an impulse. More generally, an impulse response is the reaction of any dynamic system in response to some external change. In both cases, the impulse response describes the reaction of the system as a function of time.

Convolution mathematical operation

In mathematics convolution is a mathematical operation on two functions that produces a third function expressing how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reversed and shifted.

The resulting sound is heard as it would if emitted in that acoustic space.


For auralizations to be perceived as realistic, it is critical to emulate the human hearing in terms of position and orientation of the listener's head with respect to the sources of sound. For IR data to be convolved convincingly, the acoustic events are captured using a dummy head where two microphones are positioned on each side of the head to record an emulation of sound arriving at the locations of human ears, or using an ambisonics microphone array and mixed down for binaurality. Head-related transfer functions (HRTF) datasets can be used to simplify the process insofar as a monaural IR can be measured or simulated, then audio content is convolved with its target acoustic space. In rendering the experience, the transfer function corresponding to the orientation of the head is applied to simulate the corresponding spatial emanation of sound.

Ambisonics studio or live way of recording with many channels

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

Head-related transfer function

A head-related transfer function (HRTF) also sometimes known as the anatomical transfer function (ATF) is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person.

See also

Notes and references

  2. M. Vorländer. Auralization : Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality

Related Research Articles

A microphone array is any number of microphones operating in tandem. There are many applications:

Reverberation, in psychoacoustics and acoustics, is a persistence of sound after the sound is produced. A reverberation, or reverb, is created when a sound or signal is reflected causing numerous reflections to build up and then decay as the sound is absorbed by the surfaces of objects in the space – which could include furniture, people, and air. This is most noticeable when the sound source stops but the reflections continue, decreasing in amplitude, until they reach zero amplitude.

In mathematics, deconvolution is an algorithm-based process used to reverse the effects of convolution on recorded data. The concept of deconvolution is widely used in the techniques of signal processing and image processing. Because these techniques are in turn widely used in many scientific and engineering disciplines, deconvolution finds many applications.

Sound pressure or acoustic pressure is the local pressure deviation from the ambient atmospheric pressure, caused by a sound wave. In air, sound pressure can be measured using a microphone, and in water with a hydrophone. The SI unit of sound pressure is the pascal (Pa).

Crowd simulation

Crowd simulation is the process of simulating the movement of a large number of entities or characters. It is commonly used to create virtual scenes for visual media like films and video games, and is also used in crisis training, architecture and urban planning, and evacuation simulation.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

The scale-invariant feature transform (SIFT) is a feature detection algorithm in computer vision to detect and describe local features in images. It was patented in Canada by the University of British Columbia and published by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

Smoothed-particle hydrodynamics Method of hydrodynamics simulation

Smoothed-particle hydrodynamics (SPH) is a computational method used for simulating the mechanics of continuum media, such as solid mechanics and fluid flows. It was developed by Gingold and Monaghan and Lucy in 1977, initially for astrophysical problems. It has been used in many fields of research, including astrophysics, ballistics, volcanology, and oceanography. It is a meshfree Lagrangian method, and the resolution of the method can easily be adjusted with respect to variables such as density.

The Soundfield microphone is an audio microphone composed of four closely spaced subcardioid or cardioid (unidirectional) microphone capsules arranged in a tetrahedron. It was invented by Michael Gerzon and Peter Craven, and is a part of, but not exclusive to, Ambisonics, a surround sound technology. It can function as a mono, stereo or surround sound microphone, optionally including height information.

Flux balance analysis in chemical engineering/systems biology

Flux balance analysis (FBA) is a mathematical method for simulating metabolism in genome-scale reconstructions of metabolic networks. In comparison to traditional methods of modeling, FBA is less intensive in terms of the input data required for constructing the model. Simulations performed using FBA are computationally inexpensive and can calculate steady-state metabolic fluxes for large models in a few seconds on modern personal computers.

In audio signal processing, convolution reverb is a process used for digitally simulating the reverberation of a physical or virtual space through the use of software profiles; a piece of software that creates a simulation of an audio environment. It is based on the mathematical convolution operation, and uses a pre-recorded audio sample of the impulse response of the space being modeled. To apply the reverberation effect, the impulse-response recording is first stored in a digital signal-processing system. This is then convolved with the incoming audio signal to be processed. The process of convolution multiplies each sample of the audio to be processed (reverberated) with the samples in the impulse response file.

Interactive skeleton-driven simulation is a scientific computer simulation technique used to approximate realistic physical deformations of dynamic bodies in real-time. It involves using elastic dynamics and mathematical optimizations to decide the body-shapes during motion and interaction with forces. It has various applications within realistic simulations for medicine, 3D computer animation and virtual reality.

Filtering in the context of large eddy simulation (LES) is a mathematical operation intended to remove a range of small scales from the solution to the Navier-Stokes equations. Because the principal difficulty in simulating turbulent flows comes from the wide range of length and time scales, this operation makes turbulent flow simulation cheaper by reducing the range of scales that must be resolved. The LES filter operation is low-pass, meaning it filters out the scales associated with high frequencies.

3D sound localization refers to an acoustic technology that is used to locate the source of a sound in a three-dimensional space. The source location is usually determined by the direction of the incoming sound waves and the distance between the source and sensors. It involves the structure arrangement design of the sensors and signal processing techniques.

Perceptual-based 3D sound localization is the application of knowledge of the human auditory system to develop 3D sound localization technology.

3D sound refers to the way humans experience sound in their everyday lives. In real life, people are always surrounded by sound. Sounds arrive at our ears from every direction and from varying distances. These and other factors contribute to the three-dimensional aural image humans hear. Scientists and engineers who work with 3D sound work to accurately synthesize the complexity of real-world sounds.

Positional tracking

Positional tracking detects the precise position of the head-mounted displays, controllers, other objects or body parts within Euclidean space. Positional tracking registers the exact position due to recognition of the rotation and recording of the translational movements. Since virtual reality is about emulating and altering reality it’s important that we can track accurately how objects move in real life in order to represent them inside VR. Defining the position and orientation of a real object in space is determined with the help of special sensors or markers. Sensors record the signal from the real object when it moves or is moved and transmit the received information to the computer.