# Wave field synthesis

Last updated

Wave field synthesis (WFS) is a spatial audio rendering technique, characterized by creation of virtual acoustic environments. It produces artificial wavefronts synthesized by a large number of individually driven loudspeakers. Such wavefronts seem to originate from a virtual starting point, the virtual source or notional source. Contrary to traditional spatialization techniques such as stereo or surround sound, the localization of virtual sources in WFS does not depend on or change with the listener's position.

Acoustic space is an acoustic environment in which sound can be heard by an observer. The term "acoustic space" was first mentioned by Marshall McLuhan, a professor and a philosopher.

In physics, a wavefront of a time-varying field is the set (locus) of all points where the wave has the same phase of the sinusoid. The term is generally meaningful only for fields that, at each point, vary sinusoidally in time with a single temporal frequency.

A loudspeaker is an electroacoustic transducer; a device which converts an electrical audio signal into a corresponding sound. The most widely used type of speaker in the 2010s is the dynamic speaker, invented in 1924 by Edward W. Kellogg and Chester W. Rice. The dynamic speaker operates on the same basic principle as a dynamic microphone, but in reverse, to produce sound from an electrical signal. When an alternating current electrical audio signal is applied to its voice coil, a coil of wire suspended in a circular gap between the poles of a permanent magnet, the coil is forced to move rapidly back and forth due to Faraday's law of induction, which causes a diaphragm attached to the coil to move back and forth, pushing on the air to create sound waves. Besides this most common method, there are several alternative technologies that can be used to convert an electrical signal into sound. The sound source must be amplified or strengthened with an audio power amplifier before the signal is sent to the speaker.

## Physical fundamentals

WFS is based on the Huygens–Fresnel principle, which states that any wavefront can be regarded as a superposition of elementary spherical waves. Therefore, any wavefront can be synthesized from such elementary waves. In practice, a computer controls a large array of individual loudspeakers and actuates each one at exactly the time when the desired virtual wavefront would pass through it.

The Huygens–Fresnel principle is a method of analysis applied to problems of wave propagation both in the far-field limit and in near-field diffraction and also reflection. It states that every point on a wavefront is itself the source of spherical wavelets. The sum of these spherical wavelets forms the wavefront.

The basic procedure was developed in 1988 by Professor A.J. Berkhout at the Delft University of Technology. [1] Its mathematical basis is the Kirchhoff–Helmholtz integral. It states that the sound pressure is completely determined within a volume free of sources, if sound pressure and velocity are determined in all points on its surface.

Delft University of Technology also known as TU Delft, is the largest and oldest Dutch public technological university, located in Delft, Netherlands. As of 2019, it is ranked in the top 20 of best universities for engineering and technology worldwide and is the highest ranked university in the Netherlands.

The Kirchhoff–Helmholtz integral combines the Helmholtz equation with the Kirchhoff integral theorem to produce a method applicable to acoustics, seismology and other disciplines involving wave propagation.

${\displaystyle {\boldsymbol {P}}(w,z)=\iint _{dA}\left(G(w,z\vert z'){\frac {\partial }{\partial n}}P(w,z')-P(w,z'){\frac {\partial }{\partial n}}G(w,z\vert z')\right)dz'}$

Therefore, any sound field can be reconstructed, if sound pressure and acoustic velocity are restored on all points of the surface of its volume. This approach is the underlying principle of holophony.

For reproduction, the entire surface of the volume would have to be covered with closely spaced loudspeakers, each individually driven with its own signal. Moreover, the listening area would have to be anechoic, in order to avoid sound reflections that would violate source-free volume assumption. In practice, this is hardly feasible. Because our acoustic perception is most exact in the horizontal plane, practical approaches generally reduce the problem to a horizontal loudspeaker line, circle or rectangle around the listener.

Reverberation, in psychoacoustics and acoustics, is a persistence of sound after the sound is produced. A reverberation, or reverb, is created when a sound or signal is reflected causing numerous reflections to build up and then decay as the sound is absorbed by the surfaces of objects in the space – which could include furniture, people, and air. This is most noticeable when the sound source stops but the reflections continue, decreasing in amplitude, until they reach zero amplitude.

The origin of the synthesized wavefront can be at any point on the horizontal plane of the loudspeakers. For sources behind the loudspeakers, the array will produce convex wavefronts. Sources in front of the speakers can be rendered by concave wavefronts that focus in the virtual source and diverge again. Hence the reproduction inside the volume is incomplete - it breaks down if the listener sits between speakers and inner virtual source. The origin represents the virtual acoustic source, which approximates an acoustic source at the same position. Unlike conventional (stereo) reproduction, the perceived position of the virtual sources is independent of listener position allowing the listener to move or giving an entire audience consistent perception of audio source location.

A sound field with very stable position of the acoustic sources can be established using wave field synthesis. In principle, it is possible to establish a virtual copy of a genuine sound field indistinguishable from the real sound. Changes of the listener position in the rendition area can produce the same impression as an appropriate change of location in the recording room. Listeners are no longer relegated to a sweet spot area within the room.

The sweet spot is a term used by audiophiles and recording engineers to describe the focal point between two speakers, where an individual is fully capable of hearing the stereo audio mix the way it was intended to be heard by the mixer. The sweet spot is the location which creates an equilateral triangle together with the stereo loudspeakers. In the case of surround sound, this is the focal point between four or more speakers, i.e., the location at which all wave fronts arrive simultaneously. In international recommendations the sweet spot is referred to as "reference listening point".

The Moving Picture Expert Group standardized the object-oriented transmission standard MPEG-4 which allows a separate transmission of content (dry recorded audio signal) and form (the impulse response or the acoustic model). Each virtual acoustic source needs its own (mono) audio channel. The spatial sound field in the recording room consists of the direct wave of the acoustic source and a spatially distributed pattern of mirror acoustic sources caused by the reflections by the room surfaces. Reducing that spatial mirror source distribution onto a few transmitting channels causes a significant loss of spatial information. This spatial distribution can be synthesized much more accurately by the rendition side.

MPEG-4 is a method of defining compression of audio and visual (AV) digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) under the formal standard ISO/IEC 14496 – Coding of audio-visual objects. Uses of MPEG-4 include compression of AV data for web and CD distribution, voice and broadcast television applications. The MPEG-4 standard was developed by a group led by Touradj Ebrahimi and Fernando Pereira.

Compared to conventional channel-orientated rendition procedures, WFS provides a clear advantage: Virtual acoustic sources guided by the signal content of the associated channels can be positioned far beyond the conventional material rendition area. This reduces the influence of the listener position because the relative changes in angles and levels are clearly smaller compared to conventional loudspeakers located within the rendition area. This extends the sweet spot considerably; it can now cover nearly the entire rendition area. WFS thus is not only compatible with, but potentially improves the reproduction for conventional channel-oriented methods.

## Challenges

### Sensitivity to room acoustics

Since WFS attempts to simulate the acoustic characteristics of the recording space, the acoustics of the rendition area must be suppressed. One possible solution is use of acoustic damping or to otherwise arrange the walls in an absorbing and non-reflective configuration. A second possibility is playback within the near field. For this to work effectively the loudspeakers must couple very closely at the hearing zone or the diaphragm surface must be very large.

In some cases, the most perceptible difference compared to the original sound field is the reduction of the sound field to two dimensions along the horizontal of the loudspeaker lines. This is particularly noticeable for reproduction of ambience. The suppression of acoustics in the rendition area does not complement playback of natural acoustic ambient sources.

### Aliasing

There are undesirable spatial aliasing distortions caused by position-dependent narrow-band break-downs in the frequency response within the rendition range. Their frequency depends on the angle of the virtual acoustic source and on the angle of the listener to the loudspeaker arrangement:

${\displaystyle f_{\text{alias}}={\frac {c}{\Delta x\left|\sin \Theta ^{\text{sec}}-\sin \Theta ^{\text{v}}\right|}}}$

For aliasing-free rendition in the entire audio range a distance of the single emitters below 2 cm would be necessary. But fortunately our ear is not particularly sensitive to spatial aliasing. A 10–15 cm emitter distance is generally sufficient. [2]

### Truncation effect

Another cause for disturbance of the spherical wavefront is the truncation effect. Because the resulting wavefront is a composite of elementary waves, a sudden change of pressure can occur if no further speakers deliver elementary waves where the speaker row ends. This causes a 'shadow-wave' effect. For virtual acoustic sources placed in front of the loudspeaker arrangement this pressure change hurries ahead of the actual wavefront whereby it becomes clearly audible.

In signal processing terms, this is spectral leakage in the spatial domain and is caused by application of a rectangular function as a window function on what would otherwise be an infinite array of speakers. The shadow wave can be reduced if the volume of the outer loudspeakers is reduced; this corresponds to using a different window function which tapers off instead of being truncated.

### High cost

A further and resultant problem is high cost. A large number of individual transducers must be very close together. Reducing the number of transducers by increasing their spacing introduces spatial aliasing artifacts. Reducing the number of transducers at a given spacing reduces the size of the emitter field and limits the representation range; outside of its borders no virtual acoustic sources can be produced.

## Research and market maturity

Early development of WFS began 1988 at Delft University.[ citation needed ] Further work was carried out from January 2001 to June 2003 in the context of the CARROUSO project by the European Union which included ten institutes.[ citation needed ] The WFS sound system IOSONO was developed by the Fraunhofer Institute for digital media technology (IDMT) by the Technical University of Ilmenau in 2004.

Loudspeaker arrays implementing WFS have been installed in some cinemas and theatres and in public spaces with good success. The first live WFS transmission took place in July 2008, recreating an organ recital at Cologne Cathedral in lecture hall 104 of the Technical University of Berlin. [3] The room contains the world’s largest speaker system with 2700 loudspeakers on 832 independent channels.

Development of home-audio application of WFS has only recently begun, e.g. with the foundation of Sonic Emotion in 2002—which implements wave field synthesis technology in sound bars for home cinema. [4] [5]

Sonic Emotion [6] also develops a hardware processor, the Sonic Wave I, that began to be used in the entertainment industry field, for live music and theater, allowing to use the Wave Field Synthesis approach with only a few loudspeakers. The general idea is to limit the rendering to virtual sound sources that are positioned behind the loudspeaker's curtain. With version 5, Sonic Emotion also allows native 3D, that is with elevation, rendering, given that loudspeakers are positioned at different heights.

Research trends in wave field synthesis include the implementation of psychoacoustics to reduce the necessary number of loudspeakers, and to implement complicated sound radiation properties so that a virtual grand piano sounds as grand as in real life. [7] [8] [9]

## Related Research Articles

Binaural recording is a method of recording sound that uses two microphones, arranged with the intent to create a 3-D stereo sound sensation for the listener of actually being in the room with the performers or instruments. This effect is often created using a technique known as "dummy head recording", wherein a mannequin head is outfitted with a microphone in each ear. Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers. This idea of a three dimensional or "internal" form of sound has also translated into useful advancement of technology in many things such as stethoscopes creating "in-head" acoustics and IMAX movies being able to create a three dimensional acoustic experience.

A head-related transfer function (HRTF) also sometimes known as the anatomical transfer function (ATF) is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person.

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener. Its first application was in movie theaters. Prior to surround sound, theater sound systems commonly had three "screen channels" of sound, from loudspeakers located in front of the audience at the left, center, and right. Surround sound adds one or more channels from loudspeakers behind the listener, able to create the sensation of sound coming from any horizontal direction 360° around the listener. Surround sound formats vary in reproduction and recording methods along with the number and positioning of additional channels. The most common surround sound specification, the ITU's 5.1 standard, calls for 6 speakers: Center (C) in front of the listener, Left (L) and Right (R) at angles of 60° on either side of the center, and Left Surround (LS) and Right Surround (RS) at angles of 100–120°, plus a subwoofer whose position is not critical.

A harmonic sound is said to have a missing fundamental, suppressed fundamental, or phantom fundamental when its overtones suggest a fundamental frequency but the sound lacks a component at the fundamental frequency itself. The brain perceives the pitch of a tone not only by its fundamental frequency, but also by the periodicity implied by the relationship between the higher harmonics; we may perceive the same pitch even if the fundamental frequency is missing from a tone.

Directional Sound refers to the notion of using various devices to create fields of sound which spread less than most (small) traditional loudspeakers. Several techniques are available to accomplish this, and each has its benefits and drawbacks. Ultimately, choosing a directional sound device depends greatly on the environment in which it is deployed as well as the content that will be reproduced. Keeping these factors in mind will yield the best results through any evaluation of directional sound technologies.

The precedence effect or law of the first wavefront is a binaural psychoacoustical effect. When a sound is followed by another sound separated by a sufficiently short time delay, listeners perceive a single auditory event; its perceived spatial location is dominated by the location of the first-arriving sound. The lagging sound also affects the perceived location. However, its effect is suppressed by the first-arriving sound.

3D audio effects are a group of sound effects that manipulate the sound produced by stereo speakers, surround-sound speakers, speaker-arrays, or headphones. This frequently involves the virtual placement of sound sources anywhere in three-dimensional space, including behind, above or below the listener.

Loudspeaker acoustics is a subfield of acoustical engineering concerned with the reproduction of sound and the parameters involved in doing so in actual equipment.

Ambiophonics is a method in the public domain that employs digital signal processing (DSP) and two loudspeakers directly in front of the listener in order to improve reproduction of stereophonic and 5.1 surround sound for music, movies, and games in home theaters, gaming PCs, workstations, or studio monitoring applications. First implemented using mechanical means in 1986, today a number of hardware and VST plug-in makers offer Ambiophonic DSP. Ambiophonics eliminates crosstalk inherent in the conventional “stereo triangle” speaker placement, and thereby generates a speaker-binaural soundfield that emulates headphone-binaural sound, and creates for the listener improved perception of “reality” of recorded auditory scenes. A second speaker pair can be added in back in order to enable 360° surround sound reproduction. Additional surround speakers may be used for hall ambience, including height, if desired.

Distributed Mode Loudspeaker (DML) is a flat panel loudspeaker technology, developed by NXT, in which sound is produced by inducing uniformly distributed vibration modes in the panel through a special electro-acoustic exciter. Distributed mode loudspeakers function differently from most others, which typically produce sound by inducing pistonic motion in the diaphragm.

A parabolic loudspeaker is a loudspeaker which seeks to focus its sound in coherent plane waves either by reflecting sound output from a speaker driver to a parabolic reflector aimed at the target audience, or by arraying drivers on a parabolic surface. The resulting beam of sound travels farther, with less dissipation in air, than horn loudspeakers, and can be sent to isolated audience targets, unlike line array loudspeakers. The parabolic loudspeaker has been used for such diverse purposes as directing sound at faraway targets in performing arts centers and stadia, for industrial testing, for intimate listening at museum exhibits, and as a sonic weapon.

Psychoacoustics is the scientific study of sound perception and audiology—how humans perceive various sounds. More specifically, it is the branch of science studying the psychological and physiological responses associated with sound. It can be further categorized as a branch of psychophysics. Psychoacoustics received its name from a field within psychology—i.e., recognition science—which deals with all kinds of human perceptions. It is an interdisciplinary field of many areas, including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

Amplitude Panning is a technique in sound engineering where the same sound signal is applied to a number of loudspeakers in different directions equidistant from the listener. Then, a virtual source appears to a direction that is dependent on amplitudes of the loudspeakers. The direction may not coincide with any physical sound source. Most typically amplitude panning has been used with stereophonic loudspeaker setup. However, it is increasingly used to position virtual sources to arbitrary loudspeaker setups.

Perceptual-based 3D sound localization is the application of knowledge of the human auditory system to develop 3D sound localization technology.

3D sound reconstruction is the application of reconstruction techniques to 3D sound localization technology. These methods of reconstructing three-dimensional sound are used to recreate sounds to match natural environments and provide spatial cues of the sound source. They also see applications in creating 3D visualizations on a sound field to include physical aspects of sound waves including direction, pressure, and intensity. This technology is used in entertainment to reproduce a live performance through computer speakers. The technology is also used in military applications to determine location of sound sources. Reconstructing sound fields is also applicable to medical imaging to measure points in ultrasound.

3D sound refers to the way humans experience sound in their everyday lives. In real life, people are always surrounded by sound. Sounds arrive at our ears from every direction and from varying distances. These and other factors contribute to the three-dimensional aural image humans hear. Scientists and engineers who work with 3D sound work to accurately synthesize the complexity of real-world sounds.

Apparent source width (ASW) is the audible impression of a spatially extended sound source. Physically, this psychoacoustic impressions results from sound radiation characteristics and room acoustical properties. Wide sources are desired by listeners of music. Apparent source width affects the perceived sound of unplugged concerts of art music, opera, classical music, historically informed performance and contemporary classical music, as well as concerts that use live event support, like live sound mixing, sound reinforcement systems or a public address system, like popular music, rock music, electronic music and musical theatre. Research concerning the ASW comes from the field of room acoustics, architectural acoustics and auralization as well as musical acoustics, psychoacoustics and systematic musicology.

## References

1. Brandenburg, Karlheinz; Brix, Sandra; Sporer, Thomas (2009). 2009 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video. pp. 1–4. doi:10.1109/3DTV.2009.5069680. ISBN   978-1-4244-4317-8.
2. SonicEmotion (6 January 2012). Stereo VS WFS . Retrieved 2017-04-20.
3. SonicEmotion (12 April 2016). Sonic Emotion Absolute 3D sound in a nutshell . Retrieved 2017-04-20.
4. "Sonic Emotion Absolute 3D Sound / professional" . Retrieved 11 November 2017.
5. Ziemer, Tim (2018). "Wave Field Synthesis". In Bader, Rolf (ed.). Springer Handbook of Systematic Musicology. Springer Handbooks. Berlin / Heidelberg: Springer. pp. 329–347. doi:10.1007/978-3-662-55004-5_18. ISBN   978-3-662-55004-5.
6. Ziemer, Tim (2017). "Source Width in Music Production. Methods in Stereo, Ambisonics, and Wave Field Synthesis". In Schneider, Albrecht (ed.). Studies in Musical Acoustics and Psychoacoustics. Current Research in Systematic Musicology. 4. Cham: Springer. pp. 299–340. doi:10.1007/978-3-319-47292-8_10. ISBN   978-3-319-47292-8.
7. Ziemer, Tim (2020). Psychoacoustic Music Sound Field Synthesis (PDF). Cham: Springer International Publishing. ISBN   978-3-030-23033-3 . Retrieved 19 August 2019.