Binaural unmasking

Last updated

Binaural unmasking is phenomenon of auditory perception discovered by Ira Hirsh. [1] In binaural unmasking, the brain combines information from the two ears in order to improve signal detection and identification in noise. The phenomenon is most commonly observed when there is a difference between the interaural phase of the signal and the interaural phase of the noise. When such a difference is present there is an improvement in masking threshold compared to a reference situation in which the interaural phases are the same, or when the stimulus has been presented monaurally. Those two cases usually give very similar thresholds. The size of the improvement is known as the "binaural masking level difference" (BMLD), or simply as the "masking level difference".

Contents

Binaural unmasking is most effective at low frequencies. The BMLD for pure tones in broadband noise reaches a maximum value of about 15 decibels (dB) at 250 Hz and progressively declines to 2-3 dB at 1500 Hz. The BMLD then stabilises at 2-3 dB for all higher frequencies, up to at least 4 kHz. [2] Binaural unmasking can also be observed for narrowband masking noises, but the effect behaves differently: larger BMLDs can be observed and there is little evidence of a decline with increasing frequency. [3]

Improved identification of speech in noise was first reported by J.C.R. Licklider. [4] Licklider noted that a difference in interaural phase that was being used in unmasking is similar to interaural time difference, which varies with the direction of a sound source and is involved in sound localisation. The fact that speech can be unmasked and the underlying cues vary with sound direction raised the possibility that binaural unmasking plays a role in the cocktail party effect.

Labelling system

A systematic labelling system for different stimulus configurations, first used by Jeffress, [5] has been adopted by most authors in the area. The condition names are written NxSy, where x is interaural configuration of the noise and y is the interaural configuration of the signal. Some common values for x and y include:

Theories

Binaural unmasking has two main explanatory frameworks. These are based on interaural cross-correlation [6] and interaural subtraction. [7]

The cross-correlation account relies on the existence of a coincidence detection network in the midbrain similar to that proposed by Lloyd A. Jeffress [8] to account for sensitivity to interaural time differences in sound localization. Each coincidence detector receives a stream of action potentials from the two ears via a network of axons that introduce differential transmission delays. Detection of a signal is thought to occur when the response rate of the most active coincidence detector is reduced by the presence of a signal. Cross-correlation of the signals at the two ears is often used as mathematical surrogate for modelling such an array of coincidence detecting neurons; the reduced response rate is translated into a reduction in the cross-correlation maximum.

The subtractive account is known as "equalization-cancellation" or "EC" theory. In this account, the waveforms at the two ears (or their internal representations) are temporally aligned (equalized) by the brain, before being subtracted one from the other. In effect, the coincidence detectors are replaced with neurons that are excited by action potentials from one ear, but inhibited by action potentials from the other. However, EC theory is not generally framed in such explicit neurological terms, and no suitable neural substrate has been identified in the brain. Nonetheless, EC theory has proved a very popular modelling framework, and has fared well in direct comparison with cross-correlation models in psychoacoustic experiments [9]

Perceptual cues

The ear filters incoming sound into different frequencies: a given place in the cochlea, and a given auditory nerve fibre, respond only to a limited range of frequencies. Consequently, researchers have examined the cues that are generated by mixtures of speech and noise at the two ears within a narrow frequency band around the signal. When a signal and narrowband noise are added, a vector summation occurs in which the resultant amplitude and phase differ from those of the noise or signal alone. For a binaural unmasking stimulus, the differences between the interaural parameters of the signal and noise mean that there will be a different vector summation at each ear. [5] Consequently, regardless of the stimulus construction, there tend to be fluctuations in both the level and phase differences of the stimuli at the listener's ears.

Experiments have examined which of these cues the auditory system can best detect. These have shown that, at low frequencies (specifically 500 Hz), the auditory system is most sensitive to the interaural time differences. [10] At higher frequencies, however, there seems to be a transition to using interaural level differences. [11]

Practical implications

In everyday life, speech is more easily understood in noise when speech and noise come from different directions, a phenomenon known as "spatial release from masking". In this situation, the speech and noise have distinct interaural time differences and interaural level differences. The time differences are produced by the differences in the length of the sound path to the two ears and the level differences are caused by the acoustic shadowing effect of the head. These two cues play a major role in sound localisation, and have both been shown to have independent effects in spatial release from masking. [12] The interaural level differences can give rise to one ear or the other having a better signal-to-noise ratio, which would allow the listener to gain an intelligibility improvement by simply listening to that ear. However, the interaural time differences can only be exploited by comparing the waveforms at the two ears. Successful models of spatial release from masking tend to use equalization-cancellaton theory to generate the effects of interaural time differences. [13]

Related Research Articles

<span class="mw-page-title-main">Auditory system</span> Sensory system used for hearing

The auditory system is the sensory system for the sense of hearing. It includes both the sensory organs and the auditory parts of the sensory system.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

<span class="mw-page-title-main">Interaural time difference</span>

The interaural time difference when concerning humans or animals, is the difference in arrival time of a sound between two ears. It is important in the localization of sounds, as it provides a cue to the direction or angle of the sound source from the head. If a signal arrives at the head from one side, the signal has further to travel to reach the far ear than the near ear. This pathlength difference results in a time difference between the sound's arrivals at the ears, which is detected and aids the process of identifying the direction of sound source.

Ultrasonic hearing is a recognised auditory effect which allows humans to perceive sounds of a much higher frequency than would ordinarily be audible using the inner ear, usually by stimulation of the base of the cochlea through bone conduction. Normal human hearing is recognised as having an upper bound of 15–28 kHz, depending on the person.

Binaural fusion or binaural integration is a cognitive process that involves the combination of different auditory information presented binaurally, or to each ear. In humans, this process is essential in understanding speech as one ear may pick up more information about the speech stimuli than the other.

Computational auditory scene analysis (CASA) is the study of auditory scene analysis by computational means. In essence, CASA systems are "machine listening" systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field of blind signal separation in that it is based on the mechanisms of the human auditory system, and thus uses no more than two microphone recordings of an acoustic environment. It is related to the cocktail party problem.

Duplex perception refers to the linguistic phenomenon whereby "part of the acoustic signal is used for both a speech and a nonspeech percept." A listener is presented with two simultaneous, dichotic stimuli. One ear receives an isolated third-formant transition that sounds like a nonspeech chirp. At the same time the other ear receives a base syllable. This base syllable consists of the first two formants, complete with formant transitions, and the third formant without a transition. Normally, there would be peripheral masking in such a binaural listening task but this does not occur. Instead, the listener's percept is duplex, that is, the completed syllable is perceived and the nonspeech chirp is heard at the same time. This is interpreted as being due to the existence of a special speech module.

Delayed Auditory Feedback (DAF), also called delayed sidetone, is a type of altered auditory feedback that consists of extending the time between speech and auditory perception. It can consist of a device that enables a user to speak into a microphone and then hear their voice in headphones a fraction of a second later. Some DAF devices are hardware; DAF computer software is also available. Most delays that produce a noticeable effect are between 50–200 milliseconds (ms). DAF usage has been shown to induce mental stress.

The ASA Silver Medal is an award presented by the Acoustical Society of America to individuals, without age limitation, for contributions to the advancement of science, engineering, or human welfare through the application of acoustic principles or through research accomplishments in acoustics. The medal is awarded in a number of categories depending on the technical committee responsible for making the nomination.

Dichotic pitch is a pitch heard due to binaural processing, when the brain combines two noises presented simultaneously to the ears. In other words, it cannot be heard when the sound stimulus is presented monaurally but, when it is presented binaurally a sensation of a pitch can be heard. The binaural stimulus is presented to both ears through headphones simultaneously, and is the same in several respects except for a narrow frequency band that is manipulated. The most common variation is the Huggins Pitch, which presents white-noise that only differ in the interaural phase relation over a narrow range of frequencies. For humans, this phenomenon is restricted to fundamental frequencies lower than 330 Hz and extremely low sound pressure levels. Experts investigate the effects of the dichotic pitch on the brain. For instance, there are studies that suggested it evokes activation at the lateral end of Heschl's gyrus.

Diplacusis, also known as diplacusis binauralis, binauralis disharmonica or interaural pitch difference (IPD), is a hearing disorder whereby a single auditory stimulus is perceived as different pitches between ears. It is typically experienced as a secondary symptom of sensorineural hearing loss, although not all patients with sensorineural hearing loss experience diplacusis or tinnitus. The onset is usually spontaneous and can occur following an acoustic trauma, for example an explosive noise, or in the presence of an ear infection. Sufferers may experience the effect permanently, or it may resolve on its own. Diplacusis can be particularly disruptive to individuals working within fields requiring acute audition, such as musicians, sound engineers or performing artists.

<span class="mw-page-title-main">William M. Hartmann</span>

William M. Hartmann is a noted physicist, psychoacoustician, author, and former president of the Acoustical Society of America. His major contributions in psychoacoustics are in pitch perception, binaural hearing, and sound localization. Working with junior colleagues, he discovered several major pitch effects: the binaural edge pitch, the binaural coherence edge pitch, the pitch shifts of mistuned harmonics, and the harmonic unmasking effect. His textbook, Signals, Sound and Sensation, is widely used in courses on psychoacoustics. He is currently a professor of physics at Michigan State University.

<span class="mw-page-title-main">Lombard effect</span>

The Lombard effect or Lombard reflex is the involuntary tendency of speakers to increase their vocal effort when speaking in loud noise to enhance the audibility of their voice. This change includes not only loudness but also other acoustic features such as pitch, rate, and duration of syllables. This compensation effect maintains the auditory signal-to-noise ratio of the speaker's spoken words.

Infrasound is sound at frequencies lower than the low frequency end of human hearing threshold at 20 Hz. It is known, however, that humans can perceive sounds below this frequency at very high pressure levels. Infrasound can come from many natural as well as man-made sources, including weather patterns, topographic features, ocean wave activity, thunderstorms, geomagnetic storms, earthquakes, jet streams, mountain ranges, and rocket launchings. Infrasounds are also present in the vocalizations of some animals. Low frequency sounds can travel for long distances with very little attenuation and can be detected hundreds of miles away from their sources.

Perceptual-based 3D sound localization is the application of knowledge of the human auditory system to develop 3D sound localization technology.

<span class="mw-page-title-main">Lloyd A. Jeffress</span> American scientist

Lloyd Alexander Jeffress was an acoustical scientist, a professor of experimental psychology at the University of Texas at Austin, and a developer of mine-hunting models for the US Navy during World War II and after, Jeffress was known to psychologists for his pioneering research on auditory masking in psychoacoustics, his stimulus-oriented approach to signal-detection theory in psychophysics, and his "ingenious" electronic and mathematical models of the auditory process.

Temporal envelope (ENV) and temporal fine structure (TFS) are changes in the amplitude and frequency of sound perceived by humans over time. These temporal changes are responsible for several aspects of auditory perception, including loudness, pitch and timbre perception and spatial hearing.

<span class="mw-page-title-main">Brian Moore (scientist)</span>

Brian C.J. Moore FMedSci, FRS is an Emeritus Professor of Auditory Perception in the University of Cambridge and an Emeritus Fellow of Wolfson College, Cambridge. His research focuses on psychoacoustics, audiology, and the development and assessment of hearing aids.

<span class="mw-page-title-main">Christian Lorenzi</span>

Christian Lorenzi is Professor of Experimental Psychology at École Normale Supérieure in Paris, France, where he has been Director of the Department of Cognitive Studies and Director of Scientific Studies until. Lorenzi works on auditory perception.

<span class="mw-page-title-main">Quentin Summerfield</span> British psychologist

Quentin Summerfield is a British psychologist, specialising in hearing. He joined the Medical Research Council Institute of Hearing Research in 1977 and served as its deputy director from 1993 to 2004, before moving on to a chair in psychology at The University of York. He served as head of the Psychology department from 2011 to 2017 and retired in 2018, becoming an emeritus professor. From 2013 to 2018, he was a member of the University of York's Finance & Policy Committee. From 2015 to 2018, he was a member of York University's governing body, the Council.

References

  1. Hirsh IJ (1948). "The influence of interaural phase on interaural summation and inhibition". J. Acoust. Soc. Am. 20 (4): 536–544. Bibcode:1948ASAJ...20..536H. doi:10.1121/1.1906407.
  2. Hirsh IJ, Burgeat M (1958). "Binaural Effects in Remote Masking". J. Acoust. Soc. Am. 30 (9): 827–832. Bibcode:1958ASAJ...30..827H. doi:10.1121/1.1909781.
  3. McFadden D, Pasanen EG (1978). "Binaural detection at high frequencies with time-delayed waveforms". J. Acoust. Soc. Am. 34 (4): 1120–1131. Bibcode:1978ASAJ...63.1120M. doi:10.1121/1.381820. PMID   649871.
  4. Licklider JC (1948). "The influence of interaural phase relations upon the masking of speech by white noise". J. Acoust. Soc. Am. 20 (2): 150–159. Bibcode:1948ASAJ...20..150L. doi:10.1121/1.1906358.
  5. 1 2 Jeffress LA, Blodgett HC, Sandel TT, Wood CL (1956). "Masking of Tonal Signals". J. Acoust. Soc. Am. 28 (3): 416–426. Bibcode:1956ASAJ...28..416J. doi:10.1121/1.1908346.
  6. Colburn HS (1977). "Theory of binaural interaction based on auditory-nerve data. II. Detection of tones in noise". J. Acoust. Soc. Am. 61 (2): 525–533. Bibcode:1977ASAJ...61..525C. doi:10.1121/1.381294. PMID   845314.
  7. Durlach NI (1963). "Equalization and cancellation theory of binaural masking-level differences". J. Acoust. Soc. Am. 35 (8): 416–426. doi:10.1121/1.1918675.
  8. Jeffress, L.A. (1948). "A Place Theory of Sound Localization". Journal of Comparative and Physiological Psychology . 41 (1): 35–9. doi:10.1037/h0061495. PMID   18904764.
  9. Culling JF (2007). "Evidence specifically favoring the equalization-cancellation theory of binaural unmasking". J Acoust Soc Am. 122 (5): 2803–2813. Bibcode:2007ASAJ..122.2803C. doi:10.1121/1.2785035. PMID   18189570. S2CID   24476950.
  10. Van Der Heijden M, Joris PX (2010). "Interaural correlation fails to account for detection in a classic binaural task: Dynamic ITDs dominate N0Sπ detection". J Assoc Res Otolaryngol. 11 (1): 113–131. doi:10.1007/s10162-009-0185-8. PMC   2820206 . PMID   19760461.
  11. Culling JF (2011). "Subcomponent cues in binaural unmasking" (PDF). J Acoust Soc Am. 129 (6): 3846–3855. Bibcode:2011ASAJ..129.3846C. doi:10.1121/1.3560944. PMID   21682408.
  12. Bronkhorst AW, Plomp, R (1988). "The effect of head-induced interaural time and level differences on speech intelligibility in noise". J Acoust Soc Am. 83: 1508–1516. doi:10.1121/1.395906. PMID   3372866.
  13. Beutelmann R, Brand T (2006). "Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners". J Acoust Soc Am. 120 (1): 331–342. Bibcode:2006ASAJ..120..331B. doi:10.1121/1.2202888. PMID   16875230.