Auditory masking

Last updated

Auditory masking occurs when the perception of one sound is affected by the presence of another sound. [1]

Contents

Auditory masking in the frequency domain is known as simultaneous masking, frequency masking or spectral masking. Auditory masking in the time domain is known as temporal masking or non-simultaneous masking.

Masked threshold

The unmasked threshold is the quietest level of the signal which can be perceived without a masking signal present. The masked threshold is the quietest level of the signal perceived when combined with a specific masking noise. The amount of masking is the difference between the masked and unmasked thresholds.

Figure A - adapted from Gelfand (2004) Masker increased threshold.svg
Figure A – adapted from Gelfand (2004)

Gelfand provides a basic example. [1] Let us say that for a given individual, the sound of a cat scratching a post in an otherwise quiet environment is first audible at a level of 10 dB SPL. However, in the presence of a masking noise (for example, a vacuum cleaner that is running simultaneously) that same individual cannot detect the sound of the cat scratching unless the level of the scratching sound is at least 26 dB SPL. We would say that the unmasked threshold for that individual for the target sound (i.e., the cat scratching) is 10 dB SPL, while the masked threshold is 26 dB SPL. The amount of masking is simply the difference between these two thresholds: 16 dB.

The amount of masking will vary depending on the characteristics of both the target signal and the masker, and will also be specific to an individual listener. While the person in the example above was able to detect the cat scratching at 26 dB SPL, another person may not be able to hear the cat scratching while the vacuum was on until the sound level of the cat scratching was increased to 30 dB SPL (thereby making the amount of masking for the second listener 20 dB).

Simultaneous masking

Simultaneous masking occurs when a sound is made inaudible by a noise or unwanted sound of the same duration as the original sound. [2] For example, a powerful spike at 1 kHz will tend to mask out a lower-level tone at 1.1 kHz. Also, two sine tones at 440 and 450 Hz can be perceived clearly when separated. They cannot be perceived clearly when presented simultaneously.

Critical bandwidth

If two sounds of two different frequencies are played at the same time, two separate sounds can often be heard rather than a combination tone. The ability to hear frequencies separately is known as frequency resolution or frequency selectivity. When signals are perceived as a combination tone, they are said to reside in the same critical bandwidth. This effect is thought to occur due to filtering within the cochlea, the hearing organ in the inner ear. A complex sound is split into different frequency components and these components cause a peak in the pattern of vibration at a specific place on the cilia inside the basilar membrane within the cochlea. These components are then coded independently on the auditory nerve which transmits sound information to the brain. This individual coding only occurs if the frequency components are different enough in frequency, otherwise they are in the same critical band and are coded at the same place and are perceived as one sound instead of two. [3]

The filters that distinguish one sound from another are called auditory filters, listening channels or critical bandwidths. Frequency resolution occurs on the basilar membrane due to the listener choosing a filter which is centered over the frequency they expect to hear, the signal frequency. A sharply tuned filter has good frequency resolution as it allows the center frequencies through but not other frequencies (Pickles 1982). Damage to the cochlea and the outer hair cells in the cochlea can impair the ability to tell sounds apart (Moore 1986). This explains why someone with a hearing loss due to cochlea damage would have more difficulty than a normal hearing person in distinguishing between different consonants in speech. [4]

Masking illustrates the limits of frequency selectivity. If a signal is masked by a masker with a different frequency to the signal, then the auditory system was unable to distinguish between the two frequencies. By experimenting with conditions where one sound can mask a previously heard signal, the frequency selectivity of the auditory system can be tested. [5]

Similar frequencies

Figure B - Adapted from Ehmer Maskingpatterns sp11.jpg
Figure B – Adapted from Ehmer

How effective the masker is at raising the threshold of the signal depends on the frequency of the signal and the frequency of the masker. The graphs in Figure B are a series of masking patterns, also known as masking audiograms. Each graph shows the amount of masking produced at each masker frequency shown at the top corner, 250, 500, 1000 and 2000 Hz. For example, in the first graph the masker is presented at a frequency of 250 Hz at the same time as the signal. The amount the masker increases the threshold of the signal is plotted and this is repeated for different signal frequencies, shown on the X axis. The frequency of the masker is kept constant. The masking effect is shown in each graph at various masker sound levels.

Figure C - Adapted from Gelfand 2004 Auditoryfiltermaskersignal1.svg
Figure C – Adapted from Gelfand 2004
Figure D - Adapted from Gelfand 2004 Off frequency mask diff freq1.svg
Figure D – Adapted from Gelfand 2004

Figure B shows along the Y axis the amount of masking. The greatest masking is when the masker and the signal are the same frequency and this decreases as the signal frequency moves further away from the masker frequency. [1] This phenomenon is called on-frequency masking and occurs because the masker and signal are within the same auditory filter (Figure C). This means that the listener cannot distinguish between them and they are perceived as one sound with the quieter sound masked by the louder one (Figure D).

Figure E - adapted from Moore 1998 Maskersameauditoryfilter1.svg
Figure E – adapted from Moore 1998

The amount the masker raises the threshold of the signal is much less in off-frequency masking, but it does have some masking effect because some of the masker overlaps into the auditory filter of the signal (Figure E) [5]

Figure F - adapted from Moore 1998 Onandofffreqlistening1.svg
Figure F – adapted from Moore 1998

Off-frequency masking requires the level of the masker to be greater in order to have a masking effect; this is shown in Figure F. This is because only a certain amount of the masker overlaps into the auditory filter of the signal and more masker is needed to cover the signal. [5]

Lower frequencies

The masking pattern changes depending on the frequency of the masker and the intensity (Figure B). For low levels on the 1000 Hz graph, such as the 20–40 dB range, the curve is relatively parallel. As the masker intensity increases the curves separate, especially for signals at a frequency higher than the masker. This shows that there is a spread of the masking effect upward in frequency as the intensity of the masker is increased. The curve is much shallower in the high frequencies than in the low frequencies. This flattening is called upward spread of masking and is why an interfering sound masks high frequency signals much better than low frequency signals. [1]

Figure B also shows that as the masker frequency increases, the masking patterns become increasingly compressed. This demonstrates that high frequency maskers are only effective over a narrow range of frequencies, close to the masker frequency. Low frequency maskers on the other hand are effective over a wide frequency range. [1]

Figure G - adapted from a diagram by Gelfand Maskercriticalbandwidth1.svg
Figure G – adapted from a diagram by Gelfand

Harvey Fletcher carried out an experiment to discover how much of a band of noise contributes to the masking of a tone. In the experiment, a fixed tone signal had various bandwidths of noise centered on it. The masked threshold was recorded for each bandwidth. His research showed that there is a critical bandwidth of noise which causes the maximum masking effect and energy outside that band does not affect the masking. This can be explained by the auditory system having an auditory filter which is centered over the frequency of the tone. The bandwidth of the masker that is within this auditory filter effectively masks the tone but the masker outside of the filter has no effect (Figure G).

This is used in MP3 files to reduce the size of audio files. Parts of the signals which are outside the critical bandwidth are represented with reduced precision. The parts of the signals which are perceived by the listener are reproduced with higher fidelity. [6]

Effects of intensity

Figure H - adapted from Moore 1998 OutputlevelMoore.svg
Figure H – adapted from Moore 1998

Varying intensity levels can also have an effect on masking. The lower end of the filter becomes flatter with increasing decibel level, whereas the higher end becomes slightly steeper. Changes in slope of the high frequency side of the filter with intensity are less consistent than they are at low frequencies. At the medium frequencies (1–4 kHz) the slope increases as intensity increases, but at the low frequencies there is no clear inclination with level and the filters at high center frequencies show a small decrease in slope with increasing level. The sharpness of the filter depends on the input level and not the output level to the filter. The lower side of the auditory filter also broadens with increasing level. [5] These observations are illustrated in Figure H.

Temporal masking

Temporal masking or non-simultaneous masking occurs when a sudden stimulus sound makes inaudible other sounds which are present immediately preceding or following the stimulus. Masking which obscures a sound immediately preceding the masker is called backward masking or pre-masking and masking which obscures a sound immediately following the masker is called forward masking or post-masking. [5] Temporal masking's effectiveness attenuates exponentially from the onset and offset of the masker, with the onset attenuation lasting approximately 20 ms and the offset attenuation lasting approximately 100 ms.

Similar to simultaneous masking, temporal masking reveals the frequency analysis performed by the auditory system; forward masking thresholds for complex harmonic tones (e.g., a sawtooth probe with a fundamental frequency of 500 Hz) exhibit threshold peaks (i.e., high masking levels) for frequency bands centered on the first several harmonics. In fact, auditory bandwidths measured from forward masking thresholds are narrower and more accurate than those measured using simultaneous masking.

Temporal masking should not be confused with the ear's acoustic reflex, an involuntary response in the middle ear that is activated to protect the ear's delicate structures from loud sounds.

Other masking conditions

figure I - ipsilateral simultaneous masking Ipsisimmasking.png
figure I – ipsilateral simultaneous masking

Ipsilateral ("same side") masking is not the only condition where masking takes place. Another situation where masking occurs is called contralateral ("other side") simultaneous masking. In this case, the instance where the signal might be audible in one ear but is deliberately taken away by applying a masker to the other ear.

The last situation where masking occurs is called central masking. This refers to the case where a masker causes a threshold elevation. This can be in the absence of, or in addition to, another effect and is due to interactions within the central nervous system between the separate neural inputs obtained from the masker and the signal. [1]

Effects of different stimulus types

Experiments have been carried out to see the different masking effects when using a masker which is either in the form of a narrow band noise or a sinusoidal tone.

When a sinusoidal signal and a sinusoidal masker (tone) are presented simultaneously the envelope of the combined stimulus fluctuates in a regular pattern described as beats. The fluctuations occur at a rate defined by the difference between the frequencies of the two sounds. If the frequency difference is small then the sound is perceived as a periodic change in the loudness of a single tone. If the beats are fast then this can be described as a sensation of roughness. When there is a large frequency separation, the two components are heard as separate tones without roughness or beats. Beats can be a cue to the presence of a signal even when the signal itself is not audible. The influence of beats can be reduced by using a narrowband noise rather than a sinusoidal tone for either signal or masker. [3]

Mechanisms of masking

There are many different mechanisms of masking, one being suppression. This is when there is a reduction of a response to a signal due to the presence of another. This happens because the original neural activity caused by the first signal is reduced by the neural activity of the other sound. [7]

Combination tones are products of a signal and a masker. This happens when the two sounds interact causing new sound, which can be more audible than the original signal. This is caused by the non linear distortion that happens in the ear. For example, the combination tone of two maskers can be a better masker than the two original maskers alone. [5]

The sounds interact in many ways depending on the difference in frequency between the two sounds. The most important two are cubic difference tones [ definition needed ] and quadratic difference tones [ definition needed ] . [5]

Cubic difference tones are calculated by the sum. [ clarification needed ]

2F1 – F2 [8]

(F1 being the first frequency, F2 the second) These are audible most of the time and especially when the level of the original tone is low. Hence they have a greater effect on psychoacoustic tuning curves than quadratic difference tones.

Quadratic difference tones are the result of [ clarification needed ]

F2 – F1

This happens at relatively high levels hence have a lesser effect on psychoacoustic tuning curves. [5]

Combination tones can interact with primary tones resulting in secondary combination tones due to being like their original primary tones in nature, stimulus like. An example of this is

3F1 – 2F2

Secondary combination tones are again similar to the combination tones of the primary tone. [5]

Off frequency listening

Off frequency listening is when a listener chooses a filter just lower than the signal frequency to improve their auditory performance. This “off frequency” filter reduces the level of the masker more than the signal at the output level of the filter, which means they can hear the signal more clearly hence causing an improvement of auditory performance. [2]

Applications

Auditory masking is used in tinnitus maskers to suppress annoying ringing, hissing, or buzzing or tinnitus often associated with hearing loss. It is also used in various kinds of audiometry, including pure tone audiometry, and the standard hearing test to test each ear unilaterally and to test speech recognition in the presence of partially masking noise.

Auditory masking is exploited to perform data compression for sound signals (MP3).

See also

Related Research Articles

Weighting filter

A weighting filter is used to emphasize or suppress some aspects of a phenomenon compared to others, for measurement or other purposes.

Absolute threshold of hearing minimum sound level that an average human can hear

The absolute threshold of hearing (ATH) is the minimum sound level of a pure tone that an average human ear with normal hearing can hear with no other sound present. The absolute threshold relates to the sound that can just be heard by the organism. The absolute threshold is not a discrete point, and is therefore classed as the point at which a sound elicits a response a specified percentage of the time. This is also known as the auditory threshold.

Loudness Subjective perception of sound pressure

In acoustics, loudness is the subjective perception of sound pressure. More formally, it is defined as, "That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud." The relation of physical attributes of sound to perceived loudness consists of physical, physiological and psychological components. The study of apparent loudness is included in the topic of psychoacoustics and employs methods of psychophysics.

Hearing test

A hearing test provides an evaluation of the sensitivity of a person's sense of hearing and is most often performed by an audiologist using an audiometer. An audiometer is used to determine a person's hearing sensitivity at different frequencies. There are other hearing tests as well, e.g., Weber test and Rinne test.

Equal-loudness contour Frequency charachteristics of hearing

An equal-loudness contour is a measure of sound pressure level, over the frequency spectrum, for which a listener perceives a constant loudness when presented with pure steady tones. The unit of measurement for loudness levels is the phon and is arrived at by reference to equal-loudness contours. By definition, two sine waves of differing frequencies are said to have equal-loudness level measured in phons if they are perceived as equally loud by the average young person without significant hearing impairment.

Acoustic reflex Small muscle contraction in the middle ear in response to loud sound

The acoustic reflex is an involuntary muscle contraction that occurs in the middle ear in response to loud sound stimuli or when the person starts to vocalize.

Sensorineural hearing loss Hearing loss caused by an inner ear or vestibulocochlear nerve defect

Sensorineural hearing loss (SNHL) is a type of hearing loss in which the root cause lies in the inner ear or sensory organ or the vestibulocochlear nerve. SNHL accounts for about 90% of reported hearing loss. SNHL is usually permanent and can be mild, moderate, severe, profound, or total. Various other descriptors can be used depending on the shape of the audiogram, such as high frequency, low frequency, U-shaped, notched, peaked, or flat.

An otoacoustic emission (OAE) is a sound that is generated from within the inner ear. Having been predicted by Austrian astrophysicist Thomas Gold in 1948, its existence was first demonstrated experimentally by British physicist David Kemp in 1978, and otoacoustic emissions have since been shown to arise through a number of different cellular and mechanical causes within the inner ear. Studies have shown that OAEs disappear after the inner ear has been damaged, so OAEs are often used in the laboratory and the clinic as a measure of inner ear health.

Audiometry is a branch of audiology and the science of measuring hearing acuity for variations in sound intensity and pitch and for tonal purity, involving thresholds and differing frequencies. Typically, audiometric tests determine a subject's hearing levels with the help of an audiometer, but may also measure ability to discriminate between different sound intensities, recognize pitch, or distinguish speech from background noise. Acoustic reflex and otoacoustic emissions may also be measured. Results of audiometric tests are used to diagnose hearing loss or diseases of the ear, and often make use of an audiogram.

In audiology and psychoacoustics the concept of critical bands, introduced by Harvey Fletcher in 1933 and refined in 1940, describes the frequency bandwidth of the "auditory filter" created by the cochlea, the sense organ of hearing within the inner ear. Roughly, the critical band is the band of audio frequencies within which a second tone will interfere with the perception of the first tone by auditory masking.

Audiogram

An audiogram is a graph that shows the audible threshold for standardized frequencies as measured by an audiometer. The Y axis represents intensity measured in decibels and the X axis represents frequency measured in hertz. The threshold of hearing is plotted relative to a standardised curve that represents 'normal' hearing, in dB(HL). They are not the same as equal-loudness contours, which are a set of curves representing equal loudness at different levels, as well as at the threshold of hearing, in absolute terms measured in dB SPL.

Hearing range range of frequencies that can be heard by humans or other animals

Hearing range describes the range of frequencies that can be heard by humans or other animals, though it can also refer to the range of levels. The human range is commonly given as 20 to 20,000 Hz, although there is considerable variation between individuals, especially at high frequencies, and a gradual loss of sensitivity to higher frequencies with age is considered normal. Sensitivity also varies with frequency, as shown by equal-loudness contours. Routine investigation for hearing loss usually involves an audiogram which shows threshold levels relative to a normal.

Ultrasonic hearing is a recognised auditory effect which allows humans to perceive sounds of a much higher frequency than would ordinarily be audible using the inner ear, usually by stimulation of the base of the cochlea through bone conduction. Normal human hearing is recognised as having an upper bound of 15–28 kHz, depending on the person.

The auditory brainstem response (ABR) is an auditory evoked potential extracted from ongoing electrical activity in the brain and recorded via electrodes placed on the scalp. The measured recording is a series of six to seven vertex positive waves of which I through V are evaluated. These waves, labeled with Roman numerals in Jewett and Williston convention, occur in the first 10 milliseconds after onset of an auditory stimulus. The ABR is considered an exogenous response because it is dependent upon external factors.

Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994-1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2001. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric. PEAQ characterizes the perceived audio quality as subjects would do in a listening test according to ITU-R BS.1116. PEAQ results principally model mean opinion scores that cover a scale from 1 (bad) to 5 (excellent).

Pure tone audiometry

Pure tone audiometry or pure-tone audiometry is the main hearing test used to identify hearing threshold levels of an individual, enabling determination of the degree, type and configuration of a hearing loss and thus providing a basis for diagnosis and management. Pure-tone audiometry is a subjective, behavioural measurement of a hearing threshold, as it relies on patient responses to pure tone stimuli. Therefore, pure-tone audiometry is only used on adults and children old enough to cooperate with the test procedure. As with most clinical tests, standardized calibration of the test environment, the equipment and the stimuli is needed before testing proceeds. Pure-tone audiometry only measures audibility thresholds, rather than other aspects of hearing such as sound localization and speech recognition. However, there are benefits to using pure-tone audiometry over other forms of hearing test, such as click auditory brainstem response (ABR). Pure-tone audiometry provides ear specific thresholds, and uses frequency specific pure tones to give place specific responses, so that the configuration of a hearing loss can be identified. As pure-tone audiometry uses both air and bone conduction audiometry, the type of loss can also be identified via the air-bone gap. Although pure-tone audiometry has many clinical benefits, it is not perfect at identifying all losses, such as ‘dead regions’ of the cochlea and neuropathies such as auditory processing disorder (APD). This raises the question of whether or not audiograms accurately predict someone's perceived degree of disability.

Tinnitus masker

Tinnitus maskers are a range of devices based on simple white noise machines used to add natural or artificial sound into a tinnitus sufferer's environment in order to mask or cover up the ringing. The noise is supplied by a sound generator, which may reside in or above the ear or be placed on a table or elsewhere in the environment. The noise is usually white noise or music, but in some cases, it may be patterned sound or specially tailored sound based on the characteristics of the person's tinnitus.

Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how humans perceive various sounds. More specifically, it is the branch of science studying the psychological responses associated with sound. Psychoacoustics is an interdisciplinary field of many areas, including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

Bone-conduction auditory brainstem response or BCABR is a type of auditory evoked response that records neural response from EEG with stimulus transmitted through bone conduction.

Binaural unmasking is phenomenon of auditory perception discovered by Ira Hirsh. In binaural unmasking, the brain combines information from the two ears in order to improve signal detection and identification in noise. The phenomenon is most commonly observed when there is a difference between the interaural phase of the signal and the interaural phase of the noise. When such a difference is present there is an improvement in masking threshold compared to a reference situation in which the interaural phases are the same, or when the stimulus has been presented monaurally. Those two cases usually give very similar thresholds. The size of the improvement is known as the "binaural masking level difference" (BMLD), or simply as the "masking level difference".

References

  1. 1 2 3 4 5 6 7 8 9 10 Gelfand, S.A. (2004) Hearing – An Introduction to Psychological and Physiological Acoustics 4th Ed. New York, Marcel Dekker
  2. 1 2 Moore, B.C.J. (2004) An Introduction to the Psychology of Hearing, 5th Ed. London, Elsevier Academic Press
  3. 1 2 Moore, B.C.J. (1986) Frequency Selectivity in Hearing, London, Academic Press
  4. Moore, B.C.J. (1995) Perceptual Consequences of Cochlear Damage, Oxford, Oxford University Press
  5. 1 2 3 4 5 6 7 8 9 10 11 12 Moore, B.C.J. (1998) Cochlear Hearing Loss, London, Whurr Publishers Ltd
  6. Sellars, P. (2000), Perceptual Coding: How MP3 Compression Works, Cambridge: Sound on Sound, archived from the original on 2015-07-31, retrieved 12 December 2020
  7. Oxenham, A.J. Plack, C.J. Suppression and the upward spread of masking, Journal of the Acoustical Society of America, 104 (6) pp. 3500–10
  8. Lee, Kyogu and Kim, Minjong. Estimating the Amplitude of the Cubic Difference Tone Using a Third-Order Adaptive Volterra Filter, Proceedings of the 8th International Conference on Digital Audio Effects (DAFx’05), Madrid, Spain, September 20–22, 2005, p. 297