The pitch being perceived with the first harmonic being absent in the waveform is called the missing fundamental phenomenon. [1]
It is established in psychoacoustics that the auditory system, with its natural tendency to distinguish a tone from another, will persistently assign a pitch to a complex tone given that a sufficient set of harmonics are present in the spectrum. [2]
For example, when a note (that is not a pure tone) has a pitch of 100 Hz, it will consist of frequency components that are integer multiples of that value (e.g. 100, 200, 300, 400, 500.... Hz). However, smaller loudspeakers may not produce low frequencies, so in our example, the 100 Hz component may be missing. Nevertheless, a pitch corresponding to the fundamental may still be heard.
A low pitch (also known as the pitch of the missing fundamental or virtual pitch [3] ) can sometimes be heard when there is no apparent source or component of that frequency. This perception is due to the brain interpreting repetition patterns that are present. [4] [5] [6]
It was once thought that this effect was because the missing fundamental was replaced by distortions introduced by the physics of the ear. However, experiments subsequently showed that when a noise was added that would have masked these distortions had they been present, listeners still heard a pitch corresponding to the missing fundamental, as reported by J. C. R. Licklider in 1954. [7] It is now widely accepted that the brain processes the information present in the overtones to calculate the fundamental frequency. The precise way in which it does so is still a matter of debate, but the processing seems to be based on an autocorrelation involving the timing of neural impulses in the auditory nerve. [8] However, it has long been noted that any neural mechanisms which may accomplish a delay (a necessary operation of a true autocorrelation) have not been found. [6] At least one model shows a temporal delay to be unnecessary to produce an autocorrelation model of pitch perception, appealing to phase shifts between cochlear filters; [9] however, earlier work has shown that certain sounds with a prominent peak in their autocorrelation function do not elicit a corresponding pitch percept, [10] [11] and that certain sounds without a peak in their autocorrelation function nevertheless elicit a pitch. [12] [13] Autocorrelation can thus be considered, at best, an incomplete model.
The pitch of the missing fundamental, usually at the greatest common divisor of the frequencies present, [14] is not, however, always perceived. Research conducted at Heidelberg University shows that, under narrow stimulus conditions with a small number of harmonics, the general population can be divided into those who perceive missing fundamentals, and those who primarily hear the overtones instead. [15] This was done by asking subjects to judge the direction of motion (up or down) of two complexes in succession. The authors used structural MRI and MEG to show that the preference for missing fundamental hearing correlated with left-hemisphere lateralization of pitch perception, where the preference for spectral hearing correlated with right-hemisphere lateralization, and those who exhibited the latter preference tended to be musicians.
In Parsing the Spectral Envelope: Toward a General Theory of Vocal Tone Color (2016) by Ian Howell, He wrote that although not everyone can hear the missing fundamentals, noticing them can be taught and learned. [16] D. Robert Ladd et al. have a related study that claims that most people can switch from listening for the pitch from the harmonics that are evident to finding these pitches spectrally. [17]
Timpani produce inharmonic overtones, but are constructed and tuned to produce near-harmonic overtones to an implied missing fundamental. Hit in the usual way (half to three-quarters the distance from the center to the rim), the fundamental note of a timpani is very weak in relation to its second through fifth "harmonic" overtones. [18] A timpani might be tuned to produce sound most strongly at 200, 302, 398, and 488 Hz, for instance, implying a missing fundamental at 100 Hz (though the actual dampened fundamental is 170 Hz). [19]
A violin's lowest air and body resonances generally fall between 250 Hz and 300 Hz. The fundamental frequency of the open G3 string is below 200 Hz in modern tunings as well as most historical tunings, so the lowest notes of a violin have an attenuated fundamental, although listeners seldom notice this.[ citation needed ]
Most common telephones cannot reproduce sounds lower than 300 Hz, but a male voice has a fundamental frequency of approximately 150 Hz. Because of the missing fundamental effect, the fundamental frequencies of male voices are still perceived as their pitches over the telephone. [20] [ needs update? ]
The missing fundamental phenomenon is used electronically by some pro audio manufacturers to allow sound systems to seem to produce notes that are lower in pitch than they are capable of reproducing. [21] In a hardware effects unit or a software plugin, a crossover filter is set at a low frequency above which the sound system is capable of safely reproducing tones. Musical signal content above the high-pass part of the crossover filter is sent to the main output which is amplified by the sound system. Low frequency content below the low-pass part of the crossover filter is sent to a circuit where harmonics are synthesized above the low notes. The newly created harmonics are mixed back into the main output to create a perception of the filtered-out low notes. [22] Using a device with this synthetic process can reduce complaints from low frequency noise carrying through walls and it can be employed to reduce low frequency content in loud music that might otherwise vibrate and damage breakable valuables. [23]
Some pipe organs make use of this phenomenon as a resultant tone, which allows relatively smaller bass pipes to produce very low-pitched sounds.
This very concept of "missing fundamental" being reproduced based on the overtones in the tone has been used to create the illusion of bass in sound systems that are not capable of such bass. In mid-1999, Meir Shashoua of Tel Aviv, co-founder of Waves Audio, patented an algorithm to create the sense of the missing fundamental by synthesizing higher harmonics. [24] Waves Audio released the MaxxBass plug-in to allow computer users to apply the synthesized harmonics to their audio files. Later, Waves Audio produced small subwoofers that relied on the missing fundamental concept to give the illusion of low bass. [25] Both products processed certain overtones selectively to help small loudspeakers, ones which could not reproduce low-frequency components, to sound as if they were capable of low bass. Both products included a high-pass filter which greatly attenuated all the low frequency tones that were expected to be beyond the capabilities of the target sound system. [26] One example of a popular song that was recorded with MaxxBass processing is "Lady Marmalade", the 2001 Grammy award-winning version sung by Christina Aguilera, Lil' Kim, Mýa, and Pink, produced by Missy Elliott. [26]
Other software and hardware companies have developed their own versions of missing fundamental-based bass augmentation products. The poor bass reproduction of earbuds has been identified as a possible target for such processing. [27] Many computer sound systems are not capable of low bass, and songs offered to consumers via computer have been identified as ones that may benefit from augmented bass harmonics processing. [28]
In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmonic sounds, with this definition, the formant frequency is sometimes taken as that of the harmonic that is most augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself. In practice, the frequency of a spectral peak differs slightly from the associated resonance frequency, except when, by luck, harmonics are aligned with the resonance frequency, or when the sound source is mostly non-harmonic, as in whispering and vocal fry.
In physics, acoustics, and telecommunications, a harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the fundamental frequency of a periodic signal. The fundamental frequency is also called the 1st harmonic; the other harmonics are known as higher harmonics. As all harmonics are periodic at the fundamental frequency, the sum of harmonics is also periodic at that frequency. The set of harmonics forms a harmonic series.
In music, timbre, also known as tone color or tone quality, is the perceived sound quality of a musical note, sound or tone. Timbre distinguishes different types of sound production, such as choir voices and musical instruments. It also enables listeners to distinguish different instruments in the same category.
Pitch is a perceptual property that allows sounds to be ordered on a frequency-related scale, or more commonly, pitch is the quality that makes it possible to judge sounds as "higher" and "lower" in the sense associated with musical melodies. Pitch is a major auditory attribute of musical tones, along with duration, loudness, and timbre.
Auditory illusions are illusions of real sound or outside stimulus. These false perceptions are the equivalent of an optical illusion: the listener hears either sounds which are not present in the stimulus, or sounds that should not be possible given the circumstance on how they were created.
The tritone paradox is an auditory illusion in which a sequentially played pair of Shepard tones separated by an interval of a tritone, or half octave, is heard as ascending by some people and as descending by others. Different populations tend to favor one of a limited set of different spots around the chromatic circle as central to the set of "higher" tones. Roger Shepard in 1963 had argued that such tone pairs would be heard ambiguously as either ascending or descending. However, psychology of music researcher Diana Deutsch in 1986 discovered that when the judgments of individual listeners were considered separately, their judgments depended on the positions of the tones along the chromatic circle. For example, one listener would hear the tone pair C–F♯ as ascending and the tone pair G–C♯ as descending. Yet another listener would hear the tone pair C–F♯ as descending and the tone pair G–C♯ as ascending. Furthermore, the way these tone pairs were perceived varied depending on the listener's language or dialect.
In psychoacoustics, a pure tone is a sound with a sinusoidal waveform; that is, a sine wave of constant frequency, phase-shift, and amplitude. By extension, in signal processing a single-frequency tone or pure tone is a purely sinusoidal signal . A pure tone has the property – unique among real-valued wave shapes – that its wave shape is unchanged by linear time-invariant systems; that is, only the phase and amplitude change between such a system's pure-tone input and its output.
An equal-loudness contour is a measure of sound pressure level, over the frequency spectrum, for which a listener perceives a constant loudness when presented with pure steady tones. The unit of measurement for loudness levels is the phon and is arrived at by reference to equal-loudness contours. By definition, two sine waves of differing frequencies are said to have equal-loudness level measured in phons if they are perceived as equally loud by the average young person without significant hearing impairment.
A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain, the frequency domain, or both.
In audiology and psychoacoustics the concept of critical bands, introduced by Harvey Fletcher in 1933 and refined in 1940, describes the frequency bandwidth of the "auditory filter" created by the cochlea, the sense organ of hearing within the inner ear. Roughly, the critical band is the band of audio frequencies within which a second tone will interfere with the perception of the first tone by auditory masking.
Volley theory states that groups of neurons of the auditory system respond to a sound by firing action potentials slightly out of phase with one another so that when combined, a greater frequency of sound can be encoded and sent to the brain to be analyzed. The theory was proposed by Ernest Wever and Charles Bray in 1930 as a supplement to the frequency theory of hearing. It was later discovered that this only occurs in response to sounds that are about 500 Hz to 5000 Hz.
The temporal theory of hearing, also called frequency theory or timing theory, states that human perception of sound depends on temporal patterns with which neurons respond to sound in the cochlea. Therefore, in this theory, the pitch of a pure tone is determined by the period of neuron firing patterns—either of single neurons, or groups as described by the volley theory. Temporal theory competes with the place theory of hearing, which instead states that pitch is signaled according to the locations of vibrations along the basilar membrane.
Bandwidth extension of signal is defined as the deliberate process of expanding the frequency range (bandwidth) of a signal in which it contains an appreciable and useful content, and/or the frequency range in which its effects are such. Its significant advancement in recent years has led to the technology being adopted commercially in several areas including psychacoustic bass enhancement of small loudspeakers and the high frequency enhancement of coded speech and audio.
Computational auditory scene analysis (CASA) is the study of auditory scene analysis by computational means. In essence, CASA systems are "machine listening" systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field of blind signal separation in that it is based on the mechanisms of the human auditory system, and thus uses no more than two microphone recordings of an acoustic environment. It is related to the cocktail party problem.
In audio signal processing, auditory masking occurs when the perception of one sound is affected by the presence of another sound.
In physics, sound is a vibration that propagates as an acoustic wave through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the reception of such waves and their perception by the brain. Only acoustic waves that have frequencies lying between about 20 Hz and 20 kHz, the audio frequency range, elicit an auditory percept in humans. In air at atmospheric pressure, these represent sound waves with wavelengths of 17 meters (56 ft) to 1.7 centimeters (0.67 in). Sound waves above 20 kHz are known as ultrasound and are not audible to humans. Sound waves below 20 Hz are known as infrasound. Different animal species have varying hearing ranges.
Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, and music. Psychoacoustics is an interdisciplinary field including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.
Ernst Terhardt is a German engineer and psychoacoustician who made significant contributions in diverse areas of audio communication including pitch perception, music cognition, and Fourier transformation. He was professor in the area of acoustic communication at the Institute of Electroacoustics, Technical University of Munich, Germany.
Temporal envelope (ENV) and temporal fine structure (TFS) are changes in the amplitude and frequency of sound perceived by humans over time. These temporal changes are responsible for several aspects of auditory perception, including loudness, pitch and timbre perception and spatial hearing.
Brian C.J. Moore FMedSci, FRS is an Emeritus Professor of Auditory Perception in the University of Cambridge and an Emeritus Fellow of Wolfson College, Cambridge. His research focuses on psychoacoustics, audiology, and the development and assessment of hearing aids.