Volley theory states that groups of neurons of the auditory system respond to a sound by firing action potentials slightly out of phase with one another so that when combined, a greater frequency of sound can be encoded and sent to the brain to be analyzed. The theory was proposed by Ernest Wever and Charles Bray in 1930 [1] as a supplement to the frequency theory of hearing. It was later discovered that this only occurs in response to sounds that are about 500 Hz to 5000 Hz.
The volley theory was explained in depth in Ernest Wever's 1949 book, Theory of Hearing [2] Groups of neurons in the cochlea individually fire at subharmonic frequencies of a sound being heard and collectively phase-lock to match the total frequencies of the sound. The reason for this is that neurons can only fire at a maximum of about 500 Hz but other theories of hearing did not explain for hearing sounds below about 5000 Hz.
Sounds are often sums of multiple frequency tones. When these frequencies are whole number multiples of a fundamental frequency they create a harmonic. When groups of auditory neurons are presented with harmonics, each neuron fires at one frequency and when combined, the entire harmonic is encoded into the primary auditory cortex of the brain. This is the basis of volley theory.
Phase-locking is known as matching amplitude times to a certain phase of another waveform. In the case of auditory neurons, this means firing an action potential at a certain phase of a stimulus sound being delivered. It has been seen that when being played a pure tone, auditory nerve fibers will fire at the same frequency as the tone. [3] Volley theory suggests that groups of auditory neurons use phase-locking to represent subharmonic frequencies of one harmonic sound. This has been shown in guinea pig and cat models.
In 1980, Don Johnson experimentally revealed phase-locking in the auditory nerve fibers of the adult cat. [4] In the presence of -40 to -100 decibel single tones lasting 15 or 30 seconds, recordings from the auditory nerve fibers showed firing fluctuations in synchrony with the stimulus. Johnson observed that during frequencies below 1000 Hz, two peaks are recorded for every cycle of the stimulus, which had varying phases according to stimulation frequency. This phenomenon was interpreted as the result of a second harmonic, phase-locking to the stimulus waveform. However, at frequencies between about 1000 Hz and 5000 Hz, phase-locking becomes progressively inaccurate and intervals tend to become more random. [5]
Pitch is an assigned, perceptual property where a listener orders sound frequencies from low to high. Pitch is hypothesized to be determined by receiving phase-locked input from neuronal axons and combining that information into harmonics. In simple sounds consisting of one frequency, the pitch is equivalent to the frequency. There are two models of pitch perception; a spectral and a temporal. Low frequency sounds evoke the strongest pitches, suggesting that pitch is based on the temporal components of the sound. [6] Historically, there have been many models of pitch perception. (Terhardt, 1974; [7] Goldstein, 1973; [8] Wightman, 1973). Many consisted of a peripheral spectral-analysis stage and a central periodicity-analysis stage. In his model, Terhardt claims that the spectral-analysis output of complex sounds, specifically low frequency ones, is a learned entity which eventually allows easy identification of the virtual pitch. [7] The volley principle is predominantly seen during the pitch perception of lower frequencies where sounds are often resolved. [9] Goldstein proposed that through phase-locking and temporal frequencies encoded in neuron firing rates, the brain has the itemization of frequencies that can then be used to estimate pitch. [8]
Throughout the nineteenth century, many theories and concepts of hearing were created. Ernest Wever proposed the volley theory in 1937 with his paper "The Perception of Low Tones and the Resonance-Volley Theory". [1] In this paper, Wever discusses previous theories of hearing and introduces volley theory using support from his own experiments and research. The theory was introduced as a supplement to the frequency theory or temporal theory of hearing, which was in contrast to the place theory of hearing.
The most prominent figure in the creation of the place theory of hearing is Hermann von Helmholtz, who published his finished theory in 1885. Helmholtz claimed that the cochlea contained individual fibers for analyzing each pitch and delivering that information to the brain. Many followers revised and added to Helmholtz's theory and the consensus soon became that high frequency sounds were encoded near the base of the cochlea and that middle frequency sounds were encoded near the apex. Georg von Békésy developed a novel method of dissecting the inner ear and using stroboscopic illumination to observe the basilar membrane move, adding evidence to support the theory. [10]
Ideas related to the frequency theory of hearing came about in the late 1800s as a result of the research of many individuals. In 1865, Heinrich Adolf Rinne challenged the place theory; he claimed that it’s not very efficient for complex sounds to be broken into simple sounds then be reconstructed in the brain. Later, Friedrich Voltolini added on by proposing that every auditory hair cell is stimulated by any sound. Correspondingly, William Rutherford provided evidence that this hypothesis was true, allowing greater accuracy of the cochlea. In 1886, Rutherford also proposed that the brain interpreted the vibrations of the hair cells and that the cochlea did no frequency or pitch analysis of the sound. Soon after, Max Friedrich Meyer, among other ideas, theorized that nerves would be excited at the same frequency of the stimulus. [10]
Of the various theories and notions created by Rinne, Rutherford, and their followers, the frequency theory was born. In general, it claimed that all sounds were encoded to the brain by neurons firing at a rate that mimics the frequency of the sound. However, because humans can hear frequencies up to 20,000 Hz but neurons cannot fire at these rates, the frequency theory had a major flaw. In an effort to combat this fault, Ernest Wever and Charles Bray, in 1930, proposed the volley theory, claiming that multiple neurons could fire in a volley to later combine and equal the frequency of the original sound stimulus. Through more research, it was determined that because phase synchrony is only accurate up to about 1000 Hz, volley theory cannot account for all frequencies at which we hear. [10]
Ultimately, as new methods of studying the inner ear came about, a combination of place theory and frequency theory was adopted. Today, it is widely believed that hearing follows the rules of the frequency theory, including volley theory, at frequencies below 1000 Hz and place theory at frequencies above 5000 Hz. For sounds with frequencies between 1000 and 5000 Hz, both theories come into play so the brain can utilize the basilar membrane location and the rate of the impulse. [10]
Due to the invasiveness of most hearing related experiments, it is difficult to use human models in the study of the auditory system. However, many findings have been revealed in cats and guinea pigs. Additionally, there are few ways to study the basilar membrane in vivo.
Many revolutionary concepts regarding hearing and encoding sound in the brain were founded in the late nineteenth and early twentieth centuries. Various tool were used induce a response in auditory nerves that were to be recorded. Experiments by Helmholtz, Wever, and Bray often involved the use of organ pipes, stretched springs, loaded reeds, lamellas, vibrating forks, beats, and interruption tones to create “clicks”, harmonics, or pure tones. [11] Today, electronic oscillators are often used to create sinusoidal or square waves of precise frequencies.
Attempts to electrically record from the auditory nerve began as early as 1896. Electrodes were placed into the auditory nerve of various animal models to give insight on the rate at which the neurons are firing. In a 1930 experiment involving the auditory nerve of a cat, Wever and Bray found that 100–5000 Hz sounds played to the cat produced similar frequency firing in the nerve. This supported the frequency theory and the volley theory. [10]
Pioneered by Georg von Békésy, a method to observe the basilar membrane in action came about in the mid 1900s. Békésy isolated the cochlea from human and animal cadavers and labeled the basilar membrane with silver flakes. This allowed strobe imaging to capture the movement of the membrane as sounds stimulated the hair cells. This led to the solidification of the idea that high frequencies excite the basal end of the cochlea and provided new information that low frequencies excite a large area of the cochlea. This new finding suggested that specialized properties are occurring for high frequency hearing and that low frequencies involve mechanisms explained in the frequency theory. [10]
A fundamental frequency is the lowest frequency of a harmonic. In some cases, sound can have all the frequencies of a harmonic but be missing the fundamental frequency, this is known as missing fundamental. When listening to a sound with a missing fundamental, the human brain still receives information for all frequencies, including the fundamental frequency which does not exist in the sound. [12] This implies that sound is encoded by neurons firing at all frequencies of a harmonic, therefore, the neurons must be locked in some way to result in the hearing of one sound. [8]
Congenital deafness or sensorineural hearing loss is an often used model for the study of the inner ear regarding pitch perception and theories of hearing in general. Frequency analysis of these individuals’ hearing has given insight on common deviations from normal tuning curves, [13] excitation patterns, and frequency discrimination ranges. By applying pure or complex tones, information on pitch perception can be obtained. In 1983, it was shown that subjects with low frequency sensorineural hearing loss demonstrated abnormal psychophysical tuning curves. Changes in the spatial responses in these subjects showed similar pitch judgment abilities when compared to subjects with normal spatial responses. This was especially true regarding low frequency stimuli. These results suggest that the place theory of hearing does not explain pitch perception at low frequencies, but that the temporal (frequency) theory is more likely. This conclusion is due to the finding that when deprived of basilar membrane place information, these patients still demonstrated normal pitch perception. [14] Computer models for pitch perception and loudness perception are often used during hearing studies on acoustically impaired subjects. The combination of this modeling and knowledge of natural hearing allows for better development of hearing aids. [15]
The inner ear is the innermost part of the vertebrate ear. In vertebrates, the inner ear is mainly responsible for sound detection and balance. In mammals, it consists of the bony labyrinth, a hollow cavity in the temporal bone of the skull with a system of passages comprising two main functional parts:
Pitch is a perceptual property that allows sounds to be ordered on a frequency-related scale, or more commonly, pitch is the quality that makes it possible to judge sounds as "higher" and "lower" in the sense associated with musical melodies. Pitch is a major auditory attribute of musical tones, along with duration, loudness, and timbre.
The cochlea is the part of the inner ear involved in hearing. It is a spiral-shaped cavity in the bony labyrinth, in humans making 2.75 turns around its axis, the modiolus. A core component of the cochlea is the organ of Corti, the sensory organ of hearing, which is distributed along the partition separating the fluid chambers in the coiled tapered tube of the cochlea.
The vestibulocochlear nerve or auditory vestibular nerve, also known as the eighth cranial nerve, cranial nerve VIII, or simply CN VIII, is a cranial nerve that transmits sound and equilibrium (balance) information from the inner ear to the brain. Through olivocochlear fibers, it also transmits motor and modulatory information from the superior olivary complex in the brainstem to the cochlea.
Place theory is a theory of hearing that states that our perception of sound depends on where each component frequency produces vibrations along the basilar membrane. By this theory, the pitch of a sound, such as a human voice or a musical tone, is determined by the places where the membrane vibrates, based on frequencies corresponding to the tonotopic organization of the primary auditory neurons.
Stimulus modality, also called sensory modality, is one aspect of a stimulus or what is perceived after a stimulus. For example, the temperature modality is registered after heat or cold stimulate a receptor. Some sensory modalities include: light, sound, temperature, taste, pressure, and smell. The type and location of the sensory receptor activated by the stimulus plays the primary role in coding the sensation. All sensory modalities work together to heighten stimuli sensation when necessary.
The auditory system is the sensory system for the sense of hearing. It includes both the sensory organs and the auditory parts of the sensory system.
The acoustic reflex is an involuntary muscle contraction that occurs in the middle ear in response to loud sound stimuli or when the person starts to vocalize.
Sensorineural hearing loss (SNHL) is a type of hearing loss in which the root cause lies in the inner ear, sensory organ, or the vestibulocochlear nerve. SNHL accounts for about 90% of reported hearing loss. SNHL is usually permanent and can be mild, moderate, severe, profound, or total. Various other descriptors can be used depending on the shape of the audiogram, such as high frequency, low frequency, U-shaped, notched, peaked, or flat.
In physiology, tonotopy is the spatial arrangement of where sounds of different frequency are processed in the brain. Tones close to each other in terms of frequency are represented in topologically neighbouring regions in the brain. Tonotopic maps are a particular case of topographic organization, similar to retinotopy in the visual system.
The temporal theory of hearing, also called frequency theory or timing theory, states that human perception of sound depends on temporal patterns with which neurons respond to sound in the cochlea. Therefore, in this theory, the pitch of a pure tone is determined by the period of neuron firing patterns—either of single neurons, or groups as described by the volley theory. Temporal theory competes with the place theory of hearing, which instead states that pitch is signaled according to the locations of vibrations along the basilar membrane.
The Greenwood function correlates the position of the hair cells in the inner ear to the frequencies that stimulate their corresponding auditory neurons. Empirically derived in 1961 by Donald D. Greenwood, the relationship has shown to be constant throughout mammalian species when scaled to the appropriate cochlear spiral lengths and audible frequency ranges. Moreover, the Greenwood function provides the mathematical basis for cochlear implant surgical electrode array placement within the cochlea.
The auditory brainstem response (ABR), also called brainstem evoked response audiometry (BERA) or brainstem auditory evoked potentials (BAEPs) or brainstem auditory evoked responses (BAERs) is an auditory evoked potential extracted from ongoing electrical activity in the brain and recorded via electrodes placed on the scalp. The measured recording is a series of six to seven vertex positive waves of which I through V are evaluated. These waves, labeled with Roman numerals in Jewett and Williston convention, occur in the first 10 milliseconds after onset of an auditory stimulus. The ABR is considered an exogenous response because it is dependent upon external factors.
Computational auditory scene analysis (CASA) is the study of auditory scene analysis by computational means. In essence, CASA systems are "machine listening" systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field of blind signal separation in that it is based on the mechanisms of the human auditory system, and thus uses no more than two microphone recordings of an acoustic environment. It is related to the cocktail party problem.
Diplacusis, also known as diplacusis binauralis, binauralis disharmonica or interaural pitch difference (IPD), is a hearing disorder whereby a single auditory stimulus is perceived as different pitches between ears. It is typically experienced as a secondary symptom of sensorineural hearing loss, although not all patients with sensorineural hearing loss experience diplacusis or tinnitus. The onset is usually spontaneous and can occur following an acoustic trauma, for example an explosive noise, or in the presence of an ear infection. Sufferers may experience the effect permanently, or it may resolve on its own. Diplacusis can be particularly disruptive to individuals working within fields requiring acute audition, such as musicians, sound engineers or performing artists.
The neural encoding of sound is the representation of auditory sensation and perception in the nervous system. The complexities of contemporary neuroscience are continually redefined. Thus what is known of the auditory system has been continually changing. The encoding of sounds includes the transduction of sound waves into electrical impulses along auditory nerve fibers, and further processing in the brain.
Electrocochleography is a technique of recording electrical potentials generated in the inner ear and auditory nerve in response to sound stimulation, using an electrode placed in the ear canal or tympanic membrane. The test is performed by an otologist or audiologist with specialized training, and is used for detection of elevated inner ear pressure or for the testing and monitoring of inner ear and auditory nerve function during surgery.
The frequency following response (FFR), also referred to as frequency following potential (FFP) or envelope following response (EFR), is an evoked potential generated by periodic or nearly-periodic auditory stimuli. Part of the auditory brainstem response (ABR), the FFR reflects sustained neural activity integrated over a population of neural elements: "the brainstem response...can be divided into transient and sustained portions, namely the onset response and the frequency-following response (FFR)". It is often phase-locked to the individual cycles of the stimulus waveform and/or the envelope of the periodic stimuli. It has not been well studied with respect to its clinical utility, although it can be used as part of a test battery for helping to diagnose auditory neuropathy. This may be in conjunction with, or as a replacement for, otoacoustic emissions.
Temporal envelope (ENV) and temporal fine structure (TFS) are changes in the amplitude and frequency of sound perceived by humans over time. These temporal changes are responsible for several aspects of auditory perception, including loudness, pitch and timbre perception and spatial hearing.
Brian C.J. Moore FMedSci, FRS is an Emeritus Professor of Auditory Perception in the University of Cambridge and an Emeritus Fellow of Wolfson College, Cambridge. His research focuses on psychoacoustics, audiology, and the development and assessment of hearing aids.