Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how humans perceive various sounds. More specifically, it is the branch of science studying the psychological responses associated with sound (including noise, speech, and music). Psychoacoustics is an interdisciplinary field of many areas, including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.
Hearing is not a purely mechanical phenomenon of wave propagation, but is also a sensory and perceptual event; in other words, when a person hears something, that something arrives at the ear as a mechanical sound wave traveling through the air, but within the ear it is transformed into neural action potentials. The outer hair cells (OHC) of a mammalian cochlea give rise to an enhanced sensitivity and better[ clarification needed ] frequency resolution of the mechanical response of the cochlear partition. These nerve pulses then travel to the brain where they are perceived. Hence, in many problems in acoustics, such as for audio processing, it is advantageous to take into account not just the mechanics of the environment, but also the fact that both the ear and the brain are involved in a person's listening experience.[ clarification needed ][ citation needed ]
The inner ear, for example, does significant signal processing in converting sound waveforms into neural stimuli, so certain differences between waveforms may be imperceptible.Data compression techniques, such as MP3, make use of this fact. In addition, the ear has a nonlinear response to sounds of different intensity levels; this nonlinear response is called loudness. Telephone networks and audio noise reduction systems make use of this fact by nonlinearly compressing data samples before transmission, and then expanding them for playback. Another effect of the ear's nonlinear response is that sounds that are close in frequency produce phantom beat notes, or intermodulation distortion products.
The term "psychoacoustics" also arises in discussions about cognitive psychology and the effects that personal expectations, prejudices, and predispositions may have on listeners' relative evaluations and comparisons of sonic aesthetics and acuity and on listeners' varying determinations about the relative qualities of various musical instruments and performers. The expression that one "hears what one wants (or expects) to hear" may pertain in such discussions.[ citation needed ]
The human ear can nominally hear sounds in the range 20 Hz (0.02 kHz) to 20,000 Hz(20 kHz). The upper limit tends to decrease with age; most adults are unable to hear above 16 kHz. The lowest frequency that has been identified as a musical tone is 12 Hz under ideal laboratory conditions. Tones between 4 and 16 Hz can be perceived via the body's sense of touch.
Frequency resolution of the ear is about 3.6 Hz within the octave of 1000–2000 Hz. That is, changes in pitch larger than 3.6 Hz can be perceived in a clinical setting. However, even smaller pitch differences can be perceived through other means. For example, the interference of two pitches can often be heard as a repetitive variation in volume of the tone. This amplitude modulation occurs with a frequency equal to the difference in frequencies of the two tones and is known as beating.
The semitone scale used in Western musical notation is not a linear frequency scale but logarithmic. Other scales have been derived directly from experiments on human hearing perception, such as the mel scale and Bark scale (these are used in studying perception, but not usually in musical composition), and these are approximately logarithmic in frequency at the high-frequency end, but nearly linear at the low-frequency end.
The intensity range of audible sounds is enormous. Human ear drums are sensitive to variations in the sound pressure, and can detect pressure changes from as small as a few micropascals (μPa) to greater than 100 kPa. For this reason, sound pressure level is also measured logarithmically, with all pressures referenced to 20 μPa (or 1.97385×10−10 atm). The lower limit of audibility is therefore defined as 0 dB, but the upper limit is not as clearly defined. The upper limit is more a question of the limit where the ear will be physically harmed or with the potential to cause noise-induced hearing loss.
A more rigorous exploration of the lower limits of audibility determines that the minimum threshold at which a sound can be heard is frequency dependent. By measuring this minimum intensity for testing tones of various frequencies, a frequency dependent absolute threshold of hearing (ATH) curve may be derived. Typically, the ear shows a peak of sensitivity (i.e., its lowest ATH) between 1–5 kHz, though the threshold changes with age, with older ears showing decreased sensitivity above 2 kHz.
The ATH is the lowest of the equal-loudness contours. Equal-loudness contours indicate the sound pressure level (dB SPL), over the range of audible frequencies, that are perceived as being of equal loudness. Equal-loudness contours were first measured by Fletcher and Munson at Bell Labs in 1933 using pure tones reproduced via headphones, and the data they collected are called Fletcher–Munson curves. Because subjective loudness was difficult to measure, the Fletcher–Munson curves were averaged over many subjects.
Robinson and Dadson refined the process in 1956 to obtain a new set of equal-loudness curves for a frontal sound source measured in an anechoic chamber. The Robinson-Dadson curves were standardized as ISO 226 in 1986. In 2003, ISO 226 was revised as equal-loudness contour using data collected from 12 international studies.
Sound localization is the process of determining the location of a sound source. The brain utilizes subtle differences in loudness, tone and timing between the two ears to allow us to localize sound sources.Localization can be described in terms of three-dimensional position: the azimuth or horizontal angle, the zenith or vertical angle, and the distance (for static sounds) or velocity (for moving sounds). Humans, as most four-legged animals, are adept at detecting direction in the horizontal, but less so in the vertical due to the ears being placed symmetrically. Some species of owls have their ears placed asymmetrically, and can detect sound in all three planes, an adaption to hunt small mammals in the dark.
Suppose a listener can hear a given acoustical signal under silent condition. When a signal is playing while another sound is being played (a masker), the signal has to be stronger for the listener to hear it. The masker does not need to have the frequency components of the original signal for masking to happen. A masked signal can be heard even though it is weaker than the masker. Masking happens when a signal and a masker are played together—for instance, when one person whispers while another person shouts—and the listener doesn't hear the weaker signal as it has been masked by the louder masker. Masking can also happen when a signal starts after a masker stops. For example, a single sudden loud clap sound can make sounds that follow inaudible. The effects of backward masking is weaker than forward masking. The masking effect has been widely studied in psychoacoustical research. One can change the level of the masker and measure the threshold, then create a diagram of a psychophysical tuning curve that will reveal similar features. Masking effects are also used in lossy audio encoding, such as MP3.
When presented with a harmonic series of frequencies in the relationship 2f, 3f, 4f, 5f, etc. (where f is a specific frequency), humans tend to perceive that the pitch is f. An audible example can be found on YouTube.
The psychoacoustic model provides for high quality lossy signal compression by describing which parts of a given digital audio signal can be removed (or aggressively compressed) safely—that is, without significant losses in the (consciously) perceived quality of the sound.
It can explain how a sharp clap of the hands might seem painfully loud in a quiet library, but is hardly noticeable after a car backfires on a busy, urban street. This provides great benefit to the overall compression ratio, and psychoacoustic analysis routinely leads to compressed music files that are 1/10th to 1/12th the size of high quality masters, but with discernibly less proportional quality loss. Such compression is a feature of nearly all modern lossy audio compression formats. Some of these formats include Dolby Digital (AC-3), MP3, Opus, Ogg Vorbis, AAC, WMA, MPEG-1 Layer II (used for digital audio broadcasting in several countries) and ATRAC, the compression used in MiniDisc and some Walkman models.
Psychoacoustics is based heavily on human anatomy, especially the ear's limitations in perceiving sound as outlined previously. To summarize, these limitations are:
A compression algorithm can assign a lower priority to sounds outside the range of human hearing. By carefully shifting bits away from the unimportant components and toward the important ones, the algorithm ensures that the sounds a listener is most likely to perceive are most accurately represented.
Psychoacoustics includes topics and studies that are relevant to music psychology and music therapy. Theorists such as Benjamin Boretz consider some of the results of psychoacoustics to be meaningful only in a musical context.
Irv Teibel's Environments series LPs (1969–79) are an early example of commercially available sounds released expressly for enhancing psychological abilities.
Psychoacoustics has long enjoyed a symbiotic relationship with computer science, computer engineering, and computer networking. Internet pioneers J. C. R. Licklider and Bob Taylor both completed graduate-level work in psychoacoustics, while BBN Technologies originally specialized in consulting on acoustics issues before it began building the first packet-switched network.
Licklider wrote a paper entitled "A duplex theory of pitch perception".
Psychoacoustics is applied within many fields of software development, where developers map proven and experimental mathematical patterns in digital signal processing. Many audio compression codecs such as MP3 and Opus use a psychoacoustic model to increase compression ratios. The success of conventional audio systems for the reproduction of music in theatres and homes can be attributed to psychoacousticsand psychoacoustic considerations gave rise to novel audio systems, such as psychoacoustic sound field synthesis. Furthermore, scientists have experimented with limited success in creating new acoustic weapons, which emit frequencies that may impair, harm, or kill. Psychoacoustics are also leveraged in sonification to make multiple independent data dimensions audible and easily interpretable. This enables auditory guidance without the need for spatial audio and in sonification computer games and other applications, such as drone flying and image-guided surgery. It is also applied today within music, where musicians and artists continue to create new auditory experiences by masking unwanted frequencies of instruments, causing other frequencies to be enhanced. Yet another application is in the design of small or lower-quality loudspeakers, which can use the phenomenon of missing fundamentals to give the effect of bass notes at lower frequencies than the loudspeakers are physically able to produce (see references).
Automobile manufacturers engineer their engines and even doors to have a certain sound.
Pitch is a perceptual property of sounds that allows their ordering on a frequency-related scale, or more commonly, pitch is the quality that makes it possible to judge sounds as "higher" and "lower" in the sense associated with musical melodies. Pitch can be determined only in sounds that have a frequency that is clear and stable enough to distinguish from noise. Pitch is a major auditory attribute of musical tones, along with duration, loudness, and timbre.
Auditory illusions are false perceptions of a real sound or outside stimulus. These false perceptions are the equivalent of an optical illusion: the listener hears either sounds which are not present in the stimulus, or sounds that should not be possible given the circumstance on how they were created.
A harmonic sound is said to have a missing fundamental, suppressed fundamental, or phantom fundamental when its overtones suggest a fundamental frequency but the sound lacks a component at the fundamental frequency itself. The brain perceives the pitch of a tone not only by its fundamental frequency, but also by the periodicity implied by the relationship between the higher harmonics; we may perceive the same pitch even if the fundamental frequency is missing from a tone.
The absolute threshold of hearing (ATH) is the minimum sound level of a pure tone that an average human ear with normal hearing can hear with no other sound present. The absolute threshold relates to the sound that can just be heard by the organism. The absolute threshold is not a discrete point, and is therefore classed as the point at which a sound elicits a response a specified percentage of the time. This is also known as the auditory threshold.
The Bark scale is a psychoacoustical scale proposed by Eberhard Zwicker in 1961. It is named after Heinrich Barkhausen who proposed the first subjective measurements of loudness. One definition of the term is "...a frequency scale on which equal distances correspond with perceptually equal distances. Above about 500 Hz this scale is more or less equal to a logarithmic frequency axis. Below 500 Hz the Bark scale becomes more and more linear."
In acoustics, loudness is the subjective perception of sound pressure. More formally, it is defined as, "That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relation of physical attributes of sound to perceived loudness consists of physical, physiological and psychological components. The study of apparent loudness is included in the topic of psychoacoustics and employs methods of psychophysics.
Sonification is the use of non-speech audio to convey information or perceptualize data. Auditory perception has advantages in temporal, spatial, amplitude, and frequency resolution that open possibilities as an alternative or complement to visualization techniques.
Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.
An equal-loudness contour is a measure of sound pressure level, over the frequency spectrum, for which a listener perceives a constant loudness when presented with pure steady tones. The unit of measurement for loudness levels is the phon and is arrived at by reference to equal-loudness contours. By definition, two sine waves of differing frequencies are said to have equal-loudness level measured in phons if they are perceived as equally loud by the average young person without significant hearing impairment.
In audiology and psychoacoustics the concept of critical bands, introduced by Harvey Fletcher in 1933 and refined in 1940, describes the frequency bandwidth of the "auditory filter" created by the cochlea, the sense organ of hearing within the inner ear. Roughly, the critical band is the band of audio frequencies within which a second tone will interfere with the perception of the first tone by auditory masking.
Hearing range describes the range of frequencies that can be heard by humans or other animals, though it can also refer to the range of levels. The human range is commonly given as 20 to 20,000 Hz, although there is considerable variation between individuals, especially at high frequencies, and a gradual loss of sensitivity to higher frequencies with age is considered normal. Sensitivity also varies with frequency, as shown by equal-loudness contours. Routine investigation for hearing loss usually involves an audiogram which shows threshold levels relative to a normal.
In perception and psychophysics, auditory scene analysis (ASA) is a proposed model for the basis of auditory perception. This is understood as the process by which the human auditory system organizes sound into perceptually meaningful elements. The term was coined by psychologist Albert Bregman. The related concept in machine perception is computational auditory scene analysis (CASA), which is closely related to source separation and blind signal separation.
Auditory masking occurs when the perception of one sound is affected by the presence of another sound.
The hypersonic effect is a phenomenon reported in a controversial scientific study by Tsutomu Oohashi et al., which claims that, although humans cannot consciously hear ultrasound, the presence or absence of those frequencies has a measurable effect on their physiological and psychological reactions.
Virtual pitch is the pitch of a complex tone. Virtual pitch corresponds approximately to the fundamental of a harmonic series that is recognized among the audible partials. A virtual pitch may be perceived even if the perceived pattern is incomplete or mistuned. In that respect, virtual pitch perception is similar to other forms of pattern recognition. It corresponds to the phenomenon whereby one's brain extracts tones from everyday signals and music, even if parts of the signal are masked by other sounds. Virtual pitch is contrasted to spectral pitch, which is the pitch of a pure tone or spectral component. Virtual pitch is called "virtual" because there is no acoustical correlate at the frequency corresponding to the pitch: even when a virtual pitch corresponds to a physically present fundamental, as it often does in everyday harmonic complex tones, the exact virtual pitch depends on the exact frequencies of higher harmonics and is almost independent of the exact frequency of the fundamental.
In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid.
In signal processing, sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands, typically by using a fast Fourier transform, and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals.
Ernst Terhardt is a German engineer and psychoacoustician who made significant contributions in diverse areas of audio communication including pitch perception, music cognition, and Fourier transformation. He was professor in the area of acoustic communication at the Institute of Electroacoustics, Technical University of Munich, Germany.
Temporal envelope (ENV) and temporal fine structure (TFS) are changes in the amplitude and frequency of sound perceived by humans over time. These temporal changes are responsible for several aspects of auditory perception, including loudness, pitch and timbre perception and spatial hearing.
Apparent source width (ASW) is the audible impression of a spatially extended sound source. This psychoacoustic impression results from sound radiation characteristics and properties of an acoustic space. Wide sources are desired by listeners of music because these are associated with sound of acoustic music, opera, classical music, historically informed performance. Research concerning ASW comes from the field of room acoustics, architectural acoustics and auralization as well as musical acoustics, psychoacoustics and systematic musicology.
|Wikimedia Commons has media related to Psychoacoustics .|