Audio analysis refers to the extraction of information and meaning from audio signals for analysis, classification, storage, retrieval, synthesis, etc. The observation mediums and interpretation methods vary, as audio analysis can refer to the human ear and how people interpret the audible sound source, or it could refer to using technology such as an audio analyzer to evaluate other qualities of a sound source such as amplitude, distortion, frequency response. Once an audio source's information has been observed, the information revealed can then be processed for the logical, emotional, descriptive, or otherwise relevant interpretation by the user.
The most prevalent form of audio analysis is derived from the sense of hearing. A type of sensory perception that occurs in much of the planet's fauna, audio analysis is a fundamental process of many living beings. Sounds made by the surrounding environment or other living beings provides input to the hearing mechanism, for which the listener's brain can interpret the sound and how it should respond. Examples of functions include speech, startle response, music listening, and more.
An inherent ability of humans, hearing is fundamental in communication across the globe, and the process of assigning meaning and value to speech is a complex but necessary function of the human body. The study of the auditory system has been greatly centered using mathematics and the analysis of sinusoidal vibrations and sounds. The Fourier Transform has been an essential theorem in understanding how the human ear processes moving air and turns it into the audible frequency range, about 20 to 20,000 Hz. [1] The ear is able to take one complex waveform and process it into varying frequency ranges thanks to differences in the structures of the ear canal that are tuned to specific frequency ranges. [2] The initial sensory input is then analyzed further up in the neurological system where the perception of sound takes place.
The auditory system also works in tandem with the neural system so that the listener is capable of spatially locating the direction from which a sound source originated. This is known as the Haas or Precedence effect and is possible due to the nature of having two ears, or auditory receptors. The difference in time it takes for a sound to reach both ears provides the necessary information for the brain to calculate the spatial positioning of the source. [3]
Audio signals can be analyzed in several different ways, depending on the kind of information desired from the signal.
Types of signal analysis include:
Hardware analyzers have been the primary means of signal analysis since the invention of the first audio analyzer, made by Hewlett-Packard, the HP200A. Hardware analyzers are typically used in engineering, testing, and manufacturing of professional and consumer grade products. As computer technology progressed, integrated software found its way into these hardware systems, and later there would be audio analysis tools that did not require any hardware components save for the computer running the software. Software audio analyzers are regularly used in various stages of music production, such as live audio, mixing, and mastering. These products tend to employ Fast Fourier Transform (FFT) algorithms and processing to provide a visual representation of the signal being analyzed. Display and information types include frequency spectrum, stereo field, surround field, spectrogram, and more.
A head-related transfer function (HRTF) is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person.
In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of systems, such as audio and control systems, where they simplify mathematical analysis by converting governing differential equations into algebraic equations. In an audio system, it may be used to minimize audible distortion by designing components so that the overall response is as flat (uniform) as possible across the system's bandwidth. In control systems, such as a vehicle's cruise control, it may be used to assess system stability, often through the use of Bode plots. Systems with a specific frequency response can be designed using analog and digital filters.
A spectrum analyzer measures the magnitude of an input signal versus frequency within the full frequency range of the instrument. The primary use is to measure the power of the spectrum of known and unknown signals. The input signal that most common spectrum analyzers measure is electrical; however, spectral compositions of other signals, such as acoustic pressure waves and optical light waves, can be considered through the use of an appropriate transducer. Spectrum analyzers for other types of signals also exist, such as optical spectrum analyzers which use direct optical techniques such as a monochromator to make measurements.
A hearing aid is a device designed to improve hearing by making sound audible to a person with hearing loss. Hearing aids are classified as medical devices in most countries, and regulated by the respective regulations. Small audio amplifiers such as personal sound amplification products (PSAPs) or other plain sound reinforcing systems cannot be sold as "hearing aids".
Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.
An equal-loudness contour is a measure of sound pressure level, over the frequency spectrum, for which a listener perceives a constant loudness when presented with pure steady tones. The unit of measurement for loudness levels is the phon and is arrived at by reference to equal-loudness contours. By definition, two sine waves of differing frequencies are said to have equal-loudness level measured in phons if they are perceived as equally loud by the average young person without significant hearing impairment.
Hearing range describes the frequency range that can be heard by humans or other animals, though it can also refer to the range of levels. The human range is commonly given as 20 to 20,000 Hz, although there is considerable variation between individuals, especially at high frequencies, and a gradual loss of sensitivity to higher frequencies with age is considered normal. Sensitivity also varies with frequency, as shown by equal-loudness contours. Routine investigation for hearing loss usually involves an audiogram which shows threshold levels relative to a normal.
Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994–1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2023. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric.
Computational auditory scene analysis (CASA) is the study of auditory scene analysis by computational means. In essence, CASA systems are "machine listening" systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field of blind signal separation in that it is based on the mechanisms of the human auditory system, and thus uses no more than two microphone recordings of an acoustic environment. It is related to the cocktail party problem.
Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."
In audio signal processing, auditory masking occurs when the perception of one sound is affected by the presence of another sound.
A real-time analyzer (RTA) is a professional audio device that measures and displays the frequency spectrum of an audio signal; a spectrum analyzer that works in real time. An RTA can range from a small PDA-sized device to a rack-mounted hardware unit to software running on a laptop. It works by measuring and displaying sound input, often from an integrated microphone or with a signal from a PA system. Basic RTAs show three measurements per octave at 3 or 6 dB increments; sophisticated software solutions can show 24 or more measurements per octave as well as 0.1 dB resolution.
In physics, sound is a vibration that propagates as an acoustic wave through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the reception of such waves and their perception by the brain. Only acoustic waves that have frequencies lying between about 20 Hz and 20 kHz, the audio frequency range, elicit an auditory percept in humans. In air at atmospheric pressure, these represent sound waves with wavelengths of 17 meters (56 ft) to 1.7 centimeters (0.67 in). Sound waves above 20 kHz are known as ultrasound and are not audible to humans. Sound waves below 20 Hz are known as infrasound. Different animal species have varying hearing ranges.
Hearing, or auditory perception, is the ability to perceive sounds through an organ, such as an ear, by detecting vibrations as periodic changes in the pressure of a surrounding medium. The academic field concerned with hearing is auditory science.
In sound recording and reproduction, audio mixing is the process of optimizing and combining multitrack recordings into a final mono, stereo or surround sound product. In the process of combining the separate tracks, their relative levels are adjusted and balanced and various processes such as equalization and compression are commonly applied to individual tracks, groups of tracks, and the overall mix. In stereo and surround sound mixing, the placement of the tracks within the stereo field are adjusted and balanced. Audio mixing techniques and approaches vary widely and have a significant influence on the final product.
In signal processing, sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands, typically by using a fast Fourier transform, and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals.
The neural encoding of sound is the representation of auditory sensation and perception in the nervous system. The complexities of contemporary neuroscience are continually redefined. Thus what is known of the auditory system has been continually changing. The encoding of sounds includes the transduction of sound waves into electrical impulses along auditory nerve fibers, and further processing in the brain.
Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, and music. Psychoacoustics is an interdisciplinary field including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.
A mixing engineer is responsible for combining ("mixing") different sonic elements of an auditory piece into a complete rendition, whether in music, film, or any other content of auditory nature. The finished piece, recorded or live, must achieve a good balance of properties, such as volume, pan positioning, and other effects, while resolving any arising frequency conflicts from various sound sources. These sound sources can comprise the different musical instruments or vocals in a band or orchestra, dialogue or Foley in a film, and more.
Ernst Terhardt is a German engineer and psychoacoustician who made significant contributions in diverse areas of audio communication including pitch perception, music cognition, and Fourier transformation. He was professor in the area of acoustic communication at the Institute of Electroacoustics, Technical University of Munich, Germany.
{{citation}}
: CS1 maint: DOI inactive as of June 2024 (link){{cite book}}
: |journal=
ignored (help){{cite book}}
: |journal=
ignored (help)