Audio analysis

Last updated

Audio analysis refers to the extraction of information and meaning from audio signals for analysis, classification, storage, retrieval, synthesis, etc. The observation mediums and interpretation methods vary, as audio analysis can refer to the human ear and how people interpret the audible sound source, or it could refer to using technology such as an audio analyzer to evaluate other qualities of a sound source such as amplitude, distortion, frequency response. Once an audio source's information has been observed, the information revealed can then be processed for the logical, emotional, descriptive, or otherwise relevant interpretation by the user.

Contents

Natural Analysis

The most prevalent form of audio analysis is derived from the sense of hearing. A type of sensory perception that occurs in much of the planet's fauna, audio analysis is a fundamental process of many living beings. Sounds made by the surrounding environment or other living beings provides input to the hearing mechanism, for which the listener's brain can interpret the sound and how it should respond. Examples of functions include speech, startle response, music listening, and more.

An inherent ability of humans, hearing is fundamental in communication across the globe, and the process of assigning meaning and value to speech is a complex but necessary function of the human body. The study of the auditory system has been greatly centered using mathematics and the analysis of sinusoidal vibrations and sounds. The Fourier Transform has been an essential theorem in understanding how the human ear processes moving air and turns it into the audible frequency range, about 20 to 20,000 Hz. [1] The ear is able take one complex waveform and process it into varying frequency ranges thanks to differences in the structures of the ear canal, that are tuned to specific frequency ranges. [2] The initial sensory input is then analyzed further up in the neurological system where the perception of sound takes place.

The auditory system also works in tandem with the neural system so that the listener is capable of spatially locating the direction from which a sound source originated. This is known as the Haas or Precedence effect and is possible due to the nature of having two ears, or auditory receptors. The difference in time it takes for a sound to reach both ears provides the necessary information for the brain to calculate the spatial positioning of the source. [3]

Signal Analysis

Example of a frequency analyzer Spectrum analyzer, display, noise floor.jpg
Example of a frequency analyzer

Audio signals can be analyzed in several different ways, depending on the kind of information desired from the signal.

Types of signal analysis include:

A spectrogram image of the THX audio sound THX-DeepNote-Spectogram.png
A spectrogram image of the THX audio sound

Hardware analyzers have been the primary means of signal analysis since the invention of the first audio analyzer, made by Hewlett-Packard, the HP200A. Hardware analyzers are typically used in engineering, testing, and manufacturing of professional and consumer grade products. As computer technology progressed, integrated software found its way into these hardware systems, and later there would be audio analysis tools that did not require any hardware components save for the computer running the software. Software audio analyzers are regularly used in various stages of music production, such as live audio, mixing, and mastering. These products tend to employ Fast Fourier Transform (FFT) algorithms and processing to provide a visual representation of the signal being analyzed. Display and information types include frequency spectrum, stereo field, surround field, spectrogram, and more.

See also

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

<span class="mw-page-title-main">Head-related transfer function</span> Response that characterizes how an ear receives a sound from a point in space

A head-related transfer function (HRTF), also known as anatomical transfer function (ATF), or a head shadow, is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2–5 kHz with a primary resonance of +17 dB at 2,700 Hz. But the response curve is more complex than a single bump, affects a broad frequency spectrum, and varies significantly from person to person.

<span class="mw-page-title-main">Sampling (signal processing)</span> Measurement of a signal at discrete time intervals

In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or space; this definition differs from the term's usage in statistics, which refers to a set of such values.

In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of systems, such as audio and control systems, where they simplify mathematical analysis by converting governing differential equations into algebraic equations. In an audio system, it may be used to minimize audible distortion by designing components so that the overall response is as flat (uniform) as possible across the system's bandwidth. In control systems, such as a vehicle's cruise control, it may be used to assess system stability, often through the use of Bode plots. Systems with a specific frequency response can be designed using analog and digital filters.

<span class="mw-page-title-main">Spectrum analyzer</span> Electronic testing device

A spectrum analyzer measures the magnitude of an input signal versus frequency within the full frequency range of the instrument. The primary use is to measure the power of the spectrum of known and unknown signals. The input signal that most common spectrum analyzers measure is electrical; however, spectral compositions of other signals, such as acoustic pressure waves and optical light waves, can be considered through the use of an appropriate transducer. Spectrum analyzers for other types of signals also exist, such as optical spectrum analyzers which use direct optical techniques such as a monochromator to make measurements.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

<span class="mw-page-title-main">Equal-loudness contour</span> Frequency characteristics of hearing and perceived volume

An equal-loudness contour is a measure of sound pressure level, over the frequency spectrum, for which a listener perceives a constant loudness when presented with pure steady tones. The unit of measurement for loudness levels is the phon and is arrived at by reference to equal-loudness contours. By definition, two sine waves of differing frequencies are said to have equal-loudness level measured in phons if they are perceived as equally loud by the average young person without significant hearing impairment.

Binaural fusion or binaural integration is a cognitive process that involves the combination of different auditory information presented binaurally, or to each ear. In humans, this process is essential in understanding speech as one ear may pick up more information about the speech stimuli than the other.

Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994-1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2023. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric.

Computational auditory scene analysis (CASA) is the study of auditory scene analysis by computational means. In essence, CASA systems are "machine listening" systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field of blind signal separation in that it is based on the mechanisms of the human auditory system, and thus uses no more than two microphone recordings of an acoustic environment. It is related to the cocktail party problem.

In audio signal processing, auditory masking occurs when the perception of one sound is affected by the presence of another sound.

<span class="mw-page-title-main">Real-time analyzer</span>

A real-time analyzer (RTA) is a professional audio device that measures and displays the frequency spectrum of an audio signal; a spectrum analyzer that works in real time. An RTA can range from a small PDA-sized device to a rack-mounted hardware unit to software running on a laptop. It works by measuring and displaying sound input, often from an integrated microphone or with a signal from a PA system. Basic RTAs show three measurements per octave at 3 or 6 dB increments; sophisticated software solutions can show 24 or more measurements per octave as well as 0.1 dB resolution.

<span class="mw-page-title-main">Sound</span> Vibration that travels via pressure waves in matter

In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the reception of such waves and their perception by the brain. Only acoustic waves that have frequencies lying between about 20 Hz and 20 kHz, the audio frequency range, elicit an auditory percept in humans. In air at atmospheric pressure, these represent sound waves with wavelengths of 17 meters (56 ft) to 1.7 centimeters (0.67 in). Sound waves above 20 kHz are known as ultrasound and are not audible to humans. Sound waves below 20 Hz are known as infrasound. Different animal species have varying hearing ranges.

<span class="mw-page-title-main">Hearing</span> Sensory perception of sound by living organisms

Hearing, or auditory perception, is the ability to perceive sounds through an organ, such as an ear, by detecting vibrations as periodic changes in the pressure of a surrounding medium. The academic field concerned with hearing is auditory science.

<span class="mw-page-title-main">Audio mixing (recorded music)</span> Audio mixing to yield recorded sound

In sound recording and reproduction, audio mixing is the process of optimizing and combining multitrack recordings into a final mono, stereo or surround sound product. In the process of combining the separate tracks, their relative levels are adjusted and balanced and various processes such as equalization and compression are commonly applied to individual tracks, groups of tracks, and the overall mix. In stereo and surround sound mixing, the placement of the tracks within the stereo field are adjusted and balanced. Audio mixing techniques and approaches vary widely and have a significant influence on the final product.

<span class="mw-page-title-main">Sub-band coding</span>

In signal processing, sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands, typically by using a fast Fourier transform, and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals.

The neural encoding of sound is the representation of auditory sensation and perception in the nervous system. The complexities of contemporary neuroscience are continually redefined. Thus what is known of the auditory system has been continually changing. The encoding of sounds includes the transduction of sound waves into electrical impulses along auditory nerve fibers, and further processing in the brain.

Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how human auditory system perceives various sounds. More specifically, it is the branch of science studying the psychological responses associated with sound. Psychoacoustics is an interdisciplinary field of many areas, including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

Ernst Terhardt is a German engineer and psychoacoustician who made significant contributions in diverse areas of audio communication including pitch perception, music cognition, and Fourier transformation. He was professor in the area of acoustic communication at the Institute of Electroacoustics, Technical University of Munich, Germany.

<span class="mw-page-title-main">Spectrum (physical sciences)</span> Concept relating to waves and signals

In the physical sciences, the term spectrum was introduced first into optics by Isaac Newton in the 17th century, referring to the range of colors observed when white light was dispersed through a prism. Soon the term referred to a plot of light intensity or power as a function of frequency or wavelength, also known as a spectral density plot.

References

  1. Acton, Ciaran; Miller, Robert; Maltby, John; Fullerton, Deirdre (2009), "Analysis of Variance (ANOVA)", SPSS for Social Scientists, Macmillan Education UK, pp. 183–198, doi:10.1007/978-1-137-01390-3_9, ISBN   9780230209930
  2. Guha, Martin (December 2006). Elsevier's Dictionary of Psychological Theories2006405Compiled by J.E. Roeckelein. Elsevier's Dictionary of Psychological Theories. Amsterdam: Elsevier 2006. xii+679 pp. £90; $143. pp. 10–11. doi:10.1108/09504120610709402. ISBN   0-444-51750-2. ISSN   0950-4125.{{cite book}}: |journal= ignored (help)
  3. Farmer, Lesley (2011-01-18). A/V A to Z: An Encyclopedic Dictionary of Media, Entertainment and Other Audiovisual Terms201139Richard W. Kroon. A/V A to Z: An Encyclopedic Dictionary of Media, Entertainment and Other Audiovisual Terms. Jefferson, NC: McFarland 2010. vi+766 pp., £173.95 $195 Available in the United Kingdom, Europe, the Middle East and Africa from Eurospan. p. 50. doi:10.1108/09504121111103335. ISBN   978-0-7864-4405-2. ISSN   0950-4125.{{cite book}}: |journal= ignored (help)