Spectral centroid

Last updated

The spectral centroid is a measure used in digital signal processing to characterise a spectrum. It indicates where the center of mass of the spectrum is located. Perceptually, it has a robust connection with the impression of brightness of a sound. [1] It is sometimes called center of spectral mass. [2]

Contents

Calculation

It is calculated as the weighted mean of the frequencies present in the signal, determined using a Fourier transform, with their magnitudes as the weights: [3]

where x(n) represents the weighted frequency value, or magnitude, of bin number n, and f(n) represents the center frequency of that bin.

Alternative usage

Some people use "spectral centroid" to refer to the median of the spectrum. This is a different statistic, the difference being essentially the same as the difference between the unweighted median and mean statistics. Since both are measures of central tendency, in some situations they will exhibit some similarity of behaviour. But since typical audio spectra are not normally distributed, the two measures will often give strongly different values. Grey and Gordon in 1978 found the mean a better fit than the median. [1]

Applications

Because the spectral centroid is a good predictor of the "brightness" of a sound, [1] it is widely used in digital audio and music processing as an automatic measure of musical timbre. [4]

Related Research Articles

Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.

<span class="mw-page-title-main">Pink noise</span> Signal with equal energy per octave

Pink noise, 1f noise, fractional noise or fractal noise is a signal or process with a frequency spectrum such that the power spectral density is inversely proportional to the frequency of the signal. In pink noise, each octave interval carries an equal amount of noise energy.

<span class="mw-page-title-main">Timbre</span> Quality of a musical note or sound or tone

In music, timbre, also known as tone color or tone quality, is the perceived sound quality of a musical note, sound or tone. Timbre distinguishes different types of sound production, such as choir voices and musical instruments. It also enables listeners to distinguish different instruments in the same category.

<span class="mw-page-title-main">Pitch (music)</span> Perceptual property in music ordering sounds from low to high

Pitch is a perceptual property that allows sounds to be ordered on a frequency-related scale, or more commonly, pitch is the quality that makes it possible to judge sounds as "higher" and "lower" in the sense associated with musical melodies. Pitch is a major auditory attribute of musical tones, along with duration, loudness, and timbre.

<span class="mw-page-title-main">Centroid</span> Mean position of all the points in a shape

In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any object in -dimensional Euclidean space.

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

In audio engineering, electronics, physics, and many other fields, the color of noise or noise spectrum refers to the power spectrum of a noise signal. Different colors of noise have significantly different properties. For example, as audio signals they will sound different to human ears, and as images they will have a visibly different texture. Therefore, each application typically requires noise of a specific color. This sense of 'color' for noise signals is similar to the concept of timbre in music.

A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain, the frequency domain, or both.

<span class="mw-page-title-main">Spectral slope</span>

In astrophysics and planetary science, spectral slope, also called spectral gradient, is a measure of dependence of the reflectance on the wavelength.

<span class="mw-page-title-main">Spectral flatness</span> Measure used in digital signal processing to characterize an audio spectrum

Spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used in digital signal processing to characterize an audio spectrum. Spectral flatness is typically measured in decibels, and provides a way to quantify how much a sound resembles a pure tone, as opposed to being noise-like.

In signal processing, the high frequency content measure is a simple measure, taken across a signal spectrum (usually a STFT spectrum), that can be used to characterize the amount of high-frequency content in the signal. The magnitudes of the spectral bins are added together, but multiplying each magnitude by the bin "position" (proportional to the frequency). Thus if X(k) is a discrete spectrum with N unique points, its high frequency content measure is:

<span class="mw-page-title-main">Auditory scene analysis</span>

In perception and psychophysics, auditory scene analysis (ASA) is a proposed model for the basis of auditory perception. This is understood as the process by which the human auditory system organizes sound into perceptually meaningful elements. The term was coined by psychologist Albert Bregman. The related concept in machine perception is computational auditory scene analysis (CASA), which is closely related to source separation and blind signal separation.

Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."

<span class="mw-page-title-main">Constant-Q transform</span> Short-time Fourier transform with variable resolution

In mathematics and signal processing, the constant-Q transform and variable-Q transform, simply known as CQT and VQT, transforms a data series to the frequency domain. It is related to the Fourier transform and very closely related to the complex Morlet wavelet transform. Its design is suited for musical representation.

In statistical signal processing, the goal of spectral density estimation (SDE) or simply spectral estimation is to estimate the spectral density of a signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the frequency content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities.

<span class="mw-page-title-main">Least-squares spectral analysis</span> Periodicity computation method

Least-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum based on a least-squares fit of sinusoids to data samples, similar to Fourier analysis. Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in the long and gapped records; LSSA mitigates such problems. Unlike in Fourier analysis, data need not be equally spaced to use LSSA.

<span class="mw-page-title-main">Sound</span> Vibration that travels via pressure waves in matter

In physics, sound is a vibration that propagates as an acoustic wave through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the reception of such waves and their perception by the brain. Only acoustic waves that have frequencies lying between about 20 Hz and 20 kHz, the audio frequency range, elicit an auditory percept in humans. In air at atmospheric pressure, these represent sound waves with wavelengths of 17 meters (56 ft) to 1.7 centimeters (0.67 in). Sound waves above 20 kHz are known as ultrasound and are not audible to humans. Sound waves below 20 Hz are known as infrasound. Different animal species have varying hearing ranges.

Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, and music. Psychoacoustics is an interdisciplinary field including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

Perceptual Objective Listening Quality Analysis (POLQA) was the working title of an ITU-T standard that covers a model to predict speech quality by means of analyzing digital speech signals. The model was standardized as Recommendation ITU-T P.863 in 2011. The second edition of the standard appeared in 2014, and the third, currently in-force edition was adopted in 2018 under the title Perceptual objective listening quality prediction.

References

  1. 1 2 3 Grey, John M.; Gordon, John W. (1978). "Perceptual effects of spectral modifications on musical timbres". The Journal of the Acoustical Society of America. 63 (5). Acoustical Society of America (ASA): 1493–1500. Bibcode:1978ASAJ...63.1493G. doi:10.1121/1.381843. ISSN   0001-4966.
  2. Pulavarti, Surya V. S. R. K.; Maguire, Jack B.; Yuen, Shirley; Harrison, Joseph S.; Griffin, Jermel; Premkumar, Lakshmanane; Esposito, Edward A.; Makhatadze, George I.; Garcia, Angel E.; Weiss, Thomas M.; Snell, Edward H. (2022-02-17). "From Protein Design to the Energy Landscape of a Cold Unfolding Protein". The Journal of Physical Chemistry B. 126 (6): 1212–1231. doi:10.1021/acs.jpcb.1c10750. ISSN   1520-6106. PMC   9281400 . PMID   35128921.
  3. A Large Set of Audio Features for Sound Description. Technical report published by IRCAM in 2003. Section 6.1.1 describes the spectral centroid.
  4. Schubert, Emery; Wolfe, Joe; Tarnopolsky, Alex (2004). "Spectral centroid and timbre in complex, multiple instrumental textures" (PDF). Proceedings of the 8th International Conference on Music Perception & Cognition, North Western University, Illinois. International Conference on Music Perception & Cognition. Lipscomb, S.D.; Ashley, R.; Gjerdingen, R. O.; Webster, P. (Eds.). Sydney, Australia: School of Music and Music Education; School of Physics, University of New South Wales. Archived from the original (PDF) on 2011-08-10.