Spectral centroid

Last updated

The spectral centroid is a measure used in digital signal processing to characterise a spectrum. It indicates where the center of mass of the spectrum is located. Perceptually, it has a robust connection with the impression of brightness of a sound. [1] It is sometimes called center of spectral mass. [2]

Contents

Calculation

It is calculated as the weighted mean of the frequencies present in the signal, determined using a Fourier transform, with their magnitudes as the weights: [3]

where x(n) represents the weighted frequency value, or magnitude, of bin number n, and f(n) represents the center frequency of that bin.

Alternative usage

Some people use "spectral centroid" to refer to the median of the spectrum. This is a different statistic, the difference being essentially the same as the difference between the unweighted median and mean statistics. Since both are measures of central tendency, in some situations they will exhibit some similarity of behaviour. But since typical audio spectra are not normally distributed, the two measures will often give strongly different values. Grey and Gordon in 1978 found the mean a better fit than the median. [1]

Applications

Because the spectral centroid is a good predictor of the "brightness" of a sound, [1] it is widely used in digital audio and music processing as an automatic measure of musical timbre. [4]

Related Research Articles

Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.

<span class="mw-page-title-main">Timbre</span> Quality of a musical note or sound or tone

In music, timbre, also known as tone color or tone quality, is the perceived sound quality of a musical note, sound or tone. Timbre distinguishes different types of sound production, such as choir voices and musical instruments. It also enables listeners to distinguish different instruments in the same category.

<span class="mw-page-title-main">Pitch (music)</span> Perceptual property in music ordering sounds from low to high

Pitch is a perceptual property of sounds that allows their ordering on a frequency-related scale, or more commonly, pitch is the quality that makes it possible to judge sounds as "higher" and "lower" in the sense associated with musical melodies. Pitch is a major auditory attribute of musical tones, along with duration, loudness, and timbre.

<span class="mw-page-title-main">Centroid</span> Mean ("average") position of all the points in a shape

In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any object in n-dimensional Euclidean space.

<span class="mw-page-title-main">Spectral density</span> Relative importance of certain frequencies in a composite signal

The power spectrum of a time series describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, or a spectrum of frequencies over a continuous range. The statistical average of a certain signal or sort of signal as analyzed in terms of its frequency content, is called its spectrum.

Sound localization is a listener's ability to identify the location or origin of a detected sound in direction and distance.

A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain, the frequency domain, or both.

<span class="mw-page-title-main">Spectral slope</span>

In astrophysics and planetary science, spectral slope, also called spectral gradient, is a measure of dependence of the reflectance on the wavelength.

Spectral flux is a measure of how quickly the power spectrum of a signal is changing, calculated by comparing the power spectrum for one frame against the power spectrum from the previous frame.

<span class="mw-page-title-main">Spectral flatness</span>

Spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used in digital signal processing to characterize an audio spectrum. Spectral flatness is typically measured in decibels, and provides a way to quantify how much a sound resembles a pure tone, as opposed to being noise-like.

In signal processing, the high frequency content measure is a simple measure, taken across a signal spectrum, that can be used to characterize the amount of high-frequency content in the signal. The magnitudes of the spectral bins are added together, but multiplying each magnitude by the bin "position". Thus if X(k) is a discrete spectrum with N unique points, its high frequency content measure is:

Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994-1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2001. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric. PEAQ characterizes the perceived audio quality as subjects would do in a listening test according to ITU-R BS.1116. PEAQ results principally model mean opinion scores that cover a scale from 1 (bad) to 5 (excellent).

In perception and psychophysics, auditory scene analysis (ASA) is a proposed model for the basis of auditory perception. This is understood as the process by which the human auditory system organizes sound into perceptually meaningful elements. The term was coined by psychologist Albert Bregman. The related concept in machine perception is computational auditory scene analysis (CASA), which is closely related to source separation and blind signal separation.

Computer audition (CA) or machine listening is general field of study of algorithms and systems for audio understanding by machine. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems --"software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."

In statistical signal processing, the goal of spectral density estimation (SDE) or simply spectral estimation is to estimate the spectral density of a signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the frequency content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities.

<span class="mw-page-title-main">Least-squares spectral analysis</span> Frequency-domain analysis method

Least-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum, based on a least squares fit of sinusoids to data samples, similar to Fourier analysis. Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in long gapped records; LSSA mitigates such problems. Unlike with Fourier analysis, data need not be equally spaced to use LSSA.

<span class="mw-page-title-main">Sound</span> Vibration that travels via pressure waves in matter

In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the reception of such waves and their perception by the brain. Only acoustic waves that have frequencies lying between about 20 Hz and 20 kHz, the audio frequency range, elicit an auditory percept in humans. In air at atmospheric pressure, these represent sound waves with wavelengths of 17 meters (56 ft) to 1.7 centimeters (0.67 in). Sound waves above 20 kHz are known as ultrasound and are not audible to humans. Sound waves below 20 Hz are known as infrasound. Different animal species have varying hearing ranges.

Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how humans perceive various sounds. More specifically, it is the branch of science studying the psychological responses associated with sound. Psychoacoustics is an interdisciplinary field of many areas, including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

In music cognition, melodic fission, is a phenomenon in which one line of pitches is heard as two or more separate melodic lines. This occurs when a phrase contains groups of pitches at two or more distinct registers or with two or more distinct timbres.

<span class="mw-page-title-main">Spectrum (physical sciences)</span> Concept relating to waves and signals

In the physical sciences, the term spectrum was introduced first into optics by Isaac Newton in the 17th century, referring to the range of colors observed when white light was dispersed through a prism. Soon the term referred to a plot of light intensity or power as a function of frequency or wavelength, also known as a spectral density plot.

References

  1. 1 2 3 Grey, John M.; Gordon, John W. (1978). "Perceptual effects of spectral modifications on musical timbres". The Journal of the Acoustical Society of America. Acoustical Society of America (ASA). 63 (5): 1493–1500. doi:10.1121/1.381843. ISSN   0001-4966.
  2. Pulavarti, Surya V. S. R. K.; Maguire, Jack B.; Yuen, Shirley; Harrison, Joseph S.; Griffin, Jermel; Premkumar, Lakshmanane; Esposito, Edward A.; Makhatadze, George I.; Garcia, Angel E.; Weiss, Thomas M.; Snell, Edward H. (2022-02-17). "From Protein Design to the Energy Landscape of a Cold Unfolding Protein". The Journal of Physical Chemistry B. 126 (6): 1212–1231. doi:10.1021/acs.jpcb.1c10750. ISSN   1520-6106. PMC   9281400 . PMID   35128921.
  3. A Large Set of Audio Features for Sound Description - technical report published by IRCAM in 2003. Section 6.1.1 describes the spectral centroid.
  4. Schubert, Emery; Wolfe, Joe; Tarnopolsky, Alex (2004). "Spectral centroid and timbre in complex, multiple instrumental textures" (PDF). Proceedings of the 8th International Conference on Music Perception & Cognition, North Western University, Illinois. International Conference on Music Perception & Cognition. Lipscomb, S.D.; Ashley, R.; Gjerdingen, R. O.; Webster, P. (Eds.). Sydney, Australia: School of Music and Music Education; School of Physics, University of New South Wales. Archived from the original (PDF) on 2011-08-10.