# Time–frequency analysis for music signals

Last updated

Time–frequency analysis for music signals is one of the applications of time–frequency analysis. Musical sound can be more complicated than human vocal sound, occupying a wider band of frequency. Music signals are time-varying signals; while the classic Fourier transform is not sufficient to analyze them, time–frequency analysis is an efficient tool for such use. Time–frequency analysis is extended from the classic Fourier approach. Short-time Fourier transform (STFT), Gabor transform (GT) and Wigner distribution function (WDF) are famous time–frequency methods, useful for analyzing music signals such as notes played on a piano, a flute or a guitar.

In signal processing, time–frequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various time–frequency representations. Rather than viewing a 1-dimensional signal and some transform, time–frequency analysis studies a two-dimensional signal – a function whose domain is the two-dimensional real plane, obtained from the signal via a time–frequency transform. The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time. The Gabor transform, named after Dennis Gabor, is a special case of the short-time Fourier transform. It is used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. The function to be transformed is first multiplied by a Gaussian function, which can be regarded as a window function, and the resulting function is then transformed with a Fourier transform to derive the time-frequency analysis. The window function means that the signal near the time being analyzed will have higher weight. The Gabor transform of a signal x(t) is defined by this formula:

## Contents

Music is a type of sound that has some stable frequencies in a time period. Music can be produced by several methods. For example, the sound of a piano is produced by striking strings, and the sound of a violin is produced by bowing. All musical sounds have their fundamental frequency and overtones. Fundamental frequency is the lowest frequency in harmonic series. In a periodic signal, the fundamental frequency is the inverse of the period length. Overtones are integer multiples of the fundamental frequency.

In music, a bow is a tensioned stick with hair affixed to it that is moved across some part of a musical instrument to cause vibration, which the instrument emits as sound. The vast majority of bows are used with string instruments, such as the violin, although some bows are used with musical saws and other bowed idiophones. The fundamental frequency, often referred to simply as the fundamental, is defined as the lowest frequency of a periodic waveform. In music, the fundamental is the musical pitch of a note that is perceived as the lowest partial present. In terms of a superposition of sinusoids, the fundamental frequency is the lowest frequency sinusoidal in the sum. In some contexts, the fundamental is usually abbreviated as f0, indicating the lowest frequency counting from zero. In other contexts, it is more common to abbreviate it as f1, the first harmonic.

Since the fundamental is the lowest frequency and is also perceived as the loudest, the ear identifies it as the specific pitch of the musical tone [harmonic spectrum]....The individual partials are not heard separately but are blended together by the ear into a single tone.

Table. 1 the fundamental frequency and overtone
FrequencyOrder
f = 440 HzN = 1Fundamental frequency1st harmonic
f = 880 HzN = 21st overtone2nd harmonic
f = 1320 HzN = 32nd overtone3rd harmonic
f = 1760 HzN = 43rd overtone4th harmonic

In musical theory, pitch represents the perceived fundamental frequency of a sound. However the actual fundamental frequency may differ from the perceived fundamental frequency because of overtones.

## Short-time Fourier transform

### Continuous STFT

Short-time Fourier transform is a basic type of time–frequency analysis. If there is a continuous signal x(t), we can compute the short-time Fourier transform by

$\mathbf {STFT} \left\{x(t)\right\}\equiv X(t,f)=\int _{-\infty }^{\infty }x(\tau )w(t-\tau )e^{-j2\pi f\tau }\,d\tau$ where w(t) is a window function. When the w(t) is a rectangular function, the transform is called Rec-STFT. When the w(t) is a Gaussian function, the transform is called Gabor transform. In signal processing and statistics, a window function is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the interval, usually near a maximum in the middle, and usually tapering away from the middle. Mathematically, when another function or waveform/data-sequence is "multiplied" by a window function, the product is also zero-valued outside the interval: all that is left is the part where they overlap, the "view through the window". Equivalently, and in actual practice, the segment of data within the window is first isolated, and then only that data is multiplied by the window function values. Thus, tapering, not segmentation, is the main purpose of window functions.

### Discrete STFT

However, normally the musical signal we have is not a continuous signal. It is sampled in a sampling frequency. Therefore, we can’t use the formula to compute the Rec-short-time Fourier transform. We change the original form to

$X(n\,\Delta t,m\,\Delta f)=\sum _{p=n-Q}^{n+Q}x(p\,\Delta t)e^{-j2\pi pm\,\Delta t\,\Delta f}\,\Delta t$ Let $t=n\,\Delta t$ , $f=m\,\Delta f$ , $\tau =p\,\Delta t$ and $B=Q\,\Delta t$ . There are some constraints of discrete short-time Fourier transform:

• $\Delta t\,\Delta f={\frac {1}{N}},$ where N is an integer.
• $N\geq 2Q+1$ • $\Delta <{\frac {1}{2f_{\max }}}$ , where $f_{\max }$ is the highest frequency in the signal.

## STFT example

Fig.1 shows the waveform of a piano music audio file with 44100 Hz sampling frequency. And Fig.2 shows the result of short-time Fourier transform (we use Gabor transform here) of the audio file. We can see from the time–frequency plot, from t = 0 to 0.5 second, there is a chord with three notes, and the chord changed at t = 0.5, and then changed again at t = 1. The fundamental frequency of each note in each chord is shown in the time–frequency plot.

## Spectrogram

Figure 3 shows the spectrogram of the audio file shown in Figure 1. Spectrogram is the square of STFT, time-varying spectral representation. The spectrogram of a signal s(t) can be estimated by computing the squared magnitude of the STFT of the signal s(t), as shown below: A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data is represented in a 3D plot they may be called waterfalls.

In mathematics, magnitude is the size of a mathematical object, a property which determines whether the object is larger or smaller than other objects of the same kind. More formally, an object's magnitude is the displayed result of an ordering of the class of objects to which it belongs.

$\mathbf {spectrogram} (t,f)=\left|\mathbf {STFT} (t,f)\right|^{2}$ Although the spectrogram is profoundly useful, it still has one drawback. It displays frequencies on a uniform scale. However, musical scales are based on a logarithmic scale for frequencies. Therefore, we should describe the frequency in logarithmic scale related to human hearing.

## Wigner distribution function

The Wigner distribution function can also be used to analyze music signal. The advantage of Wigner distribution function is the high clarity. However, it needs high calculation and has cross-term problem, so it's more suitable to analyze signal without more than one frequency at the same time. The Wigner distribution function (WDF) is used in signal processing as a transform in time-frequency analysis.

### Formula

The Wigner distribution function $W_{x}(t,f)$ is:

$\mathbf {W} _{x}(t,f)=\int _{-\infty }^{\infty }x(t+\tau /2)x^{*}(t-\tau /2)e^{-j2\pi \tau \,f}\,d\tau ,$ where x(t) is the signal, and x*(t) is the conjugate of the signal.

## Sources

• Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra, "Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification," August, 2008
• William J. Pielemeier, Gregory H. Wakefield, and Mary H. Simoni, "Time–frequency Analysis of Musical Signals," September,1996
• Jeremy F. Alm and James S. Walker, "Time–Frequency Analysis of Musical Instruments," 2002
• Monika Dorfler, "What Time–Frequency Analysis Can Do To Music Signals," April,2004
• EnShuo Tsau, Namgook Cho and C.-C. Jay Kuo, "Fundamental Frequency Estimation For Music Signals with Modified Hilbert–Huang transform" IEEE International Conference on Multimedia and Expo, 2009.

## Related Research Articles Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. In mathematics convolution is a mathematical operation on two functions to produce a third function that expresses how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. Convolution is similar to cross-correlation. For real-valued functions, of a continuous or discrete variable, it differs from cross-correlation only in that either f (x) or g(x) is reflected about the y-axis; thus it is a cross-correlation of f (x) and g(−x), or f (−x) and g(x). For continuous functions, the cross-correlation operator is the adjoint of the convolution operator. In mathematics, the Dirac delta function is a generalized function or distribution introduced by the physicist Paul Dirac. It is used to model the density of an idealized point mass or point charge as a function equal to zero everywhere except for zero and whose integral over the entire real line is equal to one. As there is no function that has these properties, the computations made by the theoretical physicists appeared to mathematicians as nonsense until the introduction of distributions by Laurent Schwartz to formalize and validate the computations. As a distribution, the Dirac delta function is a linear functional that maps every function to its value at zero. The Kronecker delta function, which is usually defined on a discrete domain and takes values 0 and 1, is a discrete analog of the Dirac delta function. The power spectrum of a time series describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, or a spectrum of frequencies over a continuous range. The statistical average of a certain signal or sort of signal as analyzed in terms of its frequency content, is called its spectrum.

S transform as a time–frequency distribution was developed in 1994 for analyzing geophysics data. In this way, the S transform is a generalization of the short-time Fourier transform (STFT), extending the continuous wavelet transform and overcoming some of its disadvantages. For one, modulation sinusoids are fixed with respect to the time axis; this localizes the scalable Gaussian window dilations and translations in S transform. Moreover, the S transform doesn't have a cross-term problem and yields a better signal clarity than Gabor transform. However, the S transform has its own disadvantages: the clarity is worse than Wigner distribution function and Cohen's class distribution function.

In pulsed radar and sonar signal processing, an ambiguity function is a two-dimensional function of time delay and Doppler frequency showing the distortion of a returned pulse due to the receiver matched filter due to the Doppler shift of the return from a moving target. The ambiguity function is determined by the properties of the pulse and the matched filter, and not any particular target scenario. Many definitions of the ambiguity function exist; Some are restricted to narrowband signals and others are suitable to describe the propagation delay and Doppler relationship of wideband signals. Often the definition of the ambiguity function is given as the magnitude squared of other definitions (Weiss). For a given complex baseband pulse , the narrowband ambiguity function is given by

Linear time-invariant theory, commonly known as LTI system theory, comes from applied mathematics and has direct applications in NMR spectroscopy, seismology, circuits, signal processing, control theory, and other technical areas. It investigates the response of a linear and time-invariant system to an arbitrary input signal. Trajectories of these systems are commonly measured and tracked as they move through time, but in applications like image processing and field theory, the LTI systems also have trajectories in spatial dimensions. Thus, these systems are also called linear translation-invariant to give the theory the most general reach. In the case of generic discrete-time systems, linear shift-invariant is the corresponding term. A good example of LTI systems are electrical circuits that can be made up of resistors, capacitors, and inductors. In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.

In many-body theory, the term Green's function is sometimes used interchangeably with correlation function, but refers specifically to correlators of field operators or creation and annihilation operators.

The method of reassignment is a technique for sharpening a time-frequency representation by mapping the data to time-frequency coordinates that are nearer to the true region of support of the analyzed signal. The method has been independently introduced by several parties under various names, including method of reassignment, remapping, time-frequency reassignment, and modified moving-window method. In the case of the spectrogram or the short-time Fourier transform, the method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay. This mapping to reassigned time-frequency coordinates is very precise for signals that are separable in time and frequency with respect to the analysis window.

In statistical signal processing, the goal of spectral density estimation (SDE) is to estimate the spectral density of a random signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the frequency content of the signal. One purpose of estimating the spectral density is to detect any periodicities in the data, by observing peaks at the frequencies corresponding to these periodicities.

A Modified Wigner distribution function is a variation of the Wigner distribution function (WD) with reduced or removed cross-terms.

The Gabor transform, named after Dennis Gabor, and the Wigner distribution function, named after Eugene Wigner, are both tools for time-frequency analysis. Since the Gabor transform does not have high clarity, and the Wigner distribution function has a "cross term problem", a 2007 study by S. C. Pei and J. J. Ding proposed a new combination of the two transforms that has high clarity and no cross term problem. Since the cross term does not appear in the Gabor transform, the time frequency distribution of the Gabor transform can be used as a filter to filter out the cross term in the output of the Wigner distribution function.

Bilinear time–frequency distributions, or quadratic time–frequency distributions, arise in a sub-field of signal analysis and signal processing called time–frequency signal processing, and, in the statistical analysis of time series data. Such methods are used where one needs to deal with a situation where the frequency composition of a signal may be changing over time; this sub-field used to be called time–frequency signal analysis, and is now more often called time–frequency signal processing due to the progress in using these methods to a wide range of signal-processing problems.

Choi–Williams distribution function is one of the members of Cohen's class distribution function. It was first proposed by Hyung-Ill Choi and William J. Williams in 1989. This distribution function adopts exponential kernel to suppress the cross-term. However, the kernel gain does not decrease along the axes in the ambiguity domain. Consequently, the kernel function of Choi–Williams distribution function can only filter out the cross-terms that result from the components that differ in both time and frequency center.

In order to view a signal represented over both time and frequency axis, time–frequency representation is used. Spectrogram is one of the most popular time-frequency representation, and generalized spectrogram, also called "two-window spectrogram", is the generalized application of spectrogram. In mathematics, a rectangular mask short-time Fourier transform has the simple form of short-time Fourier transform. Other types of the STFT may require more computation time than the rec-STFT. Define its mask function