# Short-time Fourier transform

Last updated

The short-term Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.  In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time.

## Forward STFT

### Continuous-time STFT

Simply, in the continuous-time case, the function to be transformed is multiplied by a window function which is nonzero for only a short period of time. The Fourier transform (a one-dimensional function) of the resulting signal is taken as the window is slid along the time axis, resulting in a two-dimensional representation of the signal. Mathematically, this is written as:

$\mathbf {STFT} \{x(t)\}(\tau ,\omega )\equiv X(\tau ,\omega )=\int _{-\infty }^{\infty }x(t)w(t-\tau )e^{-i\omega t}\,dt$ where $w(\tau )$ is the window function, commonly a Hann window or Gaussian window centered around zero, and $x(t)$ is the signal to be transformed (note the difference between the window function $w$ and the frequency $\omega$ ). $X(\tau ,\omega )$ is essentially the Fourier Transform of $x(t)w(t-\tau )$ , a complex function representing the phase and magnitude of the signal over time and frequency. Often phase unwrapping is employed along either or both the time axis, $\tau$ , and frequency axis, $\omega$ , to suppress any jump discontinuity of the phase result of the STFT. The time index $\tau$ is normally considered to be "slow" time and usually not expressed in as high resolution as time $t$ .

### Discrete-time STFT

In the discrete time case, the data to be transformed could be broken up into chunks or frames (which usually overlap each other, to reduce artifacts at the boundary). Each chunk is Fourier transformed, and the complex result is added to a matrix, which records magnitude and phase for each point in time and frequency. This can be expressed as:

$\mathbf {STFT} \{x[n]\}(m,\omega )\equiv X(m,\omega )=\sum _{n=-\infty }^{\infty }x[n]w[n-m]e^{-j\omega n}$ likewise, with signal x[n] and window w[n]. In this case, m is discrete and ω is continuous, but in most typical applications the STFT is performed on a computer using the Fast Fourier Transform, so both variables are discrete and quantized.

The magnitude squared of the STFT yields the spectrogram representation of the Power Spectral Density of the function:

$\operatorname {spectrogram} \{x(t)\}(\tau ,\omega )\equiv |X(\tau ,\omega )|^{2}$ See also the modified discrete cosine transform (MDCT), which is also a Fourier-related transform that uses overlapping windows.

#### Sliding DFT

If only a small number of ω are desired, or if the STFT is desired to be evaluated for every shift m of the window, then the STFT may be more efficiently evaluated using a sliding DFT algorithm. 

## Inverse STFT

The STFT is invertible, that is, the original signal can be recovered from the transform by the Inverse STFT. The most widely accepted way of inverting the STFT is by using the overlap-add (OLA) method, which also allows for modifications to the STFT complex spectrum. This makes for a versatile signal processing method,  referred to as the overlap and add with modifications method.

### Continuous-time STFT

Given the width and definition of the window function w(t), we initially require the area of the window function to be scaled so that

$\int _{-\infty }^{\infty }w(\tau )\,d\tau =1.$ It easily follows that

$\int _{-\infty }^{\infty }w(t-\tau )\,d\tau =1\quad \forall \ t$ and

$x(t)=x(t)\int _{-\infty }^{\infty }w(t-\tau )\,d\tau =\int _{-\infty }^{\infty }x(t)w(t-\tau )\,d\tau .$ The continuous Fourier Transform is

$X(\omega )=\int _{-\infty }^{\infty }x(t)e^{-j\omega t}\,dt.$ Substituting x(t) from above:

$X(\omega )=\int _{-\infty }^{\infty }\left[\int _{-\infty }^{\infty }x(t)w(t-\tau )\,d\tau \right]\,e^{-j\omega t}\,dt$ $=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }x(t)w(t-\tau )\,e^{-j\omega t}\,d\tau \,dt.$ Swapping order of integration:

$X(\omega )=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }x(t)w(t-\tau )\,e^{-j\omega t}\,dt\,d\tau$ $=\int _{-\infty }^{\infty }\left[\int _{-\infty }^{\infty }x(t)w(t-\tau )\,e^{-j\omega t}\,dt\right]\,d\tau$ $=\int _{-\infty }^{\infty }X(\tau ,\omega )\,d\tau .$ So the Fourier Transform can be seen as a sort of phase coherent sum of all of the STFTs of x(t). Since the inverse Fourier transform is

$x(t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }X(\omega )e^{+j\omega t}\,d\omega ,$ then x(t) can be recovered from X(τ,ω) as

$x(t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }X(\tau ,\omega )e^{+j\omega t}\,d\tau \,d\omega .$ or

$x(t)=\int _{-\infty }^{\infty }\left[{\frac {1}{2\pi }}\int _{-\infty }^{\infty }X(\tau ,\omega )e^{+j\omega t}\,d\omega \right]\,d\tau .$ It can be seen, comparing to above that windowed "grain" or "wavelet" of x(t) is

$x(t)w(t-\tau )={\frac {1}{2\pi }}\int _{-\infty }^{\infty }X(\tau ,\omega )e^{+j\omega t}\,d\omega .$ the inverse Fourier transform of X(τ,ω) for τ fixed.

## Resolution issues

One of the pitfalls of the STFT is that it has a fixed resolution. The width of the windowing function relates to how the signal is represented—it determines whether there is good frequency resolution (frequency components close together can be separated) or good time resolution (the time at which frequencies change). A wide window gives better frequency resolution but poor time resolution. A narrower window gives good time resolution but poor frequency resolution. These are called narrowband and wideband transforms, respectively.

This is one of the reasons for the creation of the wavelet transform and multiresolution analysis, which can give good time resolution for high-frequency events and good frequency resolution for low-frequency events, the combination best suited for many real signals.

This property is related to the Heisenberg uncertainty principle, but not directly – see Gabor limit for discussion. The product of the standard deviation in time and frequency is limited. The boundary of the uncertainty principle (best simultaneous resolution of both) is reached with a Gaussian window function, as the Gaussian minimizes the Fourier uncertainty principle. This is called the Gabor transform (and with modifications for multiresolution becomes the Morlet wavelet transform).

One can consider the STFT for varying window size as a two-dimensional domain (time and frequency), as illustrated in the example below, which can be calculated by varying the window size. However, this is no longer a strictly time–frequency representation – the kernel is not constant over the entire signal.

### Example

Using the following sample signal $x(t)$ that is composed of a set of four sinusoidal waveforms joined together in sequence. Each waveform is only composed of one of four frequencies (10, 25, 50, 100 Hz). The definition of $x(t)$ is:

$x(t)={\begin{cases}\cos(2\pi 10t)&0\,\mathrm {s} \leq t<5\,\mathrm {s} \\\cos(2\pi 25t)&5\,\mathrm {s} \leq t<10\,\mathrm {s} \\\cos(2\pi 50t)&10\,\mathrm {s} \leq t<15\,\mathrm {s} \\\cos(2\pi 100t)&15\,\mathrm {s} \leq t<20\,\mathrm {s} \\\end{cases}}$ Then it is sampled at 400 Hz. The following spectrograms were produced:

The 25 ms window allows us to identify a precise time at which the signals change but the precise frequencies are difficult to identify. At the other end of the scale, the 1000 ms window allows the frequencies to be precisely seen but the time between frequency changes is blurred.

### Explanation

It can also be explained with reference to the sampling and Nyquist frequency.

Take a window of N samples from an arbitrary real-valued signal at sampling rate fs . Taking the Fourier transform produces N complex coefficients. Of these coefficients only half are useful (the last N/2 being the complex conjugate of the first N/2 in reverse order, as this is a real valued signal).

These N/2 coefficients represent the frequencies 0 to fs/2 (Nyquist) and two consecutive coefficients are spaced apart by fs/N Hz.

To increase the frequency resolution of the window the frequency spacing of the coefficients needs to be reduced. There are only two variables, but decreasing fs (and keeping N constant) will cause the window size to increase — since there are now fewer samples per unit time. The other alternative is to increase N, but this again causes the window size to increase. So any attempt to increase the frequency resolution causes a larger window size and therefore a reduction in time resolution—and vice versa.

## Rayleigh frequency

As the Nyquist frequency is a limitation in the maximum frequency that can be meaningfully analysed, so is the Rayleigh frequency a limitation on the minimum frequency.

The Rayleigh frequency is the minimum frequency that can be resolved by a finite duration time window.  

Given a time window that is Τ seconds long, the minimum frequency that can be resolved is 1/Τ Hz.

The Rayleigh frequency is an important consideration in applications of the short-time Fourier transform (STFT), as well as any other method of harmonic analysis on a signal of finite record-length.  

## Application

STFTs as well as standard Fourier transforms and other tools are frequently used to analyze music. The spectrogram can, for example, show frequency on the horizontal axis, with the lowest frequencies at left, and the highest at the right. The height of each bar (augmented by color) represents the amplitude of the frequencies within that band. The depth dimension represents time, where each new bar was a separate distinct transform. Audio engineers use this kind of visual to gain information about an audio sample, for example, to locate the frequencies of specific noises (especially when used with greater frequency resolution) or to find frequencies which may be more or less resonant in the space where the signal was recorded. This information can be used for equalization or tuning other audio effects.

## Implementation

Original function

$X(t,f)=\int _{-\infty }^{\infty }w(t-\tau )x(\tau )e^{-j2\pi f\tau }d\tau$ Converting into the discrete form:

$t=n\Delta _{t},f=m\Delta _{f},\tau =p\Delta _{t}$ $X(n\Delta _{t},m\Delta _{f})=\sum _{-\infty }^{\infty }w((n-p)\Delta _{t})x(p\Delta _{t})e^{-j2\pi pm\Delta _{t}\Delta _{f}}\Delta _{t}$ Suppose that

$w(t)\cong 0{\text{ for }}|t|>B,{\frac {B}{\Delta _{t}}}=Q$ Then we can write the original function into

$X(n\Delta _{t},m\Delta _{f})=\sum _{p=n-Q}^{n+Q}w((n-p)\Delta _{t})x(p\Delta _{t})e^{-j2\pi pm\Delta _{t}\Delta _{f}}\Delta _{t}$ ### Direct implementation

#### Constraints

a. Nyquist criterion (Avoiding the aliasing effect):

$\Delta _{t}<{\frac {1}{2\Omega }}$ , where $\Omega$ is the bandwidth of $x(\tau )w(t-\tau )$ ### FFT-based method

#### Constraint

a. $\Delta _{t}\Delta _{f}={\tfrac {1}{N}}$ , where $N$ is an integer

b. $N\geq 2Q+1$ c. Nyquist criterion (Avoiding the aliasing effect):

$\Delta _{t}<{\frac {1}{2\Omega }}$ , $\Omega$ is the bandwidth of $x(\tau )w(t-\tau )$ $X(n\Delta _{t},m\Delta _{f})=\sum _{p=n-Q}^{n+Q}w((n-p)\Delta _{t})x(p\Delta _{t})e^{-{\frac {2\pi jpm}{N}}}\Delta _{t}$ ${\text{if we have }}q=p-(n-Q),{\text{ then }}p=(n-Q)+q$ $X(n\Delta _{t},m\Delta _{f})=\Delta _{t}e^{\frac {2\pi j(Q-n)m}{N}}\sum _{q=0}^{N-1}x_{1}(q)e^{-{\frac {2\pi jqm}{N}}}$ ${\text{where }}x_{1}(q)={\begin{cases}w((Q-q)\Delta _{t})x((n-Q+q)\Delta _{t})&0\leq q\leq 2Q\\0&2Q ### Recursive method

#### Constraint

a. $\Delta _{t}\Delta _{f}={\tfrac {1}{N}}$ , where $N$ is an integer

b. $N\geq 2Q+1$ c. Nyquist criterion (Avoiding the aliasing effect):

$\Delta _{t}<{\frac {1}{2\Omega }}$ , $\Omega$ is the bandwidth of $x(\tau )w(t-\tau )$ d. Only for implementing the rectangular-STFT

Rectangular window imposes the constraint

$w((n-p)\Delta _{t})=1$ Substitution gives:

{\begin{aligned}X(n\Delta _{t},m\Delta _{f})&=\sum _{p=n-Q}^{n+Q}w((n-p)\Delta _{t})&x(p\Delta _{t})e^{-{\frac {j2\pi pm}{N}}}\Delta _{t}\\&=\sum _{p=n-Q}^{n+Q}&x(p\Delta _{t})e^{-{\frac {j2\pi pm}{N}}}\Delta _{t}\\\end{aligned}} Change of variable n-1 for n:

$X((n-1)\Delta _{t},m\Delta _{f})=\sum _{p=n-1-Q}^{n-1+Q}x(p\Delta _{t})e^{-{\frac {j2\pi pm}{N}}}\Delta _{t}$ Calculate $X(\min {n}\Delta _{t},m\Delta _{f})$ by the N-point FFT:

$X(n_{0}\Delta _{t},m\Delta _{f})=\Delta _{t}e^{\frac {j2\pi (Q-n_{0})m}{N}}\sum _{q=0}^{N-1}x_{1}(q)e^{-j{\frac {2\pi qm}{N}}},\qquad n_{0}=\min {(n)}$ where

$x_{1}(q)={\begin{cases}x((n-Q+q)\Delta _{t})&q\leq 2Q\\0&q>2Q\end{cases}}$ Applying the recursive formula to calculate $X(n\Delta _{t},m\Delta _{f})$ $X(n\Delta _{t},m\Delta _{f})=X((n-1)\Delta _{t},m\Delta _{f})-x((n-Q-1)\Delta _{t})e^{-{\frac {j2\pi (n-Q-1)m}{N}}}\Delta _{t}+x((n+Q)\Delta _{t})e^{-{\frac {j2\pi (n+Q)m}{N}}}\Delta _{t}$ ### Chirp Z transform

#### Constraint

$\exp {(-j2\pi pm\Delta _{t}\Delta _{f})}=\exp {(-j\pi p^{2}\Delta _{t}\Delta _{f})}\cdot \exp {(j\pi (p-m)^{2}\Delta _{t}\Delta _{f})}\cdot \exp {(-j\pi m^{2}\Delta _{t}\Delta _{f})}$ so

$X(n\Delta _{t},m\Delta _{f})=\Delta _{t}\sum _{p=n-Q}^{n+Q}w((n-p)\Delta _{t})x(p\Delta _{t})e^{-j2\pi pm\Delta _{t}\Delta _{f}}$ $X(n\Delta _{t},m\Delta _{f})=\Delta _{t}e^{-j2\pi m^{2}\Delta _{t}\Delta _{f}}\sum _{p=n-Q}^{n+Q}w((n-p)\Delta _{t})x(p\Delta _{t})e^{-j\pi p^{2}\Delta _{t}\Delta _{f}}e^{j\pi (p-m)^{2}\Delta _{t}\Delta _{f}}$ ### Implementation comparison

MethodComplexity
Direct implementation$O(TFQ)$ FFT-based$O(TN\log _{2}N)$ Recursive$O(TF)$ Chirp Z transform$O(TN\log _{2}N)$ Other time-frequency transforms:

## Related Research Articles The power spectrum of a time series describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, or a spectrum of frequencies over a continuous range. The statistical average of a certain signal or sort of signal as analyzed in terms of its frequency content, is called its spectrum. In mathematics, Weierstrass's elliptic functions are elliptic functions that take a particularly simple form; they are named for Karl Weierstrass. This class of functions are also referred to as p-functions and generally written using the symbol ℘. The ℘ functions constitute branched double coverings of the Riemann sphere by the torus, ramified at four points. They can be used to parametrize elliptic curves over the complex numbers, thus establishing an equivalence to complex tori. Genus one solutions of differential equations can be written in terms of Weierstrass elliptic functions. Notably, the simplest periodic solutions of the Korteweg–de Vries equation are often written in terms of Weierstrass p-functions.

In mathematics and in signal processing, the Hilbert transform is a specific linear operator that takes a function, u(t) of a real variable and produces another function of a real variable H(u)(t). This linear operator is given by convolution with the function :

Stransform as a time–frequency distribution was developed in 1994 for analyzing geophysics data. In this way, the S transform is a generalization of the short-time Fourier transform (STFT), extending the continuous wavelet transform and overcoming some of its disadvantages. For one, modulation sinusoids are fixed with respect to the time axis; this localizes the scalable Gaussian window dilations and translations in S transform. Moreover, the S transform doesn't have a cross-term problem and yields a better signal clarity than Gabor transform. However, the S transform has its own disadvantages: the clarity is worse than Wigner distribution function and Cohen's class distribution function.

In mathematics, the discrete-time Fourier transform (DTFT) is a form of Fourier analysis that is applicable to a sequence of values.

Linear time-invariant theory, commonly known as LTI system theory, investigates the response of a linear and time-invariant system to an arbitrary input signal. Trajectories of these systems are commonly measured and tracked as they move through time, but in applications like image processing and field theory, the LTI systems also have trajectories in spatial dimensions. Thus, these systems are also called linear translation-invariant to give the theory the most general reach. In the case of generic discrete-time systems, linear shift-invariant is the corresponding term. A good example of LTI systems are electrical circuits that can be made up of resistors, capacitors, and inductors.. It has been used in applied mathematics and has direct applications in NMR spectroscopy, seismology, circuits, signal processing, control theory, and other technical areas.

The Havriliak–Negami relaxation is an empirical modification of the Debye relaxation model in electromagnetism. Unlike the Debye model, the Havriliak–Negami relaxation accounts for the asymmetry and broadness of the dielectric dispersion curve. The model was first used to describe the dielectric relaxation of some polymers, by adding two exponential parameters to the Debye equation: In signal processing, a causal filter is a linear and time-invariant causal system. The word causal indicates that the filter output depends only on past and present inputs. A filter whose output also depends on future inputs is non-causal, whereas a filter whose output depends only on future inputs is anti-causal. Systems that are realizable must be causal because such systems cannot act on a future input. In effect that means the output sample that best represents the input at time comes out slightly later. A common design practice for digital filters is to create a realizable filter by shortening and/or time-shifting a non-causal impulse response. If shortening is necessary, it is often accomplished as the product of the impulse-response with a window function. In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform. The Wigner distribution function (WDF) is used in signal processing as a transform in time-frequency analysis. The Gabor transform, named after Dennis Gabor, is a special case of the short-time Fourier transform. It is used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. The function to be transformed is first multiplied by a Gaussian function, which can be regarded as a window function, and the resulting function is then transformed with a Fourier transform to derive the time-frequency analysis. The window function means that the signal near the time being analyzed will have higher weight. The Gabor transform of a signal x(t) is defined by this formula:

In many-body theory, the term Green's function is sometimes used interchangeably with correlation function, but refers specifically to correlators of field operators or creation and annihilation operators.

A Modified Wigner distribution function is a variation of the Wigner distribution function (WD) with reduced or removed cross-terms.

Bilinear time–frequency distributions, or quadratic time–frequency distributions, arise in a sub-field of signal analysis and signal processing called time–frequency signal processing, and, in the statistical analysis of time series data. Such methods are used where one needs to deal with a situation where the frequency composition of a signal may be changing over time; this sub-field used to be called time–frequency signal analysis, and is now more often called time–frequency signal processing due to the progress in using these methods to a wide range of signal-processing problems.

Time-domain thermoreflectance (TDTR) is a method by which the thermal properties of a material can be measured, most importantly thermal conductivity. This method can be applied most notably to thin film materials, which have properties that vary greatly when compared to the same materials in bulk. The idea behind this technique is that once a material is heated up, the change in the reflectance of the surface can be utilized to derive the thermal properties. The reflectivity is measured with respect to time, and the data received can be matched to a model with coefficients that correspond to thermal properties.

Time–frequency analysis for music signals is one of the applications of time–frequency analysis. Musical sound can be more complicated than human vocal sound, occupying a wider band of frequency. Music signals are time-varying signals; while the classic Fourier transform is not sufficient to analyze them, time–frequency analysis is an efficient tool for such use. Time–frequency analysis is extended from the classic Fourier approach. Short-time Fourier transform (STFT), Gabor transform (GT) and Wigner distribution function (WDF) are famous time–frequency methods, useful for analyzing music signals such as notes played on a piano, a flute or a guitar.

The spectrum of a chirp pulse describes its characteristics in terms of its frequency components. This frequency-domain representation is an alternative to the more familiar time-domain waveform, and the two versions are mathematically related by the Fourier transform.
The spectrum is of particular interest when pulses are subject to signal processing. For example, when a chirp pulse is compressed by its matched filter, the resulting waveform contains not only a main narrow pulse but, also, a variety of unwanted artifacts many of which are directly attributable to features in the chirp's spectral characteristics.
The simplest way to derive the spectrum of a chirp, now that computers are widely available, is to sample the time-domain waveform at a frequency well above the Nyquist limit and call up an FFT algorithm to obtain the desired result. As this approach was not an option for the early designers, they resorted to analytic analysis, where possible, or to graphical or approximation methods, otherwise. These early methods still remain helpful, however, as they give additional insight into the behavior and properties of chirps. In mathematics, a rectangular mask short-time Fourier transform has the simple form of short-time Fourier transform. Other types of the STFT may require more computation time than the rec-STFT. Define its mask function

Beamforming is a signal processing technique used to spatially select propagating waves. In order to implement beamforming on digital hardware the received signals need to be discretized. This introduces quantization error, perturbing the array pattern. For this reason, the sample rate must be generally much greater than the Nyquist rate.

Multidimensional seismic data processing forms a major component of seismic profiling, a technique used in geophysical exploration. The technique itself has various applications, including mapping ocean floors, determining the structure of sediments, mapping subsurface currents and hydrocarbon exploration. Since geophysical data obtained in such techniques is a function of both space and time, multidimensional signal processing techniques may be better suited for processing such data.

1. Sejdić E.; Djurović I.; Jiang J. (2009). "Time-frequency feature representation using energy concentration: An overview of recent advances". Digital Signal Processing. 19 (1): 153–183. doi:10.1016/j.dsp.2007.12.004.
2. E. Jacobsen and R. Lyons, The sliding DFT, Signal Processing Magazine vol. 20, issue 2, pp. 74–80 (March 2003).
3. Jont B. Allen (June 1977). "Short Time Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform". IEEE Transactions on Acoustics, Speech, and Signal Processing. ASSP-25 (3): 235–238.
4. Zeitler M, Fries P, Gielen S (2008). "Biased competition through variations in amplitude of gamma-oscillations". J Comput Neurosci. 25 (1): 89–107. doi:10.1007/s10827-007-0066-2. PMC  . PMID   18293071.
5. Wingerden, Marijn van; Vinck, Martin; Lankelma, Jan; Pennartz, Cyriel M. A. (2010-05-19). "Theta-Band Phase Locking of Orbitofrontal Neurons during Reward Expectancy". Journal of Neuroscience. 30 (20): 7078–7087. doi:10.1523/JNEUROSCI.3860-09.2010. ISSN   0270-6474. PMID   20484650.