In mathematics and signal processing, the constant-Q transform and variable-Q transform, simply known as CQT and VQT, transforms a data series to the frequency domain. It is related to the Fourier transform [1] and very closely related to the complex Morlet wavelet transform. [2] Its design is suited for musical representation.
The transform can be thought of as a series of filters fk, logarithmically spaced in frequency, with the k-th filter having a spectral width δfk equal to a multiple of the previous filter's width:
where δfk is the bandwidth of the k-th filter, fmin is the central frequency of the lowest filter, and n is the number of filters per octave.
The short-time Fourier transform of x[n] for a frame shifted to sample m is calculated as follows:
Given a data series at sampling frequency fs = 1/T, T being the sampling period of our data, for each frequency bin we can define the following:
The equivalent transform kernel can be found by using the following substitutions:
After these modifications, we are left with
This section needs additional citations for verification .(August 2022) |
The variable-Q transform is the same as constant-Q transform, but the only difference is the filter Q is variable, hence the name variable-Q transform. The variable-Q transform is useful where time resolution on low frequencies is important[ examples needed ]. There are ways to calculate the bandwidth of the VQT, one of them using equivalent rectangular bandwidth as a value for VQT bin's bandwidth. [3]
The simplest way to implement a variable-Q transform is add a bandwidth offset called γ like this one:[ citation needed ]
This formula can be modified to have extra parameters to adjust sharpness of the transition between constant-Q and constant-bandwidth like this:[ citation needed ]
with α as a parameter for transition sharpness and where α of 2 is equals to hyperbolic sine frequency scale, in terms of frequency resolution.
The direct calculation of the constant-Q transform (either using naive discrete Fourier transform or slightly faster Goertzel algorithm) is slow when compared against the fast Fourier transform. However, the fast Fourier transform can itself be employed, in conjunction with the use of a kernel, to perform the equivalent calculation but much faster. [4] An approximate inverse to such an implementation was proposed in 2006; it works by going back to the discrete Fourier transform, and is only suitable for pitch instruments. [5]
A development on this method with improved invertibility involves performing CQT (via fast Fourier transform) octave-by-octave, using lowpass filtered and downsampled results for consecutively lower pitches. [6] Implementations of this method include the MATLAB implementation and LibROSA's Python implementation. [7] LibROSA combines the subsampled method with the direct fast Fourier transform method (which it dubs "pseudo-CQT") by having the latter process higher frequencies as a whole. [7]
The sliding discrete Fourier transform can be used for faster calculation of constant-Q transform, since the sliding discrete Fourier transform does not have to be linear-frequency spacing and same window size per bin. [8]
Alternatively, the constant-Q transform can be approximated by using multiple fast Fourier transforms of different window sizes and/or sampling rate at different frequency ranges then stitch it together. This is called multiresolution short-time Fourier transform, however the window sizes for multiresolution fast Fourier transforms are different per-octave, rather than per-bin. [9] [ ambiguous ]
This section needs expansionwith: nonstationary Gabor frames as described here (and as used by Matlab). You can help by adding to it. (December 2018) |
In general, the transform is well suited to musical data, and this can be seen in some of its advantages compared to the fast Fourier transform. As the output of the transform is effectively amplitude/phase against log frequency, fewer frequency bins are required to cover a given range effectively, and this proves useful where frequencies span several octaves. As the range of human hearing covers approximately ten octaves from 20 Hz to around 20 kHz, this reduction in output data is significant.
The transform exhibits a reduction in frequency resolution with higher frequency bins, which is desirable for auditory applications. The transform mirrors the human auditory system, whereby at lower-frequencies spectral resolution is better, whereas temporal resolution improves at higher frequencies. At the bottom of the piano scale (about 30 Hz), a difference of 1 semitone is a difference of approximately 1.5 Hz, whereas at the top of the musical scale (about 5 kHz), a difference of 1 semitone is a difference of approximately 200 Hz. [10] So for musical data the exponential frequency resolution of constant-Q transform is ideal.
In addition, the harmonics of musical notes form a pattern characteristic of the timbre of the instrument in this transform. Assuming the same relative strengths of each harmonic, as the fundamental frequency changes, the relative position of these harmonics remains constant. This can make identification of instruments much easier. The constant Q transform can also be used for automatic recognition of musical keys based on accumulated chroma content. [11]
Relative to the Fourier transform, implementation of this transform is more tricky. This is due to the varying number of samples used in the calculation of each frequency bin, which also affects the length of any windowing function implemented. [12]
Also note that because the frequency scale is logarithmic, there is no true zero-frequency / DC term present, which may be a drawback in applications that are interested in the DC term. Although for applications that are not interested in the DC such as audio, this is not a drawback.
In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex-valued function of frequency. The interval at which the DTFT is sampled is the reciprocal of the duration of the input sequence. An inverse DFT (IDFT) is a Fourier series, using the DTFT samples as coefficients of complex sinusoids at the corresponding DTFT frequencies. It has the same sample-values as the original input sequence. The DFT is therefore said to be a frequency domain representation of the original input sequence. If the original sequence spans all the non-zero values of a function, its DTFT is continuous, and the DFT provides discrete samples of one cycle. If the original sequence is one cycle of a periodic function, the DFT provides all the non-zero values of one DTFT cycle.
In mathematical analysis, the Dirac delta function, also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Since there is no function having this property, modelling the delta "function" rigorously involves the use of limits or, as is common in mathematics, measure theory and the theory of distributions.
A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing.
A low-pass filter is a filter that passes signals with a frequency lower than a selected cutoff frequency and attenuates signals with frequencies higher than the cutoff frequency. The exact frequency response of the filter depends on the filter design. The filter is sometimes called a high-cut filter, or treble-cut filter in audio applications. A low-pass filter is the complement of a high-pass filter.
A high-pass filter (HPF) is an electronic filter that passes signals with a frequency higher than a certain cutoff frequency and attenuates signals with frequencies lower than the cutoff frequency. The amount of attenuation for each frequency depends on the filter design. A high-pass filter is usually modeled as a linear time-invariant system. It is sometimes called a low-cut filter or bass-cut filter in the context of audio engineering. High-pass filters have many uses, such as blocking DC from circuitry sensitive to non-zero average voltages or radio frequency devices. They can also be used in conjunction with a low-pass filter to produce a band-pass filter.
In mathematics and signal processing, the Z-transform converts a discrete-time signal, which is a sequence of real or complex numbers, into a complex valued frequency-domain representation.
In mathematics, the Morlet wavelet is a wavelet composed of a complex exponential (carrier) multiplied by a Gaussian window (envelope). This wavelet is closely related to human perception, both hearing and vision.
In physics and engineering, the quality factor or Q factor is a dimensionless parameter that describes how underdamped an oscillator or resonator is. It is defined as the ratio of the initial energy stored in the resonator to the energy lost in one radian of the cycle of oscillation. Q factor is alternatively defined as the ratio of a resonator's centre frequency to its bandwidth when subject to an oscillating driving force. These two definitions give numerically similar, but not identical, results. Higher Q indicates a lower rate of energy loss and the oscillations die out more slowly. A pendulum suspended from a high-quality bearing, oscillating in air, has a high Q, while a pendulum immersed in oil has a low one. Resonators with high quality factors have low damping, so that they ring or vibrate longer.
In signal processing and statistics, a window function is a mathematical function that is zero-valued outside of some chosen interval. Typically, window functions are symmetric around the middle of the interval, approach a maximum in the middle, and taper away from the middle. Mathematically, when another function or waveform/data-sequence is "multiplied" by a window function, the product is also zero-valued outside the interval: all that is left is the part where they overlap, the "view through the window". Equivalently, and in actual practice, the segment of data within the window is first isolated, and then only that data is multiplied by the window function values. Thus, tapering, not segmentation, is the main purpose of window functions.
The short-time Fourier transform (STFT) is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time, known as a spectrogram or waterfall plot, such as commonly used in software defined radio (SDR) based spectrum displays. Full bandwidth displays covering the whole range of an SDR commonly use fast Fourier transforms (FFTs) with 2^24 points on desktop computers.
In mathematics, the Poisson summation formula is an equation that relates the Fourier series coefficients of the periodic summation of a function to values of the function's continuous Fourier transform. Consequently, the periodic summation of a function is completely defined by discrete samples of the original function's Fourier transform. And conversely, the periodic summation of a function's Fourier transform is completely defined by discrete samples of the original function. The Poisson summation formula was discovered by Siméon Denis Poisson and is sometimes called Poisson resummation.
Stransform as a time–frequency distribution was developed in 1994 for analyzing geophysics data. In this way, the S transform is a generalization of the short-time Fourier transform (STFT), extending the continuous wavelet transform and overcoming some of its disadvantages. For one, modulation sinusoids are fixed with respect to the time axis; this localizes the scalable Gaussian window dilations and translations in S transform. Moreover, the S transform doesn't have a cross-term problem and yields a better signal clarity than Gabor transform. However, the S transform has its own disadvantages: the clarity is worse than Wigner distribution function and Cohen's class distribution function.
In mathematics, in the area of harmonic analysis, the fractional Fourier transform (FRFT) is a family of linear transformations generalizing the Fourier transform. It can be thought of as the Fourier transform to the n-th power, where n need not be an integer — thus, it can transform a function to any intermediate domain between time and frequency. Its applications range from filter design and signal analysis to phase retrieval and pattern recognition.
In mathematics, a Dirac comb is a periodic function with the formula for some given period . Here t is a real variable and the sum extends over all integers k. The Dirac delta function and the Dirac comb are tempered distributions. The graph of the function resembles a comb, hence its name and the use of the comb-like Cyrillic letter sha (Ш) to denote the function.
In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.
In mathematics, Fourier–Bessel series is a particular kind of generalized Fourier series based on Bessel functions.
Time–frequency analysis for music signals is one of the applications of time–frequency analysis. Musical sound can be more complicated than human vocal sound, occupying a wider band of frequency. Music signals are time-varying signals; while the classic Fourier transform is not sufficient to analyze them, time–frequency analysis is an efficient tool for such use. Time–frequency analysis is extended from the classic Fourier approach. Short-time Fourier transform (STFT), Gabor transform (GT) and Wigner distribution function (WDF) are famous time–frequency methods, useful for analyzing music signals such as notes played on a piano, a flute or a guitar.
Fractional wavelet transform (FRWT) is a generalization of the classical wavelet transform (WT). This transform is proposed in order to rectify the limitations of the WT and the fractional Fourier transform (FRFT). The FRWT inherits the advantages of multiresolution analysis of the WT and has the capability of signal representations in the fractional domain which is similar to the FRFT.
In rotordynamics, order tracking is a family of signal processing tools aimed at transforming a measured signal from time domain to angular domain. These techniques are applied to asynchronously sampled signals to obtain the same signal sampled at constant angular increments of a reference shaft. In some cases the outcome of the Order Tracking is directly the Fourier transform of such angular domain signal, whose frequency counterpart is defined as "order". Each order represents a fraction of the angular velocity of the reference shaft.
Multidimensional seismic data processing forms a major component of seismic profiling, a technique used in geophysical exploration. The technique itself has various applications, including mapping ocean floors, determining the structure of sediments, mapping subsurface currents and hydrocarbon exploration. Since geophysical data obtained in such techniques is a function of both space and time, multidimensional signal processing techniques may be better suited for processing such data.