Phase vocoder

Last updated

A phase vocoder is a type of vocoder-purposed algorithm which can interpolate information present in the frequency and time domains of audio signals by using phase information extracted from a frequency transform. [1] The computer algorithm allows frequency-domain modifications to a digital sound file (typically time expansion/compression and pitch shifting).

Contents

At the heart of the phase vocoder is the short-time Fourier transform (STFT), typically coded using fast Fourier transforms. The STFT converts a time domain representation of sound into a time-frequency representation (the "analysis" phase), allowing modifications to the amplitudes or phases of specific frequency components of the sound, before resynthesis of the time-frequency domain representation into the time domain by the inverse STFT. The time evolution of the resynthesized sound can be changed by means of modifying the time position of the STFT frames prior to the resynthesis operation allowing for time-scale modification of the original sound file.

Phase coherence problem

The main problem that has to be solved for all cases of manipulation of the STFT is the fact that individual signal components (sinusoids, impulses) will be spread over multiple frames and multiple STFT frequency locations (bins). This is because the STFT analysis is done using overlapping analysis windows. The windowing results in spectral leakage such that the information of individual sinusoidal components is spread over adjacent STFT bins. To avoid border effects of tapering of the analysis windows, STFT analysis windows overlap in time. This time overlap results in the fact that adjacent STFT analyses are strongly correlated (a sinusoid present in analysis frame at time "t" will be present in the subsequent frames as well). The problem of signal transformation with the phase vocoder is related to the problem that all modifications that are done in the STFT representation need to preserve the appropriate correlation between adjacent frequency bins (vertical coherence) and time frames (horizontal coherence). Except in the case of extremely simple synthetic sounds, these appropriate correlations can be preserved only approximately, and since the invention of the phase vocoder research has been mainly concerned with finding algorithms that would preserve the vertical and horizontal coherence of the STFT representation after the modification. The phase coherence problem was investigated for quite a while before appropriate solutions emerged.

History

The phase vocoder was introduced in 1966 by Flanagan as an algorithm that would preserve horizontal coherence between the phases of bins that represent sinusoidal components. [2] This original phase vocoder did not take into account the vertical coherence between adjacent frequency bins, and therefore, time stretching with this system did produce sound signals that were missing clarity.

The optimal reconstruction of the sound signal from STFT after amplitude modifications has been proposed by Griffin and Lim in 1984. [3] This algorithm does not consider the problem of producing a coherent STFT, but it does allow finding the sound signal that has an STFT that is as close as possible to the modified STFT even if the modified STFT is not coherent (does not represent any signal).

The problem of the vertical coherence remained a major issue for the quality of time scaling operations until 1999 when Laroche and Dolson [4] proposed a means to preserve phase consistency across spectral bins. The proposition of Laroche and Dolson has to be seen as a turning point in phase vocoder history. It has been shown that by means of ensuring vertical phase consistency very high quality time scaling transformations can be obtained.

The algorithm proposed by Laroche did not allow preservation of vertical phase coherence for sound onsets (note onsets). A solution for this problem has been proposed by Roebel. [5]

An example of software implementation of phase vocoder based signal transformation using means similar to those described here to achieve high quality signal transformation is Ircam's SuperVP. [6] [ verification needed ]

Use in music

British composer Trevor Wishart used phase vocoder analyses and transformations of a human voice as the basis for his composition Vox 5 (part of his larger Vox Cycle). [7] Transfigured Wind by American composer Roger Reynolds uses the phase vocoder to perform time-stretching of flute sounds. [8] The music of JoAnn Kuchera-Morin makes some of the earliest and most extensive use of phase vocoder transformations, such as in Dreampaths (1989). [9]

See also

Related Research Articles

Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.

Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a sequence of numbers that represent samples of a continuous variable in a domain such as time, space, or frequency. In digital electronics, a digital signal is represented as a pulse train, which is typically generated by the switching of a transistor.

<span class="mw-page-title-main">Fourier analysis</span> Branch of mathematics

In mathematics, Fourier analysis is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Joseph Fourier, who showed that representing a function as a sum of trigonometric functions greatly simplifies the study of heat transfer.

Harmonic analysis is a branch of mathematics concerned with investigating the connections between a function and its representation in frequency. The frequency representation is found by using the Fourier transform for functions on the real line, or by Fourier series for periodic functions. Generalizing these transforms to other domains is generally called Fourier analysis, although the term is sometimes used interchangeably with harmonic analysis. Harmonic Analysis has become a vast subject with applications in areas as diverse as number theory, representation theory, signal processing, quantum mechanics, tidal analysis and neuroscience.

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed. Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

<span class="mw-page-title-main">Wavelet</span> Function for integral Fourier-like transform

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing.

Fourier may refer to:

<span class="mw-page-title-main">Spectrogram</span> Visual representation of the spectrum of frequencies of a signal as it varies with time

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represented in a 3D plot they may be called waterfall displays.

<span class="mw-page-title-main">Frequency domain</span> Signal representation

In mathematics, physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a signal changes over time, whereas a frequency-domain graph shows how the signal is distributed within different frequency bands over a range of frequencies. A frequency-domain representation consists of both the magnitude and the phase of a set of sinusoids at the frequency components of the signal. Although it is common to refer to the magnitude portion as the frequency response of a signal, the phase portion is required to uniquely define the signal.

<span class="mw-page-title-main">Short-time Fourier transform</span> Fourier-related transform suited to signals that change rather quickly in time

The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time, known as a spectrogram or waterfall plot, such as commonly used in software defined radio (SDR) based spectrum displays. Full bandwidth displays covering the whole range of an SDR commonly use fast Fourier transforms (FFTs) with 2^24 points on desktop computers.

A time–frequency representation (TFR) is a view of a signal represented over both time and frequency. Time–frequency analysis means analysis into the time–frequency domain provided by a TFR. This is achieved by using a formulation often called "Time–Frequency Distribution", abbreviated as TFD.

Stransform as a time–frequency distribution was developed in 1994 for analyzing geophysics data. In this way, the S transform is a generalization of the short-time Fourier transform (STFT), extending the continuous wavelet transform and overcoming some of its disadvantages. For one, modulation sinusoids are fixed with respect to the time axis; this localizes the scalable Gaussian window dilations and translations in S transform. Moreover, the S transform doesn't have a cross-term problem and yields a better signal clarity than Gabor transform. However, the S transform has its own disadvantages: the clarity is worse than Wigner distribution function and Cohen's class distribution function.

<span class="mw-page-title-main">Image scaling</span> Changing the resolution of a digital image

In computer graphics and digital imaging, imagescaling refers to the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement.

<span class="mw-page-title-main">Wavelet transform</span> Mathematical technique used in data compression and analysis

In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.

Geophysical survey is the systematic collection of geophysical data for spatial studies. Detection and analysis of the geophysical signals forms the core of Geophysical signal processing. The magnetic and gravitational fields emanating from the Earth's interior hold essential information concerning seismic activities and the internal structure. Hence, detection and analysis of the electric and Magnetic fields is very crucial. As the Electromagnetic and gravitational waves are multi-dimensional signals, all the 1-D transformation techniques can be extended for the analysis of these signals as well. Hence this article also discusses multi-dimensional signal processing techniques.

<span class="mw-page-title-main">Constant-Q transform</span> Short-time Fourier transform with variable resolution

In mathematics and signal processing, the constant-Q transform and variable-Q transform, simply known as CQT and VQT, transforms a data series to the frequency domain. It is related to the Fourier transform and very closely related to the complex Morlet wavelet transform. Its design is suited for musical representation.

The method of reassignment is a technique for sharpening a time-frequency representation by mapping the data to time-frequency coordinates that are nearer to the true region of support of the analyzed signal. The method has been independently introduced by several parties under various names, including method of reassignment, remapping, time-frequency reassignment, and modified moving-window method. In the case of the spectrogram or the short-time Fourier transform, the method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay. This mapping to reassigned time-frequency coordinates is very precise for signals that are separable in time and frequency with respect to the analysis window.

Phase retrieval is the process of algorithmically finding solutions to the phase problem. Given a complex signal , of amplitude , and phase :

<span class="mw-page-title-main">Least-squares spectral analysis</span> Periodicity computation method

Least-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum based on a least-squares fit of sinusoids to data samples, similar to Fourier analysis. Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in the long and gapped records; LSSA mitigates such problems. Unlike in Fourier analysis, data need not be equally spaced to use LSSA.

Spectroscopic optical coherence tomography (SOCT) is an optical imaging and sensing technique, which provides localized spectroscopic information of a sample based on the principles of optical coherence tomography (OCT) and low coherence interferometry. The general principles behind SOCT arise from the large optical bandwidths involved in OCT, where information on the spectral content of backscattered light can be obtained by detection and processing of the interferometric OCT signal. SOCT signal can be used to quantify depth-resolved spectra to retrieve the concentration of tissue chromophores (e.g., hemoglobin and bilirubin), characterize tissue light scattering, and/or used as a functional contrast enhancement for conventional OCT imaging.

References

  1. Sethares, William. "A Phase Vocoder in Matlab". sethares.engr.wisc.edu. Retrieved 6 December 2020.
  2. Flanagan J.L. and Golden, R. M. (1966). "Phase vocoder". Bell System Technical Journal. 45 (9): 1493–1509. doi:10.1002/j.1538-7305.1966.tb01706.x.
  3. Griffin D. and Lim J. (1984). "Signal Estimation from Modified Short-Time Fourier Transform". IEEE Transactions on Acoustics, Speech, and Signal Processing. 32 (2): 236–243. CiteSeerX   10.1.1.306.7858 . doi:10.1109/TASSP.1984.1164317.
  4. J. Laroche and M. Dolson (1999). "Improved Phase Vocoder Time-Scale Modification of Audio". IEEE Transactions on Speech and Audio Processing . 7 (3): 323–332. doi:10.1109/89.759041.
  5. Roebel A., "A new approach to transient processing in the phase vocoder", DAFx, 2003. pdf Archived 2004-06-17 at the Wayback Machine
  6. "SuperVP", Ircam.fr.
  7. Wishart, T. "The Composition of Vox 5". Computer Music Journal 12/4, 1988
  8. Serra, X. 'A System for Sound Analysis/Transformation/Synthesis based on Deterministic plus Stochastic Decomposition', p.12 (PhD Thesis 1989)
  9. Roads, Curtis (2004). Microsound, p.318. MIT Press. ISBN   9780262681544.