# Fast Fourier transform

Last updated

A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies. [1] This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse (mostly zero) factors. [2] As a result, it manages to reduce the complexity of computing the DFT from ${\displaystyle O\left(N^{2}\right)}$, which arises if one simply applies the definition of DFT, to ${\displaystyle O(N\log N)}$, where ${\displaystyle N}$ is the data size. The difference in speed can be enormous, especially for long data sets where N may be in the thousands or millions. In the presence of round-off error, many FFT algorithms are much more accurate than evaluating the DFT definition directly or indirectly. There are many different FFT algorithms based on a wide range of published theories, from simple complex-number arithmetic to group theory and number theory.

## Contents

Fast Fourier transforms are widely used for applications in engineering, music, science, and mathematics. The basic ideas were popularized in 1965, but some algorithms had been derived as early as 1805. [1] In 1994, Gilbert Strang described the FFT as "the most important numerical algorithm of our lifetime", [3] [4] and it was included in Top 10 Algorithms of 20th Century by the IEEE magazine Computing in Science & Engineering. [5]

The best-known FFT algorithms depend upon the factorization of N, but there are FFTs with O(N log N) complexity for all N, even for prime  N. Many FFT algorithms depend only on the fact that ${\displaystyle e^{-2\pi i/N}}$ is an N-th primitive root of unity, and thus can be applied to analogous transforms over any finite field, such as number-theoretic transforms. Since the inverse DFT is the same as the DFT, but with the opposite sign in the exponent and a 1/N factor, any FFT algorithm can easily be adapted for it.

## History

The development of fast algorithms for DFT can be traced to Carl Friedrich Gauss's unpublished work in 1805 when he needed it to interpolate the orbit of asteroids Pallas and Juno from sample observations. [6] [7] His method was very similar to the one published in 1965 by James Cooley and John Tukey, who are generally credited for the invention of the modern generic FFT algorithm. While Gauss's work predated even Joseph Fourier's results in 1822, he did not analyze the computation time and eventually used other methods to achieve his goal.

Between 1805 and 1965, some versions of FFT were published by other authors. Frank Yates in 1932 published his version called interaction algorithm, which provided efficient computation of Hadamard and Walsh transforms. [8] Yates' algorithm is still used in the field of statistical design and analysis of experiments. In 1942, G. C. Danielson and Cornelius Lanczos published their version to compute DFT for x-ray crystallography, a field where calculation of Fourier transforms presented a formidable bottleneck. [9] [10] While many methods in the past had focused on reducing the constant factor for ${\displaystyle O\left(N^{2}\right)}$ computation by taking advantage of "symmetries", Danielson and Lanczos realized that one could use the "periodicity" and apply a "doubling trick" to "double [N] with only slightly more than double the labor", though like Gauss they did not analyze that this led to ${\displaystyle O(N\log N)}$ scaling. [11]

James Cooley and John Tukey independently rediscovered these earlier algorithms [7] and published a more general FFT in 1965 that is applicable when N is composite and not necessarily a power of 2, as well as analyzing the ${\displaystyle O(N\log N)}$ scaling. [12] Tukey came up with the idea during a meeting of President Kennedy's Science Advisory Committee where a discussion topic involved detecting nuclear tests by the Soviet Union by setting up sensors to surround the country from outside. To analyze the output of these sensors, an FFT algorithm would be needed. In discussion with Tukey, Richard Garwin recognized the general applicability of the algorithm not just to national security problems, but also to a wide range of problems including one of immediate interest to him, determining the periodicities of the spin orientations in a 3-D crystal of Helium-3. [13] Garwin gave Tukey's idea to Cooley (both worked at IBM's Watson labs) for implementation. [14] Cooley and Tukey published the paper in a relatively short time of six months. [15] As Tukey did not work at IBM, the patentability of the idea was doubted and the algorithm went into the public domain, which, through the computing revolution of the next decade, made FFT one of the indispensable algorithms in digital signal processing.

## Definition

Let ${\displaystyle x_{0}}$, …, ${\displaystyle x_{N-1}}$ be complex numbers. The DFT is defined by the formula

${\displaystyle X_{k}=\sum _{n=0}^{N-1}x_{n}e^{-i2\pi kn/N}\qquad k=0,\ldots ,N-1,}$

where ${\displaystyle e^{i2\pi /N}}$ is a primitive Nth root of 1.

Evaluating this definition directly requires ${\displaystyle O\left(N^{2}\right)}$ operations: there are N outputs Xk, and each output requires a sum of N terms. An FFT is any method to compute the same results in ${\displaystyle O(N\log N)}$ operations. All known FFT algorithms require ${\displaystyle \Theta (N\log N)}$ operations, although there is no known proof that a lower complexity score is impossible. [16]

To illustrate the savings of an FFT, consider the count of complex multiplications and additions for N=4096 data points. Evaluating the DFT's sums directly involves N2 complex multiplications and N(N − 1) complex additions, of which ${\displaystyle O(N)}$ operations can be saved by eliminating trivial operations such as multiplications by 1, leaving about 30 million operations. In contrast, the radix-2 Cooley–Tukey algorithm, for N a power of 2, can compute the same result with only (N/2)log2(N) complex multiplications (again, ignoring simplifications of multiplications by 1 and similar) and N log2(N) complex additions, in total about 30,000 operations  a thousand times less than with direct evaluation. In practice, actual performance on modern computers is usually dominated by factors other than the speed of arithmetic operations and the analysis is a complicated subject (for example, see Frigo & Johnson, 2005), [17] but the overall improvement from ${\displaystyle O\left(N^{2}\right)}$ to ${\displaystyle O(N\log N)}$ remains.

## Algorithms

### Cooley–Tukey algorithm

By far the most commonly used FFT is the Cooley–Tukey algorithm. This is a divide-and-conquer algorithm that recursively breaks down a DFT of any composite size ${\displaystyle N=N_{1}N_{2}}$ into many smaller DFTs of sizes ${\displaystyle N_{1}}$ and ${\displaystyle N_{2}}$, along with ${\displaystyle O(N)}$ multiplications by complex roots of unity traditionally called twiddle factors (after Gentleman and Sande, 1966 [18] ).

This method (and the general idea of an FFT) was popularized by a publication of Cooley and Tukey in 1965, [12] but it was later discovered [1] that those two authors had independently re-invented an algorithm known to Carl Friedrich Gauss around 1805 [19] (and subsequently rediscovered several times in limited forms).

The best known use of the Cooley–Tukey algorithm is to divide the transform into two pieces of size N/2 at each step, and is therefore limited to power-of-two sizes, but any factorization can be used in general (as was known to both Gauss and Cooley/Tukey [1] ). These are called the radix-2 and mixed-radix cases, respectively (and other variants such as the split-radix FFT have their own names as well). Although the basic idea is recursive, most traditional implementations rearrange the algorithm to avoid explicit recursion. Also, because the Cooley–Tukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT, such as those described below.

### Other FFT algorithms

There are FFT algorithms other than Cooley–Tukey.

For N = N1N2 with coprime N1 and N2, one can use the prime-factor (Good–Thomas) algorithm (PFA), based on the Chinese remainder theorem, to factorize the DFT similarly to Cooley–Tukey but without the twiddle factors. The Rader–Brenner algorithm (1976) [20] is a Cooley–Tukey-like factorization but with purely imaginary twiddle factors, reducing multiplications at the cost of increased additions and reduced numerical stability; it was later superseded by the split-radix variant of Cooley–Tukey (which achieves the same multiplication count but with fewer additions and without sacrificing accuracy). Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the Bruun and QFT algorithms. (The Rader–Brenner [20] and QFT algorithms were proposed for power-of-two sizes, but it is possible that they could be adapted to general composite N. Bruun's algorithm applies to arbitrary even composite sizes.) Bruun's algorithm, in particular, is based on interpreting the FFT as a recursive factorization of the polynomial zN  1, here into real-coefficient polynomials of the form zM  1 and z2M + azM + 1.

Another polynomial viewpoint is exploited by the Winograd FFT algorithm, [21] [22] which factorizes zN  1 into cyclotomic polynomials—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimal-multiplication FFTs and is often used to find efficient algorithms for small factors. Indeed, Winograd showed that the DFT can be computed with only O(N) irrational multiplications, leading to a proven achievable lower bound on the number of multiplications for power-of-two sizes; unfortunately, this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors with hardware multipliers. In particular, Winograd also makes use of the PFA as well as an algorithm by Rader for FFTs of prime sizes.

Rader's algorithm, exploiting the existence of a generator for the multiplicative group modulo prime N, expresses a DFT of prime size N as a cyclic convolution of (composite) size N − 1, which can then be computed by a pair of ordinary FFTs via the convolution theorem (although Winograd uses other convolution methods). Another prime-size FFT is due to L. I. Bluestein, and is sometimes called the chirp-z algorithm; it also re-expresses a DFT as a convolution, but this time of the same size (which can be zero-padded to a power of two and evaluated by radix-2 Cooley–Tukey FFTs, for example), via the identity

${\displaystyle nk=-{\frac {(k-n)^{2}}{2}}+{\frac {n^{2}}{2}}+{\frac {k^{2}}{2}}.}$

Hexagonal fast Fourier transform (HFFT) aims at computing an efficient FFT for the hexagonally-sampled data by using a new addressing scheme for hexagonal grids, called Array Set Addressing (ASA).

## FFT algorithms specialized for real or symmetric data

In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetry

${\displaystyle X_{N-k}=X_{k}^{*}}$

and efficient FFT algorithms have been designed for this situation (see e.g. Sorensen, 1987). [23] [24] One approach consists of taking an ordinary algorithm (e.g. Cooley–Tukey) and removing the redundant parts of the computation, saving roughly a factor of two in time and memory. Alternatively, it is possible to express an even-length real-input DFT as a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real data), followed by O(N) post-processing operations.

It was once believed that real-input DFTs could be more efficiently computed by means of the discrete Hartley transform (DHT), but it was subsequently argued that a specialized real-input DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs. [23] Bruun's algorithm (above) is another method that was initially proposed to take advantage of real inputs, but it has not proved popular.

There are further FFT specializations for the cases of real data that have even/odd symmetry, in which case one can gain another factor of roughly two in time and memory and the DFT becomes the discrete cosine/sine transform(s) (DCT/DST). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with O(N) pre- and post-processing.

## Computational issues

### Bounds on complexity and operation counts

Unsolved problem in computer science:

What is the lower bound on the complexity of fast Fourier transform algorithms? Can they be faster than ${\displaystyle O(N\log N)}$?

A fundamental question of longstanding theoretical interest is to prove lower bounds on the complexity and exact operation counts of fast Fourier transforms, and many open problems remain. It is not rigorously proved whether DFTs truly require Ω(N log N) (i.e., order N log N or greater) operations, even for the simple case of power of two sizes, although no algorithms with lower complexity are known. In particular, the count of arithmetic operations is usually the focus of such questions, although actual performance on modern-day computers is determined by many other factors such as cache or CPU pipeline optimization.

Following work by Shmuel Winograd (1978), [21] a tight Θ(N) lower bound is known for the number of real multiplications required by an FFT. It can be shown that only ${\displaystyle 4N-2\log _{2}^{2}(N)-2\log _{2}(N)-4}$ irrational real multiplications are required to compute a DFT of power-of-two length ${\displaystyle N=2^{m}}$. Moreover, explicit algorithms that achieve this count are known (Heideman & Burrus, 1986; [25] Duhamel, 1990 [26] ). However, these algorithms require too many additions to be practical, at least on modern computers with hardware multipliers (Duhamel, 1990; [26] Frigo & Johnson, 2005). [17]

A tight lower bound is not known on the number of required additions, although lower bounds have been proved under some restrictive assumptions on the algorithms. In 1973, Morgenstern [27] proved an Ω(N log N) lower bound on the addition count for algorithms where the multiplicative constants have bounded magnitudes (which is true for most but not all FFT algorithms). Pan (1986) [28] proved an Ω(N log N) lower bound assuming a bound on a measure of the FFT algorithm's "asynchronicity", but the generality of this assumption is unclear. For the case of power-of-two N, Papadimitriou (1979) [29] argued that the number ${\displaystyle N\log _{2}N}$ of complex-number additions achieved by Cooley–Tukey algorithms is optimal under certain assumptions on the graph of the algorithm (his assumptions imply, among other things, that no additive identities in the roots of unity are exploited). (This argument would imply that at least ${\displaystyle 2N\log _{2}N}$ real additions are required, although this is not a tight bound because extra additions are required as part of complex-number multiplications.) Thus far, no published FFT algorithm has achieved fewer than ${\displaystyle N\log _{2}N}$ complex-number additions (or their equivalent) for power-of-two N.

A third problem is to minimize the total number of real multiplications and additions, sometimes called the "arithmetic complexity" (although in this context it is the exact count and not the asymptotic complexity that is being considered). Again, no tight lower bound has been proven. Since 1968, however, the lowest published count for power-of-two N was long achieved by the split-radix FFT algorithm, which requires ${\displaystyle 4N\log _{2}(N)-6N+8}$ real multiplications and additions for N > 1. This was recently reduced to ${\displaystyle \sim {\frac {34}{9}}N\log _{2}N}$ (Johnson and Frigo, 2007; [16] Lundy and Van Buskirk, 2007 [30] ). A slightly larger count (but still better than split radix for N ≥ 256) was shown to be provably optimal for N ≤ 512 under additional restrictions on the possible algorithms (split-radix-like flowgraphs with unit-modulus multiplicative factors), by reduction to a satisfiability modulo theories problem solvable by brute force (Haynal & Haynal, 2011). [31]

Most of the attempts to lower or prove the complexity of FFT algorithms have focused on the ordinary complex-data case, because it is the simplest. However, complex-data FFTs are so closely related to algorithms for related problems such as real-data FFTs, discrete cosine transforms, discrete Hartley transforms, and so on, that any improvement in one of these would immediately lead to improvements in the others (Duhamel & Vetterli, 1990). [32]

### Approximations

All of the FFT algorithms discussed above compute the DFT exactly (i.e. neglecting floating-point errors). A few "FFT" algorithms have been proposed, however, that compute the DFT approximately, with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999) [33] achieves lower communication requirements for parallel computing with the help of a fast multipole method. A wavelet-based approximate FFT by Guo and Burrus (1996) [34] takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995). [35] The Edelman algorithm works equally well for sparse and non-sparse data, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data. Conversely, if the data are sparse—that is, if only K out of N Fourier coefficients are nonzero—then the complexity can be reduced to O(Klog(N)log(N/K)), and this has been demonstrated to lead to practical speedups compared to an ordinary FFT for N/K > 32 in a large-N example (N = 222) using a probabilistic approximate algorithm (which estimates the largest K coefficients to several decimal places). [36]

### Accuracy

FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. Cooley–Tukey, have excellent numerical properties as a consequence of the pairwise summation structure of the algorithms. The upper bound on the relative error for the Cooley–Tukey algorithm is O(ε log N), compared to O(εN3/2) for the naïve DFT formula, [18] where ε is the machine floating-point relative precision. In fact, the root mean square (rms) errors are much better than these upper bounds, being only O(εlog N) for Cooley–Tukey and O(εN) for the naïve DFT (Schatzman, 1996). [37] These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use inaccurate trigonometric recurrence formulas. Some FFTs other than Cooley–Tukey, such as the Rader–Brenner algorithm, are intrinsically less stable.

In fixed-point arithmetic, the finite-precision errors accumulated by FFT algorithms are worse, with rms errors growing as O(N) for the Cooley–Tukey algorithm (Welch, 1969). [38] Achieving this accuracy requires careful attention to scaling to minimize loss of precision, and fixed-point FFT algorithms involve rescaling at each intermediate stage of decompositions like Cooley–Tukey.

To verify the correctness of an FFT implementation, rigorous guarantees can be obtained in O(N log N) time by a simple procedure checking the linearity, impulse-response, and time-shift properties of the transform on random inputs (Ergün, 1995). [39]

## Multidimensional FFTs

As defined in the multidimensional DFT article, the multidimensional DFT

${\displaystyle X_{\mathbf {k} }=\sum _{\mathbf {n} =0}^{\mathbf {N} -1}e^{-2\pi i\mathbf {k} \cdot (\mathbf {n} /\mathbf {N} )}x_{\mathbf {n} }}$

transforms an array xn with a d-dimensional vector of indices ${\displaystyle \mathbf {n} =\left(n_{1},\ldots ,n_{d}\right)}$ by a set of d nested summations (over ${\displaystyle n_{j}=0\ldots N_{j}-1}$ for each j), where the division n/N, defined as ${\displaystyle \mathbf {n} /\mathbf {N} =\left(n_{1}/N_{1},\ldots ,n_{d}/N_{d}\right)}$, is performed element-wise. Equivalently, it is the composition of a sequence of d sets of one-dimensional DFTs, performed along one dimension at a time (in any order).

This compositional viewpoint immediately provides the simplest and most common multidimensional DFT algorithm, known as the row-column algorithm (after the two-dimensional case, below). That is, one simply performs a sequence of d one-dimensional FFTs (by any of the above algorithms): first you transform along the n1 dimension, then along the n2 dimension, and so on (or actually, any ordering works). This method is easily shown to have the usual O(N log N) complexity, where ${\displaystyle N=N_{1}\cdot N_{2}\cdot \cdots \cdot N_{d}}$ is the total number of data points transformed. In particular, there are N/N1 transforms of size N1, etcetera, so the complexity of the sequence of FFTs is:

{\displaystyle {\begin{aligned}&{\frac {N}{N_{1}}}O(N_{1}\log N_{1})+\cdots +{\frac {N}{N_{d}}}O(N_{d}\log N_{d})\\[6pt]={}&O\left(N\left[\log N_{1}+\cdots +\log N_{d}\right]\right)=O(N\log N).\end{aligned}}}

In two dimensions, the xk can be viewed as an ${\displaystyle n_{1}\times n_{2}}$ matrix, and this algorithm corresponds to first performing the FFT of all the rows (resp. columns), grouping the resulting transformed rows (resp. columns) together as another ${\displaystyle n_{1}\times n_{2}}$ matrix, and then performing the FFT on each of the columns (resp. rows) of this second matrix, and similarly grouping the results into the final result matrix.

In more than two dimensions, it is often advantageous for cache locality to group the dimensions recursively. For example, a three-dimensional FFT might first perform two-dimensional FFTs of each planar "slice" for each fixed n1, and then perform the one-dimensional FFTs along the n1 direction. More generally, an asymptotically optimal cache-oblivious algorithm consists of recursively dividing the dimensions into two groups ${\displaystyle (n_{1},\ldots ,n_{d/2})}$ and ${\displaystyle (n_{d/2+1},\ldots ,n_{d})}$ that are transformed recursively (rounding if d is not even) (see Frigo and Johnson, 2005). [17] Still, this remains a straightforward variation of the row-column algorithm that ultimately requires only a one-dimensional FFT algorithm as the base case, and still has O(N log N) complexity. Yet another variation is to perform matrix transpositions in between transforming subsequent dimensions, so that the transforms operate on contiguous data; this is especially important for out-of-core and distributed memory situations where accessing non-contiguous data is extremely time-consuming.

There are other multidimensional FFT algorithms that are distinct from the row-column algorithm, although all of them have O(N log N) complexity. Perhaps the simplest non-row-column FFT is the vector-radix FFT algorithm, which is a generalization of the ordinary Cooley–Tukey algorithm where one divides the transform dimensions by a vector ${\displaystyle \mathbf {r} =\left(r_{1},r_{2},\ldots ,r_{d}\right)}$ of radices at each step. (This may also have cache benefits.) The simplest case of vector-radix is where all of the radices are equal (e.g. vector-radix-2 divides all of the dimensions by two), but this is not necessary. Vector radix with only a single non-unit radix at a time, i.e. ${\displaystyle \mathbf {r} =\left(1,\ldots ,1,r,1,\ldots ,1\right)}$, is essentially a row-column algorithm. Other, more complicated, methods include polynomial transform algorithms due to Nussbaumer (1977), [40] which view the transform in terms of convolutions and polynomial products. See Duhamel and Vetterli (1990) [32] for more information and references.

## Other generalizations

An O(N5/2log N) generalization to spherical harmonics on the sphere S2 with N2 nodes was described by Mohlenkamp, [41] along with an algorithm conjectured (but not proven) to have O(N2 log2(N)) complexity; Mohlenkamp also provides an implementation in the libftsh library. [42] A spherical-harmonic algorithm with O(N2log N) complexity is described by Rokhlin and Tygert. [43]

The fast folding algorithm is analogous to the FFT, except that it operates on a series of binned waveforms rather than a series of real or complex scalar values. Rotation (which in the FFT is multiplication by a complex phasor) is a circular shift of the component waveform.

Various groups have also published "FFT" algorithms for non-equispaced data, as reviewed in Potts et al. (2001). [44] Such algorithms do not strictly compute the DFT (which is only defined for equispaced data), but rather some approximation thereof (a non-uniform discrete Fourier transform, or NDFT, which itself is often computed only approximately). More generally there are various other methods of spectral estimation.

## Applications

The FFT is used in digital recording, sampling, additive synthesis and pitch correction software. [45]

The FFT's importance derives from the fact that it has made working in the frequency domain equally computationally feasible as working in the temporal or spatial domain. Some of the important applications of the FFT include: [15] [46]

## Research areas

Big FFTs
With the explosion of big data in fields such as astronomy, the need for 512K FFTs has arisen for certain interferometry calculations. The data collected by projects such as WMAP and LIGO require FFTs of tens of billions of points. As this size does not fit into main memory, so called out-of-core FFTs are an active area of research. [48]
Approximate FFTs
For applications such as MRI, it is necessary to compute DFTs for nonuniformly spaced grid points and/or frequencies. Multipole based approaches can compute approximate quantities with factor of runtime increase. [49]
Group FFTs
The FFT may also be explained and interpreted using group representation theory allowing for further generalization. A function on any compact group, including non-cyclic, has an expansion in terms of a basis of irreducible matrix elements. It remains active area of research to find efficient algorithm for performing this change of basis. Applications including efficient spherical harmonic expansion, analyzing certain Markov processes, robotics etc. [50]
Quantum FFTs
Shor's fast algorithm for integer factorization on a quantum computer has a subroutine to compute DFT of a binary vector. This is implemented as sequence of 1- or 2-bit quantum gates now known as quantum FFT, which is effectively the Cooley–Tukey FFT realized as a particular factorization of the Fourier matrix. Extension to these ideas is currently being explored.

## Language reference

LanguageCommand/MethodPre-requisites
R stats::fft(x)None
Octave/MATLAB fft(x)None
Python fft.fft(x) numpy
Mathematica Fourier[x]None
Fortran fftw_one(plan,in,out) FFTW
Julia fft(A [,dims]) FFTW
Rust fft.process(&mut x); rustfft

FFT-related algorithms:

FFT implementations:

• ALGLIB – a dual/GPL-licensed C++ and C# library (also supporting other languages), with real/complex FFT implementation
• FFTPACK – another Fortran FFT library (public domain)
• Architecture-specific:
• Many more implementations are available, [52] for CPUs and GPUs, such as PocketFFT for C++

## Related Research Articles

In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-time Fourier transform (DTFT), which is a complex-valued function of frequency. The interval at which the DTFT is sampled is the reciprocal of the duration of the input sequence. An inverse DFT is a Fourier series, using the DTFT samples as coefficients of complex sinusoids at the corresponding DTFT frequencies. It has the same sample-values as the original input sequence. The DFT is therefore said to be a frequency domain representation of the original input sequence. If the original sequence spans all the non-zero values of a function, its DTFT is continuous, and the DFT provides discrete samples of one cycle. If the original sequence is one cycle of a periodic function, the DFT provides all the non-zero values of one DTFT cycle.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

In mathematics, the discrete sine transform (DST) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using a purely real matrix. It is equivalent to the imaginary parts of a DFT of roughly twice the length, operating on real data with odd symmetry, where in some variants the input and/or output data are shifted by half a sample.

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

A discrete Hartley transform (DHT) is a Fourier-related transform of discrete, periodic data similar to the discrete Fourier transform (DFT), with analogous applications in signal processing and related fields. Its main distinction from the DFT is that it transforms real inputs to real outputs, with no intrinsic involvement of complex numbers. Just as the DFT is the discrete analogue of the continuous Fourier transform (FT), the DHT is the discrete analogue of the continuous Hartley transform (HT), introduced by Ralph V. L. Hartley in 1942.

Rader's algorithm (1968), named for Charles M. Rader of MIT Lincoln Laboratory, is a fast Fourier transform (FFT) algorithm that computes the discrete Fourier transform (DFT) of prime sizes by re-expressing the DFT as a cyclic convolution.

The chirp Z-transform (CZT) is a generalization of the discrete Fourier transform (DFT). While the DFT samples the Z plane at uniformly-spaced points along the unit circle, the chirp Z-transform samples along spiral arcs in the Z-plane, corresponding to straight lines in the S plane. The DFT, real DFT, and zoom DFT can be calculated as special cases of the CZT.

Bruun's algorithm is a fast Fourier transform (FFT) algorithm based on an unusual recursive polynomial-factorization approach, proposed for powers of two by G. Bruun in 1978 and generalized to arbitrary even composite sizes by H. Murakami in 1996. Because its operations involve only real coefficients until the last computation stage, it was initially proposed as a way to efficiently compute the discrete Fourier transform (DFT) of real data. Bruun's algorithm has not seen widespread use, however, as approaches based on the ordinary Cooley–Tukey FFT algorithm have been successfully adapted to real data with at least as much efficiency. Furthermore, there is evidence that Bruun's algorithm may be intrinsically less accurate than Cooley–Tukey in the face of finite numerical precision.

The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size in terms of N1 smaller DFTs of sizes N2, recursively, to reduce the computation time to O(N log N) for highly composite N. Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.

The Schönhage–Strassen algorithm is an asymptotically fast multiplication algorithm for large integers. It was developed by Arnold Schönhage and Volker Strassen in 1971. The run-time bit complexity is, in Big O notation, for two n-digit numbers. The algorithm uses recursive Fast Fourier transforms in rings with 2n+1 elements, a specific type of number theoretic transform.

The Goertzel algorithm is a technique in digital signal processing (DSP) for efficient evaluation of the individual terms of the discrete Fourier transform (DFT). It is useful in certain practical applications, such as recognition of dual-tone multi-frequency signaling (DTMF) tones produced by the push buttons of the keypad of a traditional analog telephone. The algorithm was first described by Gerald Goertzel in 1958.

In the context of fast Fourier transform algorithms, a butterfly is a portion of the computation that combines the results of smaller discrete Fourier transforms (DFTs) into a larger DFT, or vice versa. The name "butterfly" comes from the shape of the data-flow diagram in the radix-2 case, as described below. The earliest occurrence in print of the term is thought to be in a 1969 MIT technical report. The same structure can also be found in the Viterbi algorithm, used for finding the most likely sequence of hidden states.

The Fastest Fourier Transform in the West (FFTW) is a software library for computing discrete Fourier transforms (DFTs) developed by Matteo Frigo and Steven G. Johnson at the Massachusetts Institute of Technology.

The split-radix FFT is a fast Fourier transform (FFT) algorithm for computing the discrete Fourier transform (DFT), and was first described in an initially little-appreciated paper by R. Yavne (1968) and subsequently rediscovered simultaneously by various authors in 1984. In particular, split radix is a variant of the Cooley–Tukey FFT algorithm that uses a blend of radices 2 and 4: it recursively expresses a DFT of length N in terms of one smaller DFT of length N/2 and two smaller DFTs of length N/4.

In signal processing, the overlap–add method is an efficient way to evaluate the discrete convolution of a very long signal with a finite impulse response (FIR) filter :

In mathematics, the discrete Fourier transform over an arbitrary ring generalizes the discrete Fourier transform of a function whose values are complex numbers.

In signal processing, overlap–save is the traditional name for an efficient way to evaluate the discrete convolution between a very long signal and a finite impulse response (FIR) filter :

In applied mathematics, a bit-reversal permutation is a permutation of a sequence of n items, where n = 2k is a power of two. It is defined by indexing the elements of the sequence by the numbers from 0 to n − 1 and then reversing the binary representations of each of these numbers. Each item is then mapped to the new position given by this reversed value. The bit reversal permutation is an involution, so repeating the same permutation twice returns to the original ordering on the items.

The vector-radix FFT algorithm, is a multidimensional fast Fourier transform (FFT) algorithm, which is a generalization of the ordinary Cooley–Tukey FFT algorithm that divides the transform dimensions by arbitrary radices. It breaks a multidimensional (MD) discrete Fourier transform (DFT) down into successively smaller MD DFTs until, ultimately, only trivial MD DFTs need to be evaluated.

The sparse Fourier transform (SFT) is a kind of discrete Fourier transform (DFT) for handling big data signals. Specifically, it is used in GPS synchronization, spectrum sensing and analog-to-digital converters.:

## References

1. Heideman, Michael T.; Johnson, Don H.; Burrus, Charles Sidney (1984). "Gauss and the history of the fast Fourier transform" (PDF). IEEE ASSP Magazine. 1 (4): 14–21. CiteSeerX  . doi:10.1109/MASSP.1984.1162257. S2CID   10032502.
2. Van Loan, Charles (1992). Computational Frameworks for the Fast Fourier Transform. SIAM.
3. Strang, Gilbert (May–June 1994). "Wavelets". American Scientist . 82 (3): 250–255. JSTOR   29775194.
4. Kent, Ray D.; Read, Charles (2002). Acoustic Analysis of Speech. ISBN   0-7693-0112-6.
5. Dongarra, Jack; Sullivan, Francis (January 2000). "Guest Editors Introduction to the top 10 algorithms". Computing in Science & Engineering. 2 (1): 22–23. Bibcode:2000CSE.....2a..22D. doi:10.1109/MCISE.2000.814652. ISSN   1521-9615.
6. Gauss, Carl Friedrich (1866). "Theoria interpolationis methodo nova tractata" [Theory regarding a new method of interpolation]. Nachlass (Unpublished manuscript). Werke (in Latin and German). 3. Göttingen, Germany: Königlichen Gesellschaft der Wissenschaften zu Göttingen. pp. 265–303.
7. Heideman, Michael T.; Johnson, Don H.; Burrus, Charles Sidney (1985-09-01). "Gauss and the history of the fast Fourier transform". Archive for History of Exact Sciences. 34 (3): 265–277. CiteSeerX  . doi:10.1007/BF00348431. ISSN   0003-9519. S2CID   122847826.
8. Yates, Frank (1937). "The design and analysis of factorial experiments". Technical Communication No. 35 of the Commonwealth Bureau of Soils. 142 (3585): 90–92. Bibcode:1938Natur.142...90F. doi:10.1038/142090a0. S2CID   23501205.
9. Danielson, Gordon C.; Lanczos, Cornelius (1942). "Some improvements in practical Fourier analysis and their application to x-ray scattering from liquids". Journal of the Franklin Institute. 233 (4): 365–380. doi:10.1016/S0016-0032(42)90767-1.
10. Lanczos, Cornelius (1956). . Prentice–Hall.
11. Cooley, James W.; Lewis, Peter A. W.; Welch, Peter D. (June 1967). "Historical notes on the fast Fourier transform". IEEE Transactions on Audio and Electroacoustics . 15 (2): 76–79. CiteSeerX  . doi:10.1109/TAU.1967.1161903. ISSN   0018-9278.
12. Cooley, James W.; Tukey, John W. (1965). "An algorithm for the machine calculation of complex Fourier series". Mathematics of Computation . 19 (90): 297–301. doi:. ISSN   0025-5718.
13. Cooley, James W. (1987). The Re-Discovery of the Fast Fourier Transform Algorithm (PDF). Microchimica Acta. III. Vienna, Austria. pp. 33–45.
14. Garwin, Richard (June 1969). "The Fast Fourier Transform As an Example of the Difficulty in Gaining Wide Use for a New Technique" (PDF). IEEE Transactions on Audio and Electroacoustics . AU-17 (2): 68–72.
15. Rockmore, Daniel N. (January 2000). "The FFT: an algorithm the whole family can use". Computing in Science & Engineering. 2 (1): 60–64. Bibcode:2000CSE.....2a..60R. CiteSeerX  . doi:10.1109/5992.814659. ISSN   1521-9615.
16. Frigo, Matteo; Johnson, Steven G. (January 2007) [2006-12-19]. "A Modified Split-Radix FFT With Fewer Arithmetic Operations". IEEE Transactions on Signal Processing . 55 (1): 111–119. Bibcode:2007ITSP...55..111J. CiteSeerX  . doi:10.1109/tsp.2006.882087. S2CID   14772428.
17. Frigo, Matteo; Johnson, Steven G. (2005). "The Design and Implementation of FFTW3" (PDF). Proceedings of the IEEE . 93 (2): 216–231. CiteSeerX  . doi:10.1109/jproc.2004.840301. S2CID   6644892.
18. Gentleman, W. Morven; Sande, G. (1966). "Fast Fourier transforms—for fun and profit". Proceedings of the AFIPS . 29: 563–578. doi:10.1145/1464291.1464352. S2CID   207170956.
19. Gauss, Carl Friedrich (1866) [1805]. Theoria interpolationis methodo nova tractata. Werke (in Latin and German). 3. Göttingen, Germany: Königliche Gesellschaft der Wissenschaften. pp. 265–327.
20. Brenner, Norman M.; Rader, Charles M. (1976). "A New Principle for Fast Fourier Transformation". IEEE Transactions on Acoustics, Speech, and Signal Processing . 24 (3): 264–266. doi:10.1109/TASSP.1976.1162805.
21. Winograd, Shmuel (1978). "On computing the discrete Fourier transform". Mathematics of Computation . 32 (141): 175–199. doi:10.1090/S0025-5718-1978-0468306-4. JSTOR   2006266. PMC  . PMID   16592303.
22. Winograd, Shmuel (1979). "On the multiplicative complexity of the discrete Fourier transform". Advances in Mathematics . 32 (2): 83–117. doi:.
23. Sorensen, Henrik V.; Jones, Douglas L.; Heideman, Michael T.; Burrus, Charles Sidney (1987). "Real-valued fast Fourier transform algorithms". IEEE Transactions on Acoustics, Speech, and Signal Processing . 35 (6): 849–863. CiteSeerX  . doi:10.1109/TASSP.1987.1165220.
24. Sorensen, Henrik V.; Jones, Douglas L.; Heideman, Michael T.; Burrus, Charles Sidney (1987). "Corrections to "Real-valued fast Fourier transform algorithms"". IEEE Transactions on Acoustics, Speech, and Signal Processing . 35 (9): 1353. doi:10.1109/TASSP.1987.1165284.
25. Heideman, Michael T.; Burrus, Charles Sidney (1986). "On the number of multiplications necessary to compute a length-2n DFT". IEEE Transactions on Acoustics, Speech, and Signal Processing . 34 (1): 91–95. doi:10.1109/TASSP.1986.1164785.
26. Duhamel, Pierre (1990). "Algorithms meeting the lower bounds on the multiplicative complexity of length-2n DFTs and their connection with practical algorithms". IEEE Transactions on Acoustics, Speech, and Signal Processing . 38 (9): 1504–1511. doi:10.1109/29.60070.
27. Morgenstern, Jacques (1973). "Note on a lower bound of the linear complexity of the fast Fourier transform". Journal of the ACM . 20 (2): 305–306. doi:10.1145/321752.321761. S2CID   2790142.
28. Pan, Victor Ya. (1986-01-02). "The trade-off between the additive complexity and the asynchronicity of linear and bilinear algorithms". Information Processing Letters . 22 (1): 11–14. doi:10.1016/0020-0190(86)90035-9 . Retrieved 2017-10-31.
29. Papadimitriou, Christos H. (1979). "Optimality of the fast Fourier transform". Journal of the ACM . 26: 95–102. doi:10.1145/322108.322118. S2CID   850634.
30. Lundy, Thomas J.; Van Buskirk, James (2007). "A new matrix approach to real FFTs and convolutions of length 2k". Computing . 80 (1): 23–45. doi:10.1007/s00607-007-0222-6. S2CID   27296044.
31. Haynal, Steve; Haynal, Heidi (2011). "Generating and Searching Families of FFT Algorithms" (PDF). Journal on Satisfiability, Boolean Modeling and Computation. 7 (4): 145–187. arXiv:. Bibcode:2011arXiv1103.5740H. doi:10.3233/SAT190084. S2CID   173109. Archived from the original (PDF) on 2012-04-26.
32. Duhamel, Pierre; Vetterli, Martin (1990). "Fast Fourier transforms: a tutorial review and a state of the art". Signal Processing. 19 (4): 259–299. doi:10.1016/0165-1684(90)90158-U.
33. Edelman, Alan; McCorquodale, Peter; Toledo, Sivan (1999). "The Future Fast Fourier Transform?" (PDF). SIAM Journal on Scientific Computing . 20 (3): 1094–1114. CiteSeerX  . doi:10.1137/S1064827597316266.
34. Guo, Haitao; Burrus, Charles Sidney (1996). "Fast approximate Fourier transform via wavelets transform". Proceedings of SPIE . Wavelet Applications in Signal and Image Processing IV. 2825: 250–259. Bibcode:1996SPIE.2825..250G. CiteSeerX  . doi:10.1117/12.255236. S2CID   120514955.
35. Shentov, Ognjan V.; Mitra, Sanjit K.; Heute, Ulrich; Hossen, Abdul N. (1995). "Subband DFT. I. Definition, interpretations and extensions". Signal Processing. 41 (3): 261–277. doi:10.1016/0165-1684(94)00103-7.
36. Hassanieh, Haitham; Indyk, Piotr; Katabi, Dina; Price, Eric (January 2012). "Simple and Practical Algorithm for Sparse Fourier Transform" (PDF). ACM-SIAM Symposium on Discrete Algorithms. (NB. See also the sFFT Web Page.)
37. Schatzman, James C. (1996). "Accuracy of the discrete Fourier transform and the fast Fourier transform". SIAM Journal on Scientific Computing . 17 (5): 1150–1166. CiteSeerX  . doi:10.1137/s1064827593247023.
38. Welch, Peter D. (1969). "A fixed-point fast Fourier transform error analysis". IEEE Transactions on Audio and Electroacoustics . 17 (2): 151–157. doi:10.1109/TAU.1969.1162035.
39. Ergün, Funda (1995). Testing multivariate linear functions: Overcoming the generator bottleneck. Proceedings of the 27th ACM Symposium on the Theory of Computing. Kyoto, Japan. pp. 407–416. doi:10.1145/225058.225167. ISBN   978-0897917186. S2CID   15512806.
40. Nussbaumer, Henri J. (1977). "Digital filtering using polynomial transforms". Electronics Letters . 13 (13): 386–387. doi:10.1049/el:19770280.
41. Mohlenkamp, Martin J. (1999). "A Fast Transform for Spherical Harmonics" (PDF). Journal of Fourier Analysis and Applications. 5 (2–3): 159–184. CiteSeerX  . doi:10.1007/BF01261607. S2CID   119482349 . Retrieved 2018-01-11.
42. "libftsh library". Archived from the original on 2010-06-23. Retrieved 2007-01-09.
43. Rokhlin, Vladimir; Tygert, Mark (2006). "Fast Algorithms for Spherical Harmonic Expansions" (PDF). SIAM Journal on Scientific Computing . 27 (6): 1903–1928. CiteSeerX  . doi:10.1137/050623073 . Retrieved 2014-09-18.
44. Potts, Daniel; Steidl, Gabriele; Tasche, Manfred (2001). "Fast Fourier transforms for nonequispaced data: A tutorial" (PDF). In Benedetto, J. J.; Ferreira, P. (eds.). Modern Sampling Theory: Mathematics and Applications. Birkhäuser.
45. Burgess, Richard James (2014). The History of Music Production. Oxford University Press. ISBN   978-0199357178 . Retrieved 1 August 2019.
46. Chu, Eleanor; George, Alan (1999-11-11) [1999-11-11]. "Chapter 16". Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. CRC Press. pp. 153–168. ISBN   978-1-42004996-1.
47. Fernandez-de-Cossio Diaz, Jorge; Fernandez-de-Cossio, Jorge (2012-08-08). "Computation of Isotopic Peak Center-Mass Distribution by Fourier Transform". Analytical Chemistry. 84 (16): 7052–7056. doi:10.1021/ac301296a. ISSN   0003-2700. PMID   22873736.
48. Cormen, Thomas H.; Nicol, David M. (1998). "Performing out-of-core FFTs on parallel disk systems". Parallel Computing. 24 (1): 5–20. CiteSeerX  . doi:10.1016/S0167-8191(97)00114-2.
49. Dutt, Alok; Rokhlin, Vladimir (1993-11-01). "Fast Fourier Transforms for Nonequispaced Data". SIAM Journal on Scientific Computing . 14 (6): 1368–1393. doi:10.1137/0914081. ISSN   1064-8275.
50. Rockmore, Daniel N. (2004). "Recent Progress and Applications in Group FFTs". In Byrnes, Jim (ed.). Computational Noncommutative Algebra and Applications. NATO Science Series II: Mathematics, Physics and Chemistry. 136. Springer Netherlands. pp. 227–254. CiteSeerX  . doi:10.1007/1-4020-2307-3_9. ISBN   978-1-4020-1982-1. S2CID   1412268.
51. "Arm Performance Libraries". Arm. 2020. Retrieved 2020-12-16.
52. "Complete list of C/C++ FFT libraries". VCV Community. 2020-04-05. Retrieved 2021-03-03.