FFTW

Last updated
FFTW
Developer(s) Matteo Frigo and Steven G. Johnson
Initial release24 March 1997 (1997-03-24)
Stable release
3.3.10 [1]   OOjs UI icon edit-ltr-progressive.svg / 15 September 2021;2 years ago (15 September 2021)
Repository
Written in C, OCaml
Type Numerical software
License GPL, commercial
Website www.fftw.org   OOjs UI icon edit-ltr-progressive.svg

The Fastest Fourier Transform in the West (FFTW) is a software library for computing discrete Fourier transforms (DFTs) developed by Matteo Frigo and Steven G. Johnson at the Massachusetts Institute of Technology. [2] [3] [4]

Contents

FFTW is one of the fastest free software implementations of the fast Fourier transform (FFT). It implements the FFT algorithm for real and complex-valued arrays of arbitrary size and dimension.

Library

FFTW expeditiously transforms data by supporting a variety of algorithms and choosing the one (a particular decomposition of the transform into smaller transforms) it estimates or measures to be preferable in the particular circumstances. It works best on arrays of sizes with small prime factors, with powers of two being optimal and large primes being worst case (but still O(n log n)). To decompose transforms of composite sizes into smaller transforms, it chooses among several variants of the Cooley–Tukey FFT algorithm (corresponding to different factorizations and/or different memory-access patterns), while for prime sizes it uses either Rader's or Bluestein's FFT algorithm. [2] Once the transform has been broken up into subtransforms of sufficiently small sizes, FFTW uses hard-coded unrolled FFTs for these small sizes that were produced (at compile time, not at run time) by code generation; these routines use a variety of algorithms including Cooley–Tukey variants, Rader's algorithm, and prime-factor FFT algorithms. [2]

For a sufficiently large number of repeated transforms it is advantageous to measure the performance of some or all of the supported algorithms on the given array size and platform. These measurements, which the authors refer to as "wisdom", can be stored in a file or string for later use.

FFTW has a "guru interface" that intends "to expose as much as possible of the flexibility in the underlying FFTW architecture". This allows, among other things, multi-dimensional transforms and multiple transforms in a single call (e.g., where the data is interleaved in memory).

FFTW has limited support for out-of-order transforms (using the Message Passing Interface (MPI) version). The data reordering incurs an overhead, which for in-place transforms of arbitrary size and dimension is non-trivial to avoid. It is undocumented for which transforms this overhead is significant.

FFTW is licensed under the GNU General Public License. It is also licensed commercially (for a cost of up to $12,500) by MIT [5] and is used in the commercial MATLAB [6] matrix package for calculating FFTs. FFTW is written in the C language, but Fortran and Ada interfaces exist, as well as interfaces for a few other languages. While the library itself is C, the code is actually generated from a program called 'genfft', which is written in OCaml. [7]

In 1999, FFTW won the J. H. Wilkinson Prize for Numerical Software.

See also

Related Research Articles

<span class="mw-page-title-main">Fast Fourier transform</span> O(N log N) discrete Fourier transform algorithm

A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies. This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse factors. As a result, it manages to reduce the complexity of computing the DFT from , which arises if one simply applies the definition of DFT, to , where n is the data size. The difference in speed can be enormous, especially for long data sets where n may be in the thousands or millions. In the presence of round-off error, many FFT algorithms are much more accurate than evaluating the DFT definition directly or indirectly. There are many different FFT algorithms based on a wide range of published theories, from simple complex-number arithmetic to group theory and number theory.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

Rader's algorithm (1968), named for Charles M. Rader of MIT Lincoln Laboratory, is a fast Fourier transform (FFT) algorithm that computes the discrete Fourier transform (DFT) of prime sizes by re-expressing the DFT as a cyclic convolution.

The chirp Z-transform (CZT) is a generalization of the discrete Fourier transform (DFT). While the DFT samples the Z plane at uniformly-spaced points along the unit circle, the chirp Z-transform samples along spiral arcs in the Z-plane, corresponding to straight lines in the S plane. The DFT, real DFT, and zoom DFT can be calculated as special cases of the CZT.

The prime-factor algorithm (PFA), also called the Good–Thomas algorithm (1958/1963), is a fast Fourier transform (FFT) algorithm that re-expresses the discrete Fourier transform (DFT) of a size N = N1N2 as a two-dimensional N1×N2 DFT, but only for the case where N1 and N2 are relatively prime. These smaller transforms of size N1 and N2 can then be evaluated by applying PFA recursively or by using some other FFT algorithm.

Bruun's algorithm is a fast Fourier transform (FFT) algorithm based on an unusual recursive polynomial-factorization approach, proposed for powers of two by G. Bruun in 1978 and generalized to arbitrary even composite sizes by H. Murakami in 1996. Because its operations involve only real coefficients until the last computation stage, it was initially proposed as a way to efficiently compute the discrete Fourier transform (DFT) of real data. Bruun's algorithm has not seen widespread use, however, as approaches based on the ordinary Cooley–Tukey FFT algorithm have been successfully adapted to real data with at least as much efficiency. Furthermore, there is evidence that Bruun's algorithm may be intrinsically less accurate than Cooley–Tukey in the face of finite numerical precision.

The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size in terms of N1 smaller DFTs of sizes N2, recursively, to reduce the computation time to O(N log N) for highly composite N (smooth numbers). Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.

James William Cooley was an American mathematician. Cooley received a B.A. degree in 1949 from Manhattan College, Bronx, NY, an M.A. degree in 1951 from Columbia University, New York, NY, and a Ph.D. degree in 1961 in applied mathematics from Columbia University. He was a programmer on John von Neumann's computer at the Institute for Advanced Study, Princeton, NJ, from 1953 to 1956, where he notably programmed the Blackman–Tukey transformation.

In computing, a cache-oblivious algorithm is an algorithm designed to take advantage of a processor cache without having the size of the cache as an explicit parameter. An optimal cache-oblivious algorithm is a cache-oblivious algorithm that uses the cache optimally. Thus, a cache-oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes. Cache-oblivious algorithms are contrasted with explicit loop tiling, which explicitly breaks a problem into blocks that are optimally sized for a given cache.

<span class="mw-page-title-main">Butterfly diagram</span>

In the context of fast Fourier transform algorithms, a butterfly is a portion of the computation that combines the results of smaller discrete Fourier transforms (DFTs) into a larger DFT, or vice versa. The name "butterfly" comes from the shape of the data-flow diagram in the radix-2 case, as described below. The earliest occurrence in print of the term is thought to be in a 1969 MIT technical report. The same structure can also be found in the Viterbi algorithm, used for finding the most likely sequence of hidden states.

A twiddle factor, in fast Fourier transform (FFT) algorithms, is any of the trigonometric constant coefficients that are multiplied by the data in the course of the algorithm. This term was apparently coined by Gentleman & Sande in 1966, and has since become widespread in thousands of papers of the FFT literature.

The split-radix FFT is a fast Fourier transform (FFT) algorithm for computing the discrete Fourier transform (DFT), and was first described in an initially little-appreciated paper by R. Yavne (1968) and subsequently rediscovered simultaneously by various authors in 1984. In particular, split radix is a variant of the Cooley–Tukey FFT algorithm that uses a blend of radices 2 and 4: it recursively expresses a DFT of length N in terms of one smaller DFT of length N/2 and two smaller DFTs of length N/4.

<span class="mw-page-title-main">Bit-reversal permutation</span> Permutation that reverses binary numbers

In applied mathematics, a bit-reversal permutation is a permutation of a sequence of items, where is a power of two. It is defined by indexing the elements of the sequence by the numbers from to , representing each of these numbers by its binary representation, and mapping each item to the item whose representation has the same bits in the reversed order.

The vector-radix FFT algorithm, is a multidimensional fast Fourier transform (FFT) algorithm, which is a generalization of the ordinary Cooley–Tukey FFT algorithm that divides the transform dimensions by arbitrary radices. It breaks a multidimensional (MD) discrete Fourier transform (DFT) down into successively smaller MD DFTs until, ultimately, only trivial MD DFTs need to be evaluated.

Parallelmultidimensionaldigitalsignal processing (mD-DSP) is defined as the application of parallel programming and multiprocessing to digital signal processing techniques to process digital signals that have more than a single dimension. The use of mD-DSP is fundamental to many application areas such as digital image and video processing, medical imaging, geophysical signal analysis, sonar, radar, lidar, array processing, computer vision, computational photography, and augmented and virtual reality. However, as the number of dimensions of a signal increases the computational complexity to operate on the signal increases rapidly. This relationship between the number of dimensions and the amount of complexity, related to both time and space, as studied in the field of algorithm analysis, is analogues to the concept of the curse of dimensionality. This large complexity generally results in an extremely long execution run-time of a given mD-DSP application rendering its usage to become impractical for many applications; especially for real-time applications. This long run-time is the primary motivation of applying parallel algorithmic techniques to mD-DSP problems.

Steven Glenn Johnson is an American mathematician known for being a co-creator of the FFTW library for software-based fast Fourier transforms and for his work on photonic crystals. He is professor of Applied Mathematics and Physics at MIT where he leads a group on Nanostructures and Computation.

<span class="mw-page-title-main">Bailey's FFT algorithm</span> High-performance algorithm

The Bailey's FFT is a high-performance algorithm for computing the fast Fourier transform (FFT). This variation of the Cooley–Tukey FFT algorithm was originally designed for systems with hierarchical memory common in modern computers. The algorithm treats the samples as a two dimensional matrix and executes short FFT operations on the columns and rows of the matrix, with a correction multiplication by "twiddle factors" in between.

References

  1. "The FFTW Release Notes" . Retrieved 16 September 2021.
  2. 1 2 3 Frigo M, Johnson SG (February 2005). "The design and implementation of FFTW3" (PDF). Proceedings of the IEEE. 93 (2): 216–231. CiteSeerX   10.1.1.66.3097 . doi:10.1109/JPROC.2004.840301. S2CID   6644892.
  3. Frigo M, Johnson SG (1998). "FFTW: An adaptive software architecture for the FFT". Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181). Vol. 3. pp. 1381–1384. CiteSeerX   10.1.1.47.8661 . doi:10.1109/ICASSP.1998.681704. ISBN   978-0-7803-4428-0. S2CID   12560207.
  4. Johnson SG, Frigo M (September 2008). "ch.11: Implementing FFTs in practice". In C. S. Burrus (ed.). Fast Fourier Transforms. Houston TX: Connexions: Rice University.
  5. "View Technologies | MIT Technology Licensing Office".
  6. "FFTW - Fastest Fourier Transform in the West | MIT Technology Licensing Office". tlo.mit.edu. Retrieved 2023-02-07.
  7. "FFTW FAQ"