Pipelining (DSP implementation)

Last updated April 17, 2024

Pipelining is an important technique used in several applications such as digital signal processing (DSP) systems, microprocessors, etc. It originates from the idea of a water pipe with continuous water sent in without waiting for the water in the pipe to come out. Accordingly, it results in speed enhancement for the critical path in most DSP systems. For example, it can either increase the clock speed or reduce the power consumption at the same speed in a DSP system.

Concept

Pipelining allows different functional units of a system to run concurrently. Consider an informal example in the following figure. A system includes three sub-function units (F₀, F₁ and F₂). Assume that there are three independent tasks (T₀, T₁ and T₂) being performed by these three function units. The time for each function unit to complete a task is the same and will occupy a slot in the schedule.

If we put these three units and tasks in a sequential order, the required time to complete them is five slots.

However, if we pipeline T₀ to T₂ concurrently, the aggregate time is reduced to three slots.

Therefore, it is possible for an adequate pipelined design to achieve significant enhancement on speed.

Costs and disadvantages

Pipelining cannot decrease the processing time required for a single task. The advantage of pipelining is that it increases the throughput of the system when processing a stream of tasks.

Applying too many pipelined functions can lead to increased latency - that is, the time required for a single task to propagate through the full pipe is prolonged. A pipelined system may also require more resources (buffers, circuits, processing units, memory etc.), if the reuse of resources across different stages is restricted.

Comparison with parallel approaches

Another technique to enhance the efficiency through concurrency is parallel processing. The core difference is that parallel techniques usually duplicate function units and distribute multiple input tasks at once amongst them. Therefore, it can complete more tasks per unit time but may suffer more expensive resource costs.

For the previous example, the parallel technique duplicates each function units into another two. Accordingly, all the tasks can be operated upon by the duplicated function units with the same function simultaneously. The time to complete these three tasks is reduced to three slots.

Pipelining in FIR filters

Consider a 3-tap FIR filter:^[1]

y(n)=ax(n)+bx(n-1)+cx(n-2)

which is as shown in the following figure.

Assume the calculation time for multiplication units is T_m and T_a for add units. The critical path, representing the minimum time required for processing a new sample, is limited by 1 multiplication and 2 add function units. Therefore, the sample period is given by

T_{\text{sample}}\geq T_{m}+2T_{a}

However, such structure may not be suitable for the design with the requirement of high speed. To reduce the sampling period, we can introduce extra pipelining registers along the critical data path. Then the structure is partitioned into two stages and the data produced in the first stage will be stored in the introduced registers, delaying one clock to the second stage. The data in first three clocks is recorded in the following table. Under such pipelined structure, the sample period is reduced to

T_{\text{sample}}\geq T_{m}+T_{a}.

Pipelining in 1st-order IIR filters

By combining look-ahead techniques and pipelining,^[2] we are able to enhance the sample rate of target design. Look-ahead pipelining will add canceling poles and zeroes to the transfer function such that the coefficients of the following terms in the denominator of the transfer function are zero.

\{z^{-1},\ldots ,z^{-(M-1)}\}

Then, the output sample y(n) can be computed in terms of the inputs and the output sample y(n − M) such that there are M delay elements in the critical loop. These elements are then used to pipeline the critical loop by M stages so that the sample rate can be increased by a factor M.

Consider the 1st-order IIR filter transfer function

H(z)={\frac {1}{1-az^{-1}}}

The output y(n) can be computed in terms of the input u(n) and the previous output.

y(n)=ay(n-1)+u(n)

In a straightforward structure to design such function, the sample rate of this recursive filter is restricted by the calculation time of one multiply-add operation.

To pipeline such design, we observe that H has a pole at

z=a,a\leq 1

Therefore, in a 3-stage pipelined equivalent stable filter, the transfer function can be derived by adding poles and zeros at

z=ae^{\pm ({\frac {2j\pi }{3}})}

and is given by

H(z)={\frac {1+az^{-1}+a^{2}z^{-2}}{1-a^{3}z^{-3}}}

Therefore, the corresponding sample rate can be increased by a factor 3.

Other examples of pipelined DSP systems

Pipelined Walsh–Fourier transform^[3]
Pipelined unitary transforms^[4]
Pipelined DFT^[5]
Pipelined FFT^[6]

Related Research Articles

Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a sequence of numbers that represent samples of a continuous variable in a domain such as time, space, or frequency. In digital electronics, a digital signal is represented as a pulse train, which is typically generated by the switching of a transistor.

In signal processing, a digital filter is a system that performs mathematical operations on a sampled, discrete-time signal to reduce or enhance certain aspects of that signal. This is in contrast to the other major type of electronic filter, the analog filter, which is typically an electronic circuit operating on continuous-time analog signals.

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing.

In mathematics and signal processing, the Z-transform converts a discrete-time signal, which is a sequence of real or complex numbers, into a complex valued frequency-domain representation.

The chirp Z-transform (CZT) is a generalization of the discrete Fourier transform (DFT). While the DFT samples the Z plane at uniformly-spaced points along the unit circle, the chirp Z-transform samples along spiral arcs in the Z-plane, corresponding to straight lines in the S plane. The DFT, real DFT, and zoom DFT can be calculated as special cases of the CZT.

In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse response (IIR) filters, which may have internal feedback and may continue to respond indefinitely.

Infinite impulse response (IIR) is a property applying to many linear time-invariant systems that are distinguished by having an impulse response $that does not become exactly zero past a certain point but continues indefinitely. This is in contrast to a finite impulse response (FIR) system, in which the impulse response does become exactly zero at times for some finite, thus being of finite duration. Common examples of linear time-invariant systems are most electronic and digital filters. Systems with this property are known as IIR systems or IIR filters .$

In signal processing, time–frequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various time–frequency representations. Rather than viewing a 1-dimensional signal and some transform, time–frequency analysis studies a two-dimensional signal – a function whose domain is the two-dimensional real plane, obtained from the signal via a time–frequency transform.

In mathematics, the discrete-time Fourier transform (DTFT) is a form of Fourier analysis that is applicable to a sequence of discrete values.

In digital signal processing, downsampling, compression, and decimation are terms associated with the process of resampling in a multi-rate digital signal processing system. Both downsampling and decimation can be synonymous with compression, or they can describe an entire process of bandwidth reduction (filtering) and sample-rate reduction. When the process is performed on a sequence of samples of a signal or a continuous function, it produces an approximation of the sequence that would have been obtained by sampling the signal at a lower rate.

In digital signal processing, upsampling, expansion, and interpolation are terms associated with the process of resampling in a multi-rate digital signal processing system. Upsampling can be synonymous with expansion, or it can describe an entire process of expansion and filtering (interpolation). When upsampling is performed on a sequence of samples of a signal or other continuous function, it produces an approximation of the sequence that would have been obtained by sampling the signal at a higher rate. For example, if compact disc audio at 44,100 samples/second is upsampled by a factor of 5/4, the resulting sample-rate is 55,125.

In system analysis, among other fields of study, a linear time-invariant (LTI) system is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined below. These properties apply (exactly or approximately) to many important physical systems, in which case the response $y (t)$ of the system to an arbitrary input $x (t)$ can be found directly using convolution: $y (t) = (x * h)(t)$ where $h (t)$ is called the system's impulse response and ∗ represents convolution (not to be confused with multiplication). What's more, there are systematic methods for solving any such system (determining $h (t)$ ), whereas systems not meeting both properties are generally more difficult (or impossible) to solve analytically. A good example of an LTI system is any electrical circuit consisting of resistors, capacitors, inductors and linear amplifiers.

In signal processing, a filter bank is an array of bandpass filters that separates the input signal into multiple components, each one carrying a sub-band of the original signal. One application of a filter bank is a graphic equalizer, which can attenuate the components differently and recombine them into a modified version of the original signal. The process of decomposition performed by the filter bank is called analysis ; the output of analysis is referred to as a subband signal with as many subbands as there are filters in the filter bank. The reconstruction process is called synthesis, meaning reconstitution of a complete signal resulting from the filtering process.

The Goertzel algorithm is a technique in digital signal processing (DSP) for efficient evaluation of the individual terms of the discrete Fourier transform (DFT). It is useful in certain practical applications, such as recognition of dual-tone multi-frequency signaling (DTMF) tones produced by the push buttons of the keypad of a traditional analog telephone. The algorithm was first described by Gerald Goertzel in 1958.

In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.

In digital signal processing (DSP), parallel processing is a technique duplicating function units to operate different tasks (signals) simultaneously. Accordingly, we can perform the same processing for different signals on the corresponding duplicated function units. Further, due to the features of parallel processing, the parallel DSP design often contains multiple outputs, resulting in higher throughput than not parallel.

Two dimensional filters have seen substantial development effort due to their importance and high applicability across several domains. In the 2-D case the situation is quite different from the 1-D case, because the multi-dimensional polynomials cannot in general be factored. This means that an arbitrary transfer function cannot generally be manipulated into a form required by a particular implementation. The input-output relationship of a 2-D IIR filter obeys a constant-coefficient linear partial difference equation from which the value of an output sample can be computed using the input samples and previously computed output samples. Because the values of the output samples are fed back, the 2-D filter, like its 1-D counterpart, can be unstable.

Multidimensional Digital Signal Processing (MDSP) refers to the extension of Digital signal processing (DSP) techniques to signals that vary in more than one dimension. While conventional DSP typically deals with one-dimensional data, such as time-varying audio signals, MDSP involves processing signals in two or more dimensions. Many of the principles from one-dimensional DSP, such as Fourier transforms and filter design, have analogous counterparts in multidimensional signal processing.

Multidimensional seismic data processing forms a major component of seismic profiling, a technique used in geophysical exploration. The technique itself has various applications, including mapping ocean floors, determining the structure of sediments, mapping subsurface currents and hydrocarbon exploration. Since geophysical data obtained in such techniques is a function of both space and time, multidimensional signal processing techniques may be better suited for processing such data.

Parallelmultidimensionaldigitalsignal processing (mD-DSP) is defined as the application of parallel programming and multiprocessing to digital signal processing techniques to process digital signals that have more than a single dimension. The use of mD-DSP is fundamental to many application areas such as digital image and video processing, medical imaging, geophysical signal analysis, sonar, radar, lidar, array processing, computer vision, computational photography, and augmented and virtual reality. However, as the number of dimensions of a signal increases the computational complexity to operate on the signal increases rapidly. This relationship between the number of dimensions and the amount of complexity, related to both time and space, as studied in the field of algorithm analysis, is analogues to the concept of the curse of dimensionality. This large complexity generally results in an extremely long execution run-time of a given mD-DSP application rendering its usage to become impractical for many applications; especially for real-time applications. This long run-time is the primary motivation of applying parallel algorithmic techniques to mD-DSP problems.

References

↑ Keshab K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999
↑ Slides for VLSI Digital Signal Processing Systems: Design and Implementation John Wiley & Sons, 1999 ( ISBN 0-471-24186-5): http://www.ece.umn.edu/users/parhi/SLIDES/chap10.pdf
↑ M. R. Ashouri and Anthony G. Constantinides, "A pipeline fast Walsh Fourier transform," in Proc. IEEE Int. Conf. ASSP Hartford, CT, May 9–11), pp. 515-518, 1977.
↑ Fino, B.J.; Algazi, V.R.; "Parallel and pipeline computation of fast unitary transforms," Electronics Letters, vol.11, no.5, pp.93-94, March 6, 1975
↑ Tzou, K.-H.; Morgan, N.P.; "A fast pipelined DFT processor and its programming consideration," Electronic Circuits and Systems, IEE Proceedings G, vol.132, no.6, pp.273-276, December 1985
↑ H. L. Gorginsky and G. A. Works, "A pipeline fast Fourier transform," IEEE Trans. Comput., vol. C-19, pp. 1015-1019, Nov. 1970.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Keshab K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999

[2] Slides for VLSI Digital Signal Processing Systems: Design and Implementation John Wiley & Sons, 1999 ( ISBN 0-471-24186-5): http://www.ece.umn.edu/users/parhi/SLIDES/chap10.pdf

[3] M. R. Ashouri and Anthony G. Constantinides, "A pipeline fast Walsh Fourier transform," in Proc. IEEE Int. Conf. ASSP Hartford, CT, May 9–11), pp. 515-518, 1977.

[4] Fino, B.J.; Algazi, V.R.; "Parallel and pipeline computation of fast unitary transforms," Electronics Letters, vol.11, no.5, pp.93-94, March 6, 1975

[5] Tzou, K.-H.; Morgan, N.P.; "A fast pipelined DFT processor and its programming consideration," Electronic Circuits and Systems, IEE Proceedings G, vol.132, no.6, pp.273-276, December 1985

[6] H. L. Gorginsky and G. A. Works, "A pipeline fast Fourier transform," IEEE Trans. Comput., vol. C-19, pp. 1015-1019, Nov. 1970.

[1]

[2]

[3]

[4]

[5]

[6]