Noise shaping

Last updated

Noise shaping is a technique typically used in digital audio, image, and video processing, usually in combination with dithering, as part of the process of quantization or bit-depth reduction of a digital signal. Its purpose is to increase the apparent signal-to-noise ratio of the resultant signal. It does this by altering the spectral shape of the error that is introduced by dithering and quantization; such that the noise power is at a lower level in frequency bands at which noise is considered to be less desirable and at a correspondingly higher level in bands where it is considered to be more desirable. A popular noise shaping algorithm used in image processing is known as ‘Floyd Steinberg dithering’; and many noise shaping algorithms used in audio processing are based on an ‘Absolute threshold of hearing’ model.

Introduction

Noise shaping works by putting the quantization error in a feedback loop. Any feedback loop functions as a filter, so by creating a feedback loop for the error itself, the error can be filtered as desired.

For example, consider the feedback system:

${\displaystyle \ y[n]=x[n]+e[n-1],}$

where y[n] is the output sample value that is to be quantized, x[n] is the input sample value, n is the sample number, and e[n] is the quantization error introduced at sample n:

${\displaystyle \ e[n]=y_{\text{quantized}}[n]-y[n].}$

In this model, when any sample's bit depth is reduced, the quantization error between the quantized value and the original value is measured and stored. That "error value" is then re-added into the next sample prior to its quantization. The effect is that the quantization error is low-pass filtered by a 2-sample boxcar filter (also known as a simple moving average filter). As a result, compared to before, the quantization error has lower power at higher frequencies and higher power at lower frequencies.

Note that we can adjust the cutoff frequency of the filter by modifying the proportion, b, of the error from the previous sample that is fed back:

${\displaystyle \ y[n]=x[n]+be[n-1]}$

More generally, any FIR filter or IIR filter can be used to create a more complex frequency response curve. Such filters can be designed using the weighted least squares method. [1] In the case of digital audio, typically the weighting function used is one divided by the absolute threshold of hearing curve, i.e.

${\displaystyle \ W(f)={\frac {1}{A(f)}}.}$

Noise shaping should also always involve an appropriate amount of dither within the process itself so as to prevent determinable and correlated errors to the signal itself. If dither is not used then noise shaping effectively functions merely as distortion shaping — pushing the distortion energy around to different frequency bands, but it is still distortion. If dither is added to the process as

${\displaystyle \ y[n]=x[n]+be[n-1]+\mathrm {dither} ,}$

then the quantization error truly becomes noise, and the process indeed yields noise shaping.

In digital audio

Noise shaping in audio is most commonly applied as a bit-reduction scheme. The most basic form of dither is flat, white noise. The ear, however, is less sensitive to certain frequencies than others at low levels (see Fletcher-Munson curves). By using noise shaping the quantization error can be effectively spread around so that more of it is focused on frequencies that can't be heard as well and less of it is focused on frequencies that can. The result is that where the ear is most critical the quantization error can be reduced greatly and where the ears are less sensitive the noise is much greater. This can give a perceived noise reduction of 4 bits compared to straight dither. [2] While 16-bit audio is typically thought to have 96 dB of dynamic range (see quantization distortion calculations), it can actually be increased to 120 dB using noise-shaped dither. [3]

Noise shaping and 1-bit converters

Since around 1989, 1 bit delta-sigma modulators have been used in analog-to-digital converters. This involves sampling the audio at a very high rate (2.8224 million samples per second, for example) but only using a single bit. Because only 1 bit is used, this converter only has 6.02 dB of dynamic range. The noise floor, however, is spread throughout the entire "legal" frequency range below the Nyquist frequency of 1.4112 MHz. Noise shaping is used to lower the noise present in the audible range (20 Hz to 20 kHz) and increase the noise above the audible range. This results in a broadband dynamic range of only 7.78 dB, but it is not consistent among frequency bands, and in the lowest frequencies (the audible range) the dynamic range is much greater — over 100 dB. Noise Shaping is inherently built into the delta-sigma modulators.

The 1 bit converter is the basis of the DSD format by Sony. One criticism of the 1 bit converter (and thus the DSD system) is that because only 1 bit is used in both the signal and the feedback loop, adequate amounts of dither cannot be used in the feedback loop and distortion can be heard under some conditions. [4] [5] Most A/D converters made since 2000 use multi-bit or multi-level delta sigma modulators that yield more than 1 bit output so that proper dither can be added in the feedback loop. For traditional PCM sampling the signal is then decimated to 44.1 kHz or other appropriate sample rates.

Analog Devices uses what they refer to as "Noise Shaping Requantizer", [6] and Texas Instruments uses what they refer to as "SNRBoost" [7] [8] to lower the noise floor approximately 30db compared to the surrounding frequencies. This comes at a cost of non-continuous operation but produces a nice bathtub shape to the spectrum floor. This can be combined with other techniques such as Bit-Boost[ specify ] to further enhance the resolution of the spectrum.

Related Research Articles

In electronics, an analog-to-digital converter is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provide an isolated measurement such as an electronic device that converts an input analog voltage or current to a digital number representing the magnitude of the voltage or current. Typically the digital output is a two's complement binary number that is proportional to the input, but there are other possibilities.

A delta modulation is an analog-to-digital and digital-to-analog signal conversion technique used for transmission of voice information where quality is not of primary importance. DM is the simplest form of differential pulse-code modulation (DPCM) where the difference between successive samples is encoded into n-bit data streams. In delta modulation, the transmitted data are reduced to a 1-bit data stream. Its main features are:

Signal-to-noise ratio is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power, often expressed in decibels. A ratio higher than 1:1 indicates more signal than noise.

In electronics, a digital-to-analog converter is a system that converts a digital signal into an analog signal. An analog-to-digital converter (ADC) performs the reverse function.

Sound can be recorded and stored and played using either digital or analog techniques. Both techniques introduce errors and distortions in the sound, and these methods can be systematically compared. Musicians and listeners have argued over the superiority of digital versus analog sound recordings. Arguments for analog systems include the absence of fundamental error mechanisms which are present in digital audio systems, including aliasing and quantization noise. Advocates of digital point to the high levels of performance possible with digital audio, including excellent linearity in the audible band and low levels of noise and distortion.

In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of samples.

Audio system measurements are a means of quantifying system performance. These measurements are made for several purposes. Designers take measurements so that they can specify the performance of a piece of equipment. Maintenance engineers make them to ensure equipment is still working to specification, or to ensure that the cumulative defects of an audio path are within limits considered acceptable. Audio system measurements often accommodate psychoacoustic principles to measure the system in a way that relates to human hearing.

Direct Stream Digital (DSD) is a trademark used by Sony and Philips for their system of digitally recreating audible signals for the Super Audio CD (SACD).

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images. Dither is routinely used in processing of both digital audio and video data, and is often one of the last stages of mastering audio to a CD.

In signal processing, oversampling is the process of sampling a signal at a sampling frequency significantly higher than the Nyquist rate. Theoretically, a bandwidth-limited signal can be perfectly reconstructed if sampled at the Nyquist rate or above it. The Nyquist rate is defined as twice the bandwidth of the signal. Oversampling is capable of improving resolution and signal-to-noise ratio, and can be helpful in avoiding aliasing and phase distortion by relaxing anti-aliasing filter performance requirements.

Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction (RELP) and linear predictive coding (LPC) vocoders. Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec.

Delta-sigma modulation is a method for encoding analog signals into digital signals as found in an analog-to-digital converter (ADC). It is also used to convert high bit-count, low-frequency digital signals into lower bit-count, higher-frequency digital signals as part of the process to convert digital signals into analog as part of a digital-to-analog converter (DAC).

A class-D amplifier or switching amplifier is an electronic amplifier in which the amplifying devices operate as electronic switches, and not as linear gain devices as in other amplifiers. They operate by rapidly switching back and forth between the supply rails, being fed by a modulator using pulse width, pulse density, or related techniques to encode the audio input into a pulse train. The audio escapes through a simple low-pass filter into the loudspeaker. The high-frequency pulses are blocked. Since the pairs of output transistors are never conducting at the same time, there is no other path for current flow apart from the low-pass filter/loudspeaker. For this reason, efficiency can exceed 90%.

In digital audio using pulse-code modulation (PCM), bit depth is the number of bits of information in each sample, and it directly corresponds to the resolution of each sample. Examples of bit depth include Compact Disc Digital Audio, which uses 16 bits per sample, and DVD-Audio and Blu-ray Disc which can support up to 24 bits per sample.

Effective number of bits (ENOB) is a measure of the dynamic range of an analog-to-digital converter (ADC), digital-to-analog converter, or their associated circuitry. The resolution of an ADC is specified by the number of bits used to represent the analog value. Ideally, a 12-bit ADC will have an effective number of bits of almost 12. However, real signals have noise, and real circuits are imperfect and introduce additional noise and distortion. Those imperfections reduce the number of bits of accuracy in the ADC. The ENOB describes the effective resolution of the system in bits. An ADC may have 12-bit resolution, but the effective number of bits when used in a system may be 9.5.

A Bitcrusher is an audio effect that produces distortion by reducing of the resolution or bandwidth of digital audio data. The resulting quantization noise may produce a “warmer” sound impression, or a harsh one, depending on the amount of reduction.

In signal processing, sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands, typically by using a fast Fourier transform, and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals.

Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

Pulse-density modulation, or PDM, is a form of modulation used to represent an analog signal with a binary signal. In a PDM signal, specific amplitude values are not encoded into codewords of pulses of different weight as they would be in pulse-code modulation (PCM); rather, the relative density of the pulses corresponds to the analog signal's amplitude. The output of a 1-bit DAC is the same as the PDM encoding of the signal. Pulse-width modulation (PWM) is a special case of PDM where the switching frequency is fixed and all the pulses corresponding to one sample are contiguous in the digital signal. For a 50% voltage with a resolution of 8-bits, a PWM waveform will turn on for 128 clock cycles and then off for the remaining 128 cycles. With PDM and the same clock rate the signal would alternate between on and off every other cycle. The average is 50% for both waveforms, but the PDM signal switches more often. For 100% or 0% level, they are the same.

References

1. Verhelst, Werner; De Koning, Dreten (24 October 2001). Noise shaping filter design for minimally audible signal requantization. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE.
2. Gerzon, Michael; Peter Craven; Robert Stuart; Rhonda Wilson (16–19 March 1993). Psychoacoustic Noise Shaped Improvements in CD and Other Linear Digital Media. 94th Convention of the Audio Engineering Society, Berlin. AES. Preprint 3501.