Lulu smoothing

Last updated

In signal processing, Lulu smoothing is a nonlinear [ disambiguation needed ] mathematical technique for removing impulsive noise from a data sequence such as a time series. It is a nonlinear equivalent to taking a moving average (or other smoothing technique) of a time series, and is similar to other nonlinear smoothing techniques, such as Tukey or median smoothing. [1]

Contents

LU smoother of width 1 applied to a noisy sequence LUSmoother width=1.png
LU smoother of width 1 applied to a noisy sequence

LULU smoothers are compared in detail to median smoothers by Jankowitz and found to be superior in some aspects, particularly in mathematical properties like idempotence. [2]

Properties

Lulu operators have a number of attractive mathematical properties, among them idempotence – meaning that repeated application of the operator yields the same result as a single application – and co-idempotence. An interpretation of idempotence is that: 'Idempotence means that there is no “noise” left in the smoothed data and co-idempotence means that there is no “signal” left in the residual.' [3]

When studying smoothers there are four properties that are useful to optimize: [4]

  1. Effectiveness
  2. Consistency
  3. Stability
  4. Efficiency

The operators can also be used to decompose a signal into various subcomponents similar to wavelet or Fourier decomposition. [5]

History

Lulu smoothers were discovered by C. H. Rohwer and have been studied for the last 30 years. [6] [7] Their exact and asymptotic distributions have been derived. [3]

Operation

Applying a Lulu smoother consists of repeated applications of the min and max operators over a given subinterval of the data. As with other smoothers, a width or interval must be specified. The Lulu smoothers are composed of repeated applications of the L (lower) and U (Upper) operators, which are defined as follows:

L operator

For an L operator of width n over an infinite sequence of xs (..., xj, xj+1,...), the operation on xj is calculated as follows:

  1. Firstly we create (n + 1) mini-sequences of length (n + 1) each. Each of these mini-sequences contains the element xj. For example, for width 1, we create 2 mini-sequences of length 2 each. For width 1 these mini sequences are (xj1, xj) and (xj, xj+1). For width 2, the mini-sequences are (xj2, xj1, xj), (xj1, xj, xj+1) and (xj, xj+1, xj+2). For width 2, we refer to these mini-sequences as seq1, seq0 and seq+1
  2. Then we take the minimum of each of the mini sequences. Again for width 2 this gives: (Min(seq1), Min(seq0), Min(seq+1)). This gives us (n + 1) numbers for each point.
  3. Lastly we take the maximum of (the minimums of the mini sequences), or Max(Min(seq1), Min(seq0), Min(seq+1)) and this becomes L(xj)

Thus for width 2, the L operator is:

L(xj) = Max(Min(seq1), Min(seq0), Min(seq+1))

U Operator

This is identical to the L operator, except that the order of Min and Max is reversed, i.e. for width 2:

U(xj) = Min(Max(seq1), Max(seq0), Max(seq+1))

Examples

Examples of the U and L operators, as well as combined UL and LU operators on a sample data set are shown in the following figures.

L Smoother width 1 LSmoother width=1.png
L Smoother width 1
U Smoother width 1 USmoother width=1.png
U Smoother width 1

It can be seen that the results of the UL and LU operators can be different. The combined operators are very effective at removing impulsive noise, the only cases where the noise is not removed effectively is where we get multiple noise signals very close together, in which case the filter 'sees' the multiple noises as part of the signal.

LU smoother width 1 LUSmoother width=1.png
LU smoother width 1
UL smoother width 1 ULSmoother width=1.png
UL smoother width 1

Related Research Articles

A histogram is a visual representation of the distribution of quantitative data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) are adjacent and are typically of equal size.

<span class="mw-page-title-main">Idempotence</span> Property of operations

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of places in abstract algebra and functional programming.

<span class="mw-page-title-main">Median</span> Middle quantile of a data set or probability distribution

In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe the center of the income distribution because increases in the largest incomes alone have no effect on the median. For this reason, the median is of central importance in robust statistics.

<span class="mw-page-title-main">Pulse-width modulation</span> Representation of a signal as a rectangular wave with varying duty cycle

Pulse-width modulation (PWM), also known as pulse-duration modulation (PDM) or pulse-length modulation (PLM), is any method of representing a signal as a rectangular wave with a varying duty cycle.

An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, source reconstruction in acoustics, or calculating the density of the Earth from measurements of its gravity field. It is called an inverse problem because it starts with the effects and then calculates the causes. It is the inverse of a forward problem, which starts with the causes and then calculates the effects.

<span class="mw-page-title-main">Quantization (signal processing)</span> Process of mapping a continuous set to a countable set

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

Edge detection includes a variety of mathematical methods that aim at identifying edges, defined as curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection. Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction.

<span class="mw-page-title-main">Dynamic time warping</span> An algorithm for measuring similarity between two temporal sequences, which may vary in speed

In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

<span class="mw-page-title-main">Takens's theorem</span> Conditions under which a chaotic system can be reconstructed by observation

In the study of dynamical systems, a delay embedding theorem gives the conditions under which a chaotic dynamical system can be reconstructed from a sequence of observations of the state of that system. The reconstruction preserves the properties of the dynamical system that do not change under smooth coordinate changes, but it does not preserve the geometric shape of structures in phase space.

<span class="mw-page-title-main">Moving average</span> Type of statistical measure over subsets of a dataset

In statistics, a moving average is a calculation to analyze data points by creating a series of averages of different selections of the full data set. Variations include: simple, cumulative, or weighted forms.

In signal processing, a nonlinearfilter is a filter whose output is not a linear function of its input. That is, if the filter outputs signals R and S for two input signals r and s separately, but does not always output αR + βS when the input is a linear combination αr + βs.

<span class="mw-page-title-main">Geometry processing</span>

Geometry processing is an area of research that uses concepts from applied mathematics, computer science and engineering to design efficient algorithms for the acquisition, reconstruction, analysis, manipulation, simulation and transmission of complex 3D models. As the name implies, many of the concepts, data structures, and algorithms are directly analogous to signal processing and image processing. For example, where image smoothing might convolve an intensity signal with a blur kernel formed using the Laplace operator, geometric smoothing might be achieved by convolving a surface geometry with a blur kernel formed using the Laplace-Beltrami operator.

<span class="mw-page-title-main">Median filter</span> Non-linear digital filtering technique to remove noise

The median filter is a non-linear digital filtering technique, often used to remove noise from an image or signal. Such noise reduction is a typical pre-processing step to improve the results of later processing. Median filtering is very widely used in digital image processing because, under certain conditions, it preserves edges while removing noise, also having applications in signal processing.

<span class="mw-page-title-main">Singular spectrum analysis</span> Nonparametric spectral estimation method

In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method. It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. Its roots lie in the classical Karhunen (1946)–Loève spectral decomposition of time series and random fields and in the Mañé (1981)–Takens (1981) embedding theorem. SSA can be an aid in the decomposition of time series into a sum of components, each having a meaningful interpretation. The name "singular spectrum analysis" relates to the spectrum of eigenvalues in a singular value decomposition of a covariance matrix, and not directly to a frequency domain decomposition.

<span class="mw-page-title-main">Total variation denoising</span> Noise removal process during image processing

In signal processing, particularly image processing, total variation denoising, also known as total variation regularization or total variation filtering, is a noise removal process (filter). It is based on the principle that signals with excessive and possibly spurious detail have high total variation, that is, the integral of the image gradient magnitude is high. According to this principle, reducing the total variation of the signal—subject to it being a close match to the original signal—removes unwanted detail whilst preserving important details such as edges. The concept was pioneered by L. I. Rudin, S. Osher, and E. Fatemi in 1992 and so is today known as the ROF model.

<span class="mw-page-title-main">Step detection</span>

In statistics and signal processing, step detection is the process of finding abrupt changes in the mean level of a time series or signal. It is usually considered as a special case of the statistical method known as change detection or change point detection. Often, the step is small and the time series is corrupted by some kind of noise, and this makes the problem challenging because the step may be hidden by the noise. Therefore, statistical and/or signal processing algorithms are often required.

<span class="mw-page-title-main">Diffusion map</span>

Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.

The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse problems, and it has been extended to solve non-linear problems that involve constraints. The method was first proposed in the 1950s by Louis Landweber, and it can be now viewed as a special case of many other more general methods.

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

In signal processing, multidimensional empirical mode decomposition is an extension of the one-dimensional (1-D) EMD algorithm to a signal encompassing multiple dimensions. The Hilbert–Huang empirical mode decomposition (EMD) process decomposes a signal into intrinsic mode functions combined with the Hilbert spectral analysis, known as the Hilbert–Huang transform (HHT). The multidimensional EMD extends the 1-D EMD algorithm into multiple-dimensional signals. This decomposition can be applied to image processing, audio signal processing, and various other multidimensional signals.

References

  1. Tukey, JW (1974). "Nonlinear (nonsuperposable) methods for smoothing data". Cong. Rec. EASCON: 673.
  2. Jankowitz (2007). Some statistical aspects of LULU smoothers (PhD Thesis). University of Stellenbosch.
  3. 1 2 Conradie, WJ and de Wet, T. and Jankowitz, M. (2006). "Exact and asymptotic distributions of LULU smoothers". Journal of Computational and Applied Mathematics. 186 (1): 253–267. Bibcode:2006JCoAM.186..253C. doi:10.1016/j.cam.2005.03.073.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. Rohwer, Carl (2005). Nonlinear smoothing and multiresolution analysis. Vol. 150. Birkhauser Basel.
  5. Fabris-Rotelli, Inger Nicolette (2009). LULU operators on multidimensional arrays and applications (MSc Thesis). University of Pretoria.
  6. Rohwer, CH (1989). "Idempotent one-sided approximation of median smoothers". Journal of Approximation Theory. 58 (2): 151–163. doi: 10.1016/0021-9045(89)90017-8 .
  7. Rohwer, CH (1999). "Projections and separators". Quaestiones Mathematicae. 22 (2): 219–230. doi:10.1080/16073606.1999.9632077.