Kernel adaptive filter

Last updated

In signal processing, a kernel adaptive filter is a type of nonlinear adaptive filter. [1] An adaptive filter is a filter that adapts its transfer function to changes in signal properties over time by minimizing an error or loss function that characterizes how far the filter deviates from ideal behavior. The adaptation process is based on learning from a sequence of signal samples and is thus an online algorithm. A nonlinear adaptive filter is one in which the transfer function is nonlinear.

Kernel adaptive filters implement a nonlinear transfer function using kernel methods. [1] In these methods, the signal is mapped to a high-dimensional linear feature space and a nonlinear function is approximated as a sum over kernels, whose domain is the feature space. If this is done in a reproducing kernel Hilbert space, a kernel method can be a universal approximator for a nonlinear function. Kernel methods have the advantage of having convex loss functions, with no local minima, and of being only moderately complex to implement.

Because high-dimensional feature space is linear, kernel adaptive filters can be thought of as a generalization of linear adaptive filters. As with linear adaptive filters, there are two general approaches to adapting a filter: the least mean squares filter (LMS) [2] and the recursive least squares filter (RLS). [3]

Self organising kernel adaptive filters that use iteration to achieve convex LMS error minimisation address some of the statistical and practical issues of non-linear models that do not arise in the linear case [4] . Regularisation is particularly important feature for non-linear models and also often used in linear adaptive filters to reduce statistical uncertainties. However because nonlinear filters typically have a much higher potential structural complexity (or higher dimensional feature space) compared to the subspace actually required, regularisation of some kind must deal with the under-determined model. Though some specific forms of parameter regularisation such as prescribed by Vapink's SRM & SVM address the dimensionality problem statistically to some extent, there remain further statistical and practical issues for truly adaptive non-linear filters. Adaptive filters are often used for tracking the behaviour of a time-varying system or systems which cannot be fully modelled from the data and structure available, hence the models may not only need to adapt parameters, but structure too.

Where structural parameters of kernels are derived directly from data being processed (as in the above "Support Vector" approach) there are convenient opportunities for analytically robust methods of self organisation of the kernels available to the filter. The linearised feature space induced by kernels allows linear projection of new samples on to the current structure of the model where novelty in new data can be easily differentiated from noise-born errors which should not result in a change to model structure. Analytical metrics for structure analysis can be used to parsimoniously grow model complexity when required or optimally prune the existing structure when processor resource limits are reached. Structure updates are also relevant when system variation is detected and the long-term memory of the model should be updated as for the Kalman Filter case in linear filters.

Iterative gradient descent that is typically used in adaptive filters has also gained popularity in offline batch-mode support vector based machine learning because of its computational efficiency for large data set processing. Both time series and batch data processing performance is reported [5] to be able to easily handle over 100,000 training examples using as little as 10kB RAM. Data sizes this large are challenging to the original formulations of support vector machines and other kernel methods, which for example relied on constrained optimisation using linear or quadratic programming techniques.

Related Research Articles

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

Signal processing Academic subfield of electrical engineering

Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as sound, images, and scientific measurements. Signal processing techniques can be used to improve transmission, storage efficiency and subjective quality and to also emphasize or detect components of interest in a measured signal.

In machine learning, support-vector machines are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vapnik with colleagues, SVMs are one of the most robust prediction methods, being based on statistical learning frameworks or VC theory proposed by Vapnik and Chervonenkis (1974) and Vapnik. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. An SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

An adaptive filter is a system with a linear filter that has a transfer function controlled by variable parameters and a means to adjust those parameters according to an optimization algorithm. Because of the complexity of the optimization algorithms, almost all adaptive filters are digital filters. Adaptive filters are required for some applications because some parameters of the desired processing operation are not known in advance or are changing. The closed loop adaptive filter uses feedback in the form of an error signal to refine its transfer function.

Nonlinear dimensionality reduction Summary of algorithms for nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.

Time series Sequence of data points over time

A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

Recursive least squares (RLS) is an adaptive filter algorithm that recursively finds the coefficients that minimize a weighted linear least squares cost function relating to the input signals. This approach is in contrast to other algorithms such as the least mean squares (LMS) that aim to reduce the mean square error. In the derivation of the RLS, the input signals are considered deterministic, while for the LMS and similar algorithm they are considered stochastic. Compared to most of its competitors, the RLS exhibits extremely fast convergence. However, this benefit comes at the cost of high computational complexity.

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation.

In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the data points of a signal are modified so individual points higher than the adjacent points are reduced, and points that are lower than the adjacent points are increased leading to a smoother signal. Smoothing may be used in two important ways that can aid in data analysis (1) by being able to extract more information from the data as long as the assumption of smoothing is reasonable and (2) by being able to provide analyses that are both flexible and robust. Many different algorithms are used in smoothing.

Cerebellar model articulation controller

The cerebellar model arithmetic computer (CMAC) is a type of neural network based on a model of the mammalian cerebellum. It is also known as the cerebellar model articulation controller. It is a type of associative memory.

Linear-nonlinear-Poisson cascade model model of neural responses

The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

A two-dimensional (2D) adaptive filter is very much like a one-dimensional adaptive filter in that it is a linear system whose parameters are adaptively updated throughout the process, according to some optimization approach. The main difference between 1D and 2D adaptive filters is that the former usually take as inputs signals with respect to time, what implies in causality constraints, while the latter handles signals with 2 dimensions, like x-y coordinates in the space domain, which are usually non-causal. Moreover, just like 1D filters, most 2D adaptive filters are digital filters, because of the complex and iterative nature of the algorithms.

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

Chris J. Harris British engineer and academic (born 1945)

Chris Harris FREng is a control & signal process engineer, and Emeritus Professor of Computational Intelligence at the University of Southampton, UK.

References

  1. 1 2 Weifeng Liu; José C. Principe; Simon Haykin (March 2010). Kernel Adaptive Filtering: A Comprehensive Introduction (PDF). Wiley. pp. 12–20. ISBN   978-0-470-44753-6.
  2. Liu, Weifeng; Pokharel, P.P.; Principe, J.C. (2008-02-01). "The Kernel Least-Mean-Square Algorithm". IEEE Transactions on Signal Processing. 56 (2): 543–554. doi:10.1109/TSP.2007.907881. ISSN   1053-587X. S2CID   206797001.
  3. Engel, Y.; Mannor, S.; Meir, R. (2004-08-01). "The kernel recursive least-squares algorithm". IEEE Transactions on Signal Processing. 52 (8): 2275–2285. doi:10.1109/TSP.2004.830985. ISSN   1053-587X. S2CID   10220028.
  4. Pierre Drezet (2001). Kernel Methods and their Application to Systems Identification and Signal Processing (Thesis).
  5. Pierre Drezet; Robert F Harrison. "An Online Support Vector Learning Method". Sheffield University.