Maximum likelihood sequence estimation

Last updated

Maximum likelihood sequence estimation (MLSE) is a mathematical algorithm to extract useful data out of a noisy data stream.

Contents

Theory

For an optimized detector for digital signals the priority is not to reconstruct the transmitter signal, but it should do a best estimation of the transmitted data with the least possible number of errors. The receiver emulates the distorted channel. All possible transmitted data streams are fed into this distorted channel model. The receiver compares the time response with the actual received signal and determines the most likely signal. In cases that are most computationally straightforward, root mean square deviation can be used as the decision criterion [1] for the lowest error probability.

Background

Suppose that there is an underlying signal {x(t)}, of which an observed signal {r(t)} is available. The observed signal r is related to x via a transformation that may be nonlinear and may involve attenuation, and would usually involve the incorporation of random noise. The statistical parameters of this transformation are assumed known. The problem to be solved is to use the observations {r(t)} to create a good estimate of {x(t)}.

Maximum likelihood sequence estimation is formally the application of maximum likelihood to this problem. That is, the estimate of {x(t)} is defined to be sequence of values which maximize the functional

where p(r | x) denotes the conditional joint probability density function of the observed series {r(t)} given that the underlying series has the values {x(t)}.

In contrast, the related method of maximum a posteriori estimation is formally the application of the maximum a posteriori (MAP) estimation approach. This is more complex than maximum likelihood sequence estimation and requires a known distribution (in Bayesian terms, a prior distribution) for the underlying signal. In this case the estimate of {x(t)} is defined to be sequence of values which maximize the functional

where p(x | r) denotes the conditional joint probability density function of the underlying series {x(t)} given that the observed series has taken the values {r(t)}. Bayes' theorem implies that

In cases where the contribution of random noise is additive and has a multivariate normal distribution, the problem of maximum likelihood sequence estimation can be reduced to that of a least squares minimization.

See also

Related Research Articles

The likelihood function is the joint probability of the observed data viewed as a function of the parameters of the chosen statistical model.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it — with unobservable ("hidden") states. As part of the definition, HMM requires that there be an observable process whose outcomes are "influenced" by the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about by observing HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must not affect the outcome of at

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood, through an application of Bayes' theorem. From an epistemological perspective, the posterior probability contains everything there is to know about an uncertain proposition, given prior knowledge and a mathematical model describing the observations available at a particular time. After the arrival of new information, the current posterior probability may serve as the prior in another round of Bayesian updating.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

<span class="mw-page-title-main">Independent component analysis</span>

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the "cocktail party problem" of listening in on one person's speech in a noisy room.

In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the forward-backward algorithm to compute the statistics for the expectation step.

Amplitude-shift keying (ASK) is a form of amplitude modulation that represents digital data as variations in the amplitude of a carrier wave. In an ASK system, a symbol, representing one or more bits, is sent by transmitting a fixed-amplitude carrier wave at a fixed frequency for a specific time duration. For example, if each symbol represents a single bit, then the carrier signal could be transmitted at nominal amplitude when the input value is 1, but transmitted at reduced amplitude or not at all when the input value is 0.

In signal processing, a matched filter is obtained by correlating a known delayed signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to convolving the unknown signal with a conjugated time-reversed version of the template. The matched filter is the optimal linear filter for maximizing the signal-to-noise ratio (SNR) in the presence of additive stochastic noise.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

Estimation of a Rasch model is used to estimate the parameters of the Rasch model. Various techniques are employed to estimate the parameters from matrices of response data. The most common approaches are types of maximum likelihood estimation, such as joint and conditional maximum likelihood estimation. Joint maximum likelihood (JML) equations are efficient, but inconsistent for a finite number of items, whereas conditional maximum likelihood (CML) equations give consistent and unbiased item estimates. Person estimates are generally thought to have bias associated with them, although weighted likelihood estimation methods for the estimation of person parameters reduce the bias.

Isoline retrieval is a remote sensing inverse method that retrieves one or more isolines of a trace atmospheric constituent or variable. When used to validate another contour, it is the most accurate method possible for the task. When used to retrieve a whole field, it is a general, nonlinear inverse method and a robust estimator.

In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as i.i.d., iid, or IID. IID was first defined in statistics and finds application in different fields such as data mining and signal processing.

Noise-Predictive Maximum-Likelihood (NPML) is a class of digital signal-processing methods suitable for magnetic data storage systems that operate at high linear recording densities. It is used for retrieval of data recorded on magnetic media.

Dynamic discrete choice (DDC) models, also known as discrete choice models of dynamic programming, model an agent's choices over discrete options that have future implications. Rather than assuming observed choices are the result of static utility maximization, observed choices in DDC models are assumed to result from an agent's maximization of the present value of utility, generalizing the utility theory upon which discrete choice models are based.

References

  1. G. Bosco, P. Poggiolini, and M. Visintin, "Performance Analysis of MLSE Receivers Based on the Square-Root Metric," J. Lightwave Technol. 26, 20982109 (2008)

Further reading