Isoline retrieval

Last updated

Isoline retrieval is a remote sensing inverse method that retrieves one or more isolines of a trace atmospheric constituent or variable. When used to validate another contour, it is the most accurate method possible for the task. When used to retrieve a whole field, it is a general, nonlinear inverse method and a robust estimator.

Contents

For validating advected contours

Rationale

Suppose we have, as in contour advection, inferred knowledge of a single contour or isoline of an atmospheric constituent, q and we wish to validate this against satellite remote-sensing data. Since satellite instruments cannot measure the constituent directly, we need to perform some sort of inversion. In order to validate the contour, it is not necessary to know, at any given point, the exact value of the constituent. We only need to know whether it falls inside or outside, that is, is it greater than or less than the value of the contour, q0.

This is a classification problem. Let:

be the discretized variable. This will be related to the satellite measurement vector, , by some conditional probability, , which we approximate by collecting samples, called training data, of both the measurement vector and the state variable, q. By generating classification results over the region of interest and using any contouring algorithm to separate the two classes, the isoline will have been "retrieved."

The accuracy of a retrieval will be given by integrating the conditional probability over the area of interest, A:

where c is the retrieved class at position, . We can maximize this quantity by maximizing the value of the integrand at each point:

Since this is the definition of maximum likelihood, a classification algorithm based on maximum likelihood is the most accurate method possible of validating an advected contour. A good method for performing maximum likelihood classification from a set of training data is variable kernel density estimation.

Training data

There are two methods of generating the training data. The most obvious is empirically, by simply matching measurements of the variable, q, with collocated measurements from the satellite instrument. In this case, no knowledge of the actual physics that produce the measurement is required and the retrieval algorithm is purely statistical. The second is with a forward model:

where is the state vector and q = xk is a single component. An advantage of this method is that state vectors need not reflect actual atmospheric configurations, they need only take on a state that could reasonably occur in the real atmosphere. There are also none of the errors inherent in most collocation procedures, e.g. because of offset errors in the locations of the paired samples and differences in the footprint sizes of the two instruments. Since retrievals will be biased towards more common states, however, the statistics ought to reflect those in the real world.

Error characterization

The conditional probabilities, , provide excellent error characterization, therefore the classification algorithm ought to return them. We define the confidence rating by rescaling the conditional probability:

where nc is the number of classes (in this case, two). If C is zero, then the classification is little better than chance, while if it is one, then it should be perfect. To transform the confidence rating to a statistical tolerance, the following line integral can be applied to an isoline retrieval for which the true isoline is known:

where s is the path, l is the length of the isoline and is the retrieved confidence as a function of position. While it appears that the integral must be evaluated separately for each value of the confidence rating, C, in fact it may be done for all values of C by sorting the confidence ratings of the results, . The function relates the threshold value of the confidence rating for which the tolerance is applicable. That is, it defines a region that contains a fraction of the true isoline equal to the tolerance.

Example: water vapour from AMSU

Statistical tolerance versus confidence rating for water-vapour isoline retrieval. Tolerance from confidence.png
Statistical tolerance versus confidence rating for water-vapour isoline retrieval.

The Advanced Microwave Sounding Unit (AMSU) series of satellite instruments are designed to detect temperature and water vapour. They have a high horizontal resolution (as little as 15 km) and because they are mounted on more than one satellite, full global coverage can be obtained in less than one day. Training data was generated using the second method from European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-40 data fed to a fast radiative transfer model called RTTOV. The function, has been generated from simulated retrievals and is shown in the figure to the right. This is then used to set the 90 percent tolerance in the figure below by shading all the confidence ratings less than 0.8. Thus we expect the true isoline to fall within the shading 90 percent of the time.

Water vapour isoline retrieved from AMSU measurements and compared with ECMWF reanalysis. Ret colour.gif
Water vapour isoline retrieved from AMSU measurements and compared with ECMWF reanalysis.

For continuum retrievals

Specific humidity versus conditional probabilities from water-vapour isoline retrieval. Conditional probability proxy.png
Specific humidity versus conditional probabilities from water-vapour isoline retrieval.

Isoline retrieval is also useful for retrieving a continuum variable and constitutes a general, nonlinear [ disambiguation needed ] inverse method. It has the advantage over both a neural network, as well as iterative methods such as optimal estimation that invert the forward model directly, in that there is no possibility of getting stuck in a local minimum.

There are a number of methods of reconstituting the continuum variable from the discretized one. Once a sufficient number of contours have been retrieved, it is straightforward to interpolate between them. Conditional probabilities make a good proxy for the continuum value.

Consider the transformation from a continuum to a discrete variable:

Suppose that is given by a Gaussian:

where is the expectation value and is the standard deviation, then the conditional probability is related to the continuum variable, q, by the error function:

The figure shows conditional probability versus specific humidity for the example retrieval discussed above.

As a robust estimator

The location of q0 is found by setting the conditional probabilities of the two classes to be equal:

In other words, equal amounts of the "zeroeth order moment" lie on either side of q0. This type of formulation is characteristic of a robust estimator.

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Expected value</span> Average value of a random variable

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation of a random variable expected about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

<span class="mw-page-title-main">Probability density function</span> Function whose integral over a region describes the probability of an event occurring in that region

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event.

<span class="mw-page-title-main">Path integral formulation</span> Formulation of quantum mechanics

The path integral formulation is a description in quantum mechanics that generalizes the stationary action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables.

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference and the Gini mean difference (GMD). The mean absolute difference is sometimes denoted by Δ or as MD.

Monte Carlo in statistical physics refers to the application of the Monte Carlo method to problems in statistical physics, or statistical mechanics.

<span class="mw-page-title-main">Probabilistic design</span> Discipline within engineering design

Probabilistic design is a discipline within engineering design. It deals primarily with the consideration and minimization of the effects of random variability upon the performance of an engineering system during the design phase. Typically, these effects studied and optimized are related to quality and reliability. It differs from the classical approach to design by assuming a small probability of failure instead of using the safety factor. Probabilistic design is used in a variety of different applications to assess the likelihood of failure. Disciplines which extensively use probabilistic design principles include product design, quality control, systems engineering, machine design, civil engineering and manufacturing.

<span class="mw-page-title-main">Quantile regression</span> Statistics concept

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

<span class="mw-page-title-main">Conditional probability</span> Probability of an event occurring, given that another event has already occurred

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is A and the event B is known or assumed to have occurred, "the conditional probability of A given B", or "the probability of A under the condition B", is usually written as P(A|B) or occasionally PB(A). This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): .

The Rocchio algorithm is based on a method of relevance feedback found in information retrieval systems which stemmed from the SMART Information Retrieval System developed between 1960 and 1964. Like many other retrieval systems, the Rocchio algorithm was developed using the vector space model. Its underlying assumption is that most users have a general conception of which documents should be denoted as relevant or irrelevant. Therefore, the user's search query is revised to include an arbitrary percentage of relevant and irrelevant documents as a means of increasing the search engine's recall, and possibly the precision as well. The number of relevant and irrelevant documents allowed to enter a query is dictated by the weights of the a, b, c variables listed below in the Algorithm section.

In machine learning, a ranking SVM is a variant of the support vector machine algorithm, which is used to solve certain ranking problems. The ranking SVM algorithm was published by Thorsten Joachims in 2002. The original purpose of the algorithm was to improve the performance of an internet search engine. However, it was found that ranking SVM also can be used to solve other problems such as Rank SIFT.

In applied statistics, optimal estimation is a regularized matrix inverse method based on Bayes' theorem. It is used very commonly in the geosciences, particularly for atmospheric sounding. A matrix inverse problem looks like this:

Information field theory (IFT) is a Bayesian statistical field theory relating to signal reconstruction, cosmography, and other related areas. IFT summarizes the information available on a physical field using Bayesian probabilities. It uses computational techniques developed for quantum field theory and statistical field theory to handle the infinite number of degrees of freedom of a field and to derive algorithms for the calculation of field expectation values. For example, the posterior expectation value of a field generated by a known Gaussian process and measured by a linear device with known Gaussian noise statistics is given by a generalized Wiener filter applied to the measured data. IFT extends such known filter formula to situations with nonlinear physics, nonlinear devices, non-Gaussian field or noise statistics, dependence of the noise statistics on the field values, and partly unknown parameters of measurement. For this it uses Feynman diagrams, renormalisation flow equations, and other methods from mathematical physics.

<span class="mw-page-title-main">Loss functions for classification</span> Concept in machine learning

In machine learning and mathematical optimization, loss functions for classification are computationally feasible loss functions representing the price paid for inaccuracy of predictions in classification problems. Given as the space of all possible inputs, and as the set of labels, a typical goal of classification algorithms is to find a function which best predicts a label for a given input . However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same to generate different . As a result, the goal of the learning problem is to minimize expected loss, defined as

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

References