Isoline retrieval

Last updated

Isoline retrieval is a remote sensing inverse method that retrieves one or more isolines of a trace atmospheric constituent or variable. When used to validate another contour, it is the most accurate method possible for the task. When used to retrieve a whole field, it is a general, nonlinear inverse method and a robust estimator.

Contents

For validating advected contours

Rationale

Suppose we have, as in contour advection, inferred knowledge of a single contour or isoline of an atmospheric constituent, q and we wish to validate this against satellite remote-sensing data. Since satellite instruments cannot measure the constituent directly, we need to perform some sort of inversion. In order to validate the contour, it is not necessary to know, at any given point, the exact value of the constituent. We only need to know whether it falls inside or outside, that is, is it greater than or less than the value of the contour, q0.

This is a classification problem. Let:

be the discretized variable. This will be related to the satellite measurement vector, , by some conditional probability, , which we approximate by collecting samples, called training data, of both the measurement vector and the state variable, q. By generating classification results over the region of interest and using any contouring algorithm to separate the two classes, the isoline will have been "retrieved."

The accuracy of a retrieval will be given by integrating the conditional probability over the area of interest, A:

where c is the retrieved class at position, . We can maximize this quantity by maximizing the value of the integrand at each point:

Since this is the definition of maximum likelihood, a classification algorithm based on maximum likelihood is the most accurate method possible of validating an advected contour. A good method for performing maximum likelihood classification from a set of training data is variable kernel density estimation.

Training data

There are two methods of generating the training data. The most obvious is empirically, by simply matching measurements of the variable, q, with collocated measurements from the satellite instrument. In this case, no knowledge of the actual physics that produce the measurement is required and the retrieval algorithm is purely statistical. The second is with a forward model:

where is the state vector and q = xk is a single component. An advantage of this method is that state vectors need not reflect actual atmospheric configurations, they need only take on a state that could reasonably occur in the real atmosphere. There are also none of the errors inherent in most collocation procedures, e.g. because of offset errors in the locations of the paired samples and differences in the footprint sizes of the two instruments. Since retrievals will be biased towards more common states, however, the statistics ought to reflect those in the real world.

Error characterization

The conditional probabilities, , provide excellent error characterization, therefore the classification algorithm ought to return them. We define the confidence rating by rescaling the conditional probability:

where nc is the number of classes (in this case, two). If C is zero, then the classification is little better than chance, while if it is one, then it should be perfect. To transform the confidence rating to a statistical tolerance, the following line integral can be applied to an isoline retrieval for which the true isoline is known:

where s is the path, l is the length of the isoline and is the retrieved confidence as a function of position. While it appears that the integral must be evaluated separately for each value of the confidence rating, C, in fact it may be done for all values of C by sorting the confidence ratings of the results, . The function relates the threshold value of the confidence rating for which the tolerance is applicable. That is, it defines a region that contains a fraction of the true isoline equal to the tolerance.

Example: water vapour from AMSU

Statistical tolerance versus confidence rating for water-vapour isoline retrieval. Tolerance from confidence.png
Statistical tolerance versus confidence rating for water-vapour isoline retrieval.

The Advanced Microwave Sounding Unit (AMSU) series of satellite instruments are designed to detect temperature and water vapour. They have a high horizontal resolution (as little as 15 km) and because they are mounted on more than one satellite, full global coverage can be obtained in less than one day. Training data was generated using the second method from European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-40 data fed to a fast radiative transfer model called RTTOV. The function, has been generated from simulated retrievals and is shown in the figure to the right. This is then used to set the 90 percent tolerance in the figure below by shading all the confidence ratings less than 0.8. Thus we expect the true isoline to fall within the shading 90 percent of the time.

Water vapour isoline retrieved from AMSU measurements and compared with ECMWF reanalysis. Ret colour.gif
Water vapour isoline retrieved from AMSU measurements and compared with ECMWF reanalysis.

For continuum retrievals

Specific humidity versus conditional probabilities from water-vapour isoline retrieval. Conditional probability proxy.png
Specific humidity versus conditional probabilities from water-vapour isoline retrieval.

Isoline retrieval is also useful for retrieving a continuum variable and constitutes a general, nonlinear inverse method. It has the advantage over both a neural network, as well as iterative methods such as optimal estimation that invert the forward model directly, in that there is no possibility of getting stuck in a local minimum.

There are a number of methods of reconstituting the continuum variable from the discretized one. Once a sufficient number of contours have been retrieved, it is straightforward to interpolate between them. Conditional probabilities make a good proxy for the continuum value.

Consider the transformation from a continuum to a discrete variable:

Suppose that is given by a Gaussian:

where is the expectation value and is the standard deviation, then the conditional probability is related to the continuum variable, q, by the error function:

The figure shows conditional probability versus specific humidity for the example retrieval discussed above.

As a robust estimator

The location of q0 is found by setting the conditional probabilities of the two classes to be equal:

In other words, equal amounts of the "zeroeth order moment" lie on either side of q0. This type of formulation is characteristic of a robust estimator.

Related Research Articles

<span class="mw-page-title-main">Expected value</span> Average value of a random variable

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation of a random variable expected about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

<span class="mw-page-title-main">Probability density function</span> Concept in mathematics

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

Bell's theorem is a term encompassing a number of closely related results in physics, all of which determine that quantum mechanics is incompatible with local hidden-variable theories, given some basic assumptions about the nature of measurement. "Local" here refers to the principle of locality, the idea that a particle can only be influenced by its immediate surroundings, and that interactions mediated by physical fields cannot propagate faster than the speed of light. "Hidden variables" are putative properties of quantum particles that are not included in quantum theory but nevertheless affect the outcome of experiments. In the words of physicist John Stewart Bell, for whom this family of results is named, "If [a hidden-variable theory] is local it will not agree with quantum mechanics, and if it agrees with quantum mechanics it will not be local."

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event.

<span class="mw-page-title-main">Path integral formulation</span> Formulation of quantum mechanics

The path integral formulation is a description in quantum mechanics that generalizes the stationary action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables.

<span class="mw-page-title-main">Propagator</span> Function in quantum field theory showing probability amplitudes of moving particles

In quantum mechanics and quantum field theory, the propagator is a function that specifies the probability amplitude for a particle to travel from one place to another in a given period of time, or to travel with a certain energy and momentum. In Feynman diagrams, which serve to calculate the rate of collisions in quantum field theory, virtual particles contribute their propagator to the rate of the scattering event described by the respective diagram. Propagators may also be viewed as the inverse of the wave operator appropriate to the particle, and are, therefore, often called (causal) Green's functions.

<span class="mw-page-title-main">Linear discriminant analysis</span> Method used in statistics, pattern recognition, and other fields

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

<span class="mw-page-title-main">Lattice Boltzmann methods</span> Class of computational fluid dynamics methods

The lattice Boltzmann methods (LBM), originated from the lattice gas automata (LGA) method (Hardy-Pomeau-Pazzis and Frisch-Hasslacher-Pomeau models), is a class of computational fluid dynamics (CFD) methods for fluid simulation. Instead of solving the Navier–Stokes equations directly, a fluid density on a lattice is simulated with streaming and collision (relaxation) processes. The method is versatile as the model fluid can straightforwardly be made to mimic common fluid behaviour like vapour/liquid coexistence, and so fluid systems such as liquid droplets can be simulated. Also, fluids in complex environments such as porous media can be straightforwardly simulated, whereas with complex boundaries other CFD methods can be hard to work with.

The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference and the Gini mean difference (GMD). The mean absolute difference is sometimes denoted by Δ or as MD.

Monte Carlo in statistical physics refers to the application of the Monte Carlo method to problems in statistical physics, or statistical mechanics.

<span class="mw-page-title-main">Probabilistic design</span> Discipline within engineering design

Probabilistic design is a discipline within engineering design. It deals primarily with the consideration and minimization of the effects of random variability upon the performance of an engineering system during the design phase. Typically, these effects studied and optimized are related to quality and reliability. It differs from the classical approach to design by assuming a small probability of failure instead of using the safety factor. Probabilistic design is used in a variety of different applications to assess the likelihood of failure. Disciplines which extensively use probabilistic design principles include product design, quality control, systems engineering, machine design, civil engineering and manufacturing.

<span class="mw-page-title-main">Quantile regression</span> Statistics concept

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

In machine learning, a ranking SVM is a variant of the support vector machine algorithm, which is used to solve certain ranking problems. The ranking SVM algorithm was published by Thorsten Joachims in 2002. The original purpose of the algorithm was to improve the performance of an internet search engine. However, it was found that ranking SVM also can be used to solve other problems such as Rank SIFT.

In applied statistics, optimal estimation is a regularized matrix inverse method based on Bayes' theorem. It is used very commonly in the geosciences, particularly for atmospheric sounding. A matrix inverse problem looks like this:

Information field theory (IFT) is a Bayesian statistical field theory relating to signal reconstruction, cosmography, and other related areas. IFT summarizes the information available on a physical field using Bayesian probabilities. It uses computational techniques developed for quantum field theory and statistical field theory to handle the infinite number of degrees of freedom of a field and to derive algorithms for the calculation of field expectation values. For example, the posterior expectation value of a field generated by a known Gaussian process and measured by a linear device with known Gaussian noise statistics is given by a generalized Wiener filter applied to the measured data. IFT extends such known filter formula to situations with nonlinear physics, nonlinear devices, non-Gaussian field or noise statistics, dependence of the noise statistics on the field values, and partly unknown parameters of measurement. For this it uses Feynman diagrams, renormalisation flow equations, and other methods from mathematical physics.

<span class="mw-page-title-main">Loss functions for classification</span> Concept in machine learning

In machine learning and mathematical optimization, loss functions for classification are computationally feasible loss functions representing the price paid for inaccuracy of predictions in classification problems. Given as the space of all possible inputs, and as the set of labels, a typical goal of classification algorithms is to find a function which best predicts a label for a given input . However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same to generate different . As a result, the goal of the learning problem is to minimize expected loss, defined as

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

References