Elastic map

Last updated
Linear PCA versus nonlinear Principal Manifolds for visualization of breast cancer microarray data: a) Configuration of nodes and 2D Principal Surface in the 3D PCA linear manifold. The dataset is curved and can not be mapped adequately on a 2D principal plane; b) The distribution in the internal 2D non-linear principal surface coordinates (ELMap2D) together with an estimation of the density of points; c) The same as b), but for the linear 2D PCA manifold (PCA2D). The "basal" breast cancer subtype is visualized more adequately with ELMap2D and some features of the distribution become better resolved in comparison to PCA2D. Principal manifolds are produced by the elastic maps algorithm. Data are available for public competition. Software is available for free non-commercial use. Elmap breastcancer wiki.png
Linear PCA versus nonlinear Principal Manifolds for visualization of breast cancer microarray data: a) Configuration of nodes and 2D Principal Surface in the 3D PCA linear manifold. The dataset is curved and can not be mapped adequately on a 2D principal plane; b) The distribution in the internal 2D non-linear principal surface coordinates (ELMap2D) together with an estimation of the density of points; c) The same as b), but for the linear 2D PCA manifold (PCA2D). The “basal” breast cancer subtype is visualized more adequately with ELMap2D and some features of the distribution become better resolved in comparison to PCA2D. Principal manifolds are produced by the elastic maps algorithm. Data are available for public competition. Software is available for free non-commercial use.

Elastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are a system of elastic springs embedded in the data space. [1] This system approximates a low-dimensional manifold. The elastic coefficients of this system allow the switch from completely unstructured k-means clustering (zero elasticity) to the estimators located closely to linear PCA manifolds (for high bending and low stretching modules). With some intermediate values of the elasticity coefficients, this system effectively approximates non-linear principal manifolds. This approach is based on a mechanical analogy between principal manifolds, that are passing through "the middle" of the data distribution, and elastic membranes and plates. The method was developed by A.N. Gorban, A.Y. Zinovyev and A.A. Pitenko in 1996–1998.

Contents

Energy of elastic map

Let be a data set in a finite-dimensional Euclidean space. Elastic map is represented by a set of nodes in the same space. Each datapoint has a host node, namely the closest node (if there are several closest nodes then one takes the node with the smallest number). The data set is divided into classes .

The approximation energy D is the distortion

,

which is the energy of the springs with unit elasticity which connect each data point with its host node. It is possible to apply weighting factors to the terms of this sum, for example to reflect the standard deviation of the probability density function of any subset of data points .

On the set of nodes an additional structure is defined. Some pairs of nodes, , are connected by elastic edges. Call this set of pairs . Some triplets of nodes, , form bending ribs. Call this set of triplets .

The stretching energy is ,
The bending energy is ,

where and are the stretching and bending moduli respectively. The stretching energy is sometimes referred to as the membrane, while the bending energy is referred to as the thin plate term. [5]

For example, on the 2D rectangular grid the elastic edges are just vertical and horizontal edges (pairs of closest vertices) and the bending ribs are the vertical or horizontal triplets of consecutive (closest) vertices.

The total energy of the elastic map is thus

The position of the nodes is determined by the mechanical equilibrium of the elastic map, i.e. its location is such that it minimizes the total energy .

Expectation-maximization algorithm

For a given splitting of dataset in classes , minimization of the quadratic functional is a linear problem with the sparse matrix of coefficients. Therefore, similar to principal component analysis or k-means, a splitting method is used:

This expectation-maximization algorithm guarantees a local minimum of . For improving the approximation various additional methods are proposed. For example, the softening strategy is used. This strategy starts with a rigid grids (small length, small bending and large elasticity modules and coefficients) and finishes with soft grids (small and ). The training goes in several epochs, each epoch with its own grid rigidness. Another adaptive strategy is growing net: one starts from a small number of nodes and gradually adds new nodes. Each epoch goes with its own number of nodes.

Applications

Application of principal curves build by the elastic maps method: Nonlinear quality of life index. Points represent data of the UN 171 countries in 4-dimensional space formed by the values of 4 indicators: gross product per capita, life expectancy, infant mortality, tuberculosis incidence. Different forms and colors correspond to various geographical locations and years. Red bold line represents the principal curve, approximating the dataset. SlideQualityLife.png
Application of principal curves build by the elastic maps method: Nonlinear quality of life index. Points represent data of the UN 171 countries in 4-dimensional space formed by the values of 4 indicators: gross product per capita, life expectancy, infant mortality, tuberculosis incidence. Different forms and colors correspond to various geographical locations and years. Red bold line represents the principal curve, approximating the dataset.

Most important applications of the method and free software [3] are in bioinformatics [7] [8] for exploratory data analysis and visualisation of multidimensional data, for data visualisation in economics, social and political sciences, [9] as an auxiliary tool for data mapping in geographic informational systems and for visualisation of data of various nature.

The method is applied in quantitative biology for reconstructing the curved surface of a tree leaf from a stack of light microscopy images. [10] This reconstruction is used for quantifying the geodesic distances between trichomes and their patterning, which is a marker of the capability of a plant to resist to pathogenes.

Recently, the method is adapted as a support tool in the decision process underlying the selection, optimization, and management of financial portfolios. [11]

The method of elastic maps has been systematically tested and compared with several machine learning methods on the applied problem of identification of the flow regime of a gas-liquid flow in a pipe. [12] There are various regimes: Single phase water or air flow, Bubbly flow, Bubbly-slug flow, Slug flow, Slug-churn flow, Churn flow, Churn-annular flow, and Annular flow. The simplest and most common method used to identify the flow regime is visual observation. This approach is, however, subjective and unsuitable for relatively high gas and liquid flow rates. Therefore, the machine learning methods are proposed by many authors. The methods are applied to differential pressure data collected during a calibration process. The method of elastic maps provided a 2D map, where the area of each regime is represented. The comparison with some other machine learning methods is presented in Table 1 for various pipe diameters and pressure.

TABLE 1. Flow regime identification accuracy (%)
of different machine learning algorithms
CalibrationTestingLarger diameterHigher pressure
Elastic map10098.2100100
ANN99.189.276.270.5
SVM10088.561.770.5
SOM (small)94.994.283.688.6
SOM (large)10094.682.184.1

Here, ANN stands for the backpropagation artificial neural networks, SVM stands for the support vector machine, SOM for the self-organizing maps. The hybrid technology was developed for engineering applications. [13] In this technology, elastic maps are used in combination with Principal Component Analysis (PCA), Independent Component Analysis (ICA) and backpropagation ANN.

The textbook [14] provides a systematic comparison of elastic maps and self-organizing maps (SOMs) in applications to economic and financial decision-making.

Related Research Articles

Principal component analysis Conversion of a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components

Given a collection of points in two, three, or higher dimensional space, a "best fitting" line can be defined as one that minimizes the average squared perpendicular distance from a point to the line. The next best-fitting line can be similarly chosen from directions perpendicular to the first. Repeating this process yields an orthogonal basis in which different individual dimensions of the data are uncorrelated. These basis vectors are called Principal Components, and several related procedures Principal Component Analysis (PCA).

Self-organizing map type of artificial neural network

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional, discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing maps differ from other artificial neural networks as they apply competitive learning as opposed to error-correction learning, and in the sense that they use a neighborhood function to preserve the topological properties of the input space.

Soft tissue the tissues that connect, support, or surround other structures and organs of the body, not being hard tissue; tendons, ligaments, fascia, skin, fibrous tissues, fat, and synovial membranes (connective tissue), and muscles, nerves and blood vessels

Soft tissue is all the tissue in the body that is not hardened by the processes of ossification or calcification such as bones and teeth. Soft tissue connects, surrounds or supports internal organs and bones, and includes muscle, tendons, ligaments, fat, fibrous tissue, skin, lymph and blood vessels, fasciae, and synovial membranes. 

Nonlinear dimensionality reduction Summary of algorithms for nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.

In mathematics, the GrassmannianGr(k, V) is a space that parameterizes all k-dimensional linear subspaces of the n-dimensional vector space V. For example, the Grassmannian Gr(1, V) is the space of lines through the origin in V, so it is the same as the projective space of one dimension lower than V.

Buckling the sudden change in shape of a structural component under load

In engineering, buckling is the sudden change in shape of a structural component under load such as the bowing of a column under compression or the wrinkling of a plate under shear. If a structure is subjected to a gradually increasing load, when the load reaches a critical level, a member may suddenly change shape and the structure and component is said to have buckled.

Fracture mechanics Field of mechanics concerned with the study of the propagation of cracks in materials

Fracture mechanics is the field of mechanics concerned with the study of the propagation of cracks in materials. It uses methods of analytical solid mechanics to calculate the driving force on a crack and those of experimental solid mechanics to characterize the material's resistance to fracture.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In theoretical physics, there are many theories with supersymmetry (SUSY) which also have internal gauge symmetries. Supersymmetric gauge theory generalizes this notion.

Semi-supervised learning class of machine learning techniques

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning and supervised learning.

Elastic energy is the mechanical potential energy stored in the configuration of a material or physical system as it is subjected to elastic deformation by work performed upon it. Elastic energy occurs when objects are impermanently compressed, stretched or generally deformed in any manner. Elasticity theory primarily develops formalisms for the mechanics of solid bodies and materials. The elastic potential energy equation is used in calculations of positions of mechanical equilibrium. The energy is potential as it will be converted into other forms of energy, such as kinetic energy and sound energy, when the object is allowed to return to its original shape (reformation) by its elasticity.

Hyperelastic material material for which the stress–strain relationship derives from a strain energy density function

A hyperelastic or Green elastic material is a type of constitutive model for ideally elastic material for which the stress–strain relationship derives from a strain energy density function. The hyperelastic material is a special case of a Cauchy elastic material.

Rubber elasticity refers to the remarkable property of crosslinked rubber: it can be stretched by up to a factor of 10 from its original length and, when released, returns very nearly to its original length. This can be repeated many times with no apparent degradation to the rubber. Rubber is a member of a larger class of materials call elastomers and it is difficult to overestimate their economic and technological importance. Elastomers have played a key role in the development of new technologies in the 20th century and make a substantial contribution to the global economy. Water-tight seals, latex gloves and tires are examples of elastomer-enabled materials that are now crucial to modern society. Rubber elasticity is a complex process. It is fundamentally different from the restoring force in a spring, which relies on the mechanical stiffness of the material. In rubber the elastic force is due to the thermal interactions of the molecules within the material. There are several molecular mechanisms that work together to produce the elastic force. Understanding the physics of elasticity requires a knowledge of advanced mathematics, chemistry and statistical physics, particularly the concept of entropy. Entropy may be thought of as a measure of the thermal energy that is stored in a molecule. Common rubbers, such as polybutadiene and polyisoprene, are produced by a process called polymerization. Very long molecules (polymers) are built up sequentially by adding short molecular backbone units through chemical reactions. The backbone unit for rubber is a short chain composed of four carbon atoms. The chemical bonds that connect the carbon atoms to one another do not form a straight line and usually four sequential carbon atoms do not even lie in a plane. The result is that a rubber polymer follows a random, zigzag path in three dimensions, intermingling with many other rubber molecules. The polymer chain may be regarded as a series of local kinks, each composed of a few backbone units. To produce an elastomer from the long rubber molecules, a molecular network must be created. This is accomplished by the addition of a few percent of a cross linking molecule such as sulfur. When heated, the crosslinking molecule causes a reaction that chemically joins (bonds) two of the rubber molecules together at some point. Because the rubber molecules are so long, each one participates in many crosslinks with many other rubber molecules. The polymer segments between two crosslinks are called network chains. The crosslinks transform all of the initially separate polymers into a single enormous molecule. A rubber band is a single molecule as is a latex glove! When a rubber band is stretched, some of the network chains are forced to become straight. Individual 4-carbon backbone units are forced into rotational conformations that have longer end-to-end distances and this restricts the number of conformations (states) that are thermally accessible. Fewer available conformational states means that a stretched backbone unit can store less thermal energy. Hence its entropy must decrease. It is this decrease in entropy that gives rise to the elastic force in the network chains.

<i>Iris</i> flower data set

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician, eugenicist, and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspé Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus". Fisher's paper was published in the journal, the Annals of Eugenics, creating controversy about the continued use of the Iris dataset for teaching statistical techniques today.

The energy release rate, , is the rate at which energy is transformed as a material undergoes fracture and has units of energy-per-unit-area. Mathematically, the energy release rate is expressed as the decrease in total potential energy per increase in fracture surface area. Various energy balances can be constructed relating the energy released during fracture to the energy of the resulting new surface, as well as other dissipative processes such as plasticity and heat generation. The energy release rate is central to the field of fracture mechanics when solving problems and estimating material properties related to fracture and fatigue.

The acoustoelastic effect is how the sound velocities of an elastic material change if subjected to an initial static stress field. This is a non-linear effect of the constitutive relation between mechanical stress and finite strain in a material of continuous mass. In classical linear elasticity theory small deformations of most elastic materials can be described by a linear relation between the applied stress and the resulting strain. This relationship is commonly known as the generalised Hooke's law. The linear elastic theory involves second order elastic constants and yields constant longitudinal and shear sound velocities in an elastic material, not affected by an applied stress. The acoustoelastic effect on the other hand include higher order expansion of the constitutive relation between the applied stress and resulting strain, which yields longitudinal and shear sound velocities dependent of the stress state of the material. In the limit of an unstressed material the sound velocities of the linear elastic theory are reproduced.

Resonant ultrasound spectroscopy (RUS) is a laboratory technique used in geology and material science to measure fundamental material properties involving elasticity. This technique relies on the fact that solid objects have natural frequencies at which they vibrate when mechanically excited. The natural frequency depends on the elasticity, size, and shape of the object—RUS exploits this property of solids to determine the elastic tensor of the material. The great advantage of this technique is that the entire elastic tensor is obtained from a single crystal sample in a single rapid measurement. At lower or more general frequencies, this method is known as acoustic resonance spectroscopy.

Aleksandr Gorban Russian mathematician

Alexander Nikolaevich Gorban is a scientist of Soviet origin, working in the United Kingdom. He is a professor at the University of Leicester, and director of its Mathematical Modeling Centre. Gorban has contributed to many areas of fundamental and applied science, including statistical physics, non-equilibrium thermodynamics, machine learning and mathematical biology.

Computational anatomy is an interdisciplinary field of biology focused on quantitative investigation and modelling of anatomical shapes variability. It involves the development and application of mathematical, statistical and data-analytical methods for modelling and simulation of biological structures.

In solid mechanics, the linear stability analysis of an elastic solution is studied using the method of incremental deformations superposed on finite deformations. The method of incremental deformation can be used to solve static, quasi-static and time-dependent problems. The governing equations of the motion are ones of the classical mechanics, such as the conservation of mass and the balance of linear and angular momentum, which provide the equilibrium configuration of the material. The main corresponding mathematical framework is described in the main Raymond Ogden's book Non-linear elastic deformations and in Biot's book Mechanics of incremental deformations, which is a collection of his main papers.

References

  1. 1 2 A. N. Gorban, A. Y. Zinovyev, Principal Graphs and Manifolds, In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods and Techniques, Olivas E.S. et al. Eds. Information Science Reference, IGI Global: Hershey, PA, USA, 2009. 28–59.
  2. Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts, A.M., Look, M.P., Yang, F., Talantov, D., Timmermans, M., Meijer-van Gelder, M.E., Yu, J. et al.: Gene expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679 (2005); Data online
  3. 1 2 A. Zinovyev, ViDaExpert - Multidimensional Data Visualization Tool (free for non-commercial use). Institut Curie, Paris.
  4. A. Zinovyev, ViDaExpert overview, IHES (Institut des Hautes Études Scientifiques), Bures-Sur-Yvette, Île-de-France.
  5. Michael Kass, Andrew Witkin, Demetri Terzopoulos, Snakes: Active contour models, Int.J. Computer Vision, 1988 vol 1-4 pp.321-331
  6. A. N. Gorban, A. Zinovyev, Principal manifolds and graphs in practice: from molecular biology to dynamical systems, International Journal of Neural Systems, Vol. 20, No. 3 (2010) 219–232.
  7. A.N. Gorban, B. Kegl, D. Wunsch, A. Zinovyev (Eds.), Principal Manifolds for Data Visualisation and Dimension Reduction, LNCSE 58, Springer: Berlin Heidelberg New York, 2007. ISBN   978-3-540-73749-0
  8. M. Chacón, M. Lévano, H. Allende, H. Nowak, Detection of Gene Expressions in Microarrays by Applying Iteratively Elastic Neural Net, In: B. Beliczynski et al. (Eds.), Lecture Notes in Computer Sciences, Vol. 4432, Springer: Berlin Heidelberg 2007, 355–363.
  9. A. Zinovyev, Data visualization in political and social sciences, In: SAGE "International Encyclopedia of Political Science", Badie, B., Berg-Schlosser, D., Morlino, L. A. (Eds.), 2011.
  10. H. Failmezger, B. Jaegle, A. Schrader, M. Hülskamp, A. Tresch., Semi-automated 3D leaf reconstruction and analysis of trichome patterning from light microscopic images, PLoS Computational Biology, 2013, 9(4):e1003029.
  11. M. Resta, Portfolio optimization through elastic maps: Some evidence from the Italian stock exchange, Knowledge-Based Intelligent Information and Engineering Systems, B. Apolloni, R.J. Howlett and L. Jain (eds.), Lecture Notes in Computer Science, Vol. 4693, Springer: Berlin Heidelberg, 2010, 635-641.
  12. H. Shaban, S. Tavoularis, Identification of flow regime in vertical upward air–water pipe flow using differential pressure signals and elastic maps, International Journal of Multiphase Flow 61 (2014) 62-72.
  13. H. Shaban, S. Tavoularis, Measurement of gas and liquid flow rates in two-phase pipe flows by the application of machine learning techniques to differential pressure signals, International Journal of Multiphase Flow 67(2014), 106-117
  14. M. Resta, Computational Intelligence Paradigms in Economic and Financial Decision Making, Series Intelligent Systems Reference Library, Volume 99, Springer International Publishing, Switzerland 2016.