Group method of data handling

Last updated

Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models.

Contents

GMDH is used in such fields as data mining, knowledge discovery, prediction, complex systems modeling, optimization and pattern recognition. [1] GMDH algorithms are characterized by inductive procedure that performs sorting-out of gradually complicated polynomial models and selecting the best solution by means of the external criterion. The last section of [2] contains a summary of the applications of GMDH in the 1970s.

Other names include "polynomial feedforward neural network", [3] or "self-organization of models". It was one of the first deep learning methods, used to train an eight-layer neural net in 1971. [4] [5]

Mathematical content

Polynomial regression

This section is based on. [2]

This is the general problem of statistical modelling of data: Consider a dataset , with points. Each point contains observations, and one target to predict. How to best predict the target based on the observations?

First, we split the full dataset into two parts: a training set and a validation set. The training set would be used to fit more and more model parameters, and the validation set would be used to decide which parameters to include, and when to stop fitting completely.

The GMDH starts by considering degree-2 polynomial in 2 variables. Suppose we want to predict the target using just the parts of the observation, and using only degree-2 polynomials, then the most we can do is this:

where the parameters are computed by linear regression. Now, the parameters depend on which we have chosen, and we do not know which we should choose, so we choose all of them. That is, we perform all such polynomial regressions:

obtaining polynomial models of the dataset.

We do not want to accept all the polynomial models, since it would contain too many models. To only select the best subset of these models, we run each model on the validation dataset, and select the models whose mean-square-error is below a threshold. We also write down the smallest mean-square-error achieved as .

Suppose that after this process, we have obtained a set of models. We now run the models on the training dataset, to obtain a sequence of transformed observations: . The same algorithm can now be run again.

Fig.1. A typical distribution of minimal errors. The process terminates when a minimum is reached. Combinatorial GMDH optimal complexity.png
Fig.1. A typical distribution of minimal errors. The process terminates when a minimum is reached.

The algorithm continues, giving us . As long as each is smaller than the previous one, the process continues, giving us increasingly deep models. As soon as some , the algorithm terminates. The last layer fitted (layer ) is discarded, as it has overfit the training set. The previous layers are outputted.

More sophisticated methods for deciding when to terminate are possible. For example, one might keep running the algorithm for several more steps, in the hope of passing a temporary rise in .

In general

Instead of a degree-2 polynomial in 2 variables, each unit may use higher-degree polynomials in more variables: [1]

And more generally, a GMDH model with multiple inputs and one output is a subset of components of the base function (1):

where fi are elementary functions dependent on different sets of inputs, ai are coefficients and m is the number of the base function components.

External criteria

External criteria are optimization objectives for the model, such as minimizing mean-squared error on the validation set, as given above. The most common criteria are:

Idea

Like linear regression, which fits a linear equation over data, GMDH fits arbitrarily high orders of polynomial equations over data. [6] [7]

To choose between models, two or more subsets of a data sample are used, similar to the train-validation-test split.

GMDH combined ideas from: [8] black box modeling, successive genetic selection of pairwise features, [9] the Gabor's principle of "freedom of decisions choice", [10] and the Beer's principle of external additions. [11]

Inspired by an analogy between constructing a model out of noisy data, and sending messages through a noisy channel, [12] they proposed "noise-immune modelling": [6] the higher the noise, the less parameters must the optimal model have, since the noisy channel does not allow more bits to be sent through.

The model is structured as a feedforward neural network, but without restrictions on the depth, they had a procedure for automatic models structure generation, which imitates the process of biological selection with pairwise genetic features.

History

GMDH author - Soviet scientist Prof. Alexey G. Ivakhnenko. Photo of Prof. Alexey G. Ivakhnenko.jpg
GMDH author – Soviet scientist Prof. Alexey G. Ivakhnenko.

The method was originated in 1968 by Prof. Alexey G. Ivakhnenko in the Institute of Cybernetics in Kyiv.

Period 1968–1971 is characterized by application of only regularity criterion for solving of the problems of identification, pattern recognition and short-term forecasting. As reference functions polynomials, logical nets, fuzzy Zadeh sets and Bayes probability formulas were used. Authors were stimulated by very high accuracy of forecasting with the new approach. Noise immunity was not investigated.

Period 1972–1975. The problem of modeling of noised data and incomplete information basis was solved. Multicriteria selection and utilization of additional priory information for noiseimmunity increasing were proposed. Best experiments showed that with extended definition of the optimal model by additional criterion noise level can be ten times more than signal. Then it was improved using Shannon's Theorem of General Communication theory.

Period 1976–1979. The convergence of multilayered GMDH algorithms was investigated. It was shown that some multilayered algorithms have "multilayerness error" – analogous to static error of control systems. In 1977 a solution of objective systems analysis problems by multilayered GMDH algorithms was proposed. It turned out that sorting-out by criteria ensemble finds the only optimal system of equations and therefore to show complex object elements, their main input and output variables.

Period 1980–1988. Many important theoretical results were received. It became clear that full physical models cannot be used for long-term forecasting. It was proved, that non-physical models of GMDH are more accurate for approximation and forecast than physical models of regression analysis. Two-level algorithms which use two different time scales for modeling were developed.

Since 1989 the new algorithms (AC, OCC, PF) for non-parametric modeling of fuzzy objects and SLP for expert systems were developed and investigated. [13] Present stage of GMDH development can be described as blossom out of deep learning neuronets and parallel inductive algorithms for multiprocessor computers. Such procedure is currently used in deep learning networks. [14]

GMDH-type neural networks

There are many different ways to choose an order for partial models consideration. The very first consideration order used in GMDH and originally called multilayered inductive procedure is the most popular one. It is a sorting-out of gradually complicated models generated from base function. The best model is indicated by the minimum of the external criterion characteristic. Multilayered procedure is equivalent to the Artificial Neural Network with polynomial activation function of neurons. Therefore, the algorithm with such an approach usually referred as GMDH-type Neural Network or Polynomial Neural Network. Li showed that GMDH-type neural network performed better than the classical forecasting algorithms such as Single Exponential Smooth, Double Exponential Smooth, ARIMA and back-propagation neural network. [15]

Combinatorial GMDH

Another important approach to partial models consideration that becomes more and more popular is a combinatorial search that is either limited or full. This approach has some advantages against Polynomial Neural Networks, but requires considerable computational power and thus is not effective for objects with a large number of inputs. An important achievement of Combinatorial GMDH is that it fully outperforms linear regression approach if noise level in the input data is greater than zero. It guarantees that the most optimal model will be founded during exhaustive sorting.

Basic Combinatorial algorithm makes the following steps:

In contrast to GMDH-type neural networks Combinatorial algorithm usually does not stop at the certain level of complexity because a point of increase of criterion value can be simply a local minimum, see Fig.1.

Algorithms

Software implementations

Related Research Articles

In mathematics and computer science, Horner's method is an algorithm for polynomial evaluation. Although named after William George Horner, this method is much older, as it has been attributed to Joseph-Louis Lagrange by Horner himself, and can be traced back many hundreds of years to Chinese and Persian mathematicians. After the introduction of computers, this algorithm became fundamental for computing efficiently with polynomials.

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Perceptron</span> Algorithm for supervised learning of binary classifiers

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

In numerical analysis, polynomial interpolation is the interpolation of a given bivariate data set by the polynomial of lowest possible degree that passes through the points of the dataset.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). It is one of several forms of causal notation. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual results creating a variance actual analysis. Prediction is a similar but more general term. Forecasting might refer to specific formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods or the process of prediction and resolution itself. Usage can vary between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

In mathematics, and more specifically in computer algebra, computational algebraic geometry, and computational commutative algebra, a Gröbner basis is a particular kind of generating set of an ideal in a polynomial ring K[x1, ..., xn] over a field K. A Gröbner basis allows many important properties of the ideal and the associated algebraic variety to be deduced easily, such as the dimension and the number of zeros when it is finite. Gröbner basis computation is one of the main practical tools for solving systems of polynomial equations and computing the images of algebraic varieties under projections or rational maps.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called "hints".

<span class="mw-page-title-main">Feature selection</span> Procedure in machine learning and statistics

Feature selection is the process of selecting a subset of relevant features for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction.

<span class="mw-page-title-main">Granger causality</span> Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

<span class="mw-page-title-main">Quantum neural network</span> Quantum Mechanics in Neural Networks

Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments.

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions. When specialized to solving feasible linear optimization problems with rational data, the ellipsoid method is an algorithm which finds an optimal solution in a number of steps that is polynomial in the input size.

<span class="mw-page-title-main">Alexey Ivakhnenko</span> Soviet–Ukrainian mathematician and computer scientist

Alexey Grigoryevich Ivakhnenko was a Soviet and Ukrainian mathematician most famous for developing the group method of data handling (GMDH), a method of inductive statistical learning, for which he is sometimes referred to as the "Father of deep learning".

<span class="mw-page-title-main">Online machine learning</span> Method of machine learning

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

<span class="mw-page-title-main">Bias–variance tradeoff</span> Property of a model

In statistics and machine learning, the bias–variance tradeoff describes the relationship between a model's complexity, the accuracy of its predictions, and how well it can make predictions on previously unseen data that were not used to train the model. In general, as we increase the number of tunable parameters in a model, it becomes more flexible, and can better fit a training data set. It is said to have lower error, or bias. However, for more flexible models, there will tend to be greater variance to the model fit each time we take a set of samples to create a new training data set. It is said that there is greater variance in the model's estimated parameters.

<span class="mw-page-title-main">Multiple kernel learning</span> Set of machine learning methods

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.

References

  1. 1 2 3 Madala, H.R.; Ivakhnenko, O.G. (1994). Inductive Learning Algorithms for Complex Systems Modeling. Boca Raton: CRC Press. ISBN   978-0849344381. Archived from the original on 2017-12-31. Retrieved 2019-11-17.
  2. 1 2 Farlow, Stanley J. (November 1981). "The GMDH Algorithm of Ivakhnenko". The American Statistician. 35 (4): 210–215. doi:10.1080/00031305.1981.10479358. ISSN   0003-1305.
  3. Nikolaev, N.Y.; Iba, H. (March 2003). "Learning polynomial feedforward neural networks by genetic programming and backpropagation". IEEE Transactions on Neural Networks. 14 (2): 337–350. doi:10.1109/TNN.2003.809405. ISSN   1045-9227.
  4. Ivakhnenko, Alexey (1971). "Polynomial theory of complex systems" (PDF). IEEE Transactions on Systems, Man, and Cybernetics. SMC-1 (4): 364–378. doi:10.1109/TSMC.1971.4308320.
  5. Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv: 1404.7828 . doi:10.1016/j.neunet.2014.09.003. PMID   25462637. S2CID   11715509.
  6. 1 2 Ivakhnenko, O.G.; Stepashko, V.S. (1985). Pomekhoustojchivost' Modelirovanija (Noise Immunity of Modeling) (PDF). Kyiv: Naukova Dumka. Archived from the original (PDF) on 2017-12-31. Retrieved 2019-11-18.
  7. Ivakhnenko, O.G.; Lapa, V.G. (1967). Cybernetics and Forecasting Techniques (Modern Analytic and Computational Methods in Science and Mathematics, v.8 ed.). American Elsevier.
  8. Ivakhenko, A.G.; Savchenko, E.A..; Ivakhenko, G.A. (October 2003). "Problems of future GMDH algorithms development". Systems Analysis Modelling Simulation. 43 (10): 1301–1309. doi:10.1080/0232929032000115029. ISSN   0232-9298.
  9. Ivakhnenko, Aleksei G., and Grigorii A. Ivakhnenko. "Problems of further development of the group method of data handling algorithms. Part I." Pattern Recognition and Image Analysis c/c of raspoznavaniye obrazov i analiz izobrazhenii 10.2 (2000): 187-194.
  10. Gabor, D. (1971). Perspectives of Planing. Organization of Economic Cooperation and Development. London: Imp.Coll.
  11. Beer, S. (1959). Cybernetics and Management. London: English Univ. Press.
  12. Ivahnenko, O.G. (1982). Inductive Method of Models Self-organisation for Complex Systems (PDF). Kyiv: Naukova Dumka. Archived from the original (PDF) on 2017-12-31. Retrieved 2019-11-18.
  13. Ivakhnenko, O.G.; Ivakhnenko, G.A. (1995). "The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (GMDH)" (PDF). Pattern Recognition and Image Analysis. 5 (4): 527–535. CiteSeerX   10.1.1.19.2971 .
  14. Takao, S.; Kondo, S.; Ueno, J.; Kondo, T. (2017). "Deep feedback GMDH-type neural network and its application to medical image analysis of MRI brain images". Artificial Life and Robotics. 23 (2): 161–172. doi:10.1007/s10015-017-0410-1. S2CID   44190434.
  15. Li, Rita Yi Man; Fong, Simon; Chong, Kyle Weng Sang (2017). "Forecasting the REITs and stock indices: Group Method of Data Handling Neural Network approach". Pacific Rim Property Research Journal. 23 (2): 123–160. doi:10.1080/14445921.2016.1225149. S2CID   157150897.

Further reading