Nonlinear system identification

Last updated

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

Contents

A nonlinear system is defined as any system that is not linear, that is any system that does not satisfy the superposition principle. This negative definition tends to obscure that there are very many different types of nonlinear systems. Historically, system identification for nonlinear systems [1] [2] has developed by focusing on specific classes of system and can be broadly categorized into five basic approaches, each defined by a model class:

  1. Volterra series models,
  2. Block-structured models,
  3. Neural network models,
  4. NARMAX models, and
  5. State-space models.

There are four steps to be followed for system identification: data gathering, model postulate, parameter identification, and model validation. Data gathering is considered as the first and essential part in identification terminology, used as the input for the model which is prepared later. It consists of selecting an appropriate data set, pre-processing and processing. It involves the implementation of the known algorithms together with the transcription of flight tapes, data storage and data management, calibration, processing, analysis, and presentation. Moreover, model validation is necessary to gain confidence in, or reject, a particular model. In particular, the parameter estimation and the model validation are integral parts of the system identification. Validation refers to the process of confirming the conceptual model and demonstrating an adequate correspondence between the computational results of the model and the actual data. [3]

Volterra series methods

The early work was dominated by methods based on the Volterra series, which in the discrete time case can be expressed as

where u(k), y(k); k = 1, 2, 3, ... are the measured input and output respectively and is the lth-order Volterra kernel, or lth-order nonlinear impulse response. The Volterra series is an extension of the linear convolution integral. Most of the earlier identification algorithms assumed that just the first two, linear and quadratic, Volterra kernels are present and used special inputs such as Gaussian white noise and correlation methods to identify the two Volterra kernels. In most of these methods the input has to be Gaussian and white which is a severe restriction for many real processes. These results were later extended to include the first three Volterra kernels, to allow different inputs, and other related developments including the Wiener series. A very important body of work was developed by Wiener, Lee, Bose and colleagues at MIT from the 1940s to the 1960s including the famous Lee and Schetzen method. [4] [5] While these methods are still actively studied today there are several basic restrictions. These include the necessity of knowing the number of Volterra series terms a priori, the use of special inputs, and the large number of estimates that have to be identified. For example, for a system where the first order Volterra kernel is described by say 30 samples, 30x30 points will be required for the second order kernel, 30x30x30 for the third order and so on and hence the amount of data required to provide good estimates becomes excessively large. [6] These numbers can be reduced by exploiting certain symmetries but the requirements are still excessive irrespective of what algorithm is used for the identification.

Block-structured systems

Because of the problems of identifying Volterra models other model forms were investigated as a basis for system identification for nonlinear systems. Various forms of block structured nonlinear models have been introduced or re-introduced. [6] [7] The Hammerstein model consists of a static single valued nonlinear element followed by a linear dynamic element. [8] The Wiener model is the reverse of this combination so that the linear element occurs before the static nonlinear characteristic. [9] The Wiener-Hammerstein model consists of a static nonlinear element sandwiched between two dynamic linear elements, and several other model forms are available. The Hammerstein-Wiener model consists of a linear dynamic block sandwiched between two static nonlinear blocks. [10] The Urysohn model [11] [12] is different from other block models, it does not consists of sequence linear and nonlinear blocks, but describes both dynamic and static nonlinearities in the expression of the kernel of an operator. [13] All these models can be represented by a Volterra series but in this case the Volterra kernels take on a special form in each case. Identification consists of correlation based and parameter estimation methods. The correlation methods exploit certain properties of these systems, which means that if specific inputs are used, often white Gaussian noise, the individual elements can be identified one at a time. This results in manageable data requirements and the individual blocks can sometimes be related to components in the system under study.

More recent results are based on parameter estimation and neural network based solutions. Many results have been introduced and these systems continue to be studied in depth. One problem is that these methods are only applicable to a very special form of model in each case and usually this model form has to be known prior to identification.

Neural networks

Artificial neural networks try loosely to imitate the network of neurons in the brain where computation takes place through a large number of simple processing elements. A typical neural network consists of a number of simple processing units interconnected to form a complex network. Layers of such units are arranged so that data is entered at the input layer and passes through either one or several intermediate layers before reaching the output layer. In supervised learning the network is trained by operating on the difference between the actual output and the desired output of the network, the prediction error, to change the connection strengths between the nodes. By iterating, the weights are modified until the output error reaches an acceptable level. This process is called machine learning because the network adjusts the weights so that the output pattern is reproduced. Neural networks have been extensively studied and there are many excellent textbooks devoted to this topic in general, [1] [14] and more focused textbooks which emphasise control and systems applications,. [1] [15] There are two main problem types that can be studied using neural networks: static problems, and dynamic problems. Static problems include pattern recognition, classification, and approximation. Dynamic problems involve lagged variables and are more appropriate for system identification and related applications. Depending on the architecture of the network the training problem can be either nonlinear-in-the-parameters which involves optimisation or linear-in-the-parameters which can be solved using classical approaches. The training algorithms can be categorised into supervised, unsupervised, or reinforcement learning. Neural networks have excellent approximation properties but these are usually based on standard function approximation results using for example the Weierstrass Theorem that applies equally well to polynomials, rational functions, and other well-known models. Neural networks have been applied extensively to system identification problems which involve nonlinear and dynamic relationships. However, classical neural networks are purely gross static approximating machines. There is no dynamics within the network. Hence when fitting dynamic models all the dynamics arise by allocating lagged inputs and outputs to the input layer of the network. The training procedure then produces the best static approximation that relates the lagged variables assigned to the input nodes to the output. There are more complex network architectures, including recurrent networks, [1] that produce dynamics by introducing increasing orders of lagged variables to the input nodes. But in these cases it is very easy to over specify the lags and this can lead to over fitting and poor generalisation properties. Neural networks have several advantages; they are conceptually simple, easy to train and to use, have excellent approximation properties, the concept of local and parallel processing is important and this provides integrity and fault tolerant behaviour. The biggest criticism of the classical neural network models is that the models produced are completely opaque and usually cannot be written down or analysed. It is therefore very difficult to know what is causing what, to analyse the model, or to compute dynamic characteristics from the model. Some of these points will not be relevant to all applications but they are for dynamic modelling.

NARMAX methods

The nonlinear autoregressive moving average model with exogenous inputs (NARMAX model) can represent a wide class of nonlinear systems, [2] and is defined as

where y(k), u(k) and e(k) are the system output, input, and noise sequences respectively; , , and are the maximum lags for the system output, input and noise; F[•] is some nonlinear function, d is a time delay typically set to d = 1.The model is essentially an expansion of past inputs, outputs and noise terms. Because the noise is modelled explicitly, unbiased estimates of the system model can be obtained in the presence of unobserved highly correlated and nonlinear noise. The Volterra, the block structured models and many neural network architectures can all be considered as subsets of the NARMAX model. Since NARMAX was introduced, by proving what class of nonlinear systems can be represented by this model, many results and algorithms have been derived based around this description. Most of the early work was based on polynomial expansions of the NARMAX model. These are still the most popular methods today but other more complex forms based on wavelets and other expansions have been introduced to represent severely nonlinear and highly complex nonlinear systems. A significant proportion of nonlinear systems can be represented by a NARMAX model including systems with exotic behaviours such as chaos, bifurcations, and subharmonics. While NARMAX started as the name of a model it has now developed into a philosophy of nonlinear system identification,. [2] The NARMAX approach consists of several steps:

Structure detection forms the most fundamental part of NARMAX. For example, a NARMAX model which consists of one lagged input and one lagged output term, three lagged noise terms, expanded as a cubic polynomial would consist of eighty two possible candidate terms. This number of candidate terms arises because the expansion by definition includes all possible combinations within the cubic expansion. Naively proceeding to estimate a model which includes all these terms and then pruning will cause numerical and computational problems and should always be avoided. However, only a few terms are often important in the model. Structure detection, which aims to select terms one at a time, is therefore critically important. These objectives can easily be achieved by using the Orthogonal Least Squares [2] algorithm and its derivatives to select the NARMAX model terms one at a time. These ideas can also be adapted for pattern recognition and feature selection and provide an alternative to principal component analysis but with the advantage that the features are revealed as basis functions that are easily related back to the original problem.

NARMAX methods are designed to do more than find the best approximating model. System identification can be divided into two aims. The first involves approximation where the key aim is to develop a model that approximates the data set such that good predictions can be made. There are many applications where this approach is appropriate, for example in time series prediction of the weather, stock prices, speech, target tracking, pattern classification etc. In such applications the form of the model is not that important. The objective is to find an approximation scheme which produces the minimum prediction errors. A second objective of system identification, which includes the first objective as a subset, involves much more than just finding a model to achieve the best mean squared errors. This second aim is why the NARMAX philosophy was developed and is linked to the idea of finding the simplest model structure. The aim here is to develop models that reproduce the dynamic characteristics of the underlying system, to find the simplest possible model, and if possible to relate this to components and behaviours of the system under study. The core aim of this second approach to identification is therefore to identify and reveal the rule that represents the system. These objectives are relevant to model simulation and control systems design, but increasingly to applications in medicine, neuro science, and the life sciences. Here the aim is to identify models, often nonlinear, that can be used to understand the basic mechanisms of how these systems operate and behave so that we can manipulate and utilise these. NARMAX methods have also been developed in the frequency and spatio-temporal domains.

Stochastic nonlinear models

In a general situation, it might be the case that some exogenous uncertain disturbance passes through the nonlinear dynamics and influence the outputs. A model class that is general enough to capture this situation is the class of stochastic nonlinear state-space models. A state-space model is usually obtained using first principle laws, [16] such as mechanical, electrical, or thermodynamic physical laws, and the parameters to be identified usually have some physical meaning or significance.

A discrete-time state-space model may be defined by the difference equations:

in which is a positive integer referring to time. The functions and are general nonlinear functions. The first equation is known as the state equation and the second is known as the output equation. All the signals are modeled using stochastic processes. The process is known as the state process, and are usually assumed independent and mutually independent such that . The parameter is usually a finite-dimensional (real) parameter to be estimated (using experimental data). Observe that the state process does not have to be a physical signal, and it is normally unobserved (not measured). The data set is given as a set of input-output pairs for for some finite positive integer value .

Unfortunately, due to the nonlinear transformation of unobserved random variables, the likelihood function of the outputs is analytically intractable; it is given in terms of a multidimensional marginalization integral. Consequently, commonly used parameter estimation methods such as the Maximum Likelihood Method or the Prediction Error Method based on the optimal one-step ahead predictor [16] are analytically intractable. Recently, algorithms based on sequential Monte Carlo methods have been used to approximate the conditional mean of the outputs or, in conjunction with the Expectation-Maximization algorithm, to approximate the maximum likelihood estimator. [17] These methods, albeit asymptotically optimal, are computationally demanding and their use is limited to specific cases where the fundamental limitations of the employed particle filters can be avoided. An alternative solution is to apply the prediction error method using a sub-optimal predictor. [18] [19] [20] The resulting estimator can be shown to be strongly consistent and asymptotically normal and can be evaluated using relatively simple algorithms. [21] [20]

See also

Related Research Articles

Control theory is a field of control engineering and applied mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a desired state, while minimizing any delay, overshoot, or steady-state error and ensuring a level of control stability; often with the aim to achieve a degree of optimality.

A mathematical model is an abstract description of a concrete system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in applied mathematics and in the natural sciences and engineering disciplines, as well as in non-physical systems such as the social sciences (such as economics, psychology, sociology, political science). It can also be taught as a subject in its own right.

In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many other scientists since most systems are inherently nonlinear in nature. Nonlinear dynamical systems, describing changes in variables over time, may appear chaotic, unpredictable, or counterintuitive, contrasting with much simpler linear systems.

An adaptive filter is a system with a linear filter that has a transfer function controlled by variable parameters and a means to adjust those parameters according to an optimization algorithm. Because of the complexity of the optimization algorithms, almost all adaptive filters are digital filters. Adaptive filters are required for some applications because some parameters of the desired processing operation are not known in advance or are changing. The closed loop adaptive filter uses feedback in the form of an error signal to refine its transfer function.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">System identification</span> Statistical methods to build mathematical models of dynamical systems from measured data

The field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data. System identification also includes the optimal design of experiments for efficiently generating informative data for fitting such models as well as model reduction. A common approach is to start from measurements of the behavior of the system and the external influences and try to determine a mathematical relation between them without going into many details of what is actually happening inside the system; this approach is called black box system identification.

In time series modeling, a nonlinear autoregressive exogenous model (NARX) is a nonlinear autoregressive model which has exogenous inputs. This means that the model relates the current value of a time series to both:

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

The Volterra series is a model for non-linear behavior similar to the Taylor series. It differs from the Taylor series in its ability to capture "memory" effects. The Taylor series can be used for approximating the response of a nonlinear system to a given input if the output of the system depends strictly on the input at that particular time. In the Volterra series, the output of the nonlinear system depends on the input to the system at all other times. This provides the ability to capture the "memory" effect of devices like capacitors and inductors.

Control reconfiguration is an active approach in control theory to achieve fault-tolerant control for dynamic systems. It is used when severe faults, such as actuator or sensor outages, cause a break-up of the control loop, which must be restructured to prevent failure at the system level. In addition to loop restructuring, the controller parameters must be adjusted to accommodate changed plant dynamics. Control reconfiguration is a building block toward increasing the dependability of systems under feedback control.

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

Activation function of a node in an artificial neural network is a function that calculates the output of the node. Nontrivial problems can be solved only using a nonlinear activation function. Modern activation functions include the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model, the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model.

<span class="mw-page-title-main">Linear-nonlinear-Poisson cascade model</span>

The linear-nonlinear-Poisson (LNP) cascade model is a simplified functional model of neural spike responses. It has been successfully used to describe the response characteristics of neurons in early sensory pathways, especially the visual system. The LNP model is generally implicit when using reverse correlation or the spike-triggered average to characterize neural responses with white-noise stimuli.

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers.

In mathematics, the Wiener series, or Wiener G-functional expansion, originates from the 1958 book of Norbert Wiener. It is an orthogonal expansion for nonlinear functionals closely related to the Volterra series and having the same relation to it as an orthogonal Hermite polynomial expansion has to a power series. For this reason it is also known as the Wiener–Hermite expansion. The analogue of the coefficients are referred to as Wiener kernels. The terms of the series are orthogonal (uncorrelated) with respect to a statistical input of white noise. This property allows the terms to be identified in applications by the Lee–Schetzen method.

There are many types of artificial neural networks (ANN).

Linear parameter-varying control deals with the control of linear parameter-varying systems, a class of nonlinear systems which can be modelled as parametrized linear systems whose parameters change with their state.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

Tensor informally refers in machine learning to two different concepts that organize and represent data. Data may be organized in a multidimensional array (M-way array) that is informally referred to as a "data tensor"; however in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor") may be analyzed either by artificial neural networks or tensor methods.

Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization.

References

  1. 1 2 3 4 Nelles O. "Nonlinear System Identification: From Classical Approaches to Neural Networks". Springer Verlag, 2001
  2. 1 2 3 4 Billings S.A. "Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains". Wiley, 2013
  3. Nesaei, Sepehr; Raissi, Kamran (2011-12-01). Das, Vinu V.; Ariwa, Ezendu; Rahayu, Syarifah Bahiyah (eds.). Data Processing Consideration and Model Validation in Flight Vehicle System Identification. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Springer Berlin Heidelberg. pp. 269–274. doi:10.1007/978-3-642-32573-1_46. ISBN   978-3-642-32572-4.
  4. Schetzen M. "The Volterra and Wiener Theories of Nonlinear Systems". Wiley, 1980
  5. Rugh W.J. "Nonlinear System Theory – The Volterra Wiener Approach". Johns Hopkins University Press,1981
  6. 1 2 Billings S.A. "Identification of Nonlinear Systems: A Survey". IEE Proceedings Part D 127(6), 272–285,1980
  7. Haber R., Keviczky L "Nonlinear System Identification-Input Output Modeling Approach". Vols I & II, Kluwer,1980
  8. Hammerstein (Acta Math 1930) was not concerned with system analysis but with boundary-value problems and eigenvalues of nonlinear operators
  9. This term is in common use but it is quite inaccurate as Wiener never used this simple model. His model was that given immediately after p.50 in Billings 1980 survey referred to in the references below.
  10. A.Wills, T.Schön, L.Ljung, B.Ninness, Identification of Hammerstein–Wiener models, Automatica 29 (2013), 70-81
  11. M.Poluektov and A.Polar. Modelling non-linear control systems using the discrete urysohn operator. 2018. Submitted arXiv:1802.01700.
  12. A.Polar. http://ezcodesample.com/urysohn/urysohn.html
  13. M.Poluektov and A.Polar. Urysohn Adaptive Filter. 2019.
  14. Haykin S. "Neural Networks: A Comprehensive Foundation". McMillan, 1999
  15. Warwick K, Irwin G.W., Hunt K.J. "Neural Networks for Control and Systems". Peter Peregrinus, 1992
  16. 1 2 Lennart., Ljung (1999). System identification : theory for the user (2nd ed.). Upper Saddle River, NJ: Prentice Hall PTR. ISBN   978-0136566953. OCLC   38884169.
  17. Schön, Thomas B.; Lindsten, Fredrik; Dahlin, Johan; Wågberg, Johan; Naesseth, Christian A.; Svensson, Andreas; Dai, Liang (2015). "Sequential Monte Carlo Methods for System Identification**This work was supported by the projects Learning of complex dynamical systems (Contract number: 637-2014-466) and Probabilistic modeling of dynamical systems (Contract number: 621-2013-5524), both funded by the Swedish Research Council". IFAC-PapersOnLine. 48 (28): 775–786. arXiv: 1503.06058 . doi:10.1016/j.ifacol.2015.12.224. S2CID   11396163.
  18. M. Abdalmoaty, ‘Learning Stochastic Nonlinear Dynamical Systems Using Non-stationary Linear Predictors’, Licentiate dissertation, Stockholm, Sweden, 2017. Urn:nbn:se:kth:diva-218100
  19. Abdalmoaty, Mohamed Rasheed; Hjalmarsson, Håkan (2017). "Simulated Pseudo Maximum Likelihood Identification of Nonlinear Models". IFAC-PapersOnLine. 50 (1): 14058–14063. doi: 10.1016/j.ifacol.2017.08.1841 .
  20. 1 2 Abdalmoaty, Mohamed (2019). "Identification of Stochastic Nonlinear Dynamical Models Using Estimating Functions". Diva.
  21. Abdalmoaty, Mohamed Rasheed-Hilmy; Hjalmarsson, Håkan (2019). "Linear prediction error methods for stochastic nonlinear models". Automatica. 105: 49–63. doi:10.1016/j.automatica.2019.03.006. S2CID   132768104.

Further reading