Physics-informed neural networks (PINNs), [1] also referred to as Theory-Trained Neural Networks (TTNs), [2] are a type of universal function approximators that can embed the knowledge of any physical laws that govern a given data-set in the learning process, and can be described by partial differential equations (PDEs). Low data availability for some biological and engineering problems limit the robustness of conventional machine learning models used for these applications. [1] The prior knowledge of general physical laws acts in the training of neural networks (NNs) as a regularization agent that limits the space of admissible solutions, increasing the generalizability of the function approximation. This way, embedding this prior information into a neural network results in enhancing the information content of the available data, facilitating the learning algorithm to capture the right solution and to generalize well even with a low amount of training examples.
Most of the physical laws that govern the dynamics of a system can be described by partial differential equations. For example, the Navier–Stokes equations [3] are a set of partial differential equations derived from the conservation laws (i.e., conservation of mass, momentum, and energy) that govern fluid mechanics. The solution of the Navier–Stokes equations with appropriate initial and boundary conditions allows the quantification of flow dynamics in a precisely defined geometry. However, these equations cannot be solved exactly and therefore numerical methods must be used (such as finite differences, finite elements and finite volumes). In this setting, these governing equations must be solved while accounting for prior assumptions, linearization, and adequate time and space discretization.
Recently, solving the governing partial differential equations of physical phenomena using deep learning has emerged as a new field of scientific machine learning (SciML), leveraging the universal approximation theorem [4] and high expressivity of neural networks. In general, deep neural networks could approximate any high-dimensional function given that sufficient training data are supplied. [5] However, such networks do not consider the physical characteristics underlying the problem, and the level of approximation accuracy provided by them is still heavily dependent on careful specifications of the problem geometry as well as the initial and boundary conditions. Without this preliminary information, the solution is not unique and may lose physical correctness. On the other hand, physics-informed neural networks (PINNs) leverage governing physical equations in neural network training. Namely, PINNs are designed to be trained to satisfy the given training data as well as the imposed governing equations. In this fashion, a neural network can be guided with training data that do not necessarily need to be large and complete. [5] Potentially, an accurate solution of partial differential equations can be found without knowing the boundary conditions. [6] Therefore, with some knowledge about the physical characteristics of the problem and some form of training data (even sparse and incomplete), PINN may be used for finding an optimal solution with high fidelity.
PINNs allow for addressing a wide range of problems in computational science and represent a pioneering technology leading to the development of new classes of numerical solvers for PDEs. PINNs can be thought of as a meshfree alternative to traditional approaches (e.g., CFD for fluid dynamics), and new data-driven approaches for model inversion and system identification. [7] Notably, the trained PINN network can be used for predicting the values on simulation grids of different resolutions without the need to be retrained. [8] In addition, they allow for exploiting automatic differentiation (AD) [9] to compute the required derivatives in the partial differential equations, a new class of differentiation techniques widely used to derive neural networks assessed to be superior to numerical or symbolic differentiation.
A general nonlinear partial differential equation can be:
where denotes the solution, is a nonlinear operator parameterized by , and is a subset of . This general form of governing equations summarizes a wide range of problems in mathematical physics, such as conservative laws, diffusion process, advection-diffusion systems, and kinetic equations. Given noisy measurements of a generic dynamic system described by the equation above, PINNs can be designed to solve two classes of problems:
The data-driven solution of PDE [1] computes the hidden state of the system given boundary data and/or measurements , and fixed model parameters . We solve:
.
By defining the residual as
,
and approximating by a deep neural network. This network can be differentiated using automatic differentiation. The parameters of and can be then learned by minimizing the following loss function :
.
Where is the error between the PINN and the set of boundary conditions and measured data on the set of points where the boundary conditions and data are defined, and is the mean-squared error of the residual function. This second term encourages the PINN to learn the structural information expressed by the partial differential equation during the training process.
This approach has been used to yield computationally efficient physics-informed surrogate models with applications in the forecasting of physical processes, model predictive control, multi-physics and multi-scale modeling, and simulation. [10] It has been shown to converge to the solution of the PDE. [11]
Given noisy and incomplete measurements of the state of the system, the data-driven discovery of PDE [7] results in computing the unknown state and learning model parameters that best describe the observed data and it reads as follows:
.
By defining as
,
and approximating by a deep neural network, results in a PINN. This network can be derived using automatic differentiation. The parameters of and , together with the parameter of the differential operator can be then learned by minimizing the following loss function :
.
Where , with and state solutions and measurements at sparse location , respectively and residual function. This second term requires the structured information represented by the partial differential equations to be satisfied in the training process.
This strategy allows for discovering dynamic models described by nonlinear PDEs assembling computationally efficient and fully differentiable surrogate models that may find application in predictive forecasting, control, and data assimilation. [12] [13] [14] [15]
PINN is unable to approximate PDEs that have strong non-linearity or sharp gradients that commonly occur in practical fluid flow problems. Piece-wise approximation has been an old practice in the field of numerical approximation. With the capability of approximating strong non-linearity extremely light weight PINNs are used to solve PDEs in much larger discrete subdomains that increases accuracy substantially and decreases computational load as well. [16] [17] DPINN (Distributed physics-informed neural networks) and DPIELM (Distributed physics-informed extreme learning machines) are generalizable space-time domain discretization for better approximation. [16] DPIELM is an extremely fast and lightweight approximator with competitive accuracy. Domain scaling on the top has a special effect. [17] Another school of thought is discretization for parallel computation to leverage usage of available computational resources.
XPINNs [18] is a generalized space-time domain decomposition approach for the physics-informed neural networks (PINNs) to solve nonlinear partial differential equations on arbitrary complex-geometry domains. The XPINNs further pushes the boundaries of both PINNs as well as Conservative PINNs (cPINNs), [19] which is a spatial domain decomposition approach in the PINN framework tailored to conservation laws. Compared to PINN, the XPINN method has large representation and parallelization capacity due to the inherent property of deployment of multiple neural networks in the smaller subdomains. Unlike cPINN, XPINN can be extended to any type of PDEs. Moreover, the domain can be decomposed in any arbitrary way (in space and time), which is not possible in cPINN. Thus, XPINN offers both space and time parallelization, thereby reducing the training cost more effectively. The XPINN is particularly effective for the large-scale problems (involving large data set) as well as for the high-dimensional problems where single network based PINN is not adequate. The rigorous bounds on the errors resulting from the approximation of the nonlinear PDEs (incompressible Navier–Stokes equations) with PINNs and XPINNs are proved. [15] However, DPINN debunks the use of residual (flux) matching at the domain interfaces as they hardly seem to improve the optimization. [17]
In the PINN framework, initial and boundary conditions are not analytically satisfied, thus they need to be included in the loss function of the network to be simultaneously learned with the differential equation (DE) unknown functions. Having competing objectives during the network's training can lead to unbalanced gradients while using gradient-based techniques, which causes PINNs to often struggle to accurately learn the underlying DE solution. This drawback is overcome by using functional interpolation techniques such as the Theory of functional connections (TFC)'s constrained expression, in the Deep-TFC [20] framework, which reduces the solution search space of constrained problems to the subspace of neural network that analytically satisfies the constraints. [21] A further improvement of PINN and functional interpolation approach is given by the Extreme Theory of Functional Connections (X-TFC) framework, where a single-layer Neural Network and the extreme learning machine training algorithm are employed. [22] X-TFC allows to improve the accuracy and performance of regular PINNs, and its robustness and reliability are proved for stiff problems, optimal control, aerospace, and rarefied gas dynamics applications. [23] [24] [25]
Regular PINNs are only able to obtain the solution of a forward or inverse problem on a single geometry. It means that for any new geometry (computational domain), one must retrain a PINN. This limitation of regular PINNs imposes high computational costs, specifically for a comprehensive investigation of geometric parameters in industrial designs. Physics-informed PointNet (PIPN) [26] is fundamentally the result of a combination of PINN's loss function with PointNet. [27] In fact, instead of using a simple fully connected neural network, PIPN uses PointNet as the core of its neural network. PointNet has been primarily designed for deep learning of 3D object classification and segmentation by the research group of Leonidas J. Guibas. PointNet extracts geometric features of input computational domains in PIPN. Thus, PIPN is able to solve governing equations on multiple computational domains (rather than only a single domain) with irregular geometries, simultaneously. The effectiveness of PIPN has been shown for incompressible flow, heat transfer and linear elasticity. [26] [28]
Physics-informed neural networks (PINNs) have proven particularly effective in solving inverse problems within differential equations, [29] demonstrating their applicability across science, engineering, and economics. They have shown useful for solving inverse problems in a variety of fields, including nano-optics, [30] topology optimization/characterization, [31] multiphase flow in porous media, [32] [33] and high-speed fluid flow. [34] [13] PINNs have demonstrated flexibility when dealing with noisy and uncertain observation datasets. They also demonstrated clear advantages in the inverse calculation of parameters for multi-fidelity datasets, meaning datasets with different quality, quantity, and types of observations. Uncertainties in calculations can be evaluated using ensemble-based or Bayesian-based calculations. [35]
Deep backward stochastic differential equation method is a numerical method that combines deep learning with Backward stochastic differential equation (BSDE) to solve high-dimensional problems in financial mathematics. By leveraging the powerful function approximation capabilities of deep neural networks, deep BSDE addresses the computational challenges faced by traditional numerical methods like finite difference methods or Monte Carlo simulations, which struggle with the curse of dimensionality. Deep BSDE methods use neural networks to approximate solutions of high-dimensional partial differential equations (PDEs), effectively reducing the computational burden. Additionally, integrating Physics-informed neural networks (PINNs) into the deep BSDE framework enhances its capability by embedding the underlying physical laws into the neural network architecture, ensuring solutions adhere to governing stochastic differential equations, resulting in more accurate and reliable solutions. [36]
An extension or adaptation of PINNs are Biologically-informed neural networks (BINNs). BINNs introduce two key adaptations to the typical PINN framework: (i) the mechanistic terms of the governing PDE are replaced by neural networks, and (ii) the loss function is modified to include , a term used to incorporate domain-specific knowledge that helps enforce biological applicability. For (i), this adaptation has the advantage of relaxing the need to specify the governing differential equation a priori, either explicitly or by using a library of candidate terms. Additionally, this approach circumvents the potential issue of misspecifying regularization terms in stricter theory-informed cases. [37] [38]
A natural example of BINNs can be found in cell dynamics, where the cell density is governed by a reaction-diffusion equation with diffusion and growth functions and , respectively:
In this case, a component of could be for , which penalizes values of that fall outside a biologically relevant diffusion range defined by . Furthermore, the BINN architecture, when utilizing multilayer-perceptrons (MLPs), would function as follows: an MLP is used to construct from model inputs , serving as a surrogate model for the cell density . This surrogate is then fed into the two additional MLPs, and , which model the diffusion and growth functions. Automatic differentiation can then be applied to compute the necessary derivatives of , and to form the governing reaction-diffusion equation. [37]
Note that since is a surrogate for the cell density, it may contain errors, particularly in regions where the PDE is not fully satisfied. Therefore, the reaction-diffusion equation may be solved numerically, for instance using a method-of-lines approach approach.
Translation and discontinuous behavior are hard to approximate using PINNs. [17] They fail when solving differential equations with slight advective dominance and hence asymptotic behaviour causes the method to fail. Such PDEs could be solved by scaling variables. [17] This difficulty in training of PINNs in advection-dominated PDEs can be explained by the Kolmogorov n–width of the solution. [39] They also fail to solve a system of dynamical systems and hence have not been a success in solving chaotic equations. [40] One of the reasons behind the failure of regular PINNs is soft-constraining of Dirichlet and Neumann boundary conditions which pose a multi-objective optimization problem which requires manually weighing the loss terms to be able to optimize. [17] More generally, posing the solution of a PDE as an optimization problem brings with it all the problems that are faced in the world of optimization, the major one being getting stuck in local optima. [17] [41]
In mathematics, a partial differential equation (PDE) is an equation which involves a multivariable function and one or more of its partial derivatives.
The Hamilton-Jacobi-Bellman (HJB) equation is a nonlinear partial differential equation that provides necessary and sufficient conditions for optimality of a control with respect to a loss function. Its solution is the value function of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer of the Hamiltonian involved in the HJB equation.
In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, and the differential equation defines a relationship between the two. Such relations are common; therefore, differential equations play a prominent role in many disciplines including engineering, physics, economics, and biology.
In mathematics and physics, the Kadomtsev–Petviashvili equation is a partial differential equation to describe nonlinear wave motion. Named after Boris Borisovich Kadomtsev and Vladimir Iosifovich Petviashvili, the KP equation is usually written as where . The above form shows that the KP equation is a generalization to two spatial dimensions, x and y, of the one-dimensional Korteweg–de Vries (KdV) equation. To be physically meaningful, the wave propagation direction has to be not-too-far from the x direction, i.e. with only slow variations of solutions in the y direction.
Moving least squares is a method of reconstructing continuous functions from a set of unorganized point samples via the calculation of a weighted least squares measure biased towards the region around the point at which the reconstructed value is requested.
A parabolic partial differential equation is a type of partial differential equation (PDE). Parabolic PDEs are used to describe a wide variety of time-dependent phenomena in, i.a., engineering science, quantum mechanics and financial mathematics. Examples include the heat equation, time-dependent Schrödinger equation and the Black–Scholes equation.
Dispersionless limits of integrable partial differential equations (PDE) arise in various problems of mathematics and physics and have been intensively studied in recent literature. They typically arise when considering slowly modulated long waves of an integrable dispersive PDE system.
Finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat transfer, fluid flow, mass transport, and electromagnetic potential. Computers are usually used to perform the calculations required. With high-speed supercomputers, better solutions can be achieved and are often required to solve the largest and most complex problems.
The closest point method (CPM) is an embedding method for solving partial differential equations on surfaces. The closest point method uses standard numerical approaches such as finite differences, finite element or spectral methods in order to solve the embedding partial differential equation (PDE) which is equal to the original PDE on the surface. The solution is computed in a band surrounding the surface in order to be computationally efficient. In order to extend the data off the surface, the closest point method uses a closest point representation. This representation extends function values to be constant along directions normal to the surface.
The Harrow–Hassidim–Lloyd algorithm or HHL algorithm is a quantum algorithm for numerically solving a system of linear equations, designed by Aram Harrow, Avinatan Hassidim, and Seth Lloyd. The algorithm estimates the result of a scalar measurement on the solution vector to a given linear system of equations.
In machine learning, the vanishing gradient problem is encountered when training neural networks with gradient-based learning methods and backpropagation. In such methods, during each training iteration, each neural network weight receives an update proportional to the partial derivative of the loss function with respect to the current weight. The problem is that as the network depth or sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further learning. As one example of this problem, traditional activation functions such as the hyperbolic tangent function have gradients in the range [-1,1], and backpropagation computes gradients using the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.
Daniele Mortari is Professor of Aerospace Engineering at Texas A&M University and Chief Scientist for Space for Texas A&M ASTRO Center. Mortari is known for inventing the Flower Constellations, the k-vector range searching technique, and the Theory of functional connections.
Applying machine learning (ML) methods to the study of quantum systems is an emergent area of physics research. A basic example of this is quantum state tomography, where a quantum state is learned from measurement. Other examples include learning Hamiltonians, learning quantum phase transitions, and automatically generating new quantum experiments. ML is effective at processing large amounts of experimental or calculated data in order to characterize an unknown quantum system, making its application useful in contexts including quantum information theory, quantum technology development, and computational materials design. In this context, for example, it can be used as a tool to interpolate pre-calculated interatomic potentials, or directly solving the Schrödinger equation with a variational method.
PDE-constrained optimization is a subset of mathematical optimization where at least one of the constraints may be expressed as a partial differential equation. Typical domains where these problems arise include aerodynamics, computational fluid dynamics, image segmentation, and inverse problems. A standard formulation of PDE-constrained optimization encountered in a number of disciplines is given by:where is the control variable and is the squared Euclidean norm and is not a norm itself. Closed-form solutions are generally unavailable for PDE-constrained optimization problems, necessitating the development of numerical methods.
Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.
The frequency principle/spectral bias is a phenomenon observed in the study of Artificial Neural Networks(ANNs), specifically deep neural networks(DNNs). It describes the tendency of deep neural networks to fit target functions from low to high frequencies during the training process.
Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization.
Deep backward stochastic differential equation method is a numerical method that combines deep learning with Backward stochastic differential equation (BSDE). This method is particularly useful for solving high-dimensional problems in financial derivatives pricing and risk management. By leveraging the powerful function approximation capabilities of deep neural networks, deep BSDE addresses the computational challenges faced by traditional numerical methods in high-dimensional settings.
In machine learning, a neural differential equation is a differential equation whose right-hand side is parametrized by the weights θ of a neural network. In particular, a neural ordinary differential equation (neural ODE) is an ordinary differential equation of the form
The Theory of Functional Connections (TFC) is a mathematical framework designed for functional interpolation. It introduces a method to derive a functional— a function that operates on another function—capable of transforming constrained optimization problems into equivalent unconstrained problems. This transformation enables the application of TFC to various mathematical challenges, including the solution of differential equations. Functional interpolation, in this context, refers to constructing functionals that always satisfy given constraints, regardless of the expression of the internal (free) function.