Neural operators

Last updated

Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization. [1]

Contents

The primary application of neural operators is in learning surrogate maps for the solution operators of partial differential equations (PDEs), [1] which are critical tools in modeling the natural environment. [2] [3] Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs [4] compared to existing machine learning methodologies while being significantly faster than numerical solvers. [5] [6] [7] Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data, [8] and the geosciences. [9] In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media, [10] and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces, [11] [12] and subsumes these settings when limited to fixed input resolution.

Operator learning

Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, one can cast the problem of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a machine learning paradigm to learn solution operators mapping the input function to the output function.

Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training.

The primary properties of neural operators that differentiate them from traditional neural networks is discretization invariance and discretization convergence. [1] Unlike conventional neural networks, which are fixed on the discretization of training data, neural operators can adapt to various discretizations without re-training. This property improves the robustness and applicability of neural operators in different scenarios, providing consistent performance across different resolutions and grids.

Definition and formulation

Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating linear maps and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear integral operators on function spaces and point-wise non-linearities. [1] Using an analogous architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a compact set. [1]

Neural operators seek to approximate some operator between function spaces and by building a parametric map . Such parametric maps can generally be defined in the form

where are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output codimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as multilayer perceptrons. is a pointwise nonlinearity, such as a rectified linear unit (ReLU), or a Gaussian error linear unit (GeLU). Each layer has a respective local operator (usually parameterized by a pointwise neural network), a kernel integral operator , and a bias function . Given some intermediate functional representation with domain in the -th hidden layer, a kernel integral operator is defined as

where the kernel is a learnable implicit neural network, parametrized by .

In practice, one is often given the input function to the neural operator at a specific resolution. For instance, consider the setting where one is given the evaluation of at points . Borrowing from Nyström integral approximation methods such as Riemann sum integration and Gaussian quadrature, the above integral operation can be computed as follows:

where is the sub-area volume or quadrature weight associated to the point . Thus, a simplified layer can be computed as

The above approximation, along with parametrizing as an implicit neural network, results in the graph neural operator (GNO). [13]

There have been various parameterizations of neural operators for different applications. [5] [13] These typically differ in their parameterization of . The most popular instantiation is the Fourier neural operator (FNO). FNO takes and by applying the convolution theorem, arrives at the following parameterization of the kernel integral operator:

where represents the Fourier transform and represents the Fourier transform of some periodic function . That is, FNO parameterizes the kernel integration directly in Fourier space, using a prescribed number of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using the discrete Fourier transform (DFT) with frequencies below some specified threshold. The discrete Fourier transform can be computed using a fast Fourier transform (FFT) implementation.

Training

Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some Lp norm or Sobolev norm. In particular, for a dataset of size , neural operators minimize (a discretization of)

,

where is a norm on the output function space . Neural operators can be trained directly using backpropagation and gradient descent-based methods.

Another training paradigm is associated with physics-informed machine learning. In particular, physics-informed neural networks (PINNs) use complete physics laws to fit neural networks to solutions of PDEs. Extensions of this paradigm to operator learning are broadly called physics-informed neural operators (PINO), [14] where loss functions can include full physics equations or partial physical laws. As opposed to standard PINNs, the PINO paradigm incorporates a data loss (as defined above) in addition to the physics loss . The physics loss quantifies how much the predicted solution of violates the PDEs equation for the input .

Related Research Articles

<span class="mw-page-title-main">Convolution</span> Integral expressing the amount of overlap of one function as it is shifted over another

In mathematics, convolution is a mathematical operation on two functions that produces a third function. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The integral is evaluated for all values of shift, producing the convolution function. The choice of which function is reflected and shifted before the integral does not change the integral result. Graphically, it expresses how the 'shape' of one function is modified by the other.

<span class="mw-page-title-main">Dynamical system</span> Mathematical model of the time dependence of a point in space

In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space, such as in a parametric curve. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in a pipe, the random motion of particles in the air, and the number of fish each springtime in a lake. The most general definition unifies several concepts in mathematics such as ordinary differential equations and ergodic theory by allowing different choices of the space and how time is measured. Time can be measured by integers, by real or complex numbers or can be a more general algebraic object, losing the memory of its physical origin, and the space may be a manifold or simply a set, without the need of a smooth space-time structure defined on it.

In engineering, a transfer function of a system, sub-system, or component is a mathematical function that models the system's output for each possible input. It is widely used in electronic engineering tools like circuit simulators and control systems. In simple cases, this function can be represented as a two-dimensional graph of an independent scalar input versus the dependent scalar output. Transfer functions for components are used to design and analyze systems assembled from components, particularly using the block diagram technique, in electronics and control theory.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

<span class="mw-page-title-main">Fourier transform</span> Mathematical transform that expresses a function of time as a function of frequency

In physics, engineering and mathematics, the Fourier transform (FT) is an integral transform that takes a function as input and outputs another function that describes the extent to which various frequencies are present in the original function. The output of the transform is a complex-valued function of frequency. The term Fourier transform refers to both this complex-valued function and the mathematical operation. When a distinction needs to be made, the output of the operation is sometimes called the frequency domain representation of the original function. The Fourier transform is analogous to decomposing the sound of a musical chord into the intensities of its constituent pitches.

<span class="mw-page-title-main">Partial differential equation</span> Type of differential equation

In mathematics, a partial differential equation (PDE) is an equation which computes a function between various partial derivatives of a multivariable function.

<span class="mw-page-title-main">Heat equation</span> Partial differential equation describing the evolution of temperature in a region

In mathematics and physics, the heat equation is a certain partial differential equation. Solutions of the heat equation are sometimes known as caloric functions. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quantity such as heat diffuses through a given region.

In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal unit vectors. A unit vector means that the vector has a length of 1, which is also known as normalized. Orthogonal means that the vectors are all perpendicular to each other. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of unit length. An orthonormal set which forms a basis is called an orthonormal basis.

Fourier optics is the study of classical optics using Fourier transforms (FTs), in which the waveform being considered is regarded as made up of a combination, or superposition, of plane waves. It has some parallels to the Huygens–Fresnel principle, in which the wavefront is regarded as being made up of a combination of spherical wavefronts whose sum is the wavefront being studied. A key difference is that Fourier optics considers the plane waves to be natural modes of the propagation medium, as opposed to Huygens–Fresnel, where the spherical waves originate in the physical medium.

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

<span class="mw-page-title-main">Nonlinear Schrödinger equation</span> Nonlinear form of the Schrödinger equation

In theoretical physics, the (one-dimensional) nonlinear Schrödinger equation (NLSE) is a nonlinear variation of the Schrödinger equation. It is a classical field equation whose principal applications are to the propagation of light in nonlinear optical fibers and planar waveguides and to Bose–Einstein condensates confined to highly anisotropic, cigar-shaped traps, in the mean-field regime. Additionally, the equation appears in the studies of small-amplitude gravity waves on the surface of deep inviscid (zero-viscosity) water; the Langmuir waves in hot plasmas; the propagation of plane-diffracted wave beams in the focusing regions of the ionosphere; the propagation of Davydov's alpha-helix solitons, which are responsible for energy transport along molecular chains; and many others. More generally, the NLSE appears as one of universal equations that describe the evolution of slowly varying packets of quasi-monochromatic waves in weakly nonlinear media that have dispersion. Unlike the linear Schrödinger equation, the NLSE never describes the time evolution of a quantum state. The 1D NLSE is an example of an integrable model.

<span class="mw-page-title-main">Free field</span> Physical field theory with no forces/interactions

In physics a free field is a field without interactions, which is described by the terms of motion and mass.

In differential geometry, the Laplace–Beltrami operator is a generalization of the Laplace operator to functions defined on submanifolds in Euclidean space and, even more generally, on Riemannian and pseudo-Riemannian manifolds. It is named after Pierre-Simon Laplace and Eugenio Beltrami.

In theoretical physics, a source field is a background field coupled to the original field as

In mathematics — specifically, in stochastic analysis — the infinitesimal generator of a Feller process is a Fourier multiplier operator that encodes a great deal of information about the process.

<span class="mw-page-title-main">Finite element method</span> Numerical method for solving physical or engineering problems

The finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat transfer, fluid flow, mass transport, and electromagnetic potential. Computers are usually used to perform the calculations required. With high-speed supercomputers, better solutions can be achieved, and are often required to solve the largest and most complex problems.

<span class="mw-page-title-main">Mild-slope equation</span> Physics phenomenon and formula

In fluid dynamics, the mild-slope equation describes the combined effects of diffraction and refraction for water waves propagating over bathymetry and due to lateral boundaries—like breakwaters and coastlines. It is an approximate model, deriving its name from being originally developed for wave propagation over mild slopes of the sea floor. The mild-slope equation is often used in coastal engineering to compute the wave-field changes near harbours and coasts.

<span class="mw-page-title-main">Causal fermion systems</span> Candidate unified theory of physics

The theory of causal fermion systems is an approach to describe fundamental physics. It provides a unification of the weak, the strong and the electromagnetic forces with gravity at the level of classical field theory. Moreover, it gives quantum mechanics as a limiting case and has revealed close connections to quantum field theory. Therefore, it is a candidate for a unified physical theory. Instead of introducing physical objects on a preexisting spacetime manifold, the general concept is to derive spacetime as well as all the objects therein as secondary objects from the structures of an underlying causal fermion system. This concept also makes it possible to generalize notions of differential geometry to the non-smooth setting. In particular, one can describe situations when spacetime no longer has a manifold structure on the microscopic scale. As a result, the theory of causal fermion systems is a proposal for quantum geometry and an approach to quantum gravity.

In theoretical physics, Hamiltonian field theory is the field-theoretic analogue to classical Hamiltonian mechanics. It is a formalism in classical field theory alongside Lagrangian field theory. It also has applications in quantum field theory.

References

  1. 1 2 3 4 5 Kovachki, Nikola; Li, Zongyi; Liu, Burigede; Azizzadenesheli, Kamyar; Bhattacharya, Kaushik; Stuart, Andrew; Anandkumar, Anima (2021). "Neural operator: Learning maps between function spaces" (PDF). Journal of Machine Learning Research. 24: 1–97. arXiv: 2108.08481 .
  2. Evans, L. C. (1998). Partial Differential Equations. Providence: American Mathematical Society. ISBN   0-8218-0772-2.
  3. "How AI models are transforming weather forecasting: A showcase of data-driven systems". phys.org (Press release). European Centre for Medium-Range Weather Forecasts. 6 September 2023.
  4. Russ, Dan; Abinader, Sacha (23 August 2023). "Microsoft and Accenture partner to tackle methane emissions with AI technology". Microsoft Azure Blog.
  5. 1 2 Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Fourier neural operator for parametric partial differential equations". arXiv: 2010.08895 [cs.LG].
  6. Hao, Karen (30 October 2020). "AI has cracked a key mathematical puzzle for understanding our world". MIT Technology Review.
  7. Ananthaswamy, Anil (19 April 2021). "Latest Neural Nets Solve World's Hardest Equations Faster Than Ever Before". Quanta Magazine.
  8. Sharma, Anuj; Singh, Sukhdeep; Ratna, S. (15 August 2023). "Graph Neural Network Operators: a Review". Multimedia Tools and Applications. 83 (8): 23413–23436. doi:10.1007/s11042-023-16440-4.
  9. Wen, Gege; Li, Zongyi; Azizzadenesheli, Kamyar; Anandkumar, Anima; Benson, Sally M. (May 2022). "U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow". Advances in Water Resources. 163: 104180. arXiv: 2109.03697 . Bibcode:2022AdWR..16304180W. doi:10.1016/j.advwatres.2022.104180.
  10. Choubineh, Abouzar; Chen, Jie; Wood, David A.; Coenen, Frans; Ma, Fei (2023). "Fourier Neural Operator for Fluid Flow in Small-Shape 2D Simulated Porous Media Dataset". Algorithms. 16 (1): 24. doi: 10.3390/a16010024 .
  11. Jiang, Chiyu Lmaxr; Esmaeilzadeh, Soheil; Azizzadenesheli, Kamyar; Kashinath, Karthik; Mustafa, Mustafa; Tchelepi, Hamdi A.; Marcus, Philip; Prabhat, Mr; Anandkumar, Anima (2020). "MESHFREEFLOWNET: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework". SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. pp. 1–15. doi:10.1109/SC41405.2020.00013. ISBN   978-1-7281-9998-6.
  12. Lu, Lu; Jin, Pengzhan; Pang, Guofei; Zhang, Zhongqiang; Karniadakis, George Em (18 March 2021). "Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators". Nature Machine Intelligence. 3 (3): 218–229. arXiv: 1910.03193 . doi:10.1038/s42256-021-00302-5.
  13. 1 2 Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Neural operator: Graph kernel network for partial differential equations". arXiv: 2003.03485 [cs.LG].
  14. Li, Zongyi; Hongkai, Zheng; Kovachki, Nikola; Jin, David; Chen, Haoxuan; Liu, Burigede; Azizzadenesheli, Kamyar; Anima, Anandkumar (2021). "Physics-Informed Neural Operator for Learning Partial Differential Equations". arXiv: 2111.03794 [cs.LG].