Neural operators

Last updated

Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. [1] Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization. [2]

Contents

The primary application of neural operators is in learning surrogate maps for the solution operators of partial differential equations (PDEs), [2] which are critical tools in modeling the natural environment. [3] [4] Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs [5] compared to existing machine learning methodologies while being significantly faster than numerical solvers. [6] [7] [8] [9] Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data, [10] and the geosciences. [11] In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media, [12] and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces, [13] [14] and subsumes these settings when limited to fixed input resolution.

Operator learning

Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, one can cast the problem of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a machine learning paradigm to learn solution operators mapping the input function to the output function.

Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training.

The primary properties of neural operators that differentiate them from traditional neural networks is discretization invariance and discretization convergence. [2] Unlike conventional neural networks, which are fixed on the discretization of training data, neural operators can adapt to various discretizations without re-training. This property improves the robustness and applicability of neural operators in different scenarios, providing consistent performance across different resolutions and grids.

Definition and formulation

Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating linear maps and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear integral operators on function spaces and point-wise non-linearities. [1] [2] Using an analogous architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a compact set. [2]

Neural operators seek to approximate some operator between function spaces and by building a parametric map . Such parametric maps can generally be defined in the form

where are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output codimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as multilayer perceptrons. is a pointwise nonlinearity, such as a rectified linear unit (ReLU), or a Gaussian error linear unit (GeLU). Each layer has a respective local operator (usually parameterized by a pointwise neural network), a kernel integral operator , and a bias function . Given some intermediate functional representation with domain in the -th hidden layer, a kernel integral operator is defined as

where the kernel is a learnable implicit neural network, parametrized by .

In practice, one is often given the input function to the neural operator at a specific resolution. For instance, consider the setting where one is given the evaluation of at points . Borrowing from Nyström integral approximation methods such as Riemann sum integration and Gaussian quadrature, the above integral operation can be computed as follows:

where is the sub-area volume or quadrature weight associated to the point . Thus, a simplified layer can be computed as

The above approximation, along with parametrizing as an implicit neural network, results in the graph neural operator (GNO). [15]

There have been various parameterizations of neural operators for different applications. [6] [7] [15] These typically differ in their parameterization of . The most popular instantiation is the Fourier neural operator (FNO). FNO takes and by applying the convolution theorem, arrives at the following parameterization of the kernel integral operator:

where represents the Fourier transform and represents the Fourier transform of some periodic function . That is, FNO parameterizes the kernel integration directly in Fourier space, using a prescribed number of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using the discrete Fourier transform (DFT) with frequencies below some specified threshold. The discrete Fourier transform can be computed using a fast Fourier transform (FFT) implementation.

Training

Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some Lp norm or Sobolev norm. In particular, for a dataset of size , neural operators minimize (a discretization of)

,

where is a norm on the output function space . Neural operators can be trained directly using backpropagation and gradient descent-based methods. [1]

Another training paradigm is associated with physics-informed machine learning. In particular, physics-informed neural networks (PINNs) use complete physics laws to fit neural networks to solutions of PDEs. Extensions of this paradigm to operator learning are broadly called physics-informed neural operators (PINO), [16] where loss functions can include full physics equations or partial physical laws. As opposed to standard PINNs, the PINO paradigm incorporates a data loss (as defined above) in addition to the physics loss . The physics loss quantifies how much the predicted solution of violates the PDEs equation for the input .

Related Research Articles

<span class="mw-page-title-main">Convolution</span> Integral expressing the amount of overlap of one function as it is shifted over another

In mathematics, convolution is a mathematical operation on two functions that produces a third function. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The integral is evaluated for all values of shift, producing the convolution function. The choice of which function is reflected and shifted before the integral does not change the integral result. Graphically, it expresses how the 'shape' of one function is modified by the other.

<span class="mw-page-title-main">Feynman diagram</span> Pictorial representation of the behavior of subatomic particles

In theoretical physics, a Feynman diagram is a pictorial representation of the mathematical expressions describing the behavior and interaction of subatomic particles. The scheme is named after American physicist Richard Feynman, who introduced the diagrams in 1948. The interaction of subatomic particles can be complex and difficult to understand; Feynman diagrams give a simple visualization of what would otherwise be an arcane and abstract formula. According to David Kaiser, "Since the middle of the 20th century, theoretical physicists have increasingly turned to this tool to help them undertake critical calculations. Feynman diagrams have revolutionized nearly every aspect of theoretical physics." While the diagrams are applied primarily to quantum field theory, they can also be used in other areas of physics, such as solid-state theory. Frank Wilczek wrote that the calculations that won him the 2004 Nobel Prize in Physics "would have been literally unthinkable without Feynman diagrams, as would [Wilczek's] calculations that established a route to production and observation of the Higgs particle."

In engineering, a transfer function of a system, sub-system, or component is a mathematical function that models the system's output for each possible input. It is widely used in electronic engineering tools like circuit simulators and control systems. In simple cases, this function can be represented as a two-dimensional graph of an independent scalar input versus the dependent scalar output. Transfer functions for components are used to design and analyze systems assembled from components, particularly using the block diagram technique, in electronics and control theory.

<span class="mw-page-title-main">Wavelet</span> Function for integral Fourier-like transform

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

<span class="mw-page-title-main">Fourier transform</span> Mathematical transform that expresses a function of time as a function of frequency

In physics, engineering and mathematics, the Fourier transform (FT) is an integral transform that converts a function into a form that describes the frequencies present in the original function. The output of the transform is a complex-valued function of frequency. The term Fourier transform refers to both this complex-valued function and the mathematical operation. When a distinction needs to be made the Fourier transform is sometimes called the frequency domain representation of the original function. The Fourier transform is analogous to decomposing the sound of a musical chord into the intensities of its constituent pitches.

<span class="mw-page-title-main">Korteweg–De Vries equation</span> Mathematical model of waves on a shallow water surface

In mathematics, the Korteweg–De Vries (KdV) equation is a partial differential equation (PDE) which serves as a mathematical model of waves on shallow water surfaces. It is particularly notable as the prototypical example of an integrable PDE and exhibits many of the expected behaviors for an integrable PDE, such as a large number of explicit solutions, in particular soliton solutions, and an infinite number of conserved quantities, despite the nonlinearity which typically renders PDEs intractable. The KdV can be solved by the inverse scattering method (ISM). In fact, Gardner, Greene, Kruskal and Miura developed the classical inverse scattering method to solve the KdV equation.

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

A classical field theory is a physical theory that predicts how one or more fields in physics interact with matter through field equations, without considering effects of quantization; theories that incorporate quantum mechanics are called quantum field theories. In most contexts, 'classical field theory' is specifically intended to describe electromagnetism and gravitation, two of the fundamental forces of nature.

<span class="mw-page-title-main">Linear time-invariant system</span> Mathematical model which is both linear and time-invariant

In system analysis, among other fields of study, a linear time-invariant (LTI) system is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined below. These properties apply (exactly or approximately) to many important physical systems, in which case the response y(t) of the system to an arbitrary input x(t) can be found directly using convolution: y(t) = (xh)(t) where h(t) is called the system's impulse response and ∗ represents convolution (not to be confused with multiplication). What's more, there are systematic methods for solving any such system (determining h(t)), whereas systems not meeting both properties are generally more difficult (or impossible) to solve analytically. A good example of an LTI system is any electrical circuit consisting of resistors, capacitors, inductors and linear amplifiers.

<span class="mw-page-title-main">Free field</span> Physical field theory with no forces/interactions

In physics a free field is a field without interactions, which is described by the terms of motion and mass.

In differential geometry, the Laplace–Beltrami operator is a generalization of the Laplace operator to functions defined on submanifolds in Euclidean space and, even more generally, on Riemannian and pseudo-Riemannian manifolds. It is named after Pierre-Simon Laplace and Eugenio Beltrami.

In theoretical physics, a source field is a background field coupled to the original field as

In theoretical physics, scalar field theory can refer to a relativistically invariant classical or quantum theory of scalar fields. A scalar field is invariant under any Lorentz transformation.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.

<span class="mw-page-title-main">Mild-slope equation</span> Physics phenomenon and formula

In fluid dynamics, the mild-slope equation describes the combined effects of diffraction and refraction for water waves propagating over bathymetry and due to lateral boundaries—like breakwaters and coastlines. It is an approximate model, deriving its name from being originally developed for wave propagation over mild slopes of the sea floor. The mild-slope equation is often used in coastal engineering to compute the wave-field changes near harbours and coasts.

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

The Harrow–Hassidim–Lloyd algorithm or HHL algorithm is a quantum algorithm for numerically solving a system of linear equations, designed by Aram Harrow, Avinatan Hassidim, and Seth Lloyd. The algorithm estimates the result of a scalar measurement on the solution vector to a given linear system of equations.

<span class="mw-page-title-main">Causal fermion systems</span> Candidate unified theory of physics

The theory of causal fermion systems is an approach to describe fundamental physics. It provides a unification of the weak, the strong and the electromagnetic forces with gravity at the level of classical field theory. Moreover, it gives quantum mechanics as a limiting case and has revealed close connections to quantum field theory. Therefore, it is a candidate for a unified physical theory. Instead of introducing physical objects on a preexisting spacetime manifold, the general concept is to derive spacetime as well as all the objects therein as secondary objects from the structures of an underlying causal fermion system. This concept also makes it possible to generalize notions of differential geometry to the non-smooth setting. In particular, one can describe situations when spacetime no longer has a manifold structure on the microscopic scale. As a result, the theory of causal fermion systems is a proposal for quantum geometry and an approach to quantum gravity.

Lightfieldmicroscopy (LFM) is a scanning-free 3-dimensional (3D) microscopic imaging method based on the theory of light field. This technique allows sub-second (~10 Hz) large volumetric imaging with ~1 μm spatial resolution in the condition of weak scattering and semi-transparence, which has never been achieved by other methods. Just as in traditional light field rendering, there are two steps for LFM imaging: light field capture and processing. In most setups, a microlens array is used to capture the light field. As for processing, it can be based on two kinds of representations of light propagation: the ray optics picture and the wave optics picture. The Stanford University Computer Graphics Laboratory published their first prototype LFM in 2006 and has been working on the cutting edge since then.

References

  1. 1 2 3 Patel, Ravi G.; Desjardins, Olivier (2018). "Nonlinear integro-differential operator regression with neural networks". arXiv: 1810.08552 [cs.LG].
  2. 1 2 3 4 5 Kovachki, Nikola; Li, Zongyi; Liu, Burigede; Azizzadenesheli, Kamyar; Bhattacharya, Kaushik; Stuart, Andrew; Anandkumar, Anima (2021). "Neural operator: Learning maps between function spaces" (PDF). Journal of Machine Learning Research. 24: 1–97. arXiv: 2108.08481 .
  3. Evans, L. C. (1998). Partial Differential Equations. Providence: American Mathematical Society. ISBN   0-8218-0772-2.
  4. "How AI models are transforming weather forecasting: A showcase of data-driven systems". phys.org (Press release). European Centre for Medium-Range Weather Forecasts. 6 September 2023.
  5. Russ, Dan; Abinader, Sacha (23 August 2023). "Microsoft and Accenture partner to tackle methane emissions with AI technology". Microsoft Azure Blog.
  6. 1 2 Patel, Ravi G.; Trask, Nathaniel A.; Wood, Mitchell A.; Cyr, Eric C. (January 2021). "A physics-informed operator regression framework for extracting data-driven continuum models". Computer Methods in Applied Mechanics and Engineering. 373: 113500. arXiv: 2009.11992 . Bibcode:2021CMAME.373k3500P. doi:10.1016/j.cma.2020.113500.
  7. 1 2 Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Fourier neural operator for parametric partial differential equations". arXiv: 2010.08895 [cs.LG].
  8. Hao, Karen (30 October 2020). "AI has cracked a key mathematical puzzle for understanding our world". MIT Technology Review.
  9. Ananthaswamy, Anil (19 April 2021). "Latest Neural Nets Solve World's Hardest Equations Faster Than Ever Before". Quanta Magazine.
  10. Sharma, Anuj; Singh, Sukhdeep; Ratna, S. (15 August 2023). "Graph Neural Network Operators: a Review". Multimedia Tools and Applications. 83 (8): 23413–23436. doi:10.1007/s11042-023-16440-4.
  11. Wen, Gege; Li, Zongyi; Azizzadenesheli, Kamyar; Anandkumar, Anima; Benson, Sally M. (May 2022). "U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow". Advances in Water Resources. 163: 104180. arXiv: 2109.03697 . Bibcode:2022AdWR..16304180W. doi:10.1016/j.advwatres.2022.104180.
  12. Choubineh, Abouzar; Chen, Jie; Wood, David A.; Coenen, Frans; Ma, Fei (2023). "Fourier Neural Operator for Fluid Flow in Small-Shape 2D Simulated Porous Media Dataset". Algorithms. 16 (1): 24. doi: 10.3390/a16010024 .
  13. Jiang, Chiyu Lmaxr; Esmaeilzadeh, Soheil; Azizzadenesheli, Kamyar; Kashinath, Karthik; Mustafa, Mustafa; Tchelepi, Hamdi A.; Marcus, Philip; Prabhat, Mr; Anandkumar, Anima (2020). "MESHFREEFLOWNET: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework". SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. pp. 1–15. doi:10.1109/SC41405.2020.00013. ISBN   978-1-7281-9998-6.
  14. Lu, Lu; Jin, Pengzhan; Pang, Guofei; Zhang, Zhongqiang; Karniadakis, George Em (18 March 2021). "Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators". Nature Machine Intelligence. 3 (3): 218–229. arXiv: 1910.03193 . doi:10.1038/s42256-021-00302-5.
  15. 1 2 Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Neural operator: Graph kernel network for partial differential equations". arXiv: 2003.03485 [cs.LG].
  16. Li, Zongyi; Hongkai, Zheng; Kovachki, Nikola; Jin, David; Chen, Haoxuan; Liu, Burigede; Azizzadenesheli, Kamyar; Anima, Anandkumar (2021). "Physics-Informed Neural Operator for Learning Partial Differential Equations". arXiv: 2111.03794 [cs.LG].