Neural operators

Neural operators are a class of deep learning architecture designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators in function spaces; they can receive input functions, and the output function can be evaluated at any discretization.^[1]

The primary application of neural operators is in learning surrogate maps for the solution operators of partial differential equations (PDEs)^[1]. Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs compared to existing machine learning methodologies, while being significantly faster than numerical solvers.^[2]

Operator learning

Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, one can cast the problem of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a machine learning paradigm to learn solution operators mapping the input function to the output function.

Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training.

The primary properties of neural operators that differentiate them from traditional neural networks is discretization invariance and discretization convergence^[1]. Unlike conventional neural networks, which are fixed on the discretization of training data, neural operators can adapt to various discretizations without re-training. This property improves the robustness and applicability of neural operators in different scenarios, providing consistent performance across different resolutions and grids.

Definition and formulation

Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are comprised of alternating linear maps and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear integral operators on function spaces and point-wise non-linearities.^[1] Using an analogous architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a compact set.

Neural operators seek to approximate some operator ${\mathcal {G}}:{\mathcal {A}}\to {\mathcal {U}}$ by building a parametric map ${\mathcal {G}}_{\theta }:{\mathcal {A}}\to {\mathcal {U}}$ . Let $a\in {\mathcal {A}}$ , where ${\mathcal {A}}$ denotes some input function space. Let ${\mathcal {U}}$ denote the output space and let $u\in {\mathcal {U}}$ . Neural operators are generally defined in the form

${\mathcal {G}}_{\theta }:={\mathcal {Q}}\circ \sigma (W_{T}+{\mathcal {K}}_{T}+b_{T})\circ \cdots \circ \sigma (W_{1}+{\mathcal {K}}_{1}+b_{1})\circ {\mathcal {P}},$

where ${\mathcal {P}},{\mathcal {Q}}$ are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output codimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as a multilayer perceptron. $\sigma$ is a pointwise nonlinearity, such as a rectified linear unit (ReLU), or a Gaussian error linear unit (GeLU). Each layer $i=1,\dots ,T$ has a respective local operator $W_{i}$ (usually parameterized by a pointwise neural network) and a bias function $b_{i}$ . Given some intermediate functional representation $v_{t}$ with domain $D$ in a hidden layer, a kernel integral operator ${\mathcal {K}}_{\phi }$ is defined as

$({\mathcal {K}}_{\phi }v_{t})(x)=\int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy,$

where the kernel $\kappa _{\phi }$ is a learnable implicit neural network, parametrized by $\phi$ .

In practice, we are often given the input function to the neural operator at a certain resolution for each data point. For the $i$ 'th data point, let's consider the setting where we have evaluation of $v_{t}$ at $n$ points $\{y_{j}\}_{j}^{n}$ . Borrowing from Nyström integral approximation methods such as Riemann sum integration and Gaussian quadrature, we compute the above integral operation as follows,

$\int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy\approx \sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}},$ where $\Delta _{y_{j}}$ is the sub-area volume or quadrature weight and approximation error . Ergo, a simplified layer can be computed as follows,

$v_{t+1}(x)\approx \sigma (\sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}}+W_{t}(v_{t}(y_{j}))+b_{t}(x))$

Many variants of the architecture is developed in the prior work, and some of them are supported in the neural operator library. The above approximation, along with deployment of implicit neural network for $\kappa _{\phi }$ results in graph neural operator (GNO)^[3].

The varying parameterizations of neural operators typically differ in their parameterization of $\kappa$ . There have been various parameterizations of neural operators for different applications^[2]^[3]. The most popular instantiation is the Fourier neural operator (FNO). FNO takes $\kappa _{\phi }(x,y,a(x),a(y))v_{t}(y)=\kappa _{\phi }(x-y)$ and by applying the convolution theorem, arrives at the following parameterization of the kernel integration:

$({\mathcal {K}}_{\phi }(a)v_{t})(x)={\mathcal {F}}^{-1}(R_{\phi }\cdot ({\mathcal {F}}v_{t}))(x),$

where ${\mathcal {F}}$ represents the Fourier transform and $R_{\phi }$ represents the Fourier transform of some periodic function $\kappa$ . That is, FNO parameterizes the kernel integration directly in Fourier space, using a handful of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using summation, resulting in discrete Fourier transform (DFT) with frequencies at some specified threshold. The discrete Fourier transform can be computed using a fast Fourier transform (FFT) implementation, making FNO architecture among the fastest and most sample-efficient neural operator architectures.

Training

Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some Lp norm or Sobolev norm. In particular, for a dataset of size $N$ , neural operators minimize

${\mathcal {L}}_{\mathcal {U}}(\{(a_{i},u_{i})\})=\sum _{i=1}^{N}\|u_{i}-{\mathcal {G}}_{\theta }(a_{i})\|_{\mathcal {U}}^{2}$ ,

in some norm $\|\cdot \|_{\mathcal {U}}.$ Neural operators can be trained directly using backpropagation and gradient descent-based methods.

When dealing with modeling natural phenomena, often physics equations, mostly in the form of PDEs, drive the physical world around us.^[4]. Based on this idea, physics-informed neural networksphysics-informed neural networks utilize complete physics laws to fit neural networks to solutions of PDEs. The general extension to operator learning is physics informed neural operator paradigm (PINO),^[5], where the supervision can also be channeled through physics equations and can process learning through partially available physics. PINO is mainly a supervised learning setting that is suitable for cases where partial data or partial physics in available. In short, in PINO, in addition to the data loss mentioned above, physics loss ${\mathcal {L}}_{PDE}((a,{\mathcal {G}}_{\theta }(a))$ , is used for further training. The physics loss ${\mathcal {L}}_{PDE}((a,{\mathcal {G}}_{\theta }(a))$ quantifies how much the predicted solution of ${\mathcal {G}}_{\theta }(a)$ violates the PDEs equation for the input $a$ .

References

^ ^a ^b ^c ^d Kovachki, Nikola; Li, Zongyi; Liu, Burigede; Azizzadenesheli, Kamyar; Bhattacharya, Kaushik; Stuart, Andrew; Anandkumar, Anima. "Neural operator: Learning maps between function spaces" (PDF). Journal of Machine Learning Research. 24: 1-97.
^ ^a ^b Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Fourier neural operator for parametric partial differential equations" (PDF). arXiv preprint arXiv:2010.08895.
^ ^a ^b Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Neural operator: Graph kernel network for partial differential equations" (PDF). arXiv preprint arXiv:2003.03485.
^ Evans, L. C. (1998). "Partial Differential Equations". Providence: American Mathematical Society. ISBN 0-8218-0772-2. {{cite journal}}: Cite journal requires |journal= (help)
^ Li, Zongyi; Hongkai, Zheng; Kovachki, Nikola; Jin, David; Chen, Haoxuan; Liu, Burigede; Azizzadenesheli, Kamyar; Anima, Anandkumar (2021). "Physics-Informed Neural Operator for Learning Partial Differential Equations". https://arxiv.org/pdf/2111.03794.pdf. {{cite journal}}: External link in |journal= (help)

[NO_journal-1] Kovachki, Nikola; Li, Zongyi; Liu, Burigede; Azizzadenesheli, Kamyar; Bhattacharya, Kaushik; Stuart, Andrew; Anandkumar, Anima. "Neural operator: Learning maps between function spaces" (PDF). Journal of Machine Learning Research. 24: 1-97.

[FNO-2] Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Fourier neural operator for parametric partial differential equations" (PDF). arXiv preprint arXiv:2010.08895.

[Graph_NO-3] Li, Zongyi; Kovachki, Nikola; Azizzadenesheli, Kamyar; Liu, Burigede; Bhattacharya, Kaushik; Stuart, Andrew; Anima, Anandkumar (2020). "Neural operator: Graph kernel network for partial differential equations" (PDF). arXiv preprint arXiv:2003.03485.

[Evans-4] Evans, L. C. (1998). "Partial Differential Equations". Providence: American Mathematical Society. ISBN 0-8218-0772-2. {{cite journal}}: Cite journal requires |journal= (help)

[PINO-5] Li, Zongyi; Hongkai, Zheng; Kovachki, Nikola; Jin, David; Chen, Haoxuan; Liu, Burigede; Azizzadenesheli, Kamyar; Anima, Anandkumar (2021). "Physics-Informed Neural Operator for Learning Partial Differential Equations". https://arxiv.org/pdf/2111.03794.pdf. {{cite journal}}: External link in |journal= (help)

[1]

[2]

[3]

[4]

[5]