Graph neural network

Last updated

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs. [1] [2] [3] [4] [5]

Contents

Basic building blocks of a graph neural network (GNN).
(
1
)
{\displaystyle (1)}
Permutation equivariant layer.
(
2
)
{\displaystyle (2)}
Local pooling layer.
(
3
)
{\displaystyle (3)}
Global pooling (or readout) layer. Colors indicate features. GNN building blocks.png
Basic building blocks of a graph neural network (GNN). Permutation equivariant layer. Local pooling layer. Global pooling (or readout) layer. Colors indicate features.

In the more general subject of "geometric deep learning", certain existing neural network architectures can be interpreted as GNNs operating on suitably defined graphs. [6] A convolutional neural network layer, in the context of computer vision, can be seen as a GNN applied to graphs whose nodes are pixels and only adjacent pixels are connected by edges in the graph. A transformer layer, in natural language processing, can be seen as a GNN applied to complete graphs whose nodes are words or tokens in a passage of natural language text.

The key design element of GNNs is the use of pairwise message passing, such that graph nodes iteratively update their representations by exchanging information with their neighbors. Since their inception, several different GNN architectures have been proposed, [2] [3] [7] [8] [9] which implement different flavors of message passing, [6] [10] started by recursive [2] or convolutional constructive [3] approaches. As of 2022, whether it is possible to define GNN architectures "going beyond" message passing, or if every GNN can be built on message passing over suitably defined graphs, is an open research question. [11]

Relevant application domains for GNNs include Natural Language Processing, [12] social networks, [13] citation networks, [14] molecular biology, [15] chemistry, [16] [17] physics [18] and NP-hard combinatorial optimization problems. [19]

Several open source libraries implementing graph neural networks are available, such as PyTorch Geometric [20] (PyTorch), TensorFlow GNN [21] (TensorFlow), jraph [22] (Google JAX), and GraphNeuralNetworks.jl [23] /GeometricFlux.jl [24] (Julia, Flux).

Architecture

The architecture of a generic GNN implements the following fundamental layers: [6]

  1. Permutation equivariant: a permutation equivariant layer maps a representation of a graph into an updated representation of the same graph. In the literature, permutation equivariant layers are implemented via pairwise message passing between graph nodes. [6] [11] Intuitively, in a message passing layer, nodes update their representations by aggregating the messages received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop.
  2. Local pooling: a local pooling layer coarsens the graph via downsampling. Local pooling is used to increase the receptive field of a GNN, in a similar fashion to pooling layers in convolutional neural networks. Examples include k-nearest neighbours pooling, top-k pooling, [25] and self-attention pooling. [26]
  3. Global pooling: a global pooling layer, also known as readout layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output. [27] Examples include element-wise sum, mean or maximum.

It has been demonstrated that GNNs cannot be more expressive than the Weisfeiler–Leman Graph Isomorphism Test. [28] [29] In practice, this means that there exist different graph structures (e.g., molecules with the same atoms but different bonds) that cannot be distinguished by GNNs. More powerful GNNs operating on higher-dimension geometries such as simplicial complexes can be designed. [30] [31] [10] As of 2022, whether or not future architectures will overcome the message passing primitive is an open research question. [11]

Non-isomorphic graphs that cannot be distinguished by a GNN due to the limitations of the Weisfeiler-Lehman Graph Isomorphism Test. Colors indicate node features. GNN representational limits.png
Non-isomorphic graphs that cannot be distinguished by a GNN due to the limitations of the Weisfeiler-Lehman Graph Isomorphism Test. Colors indicate node features.

Message passing layers

Node representation update in a Message Passing Neural Network (MPNN) layer. Node
x
0
{\displaystyle \mathbf {x} _{0}}
receives messages sent by all of its immediate neighbours
x
1
{\displaystyle \mathbf {x} _{1}}
to
x
4
{\displaystyle \mathbf {x} _{4}}
. Messages are computing via the message function
ph
{\displaystyle \phi }
, which accounts for the features of both senders and receiver. Message Passing Neural Network.png
Node representation update in a Message Passing Neural Network (MPNN) layer. Node receives messages sent by all of its immediate neighbours to . Messages are computing via the message function , which accounts for the features of both senders and receiver.

Message passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph. Formally, they can be expressed as message passing neural networks (MPNNs). [6]

Let be a graph, where is the node set and is the edge set. Let be the neighbourhood of some node . Additionally, let be the features of node , and be the features of edge . An MPNN layer can be expressed as follows: [6]

where and are differentiable functions (e.g., artificial neural networks), and is a permutation invariant aggregation operator that can accept an arbitrary number of inputs (e.g., element-wise sum, mean, or max). In particular, and are referred to as update and message functions, respectively. Intuitively, in an MPNN computational block, graph nodes update their representations by aggregating the messages received from their neighbours.

The outputs of one or more MPNN layers are node representations for each node in the graph. Node representations can be employed for any downstream task, such as node/graph classification or edge prediction.

Graph nodes in an MPNN update their representation aggregating information from their immediate neighbours. As such, stacking MPNN layers means that one node will be able to communicate with nodes that are at most "hops" away. In principle, to ensure that every node receives information from every other node, one would need to stack a number of MPNN layers equal to the graph diameter. However, stacking many MPNN layers may cause issues such as oversmoothing [32] and oversquashing. [33] Oversmoothing refers to the issue of node representations becoming indistinguishable. Oversquashing refers to the bottleneck that is created by squeezing long-range dependencies into fixed-size representations. Countermeasures such as skip connections [8] [34] (as in residual neural networks), gated update rules [35] and jumping knowledge [36] can mitigate oversmoothing. Modifying the final layer to be a fully-adjacent layer, i.e., by considering the graph as a complete graph, can mitigate oversquashing in problems where long-range dependencies are required. [33]

Other "flavours" of MPNN have been developed in the literature, [6] such as graph convolutional networks [7] and graph attention networks, [9] whose definitions can be expressed in terms of the MPNN formalism.

Graph convolutional network

The graph convolutional network (GCN) was first introduced by Thomas Kipf and Max Welling in 2017. [7]

A GCN layer defines a first-order approximation of a localized spectral filter on graphs. GCNs can be understood as a generalization of convolutional neural networks to graph-structured data.

The formal expression of a GCN layer reads as follows:

where is the matrix of node representations , is the matrix of node features , is an activation function (e.g., ReLU), is the graph adjacency matrix with the addition of self-loops, is the graph degree matrix with the addition of self-loops, and is a matrix of trainable parameters.

In particular, let be the graph adjacency matrix: then, one can define and , where denotes the identity matrix. This normalization ensures that the eigenvalues of are bounded in the range , avoiding numerical instabilities and exploding/vanishing gradients.

A limitation of GCNs is that they do not allow multidimensional edge features . [7] It is however possible to associate scalar weights to each edge by imposing , i.e., by setting each nonzero entry in the adjacency matrix equal to the weight of the corresponding edge.

Graph attention network

The graph attention network (GAT) was introduced by Petar Veličković et al. in 2018. [9]

Graph attention network is a combination of a graph neural network and an attention layer. The implementation of attention layer in graphical neural networks helps provide attention or focus to the important information from the data instead of focusing on the whole data.

A multi-head GAT layer can be expressed as follows:

where is the number of attention heads, denotes vector concatenation, is an activation function (e.g., ReLU), are attention coefficients, and is a matrix of trainable parameters for the -th attention head.

For the final GAT layer, the outputs from each attention head are averaged before the application of the activation function. Formally, the final GAT layer can be written as:

Attention in Machine Learning is a technique that mimics cognitive attention. In the context of learning on graphs, the attention coefficient measures how important is node to node .

Normalized attention coefficients are computed as follows:

where is a vector of learnable weights, indicates transposition, and is a modified ReLU activation function. Attention coefficients are normalized to make them easily comparable across different nodes. [9]

A GCN can be seen as a special case of a GAT where attention coefficients are not learnable, but fixed and equal to the edge weights .

Gated graph sequence neural network

The gated graph sequence neural network (GGS-NN) was introduced by Yujia Li et al. in 2015. [35] The GGS-NN extends the GNN formulation by Scarselli et al. [2] to output sequences. The message passing framework is implemented as an update rule to a gated recurrent unit (GRU) cell.

A GGS-NN can be expressed as follows:

where denotes vector concatenation, is a vector of zeros, is a matrix of learnable parameters, is a GRU cell, and denotes the sequence index. In a GGS-NN, the node representations are regarded as the hidden states of a GRU cell. The initial node features are zero-padded up to the hidden state dimension of the GRU cell. The same GRU cell is used for updating representations for each node.

Local pooling layers

Local pooling layers coarsen the graph via downsampling. We present here several learnable local pooling strategies that have been proposed. [27] For each cases, the input is the initial graph is represented by a matrix of node features, and the graph adjacency matrix . The output is the new matrix of node features, and the new graph adjacency matrix .

Top-k pooling

We first set

where is a learnable projection vector. The projection vector computes a scalar projection value for each graph node.

The top-k pooling layer [25] can then be formalised as follows:

where is the subset of nodes with the top-k highest projection scores, denotes element-wise matrix multiplication, and is the sigmoid function. In other words, the nodes with the top-k highest projection scores are retained in the new adjacency matrix . The operation makes the projection vector trainable by backpropagation, which otherwise would produce discrete outputs. [25]

Self-attention pooling

We first set

where is a generic permutation equivariant GNN layer (e.g., GCN, GAT, MPNN).

The Self-attention pooling layer [26] can then be formalised as follows:

where is the subset of nodes with the top-k highest projection scores, denotes element-wise matrix multiplication.

The self-attention pooling layer can be seen as an extension of the top-k pooling layer. Differently from top-k pooling, the self-attention scores computed in self-attention pooling account both for the graph features and the graph topology.

Applications

Protein folding

Graph neural networks are one of the main building blocks of AlphaFold, an artificial intelligence program developed by Google's DeepMind for solving the protein folding problem in biology. AlphaFold achieved first place in several CASP competitions. [37] [38] [36]

Social networks

Social networks are a major application domain for GNNs due to their natural representation as social graphs. GNNs are used to develop recommender systems based on both social relations and item relations. [39] [13]

Combinatorial optimization

GNNs are used as fundamental building blocks for several combinatorial optimization algorithms. [40] Examples include computing shortest paths or Eulerian circuits for a given graph, [35] deriving chip placements superior or competitive to handcrafted human solutions, [41] and improving expert-designed branching rules in branch and bound. [42]

Cyber security

When viewed as a graph, a network of computers can be analyzed with GNNs for anomaly detection. Anomalies within provenance graphs often correlate to malicious activity within the network. GNNs have been used to identify these anomalies on individual nodes [43] and within paths [44] to detect malicious processes, or on the edge level [45] to detect lateral movement.

Related Research Articles

The information bottleneck method is a technique in information theory introduced by Naftali Tishby, Fernando C. Pereira, and William Bialek. It is designed for finding the best tradeoff between accuracy and complexity (compression) when summarizing a random variable X, given a joint probability distribution p(X,Y) between X and an observed relevant variable Y - and self-described as providing "a surprisingly rich framework for discussing a variety of problems in signal processing and learning".

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear. Modern activation functions include the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model, the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model.

Algebraic signal processing (ASP) is an emerging area of theoretical signal processing (SP). In the algebraic theory of signal processing, a set of filters is treated as an (abstract) algebra, a set of signals is treated as a module or vector space, and convolution is treated as an algebra representation. The advantage of algebraic signal processing is its generality and portability.

Artificial neural networks are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued vectors to real-valued vectors. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces.

There are many types of artificial neural networks (ANN).

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

Dilution and dropout are regularization techniques for reducing overfitting in artificial neural networks by preventing complex co-adaptations on training data. They are an efficient way of performing model averaging with neural networks. Dilution refers to thinning weights, while dropout refers to randomly "dropping out", or omitting, units during the training process of a neural network. Both trigger the same type of regularization.

<span class="mw-page-title-main">Stochastic block model</span>

The stochastic block model is a generative model for random graphs. This model tends to produce graphs containing communities, subsets of nodes characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation was first introduced in 1983 in the field of social network analysis by Paul W. Holland et al. The stochastic block model is important in statistics, machine learning, and network science, where it serves as a useful benchmark for the task of recovering community structure in graph data.

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features, but lacks a context vector or output gate, resulting in fewer parameters than LSTM. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. GRUs showed that gating is indeed helpful in general, and Bengio's team came to no concrete conclusion on which of the two gating units was better.

A capsule neural network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Machine learning algorithm used for natural-language processing

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". It has no recurrent units, and thus requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets, such as the Wikipedia corpus and Common Crawl. Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained.

A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow, which is a statistical method using the change-of-variable law of probabilities to transform a simple distribution into a complex one.

<span class="mw-page-title-main">Knowledge graph embedding</span> Dimensionality reduction of graph-based semantic data objects [machine learning task]

In representation learning, knowledge graph embedding (KGE), also referred to as knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

Tensor informally refers in machine learning to two different concepts that organize and represent data. Data may be organized in a multidimensional array (M-way array) that is informally referred to as a "data tensor"; however in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor") may be analyzed either by artificial neural networks or tensor methods.

Topological deep learning (TDL) represents a field at the intersection of topology and deep learning, offering approaches to analyze and learn from data structured in topological spaces. By leveraging the principles of topology, TDL offers an approach to understanding and processing data supported on topological spaces.

References

  1. Wu, Lingfei; Cui, Peng; Pei, Jian; Zhao, Liang (2022). "Graph Neural Networks: Foundations, Frontiers, and Applications". Springer Singapore: 725.
  2. 1 2 3 4 Scarselli, Franco; Gori, Marco; Tsoi, Ah Chung; Hagenbuchner, Markus; Monfardini, Gabriele (2009). "The Graph Neural Network Model". IEEE Transactions on Neural Networks. 20 (1): 61–80. doi:10.1109/TNN.2008.2005605. ISSN   1941-0093. PMID   19068426. S2CID   206756462.
  3. 1 2 3 Micheli, Alessio (2009). "Neural Network for Graphs: A Contextual Constructive Approach". IEEE Transactions on Neural Networks. 20 (3): 498–511. doi:10.1109/TNN.2008.2010350. ISSN   1045-9227. PMID   19193509. S2CID   17486263.
  4. Sanchez-Lengeling, Benjamin; Reif, Emily; Pearce, Adam; Wiltschko, Alex (2021-09-02). "A Gentle Introduction to Graph Neural Networks". Distill. 6 (9): e33. doi: 10.23915/distill.00033 . ISSN   2476-0757.
  5. Daigavane, Ameya; Ravindran, Balaraman; Aggarwal, Gaurav (2021-09-02). "Understanding Convolutions on Graphs". Distill. 6 (9): e32. doi: 10.23915/distill.00032 . ISSN   2476-0757. S2CID   239678898.
  6. 1 2 3 4 5 6 7 Bronstein, Michael M.; Bruna, Joan; Cohen, Taco; Veličković, Petar (May 4, 2021). "Geometric Deep Learning: Grids, Groups, Graphs Geodesics and Gauges". arXiv: 2104.13478 [cs.LG].
  7. 1 2 3 4 Kipf, Thomas N; Welling, Max (2016). "Semi-supervised classification with graph convolutional networks". IEEE Transactions on Neural Networks. 5 (1): 61–80. arXiv: 1609.02907 . doi:10.1109/TNN.2008.2005605. PMID   19068426. S2CID   206756462.
  8. 1 2 Hamilton, William; Ying, Rex; Leskovec, Jure (2017). "Inductive Representation Learning on Large Graphs" (PDF). Neural Information Processing Systems. 31. arXiv: 1706.02216 via Stanford.
  9. 1 2 3 4 Veličković, Petar; Cucurull, Guillem; Casanova, Arantxa; Romero, Adriana; Liò, Pietro; Bengio, Yoshua (2018-02-04). "Graph Attention Networks". arXiv: 1710.10903 [stat.ML].
  10. 1 2 Hajij, M.; Zamzmi, G.; Papamarkou, T.; Miolane, N.; Guzmán-Sáenz, A.; Ramamurthy, K. N.; Schaub, M. T. (2022). "Topological deep learning: Going beyond graph data". arXiv: 2206.00606 .
  11. 1 2 3 Veličković, Petar (2022). "Message passing all the way up". arXiv: 2202.11097 [cs.LG].
  12. Wu, Lingfei; Chen, Yu; Shen, Kai; Guo, Xiaojie; Gao, Hanning; Li, Shucheng; Pei, Jian; Long, Bo (2023). "Graph Neural Networks for Natural Language Processing: A Survey". Foundations and Trends in Machine Learning. 16 (2): 119–328. arXiv: 2106.06090 . doi:10.1561/2200000096. ISSN   1941-0093. PMID   19068426. S2CID   206756462.
  13. 1 2 Ying, Rex; He, Ruining; Chen, Kaifeng; Eksombatchai, Pong; Hamilton, William L.; Leskovec, Jure (2018). Graph Convolutional Neural Networks for Web-Scale Recommender Systems. pp. 974–983. arXiv: 1806.01973 . doi:10.1145/3219819.3219890. ISBN   9781450355520. S2CID   46949657.
  14. "Stanford Large Network Dataset Collection". snap.stanford.edu. Retrieved 2021-07-05.
  15. Zhang, Weihang; Cui, Yang; Liu, Bowen; Loza, Martin; Park, Sung-Joon; Nakai, Kenta (1 December 2023). "HyGAnno: Hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data". doi:10.1101/2023.11.29.569114.{{cite journal}}: Cite journal requires |journal= (help)
  16. Gilmer, Justin; Schoenholz, Samuel S.; Riley, Patrick F.; Vinyals, Oriol; Dahl, George E. (2017-07-17). "Neural Message Passing for Quantum Chemistry". Proceedings of Machine Learning Research: 1263–1272. arXiv: 1704.01212 .
  17. Coley, Connor W.; Jin, Wengong; Rogers, Luke; Jamison, Timothy F.; Jaakkola, Tommi S.; Green, William H.; Barzilay, Regina; Jensen, Klavs F. (2019-01-02). "A graph-convolutional neural network model for the prediction of chemical reactivity". Chemical Science. 10 (2): 370–377. doi: 10.1039/C8SC04228D . ISSN   2041-6539. PMC   6335848 . PMID   30746086.
  18. Qasim, Shah Rukh; Kieseler, Jan; Iiyama, Yutaro; Pierini, Maurizio Pierini (2019). "Learning representations of irregular particle-detector geometry with distance-weighted graph networks". The European Physical Journal C. 79 (7). arXiv: 1902.07987 . doi: 10.1140/epjc/s10052-019-7113-9 . S2CID   88518244.
  19. Li, Zhuwen; Chen, Qifeng; Koltun, Vladlen (2018). "Combinatorial optimization with graph convolutional networks and guided tree search". Neural Information Processing Systems. 31: 537–546. arXiv: 1810.10659 .
  20. Matthias, Fey; Lenssen, Jan E. (2019). "Fast Graph Representation Learning with PyTorch Geometric". arXiv: 1903.02428 [cs.LG].
  21. "Tensorflow GNN". GitHub . Retrieved 30 June 2022.
  22. "jraph". GitHub . Retrieved 30 June 2022.
  23. Lucibello, Carlo (2021). "GraphNeuralNetworks.jl" . Retrieved 2023-09-21.
  24. FluxML/GeometricFlux.jl, FluxML, 2024-01-31, retrieved 2024-02-03
  25. 1 2 3 Gao, Hongyang; Ji, Shuiwang Ji (2019). "Graph U-Nets". arXiv: 1905.05178 [cs.LG].
  26. 1 2 Lee, Junhyun; Lee, Inyeop; Kang, Jaewoo (2019). "Self-Attention Graph Pooling". arXiv: 1904.08082 [cs.LG].
  27. 1 2 Liu, Chuang; Zhan, Yibing; Li, Chang; Du, Bo; Wu, Jia; Hu, Wenbin; Liu, Tongliang; Tao, Dacheng (2022). "Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities". arXiv: 2204.07321 [cs.LG].
  28. Douglas, B. L. (2011-01-27). "The Weisfeiler–Lehman Method and Graph Isomorphism Testing". arXiv: 1101.5211 [math.CO].
  29. Xu, Keyulu; Hu, Weihua; Leskovec, Jure; Jegelka, Stefanie (2019-02-22). "How Powerful are Graph Neural Networks?". arXiv: 1810.00826 [cs.LG].
  30. Bodnar, Christian; Frasca, Fabrizio; Guang Wang, Yu; Otter, Nina; Montúfar, Guido; Liò, Pietro; Bronstein, Michael (2021). "Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks". arXiv: 2103.03212 [cs.LG].
  31. Grady, Leo; Polimeni, Jonathan (2011). Discrete Calculus: Applied Analysis on Graphs for Computational Science (PDF). Springer.
  32. Chen, Deli; Lin, Yankai; Li, Wei; Li, Peng; Zhou, Jie; Sun, Xu (2020). "Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View". Proceedings of the AAAI Conference on Artificial Intelligence. 34 (4): 3438–3445. arXiv: 1909.03211 . doi:10.1609/aaai.v34i04.5747. S2CID   202539008.
  33. 1 2 Alon, Uri; Yahav, Eran (2021). "On the Bottleneck of Graph Neural Networks and its Practical Implications". arXiv: 2006.05205 [cs.LG].
  34. Xu, Keyulu; Zhang, Mozhi; Jegelka, Stephanie; Kawaguchi, Kenji (2021). "Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth". arXiv: 2105.04550 [cs.LG].
  35. 1 2 3 Li, Yujia; Tarlow, Daniel; Brockschmidt, Mark; Zemel, Richard (2016). "Gated Graph Sequence Neural Networks". arXiv: 1511.05493 [cs.LG].
  36. 1 2 Xu, Keyulu; Li, Chengtao; Tian, Yonglong; Sonobe, Tomohiro; Kawarabayashi, Ken-ichi; Jegelka, Stefanie (2018). "Representation Learning on Graphs with Jumping Knowledge Networks". arXiv: 1806.03536 [cs.LG].
  37. Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 30 November 2020.
  38. "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.
  39. Fan, Wenqi; Ma, Yao; Li, Qing; He, Yuan; Zhao, Eric; Tang, Jiliang; Yin, Dawei (2019). Graph Neural Networks for Social Recommendation. pp. 417–426. arXiv: 1902.07243 . doi:10.1145/3308558.3313488. hdl:10397/81232. ISBN   9781450366748. S2CID   67769538.
  40. Cappart, Quentin; Chételat, Didier; Khalil, Elias; Lodi, Andrea; Morris, Christopher; Veličković, Petar (2021). "Combinatorial optimization and reasoning with graph neural networks". arXiv: 2102.09544 [cs.LG].
  41. Mirhoseini, Azalia; Goldie, Anna; Yazgan, Mustafa; Jiang, Joe Wenjie; Songhori, Ebrahim; Wang, Shen; Lee, Young-Joon; Johnson, Eric; Pathak, Omkar; Nazi, Azade; Pak, Jiwoo; Tong, Andy; Srinivasa, Kavya; Hang, William; Tuncer, Emre; Le, Quoc V.; Laudon, James; Ho, Richard; Carpenter, Roger; Dean, Jeff (2021). "A graph placement methodology for fast chip design". Nature. 594 (7862): 207–212. doi:10.1038/s41586-021-03544-w. PMID   34108699. S2CID   235395490.
  42. Gasse, Maxime; Chételat, Didier; Ferroni, Nicola; Charlin, Laurent; Lodi, Andrea (2019). "Exact Combinatorial Optimization with Graph Convolutional Neural Networks". arXiv: 1906.01629 [cs.LG].
  43. Wang, Su; Wang, Zhiliang; Zhou, Tao; Sun, Hongbin; Yin, Xia; Han, Dongqi; Zhang, Han; Shi, Xingang; Yang, Jiahai (2022). "Threatrace: Detecting and Tracing Host-Based Threats in Node Level Through Provenance Graph Learning". IEEE Transactions on Information Forensics and Security. 17: 3972–3987. arXiv: 2111.04333 . doi:10.1109/TIFS.2022.3208815. ISSN   1556-6021. S2CID   243847506.
  44. Wang, Qi; Hassan, Wajih Ul; Li, Ding; Jee, Kangkook; Yu, Xiao (2020). "You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis". Network and Distributed Systems Security (NDSS) Symposium. doi: 10.14722/ndss.2020.24167 . ISBN   978-1-891562-61-7. S2CID   211267791.
  45. King, Isaiah J.; Huang, H. Howie (2022). "Euler: Detecting Network Lateral Movement via Scalable Temporal Link Prediction" (PDF). In Proceedings of the 29th Network and Distributed Systems Security Symposium (NDSS). doi:10.14722/ndss.2022.24107. S2CID   248221601.