Graph neural network

Last updated

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs. [1] [2] [3] [4] [5]

Contents

Basic building blocks of a graph neural network (GNN).
(
1
)
{\displaystyle (1)}
Permutation equivariant layer.
(
2
)
{\displaystyle (2)}
Local pooling layer.
(
3
)
{\displaystyle (3)}
Global pooling (or readout) layer. Colors indicate features. GNN building blocks.png
Basic building blocks of a graph neural network (GNN). Permutation equivariant layer. Local pooling layer. Global pooling (or readout) layer. Colors indicate features.

In the more general subject of "geometric deep learning", certain existing neural network architectures can be interpreted as GNNs operating on suitably defined graphs. [6] A convolutional neural network layer, in the context of computer vision, can be considered a GNN applied to graphs whose nodes are pixels and only adjacent pixels are connected by edges in the graph. A transformer layer, in natural language processing, can be considered a GNN applied to complete graphs whose nodes are words or tokens in a passage of natural language text.

The key design element of GNNs is the use of pairwise message passing, such that graph nodes iteratively update their representations by exchanging information with their neighbors. Several GNN architectures have been proposed, [2] [3] [7] [8] [9] which implement different flavors of message passing, [6] [10] started by recursive [2] or convolutional constructive [3] approaches. As of 2022, it is an open question whether it is possible to define GNN architectures "going beyond" message passing, or instead every GNN can be built on message passing over suitably defined graphs. [11]

Relevant application domains for GNNs include natural language processing, [12] social networks, [13] citation networks, [14] molecular biology, [15] chemistry, [16] [17] physics [18] and NP-hard combinatorial optimization problems. [19]

Open source libraries implementing GNNs include PyTorch Geometric [20] (PyTorch), TensorFlow GNN [21] (TensorFlow), Deep Graph Library [22] (framework agnostic), jraph [23] (Google JAX), and GraphNeuralNetworks.jl [24] /GeometricFlux.jl [25] (Julia, Flux).

Architecture

The architecture of a generic GNN implements the following fundamental layers: [6]

  1. Permutation equivariant: a permutation equivariant layer maps a representation of a graph into an updated representation of the same graph. In the literature, permutation equivariant layers are implemented via pairwise message passing between graph nodes. [6] [11] Intuitively, in a message passing layer, nodes update their representations by aggregating the messages received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop.
  2. Local pooling: a local pooling layer coarsens the graph via downsampling. Local pooling is used to increase the receptive field of a GNN, in a similar fashion to pooling layers in convolutional neural networks. Examples include k-nearest neighbours pooling, top-k pooling, [26] and self-attention pooling. [27]
  3. Global pooling: a global pooling layer, also known as readout layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output. [28] Examples include element-wise sum, mean or maximum.

It has been demonstrated that GNNs cannot be more expressive than the Weisfeiler–Leman Graph Isomorphism Test. [29] [30] In practice, this means that there exist different graph structures (e.g., molecules with the same atoms but different bonds) that cannot be distinguished by GNNs. More powerful GNNs operating on higher-dimension geometries such as simplicial complexes can be designed. [31] [32] [10] As of 2022, whether or not future architectures will overcome the message passing primitive is an open research question. [11]

Non-isomorphic graphs that cannot be distinguished by a GNN due to the limitations of the Weisfeiler-Lehman Graph Isomorphism Test. Colors indicate node features. GNN representational limits.png
Non-isomorphic graphs that cannot be distinguished by a GNN due to the limitations of the Weisfeiler-Lehman Graph Isomorphism Test. Colors indicate node features.

Message passing layers

Node representation update in a Message Passing Neural Network (MPNN) layer. Node
x
0
{\displaystyle \mathbf {x} _{0}}
receives messages sent by all of its immediate neighbours
x
1
{\displaystyle \mathbf {x} _{1}}
to
x
4
{\displaystyle \mathbf {x} _{4}}
. Messages are computing via the message function
ps
{\displaystyle \psi }
, which accounts for the features of both senders and receiver. Message Passing Neural Network.png
Node representation update in a Message Passing Neural Network (MPNN) layer. Node receives messages sent by all of its immediate neighbours to . Messages are computing via the message function , which accounts for the features of both senders and receiver.

Message passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph. Formally, they can be expressed as message passing neural networks (MPNNs). [6]

Let be a graph, where is the node set and is the edge set. Let be the neighbourhood of some node . Additionally, let be the features of node , and be the features of edge . An MPNN layer can be expressed as follows: [6]

where and are differentiable functions (e.g., artificial neural networks), and is a permutation invariant aggregation operator that can accept an arbitrary number of inputs (e.g., element-wise sum, mean, or max). In particular, and are referred to as update and message functions, respectively. Intuitively, in an MPNN computational block, graph nodes update their representations by aggregating the messages received from their neighbours.

The outputs of one or more MPNN layers are node representations for each node in the graph. Node representations can be employed for any downstream task, such as node/graph classification or edge prediction.

Graph nodes in an MPNN update their representation aggregating information from their immediate neighbours. As such, stacking MPNN layers means that one node will be able to communicate with nodes that are at most "hops" away. In principle, to ensure that every node receives information from every other node, one would need to stack a number of MPNN layers equal to the graph diameter. However, stacking many MPNN layers may cause issues such as oversmoothing [33] and oversquashing. [34] Oversmoothing refers to the issue of node representations becoming indistinguishable. Oversquashing refers to the bottleneck that is created by squeezing long-range dependencies into fixed-size representations. Countermeasures such as skip connections [8] [35] (as in residual neural networks), gated update rules [36] and jumping knowledge [37] can mitigate oversmoothing. Modifying the final layer to be a fully-adjacent layer, i.e., by considering the graph as a complete graph, can mitigate oversquashing in problems where long-range dependencies are required. [34]

Other "flavours" of MPNN have been developed in the literature, [6] such as graph convolutional networks [7] and graph attention networks, [9] whose definitions can be expressed in terms of the MPNN formalism.

Graph convolutional network

The graph convolutional network (GCN) was first introduced by Thomas Kipf and Max Welling in 2017. [7]

A GCN layer defines a first-order approximation of a localized spectral filter on graphs. GCNs can be understood as a generalization of convolutional neural networks to graph-structured data.

The formal expression of a GCN layer reads as follows:

where is the matrix of node representations , is the matrix of node features , is an activation function (e.g., ReLU), is the graph adjacency matrix with the addition of self-loops, is the graph degree matrix with the addition of self-loops, and is a matrix of trainable parameters.

In particular, let be the graph adjacency matrix: then, one can define and , where denotes the identity matrix. This normalization ensures that the eigenvalues of are bounded in the range , avoiding numerical instabilities and exploding/vanishing gradients.

A limitation of GCNs is that they do not allow multidimensional edge features . [7] It is however possible to associate scalar weights to each edge by imposing , i.e., by setting each nonzero entry in the adjacency matrix equal to the weight of the corresponding edge.

Graph attention network

The graph attention network (GAT) was introduced by Petar Veličković et al. in 2018. [9]

Graph attention network is a combination of a graph neural network and an attention layer. The implementation of attention layer in graphical neural networks helps provide attention or focus to the important information from the data instead of focusing on the whole data.

A multi-head GAT layer can be expressed as follows:

where is the number of attention heads, denotes vector concatenation, is an activation function (e.g., ReLU), are attention coefficients, and is a matrix of trainable parameters for the -th attention head.

For the final GAT layer, the outputs from each attention head are averaged before the application of the activation function. Formally, the final GAT layer can be written as:

Attention in Machine Learning is a technique that mimics cognitive attention. In the context of learning on graphs, the attention coefficient measures how important is node to node .

Normalized attention coefficients are computed as follows:

where is a vector of learnable weights, indicates transposition, are the edge features (if present), and is a modified ReLU activation function. Attention coefficients are normalized to make them easily comparable across different nodes. [9]

A GCN can be seen as a special case of a GAT where attention coefficients are not learnable, but fixed and equal to the edge weights .

Gated graph sequence neural network

The gated graph sequence neural network (GGS-NN) was introduced by Yujia Li et al. in 2015. [36] The GGS-NN extends the GNN formulation by Scarselli et al. [2] to output sequences. The message passing framework is implemented as an update rule to a gated recurrent unit (GRU) cell.

A GGS-NN can be expressed as follows:

where denotes vector concatenation, is a vector of zeros, is a matrix of learnable parameters, is a GRU cell, and denotes the sequence index. In a GGS-NN, the node representations are regarded as the hidden states of a GRU cell. The initial node features are zero-padded up to the hidden state dimension of the GRU cell. The same GRU cell is used for updating representations for each node.

Local pooling layers

Local pooling layers coarsen the graph via downsampling. We present here several learnable local pooling strategies that have been proposed. [28] For each case, the input is the initial graph is represented by a matrix of node features, and the graph adjacency matrix . The output is the new matrix of node features, and the new graph adjacency matrix .

Top-k pooling

We first set

where is a learnable projection vector. The projection vector computes a scalar projection value for each graph node.

The top-k pooling layer [26] can then be formalised as follows:

where is the subset of nodes with the top-k highest projection scores, denotes element-wise matrix multiplication, and is the sigmoid function. In other words, the nodes with the top-k highest projection scores are retained in the new adjacency matrix . The operation makes the projection vector trainable by backpropagation, which otherwise would produce discrete outputs. [26]

Self-attention pooling

We first set

where is a generic permutation equivariant GNN layer (e.g., GCN, GAT, MPNN).

The Self-attention pooling layer [27] can then be formalised as follows:

where is the subset of nodes with the top-k highest projection scores, denotes element-wise matrix multiplication.

The self-attention pooling layer can be seen as an extension of the top-k pooling layer. Differently from top-k pooling, the self-attention scores computed in self-attention pooling account both for the graph features and the graph topology.

Applications

Protein folding

Graph neural networks are one of the main building blocks of AlphaFold, an artificial intelligence program developed by Google's DeepMind for solving the protein folding problem in biology. AlphaFold achieved first place in several CASP competitions. [38] [39] [37]

Social networks

Social networks are a major application domain for GNNs due to their natural representation as social graphs. GNNs are used to develop recommender systems based on both social relations and item relations. [40] [13]

Combinatorial optimization

GNNs are used as fundamental building blocks for several combinatorial optimization algorithms. [41] Examples include computing shortest paths or Eulerian circuits for a given graph, [36] deriving chip placements superior or competitive to handcrafted human solutions, [42] and improving expert-designed branching rules in branch and bound. [43]

Cyber security

When viewed as a graph, a network of computers can be analyzed with GNNs for anomaly detection. Anomalies within provenance graphs often correlate to malicious activity within the network. GNNs have been used to identify these anomalies on individual nodes [44] and within paths [45] to detect malicious processes, or on the edge level [46] to detect lateral movement.

Water distribution networks

Water distribution systems can be modelled as graphs, being then a straightforward application of GNN. This kind of algorithm has been applied to water demand forecasting, [47] interconnecting District Measuring Areas to improve the forecasting capacity. Other application of this algorithm on water distribution modelling is the development of metamodels. [48]

Related Research Articles

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

<span class="mw-page-title-main">Long short-term memory</span> Type of recurrent neural network architecture

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at mitigating the vanishing gradient problem commonly encountered by traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models, and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps. The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since the early 20th century.

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear. Modern activation functions include the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model, the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model.

Algebraic signal processing (ASP) is an emerging area of theoretical signal processing (SP). In the algebraic theory of signal processing, a set of filters is treated as an (abstract) algebra, a set of signals is treated as a module or vector space, and convolution is treated as an algebra representation. The advantage of algebraic signal processing is its generality and portability.

In the mathematical theory of artificial neural networks, universal approximation theorems are theorems of the following form: Given a family of neural networks, for each function from a certain function space, there exists a sequence of neural networks from the family, such that according to some criterion. That is, the family of neural networks is dense in the function space.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

<span class="mw-page-title-main">Dilution (neural networks)</span>

Dilution and dropout are regularization techniques for reducing overfitting in artificial neural networks by preventing complex co-adaptations on training data. They are an efficient way of performing model averaging with neural networks. Dilution refers to thinning weights, while dropout refers to randomly "dropping out", or omitting, units during the training process of a neural network. Both trigger the same type of regularization.

A capsule neural network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Deep learning architecture for modelling sequential data

A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism, proposed in the 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained.

<span class="mw-page-title-main">Attention (machine learning)</span> Machine learning technique

Attention is a machine learning method that determines the relative importance of each component in a sequence relative to the other components in that sequence. In natural language processing, importance is represented by "soft" weights assigned to each word in a sentence. More generally, attention encodes vectors called token embeddings across a fixed-width sequence that can range from tens to millions of tokens in size.

A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow, which is a statistical method using the change-of-variable law of probabilities to transform a simple distribution into a complex one.

<span class="mw-page-title-main">Knowledge graph embedding</span> Dimensionality reduction of graph-based semantic data objects [machine learning task]

In representation learning, knowledge graph embedding (KGE), also referred to as knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

<span class="mw-page-title-main">Vision transformer</span> Machine learning model for vision processing

A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches, serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.

In machine learning, the term tensor informally refers to two different concepts for organizing and representing data. Data may be organized in a multidimensional array (M-way array), informally referred to as a "data tensor"; however, in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor"), may be analyzed either by artificial neural networks or tensor methods.

Topological deep learning (TDL) is a research field that extends deep learning to handle complex, non-Euclidean data structures. Traditional deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), excel in processing data on regular grids and sequences. However, scientific and real-world data often exhibit more intricate data domains encountered in scientific computations, including point clouds, meshes, time series, scalar fields graphs, or general topological spaces like simplicial complexes and CW complexes. TDL addresses this by incorporating topological concepts to process data with higher-order relationships, such as interactions among multiple entities and complex hierarchies. This approach leverages structures like simplicial complexes and hypergraphs to capture global dependencies and qualitative spatial properties, offering a more nuanced representation of data. TDL also encompasses methods from computational and algebraic topology that permit studying properties of neural networks and their training process, such as their predictive performance or generalization properties.,.

In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, reducing the amount of computation and memory required, makes the model more robust to small variations in the input, and increases the receptive field of neurons in later layers in the network.

In neural networks, the gating mechanism is an architectural motif for controlling the flow of activation and gradient signals. They are most prominently used in recurrent neural networks (RNNs), but have also found applications in other architectures.

In deep learning, weight initialization describes the initial step in creating a neural network. A neural network contains trainable parameters that are modified during training: weight initalization is the pre-training step of assigning initial values to these parameters.

References

  1. Wu, Lingfei; Cui, Peng; Pei, Jian; Zhao, Liang (2022). "Graph Neural Networks: Foundations, Frontiers, and Applications". Springer Singapore: 725.
  2. 1 2 3 4 Scarselli, Franco; Gori, Marco; Tsoi, Ah Chung; Hagenbuchner, Markus; Monfardini, Gabriele (2009). "The Graph Neural Network Model". IEEE Transactions on Neural Networks. 20 (1): 61–80. doi:10.1109/TNN.2008.2005605. ISSN   1941-0093. PMID   19068426. S2CID   206756462.
  3. 1 2 3 Micheli, Alessio (2009). "Neural Network for Graphs: A Contextual Constructive Approach". IEEE Transactions on Neural Networks. 20 (3): 498–511. doi:10.1109/TNN.2008.2010350. ISSN   1045-9227. PMID   19193509. S2CID   17486263.
  4. Sanchez-Lengeling, Benjamin; Reif, Emily; Pearce, Adam; Wiltschko, Alex (2021-09-02). "A Gentle Introduction to Graph Neural Networks". Distill. 6 (9): e33. doi: 10.23915/distill.00033 . ISSN   2476-0757.
  5. Daigavane, Ameya; Ravindran, Balaraman; Aggarwal, Gaurav (2021-09-02). "Understanding Convolutions on Graphs". Distill. 6 (9): e32. doi: 10.23915/distill.00032 . ISSN   2476-0757. S2CID   239678898.
  6. 1 2 3 4 5 6 7 Bronstein, Michael M.; Bruna, Joan; Cohen, Taco; Veličković, Petar (May 4, 2021). "Geometric Deep Learning: Grids, Groups, Graphs Geodesics and Gauges". arXiv: 2104.13478 [cs.LG].
  7. 1 2 3 4 Kipf, Thomas N; Welling, Max (2016). "Semi-supervised classification with graph convolutional networks". IEEE Transactions on Neural Networks. 5 (1): 61–80. arXiv: 1609.02907 . doi:10.1109/TNN.2008.2005605. PMID   19068426. S2CID   206756462.
  8. 1 2 Hamilton, William; Ying, Rex; Leskovec, Jure (2017). "Inductive Representation Learning on Large Graphs" (PDF). Neural Information Processing Systems. 31. arXiv: 1706.02216 via Stanford.
  9. 1 2 3 4 Veličković, Petar; Cucurull, Guillem; Casanova, Arantxa; Romero, Adriana; Liò, Pietro; Bengio, Yoshua (2018-02-04). "Graph Attention Networks". arXiv: 1710.10903 [stat.ML].
  10. 1 2 Hajij, M.; Zamzmi, G.; Papamarkou, T.; Miolane, N.; Guzmán-Sáenz, A.; Ramamurthy, K. N.; Schaub, M. T. (2022). "Topological deep learning: Going beyond graph data". arXiv: 2206.00606 [cs.LG].
  11. 1 2 3 Veličković, Petar (2022). "Message passing all the way up". arXiv: 2202.11097 [cs.LG].
  12. Wu, Lingfei; Chen, Yu; Shen, Kai; Guo, Xiaojie; Gao, Hanning; Li, Shucheng; Pei, Jian; Long, Bo (2023). "Graph Neural Networks for Natural Language Processing: A Survey". Foundations and Trends in Machine Learning. 16 (2): 119–328. arXiv: 2106.06090 . doi:10.1561/2200000096. ISSN   1941-0093. PMID   19068426. S2CID   206756462.
  13. 1 2 Ying, Rex; He, Ruining; Chen, Kaifeng; Eksombatchai, Pong; Hamilton, William L.; Leskovec, Jure (2018). Graph Convolutional Neural Networks for Web-Scale Recommender Systems. pp. 974–983. arXiv: 1806.01973 . doi:10.1145/3219819.3219890. ISBN   9781450355520. S2CID   46949657.
  14. "Stanford Large Network Dataset Collection". snap.stanford.edu. Retrieved 2021-07-05.
  15. Zhang, Weihang; Cui, Yang; Liu, Bowen; Loza, Martin; Park, Sung-Joon; Nakai, Kenta (5 April 2024). "HyGAnno: Hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data". Briefings in Bioinformatics. 25 (3): bbae152. doi:10.1093/bib/bbae152. PMC   10998639 . PMID   38581422.
  16. Gilmer, Justin; Schoenholz, Samuel S.; Riley, Patrick F.; Vinyals, Oriol; Dahl, George E. (2017-07-17). "Neural Message Passing for Quantum Chemistry". Proceedings of Machine Learning Research: 1263–1272. arXiv: 1704.01212 .
  17. Coley, Connor W.; Jin, Wengong; Rogers, Luke; Jamison, Timothy F.; Jaakkola, Tommi S.; Green, William H.; Barzilay, Regina; Jensen, Klavs F. (2019-01-02). "A graph-convolutional neural network model for the prediction of chemical reactivity". Chemical Science. 10 (2): 370–377. doi: 10.1039/C8SC04228D . ISSN   2041-6539. PMC   6335848 . PMID   30746086.
  18. Qasim, Shah Rukh; Kieseler, Jan; Iiyama, Yutaro; Pierini, Maurizio Pierini (2019). "Learning representations of irregular particle-detector geometry with distance-weighted graph networks". The European Physical Journal C. 79 (7): 608. arXiv: 1902.07987 . Bibcode:2019EPJC...79..608Q. doi: 10.1140/epjc/s10052-019-7113-9 . S2CID   88518244.
  19. Li, Zhuwen; Chen, Qifeng; Koltun, Vladlen (2018). "Combinatorial optimization with graph convolutional networks and guided tree search". Neural Information Processing Systems. 31: 537–546. arXiv: 1810.10659 . doi:10.1007/978-3-030-04221-9_48.
  20. Matthias, Fey; Lenssen, Jan E. (2019). "Fast Graph Representation Learning with PyTorch Geometric". arXiv: 1903.02428 [cs.LG].
  21. "Tensorflow GNN". GitHub . Retrieved 30 June 2022.
  22. "Deep Graph Library (DGL)" . Retrieved 2024-09-12.
  23. "jraph". GitHub . Retrieved 30 June 2022.
  24. Lucibello, Carlo (2021). "GraphNeuralNetworks.jl". GitHub . Retrieved 2023-09-21.
  25. FluxML/GeometricFlux.jl, FluxML, 2024-01-31, retrieved 2024-02-03
  26. 1 2 3 Gao, Hongyang; Ji, Shuiwang Ji (2019). "Graph U-Nets". arXiv: 1905.05178 [cs.LG].
  27. 1 2 Lee, Junhyun; Lee, Inyeop; Kang, Jaewoo (2019). "Self-Attention Graph Pooling". arXiv: 1904.08082 [cs.LG].
  28. 1 2 Liu, Chuang; Zhan, Yibing; Li, Chang; Du, Bo; Wu, Jia; Hu, Wenbin; Liu, Tongliang; Tao, Dacheng (2022). "Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities". arXiv: 2204.07321 [cs.LG].
  29. Douglas, B. L. (2011-01-27). "The Weisfeiler–Lehman Method and Graph Isomorphism Testing". arXiv: 1101.5211 [math.CO].
  30. Xu, Keyulu; Hu, Weihua; Leskovec, Jure; Jegelka, Stefanie (2019-02-22). "How Powerful are Graph Neural Networks?". arXiv: 1810.00826 [cs.LG].
  31. Bodnar, Christian; Frasca, Fabrizio; Guang Wang, Yu; Otter, Nina; Montúfar, Guido; Liò, Pietro; Bronstein, Michael (2021). "Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks". arXiv: 2103.03212 [cs.LG].
  32. Grady, Leo; Polimeni, Jonathan (2011). Discrete Calculus: Applied Analysis on Graphs for Computational Science (PDF). Springer.
  33. Chen, Deli; Lin, Yankai; Li, Wei; Li, Peng; Zhou, Jie; Sun, Xu (2020). "Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View". Proceedings of the AAAI Conference on Artificial Intelligence. 34 (4): 3438–3445. arXiv: 1909.03211 . doi:10.1609/aaai.v34i04.5747. S2CID   202539008.
  34. 1 2 Alon, Uri; Yahav, Eran (2021). "On the Bottleneck of Graph Neural Networks and its Practical Implications". arXiv: 2006.05205 [cs.LG].
  35. Xu, Keyulu; Zhang, Mozhi; Jegelka, Stephanie; Kawaguchi, Kenji (2021). "Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth". arXiv: 2105.04550 [cs.LG].
  36. 1 2 3 Li, Yujia; Tarlow, Daniel; Brockschmidt, Mark; Zemel, Richard (2016). "Gated Graph Sequence Neural Networks". arXiv: 1511.05493 [cs.LG].
  37. 1 2 Xu, Keyulu; Li, Chengtao; Tian, Yonglong; Sonobe, Tomohiro; Kawarabayashi, Ken-ichi; Jegelka, Stefanie (2018). "Representation Learning on Graphs with Jumping Knowledge Networks". arXiv: 1806.03536 [cs.LG].
  38. Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 30 November 2020.
  39. "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.
  40. Fan, Wenqi; Ma, Yao; Li, Qing; He, Yuan; Zhao, Eric; Tang, Jiliang; Yin, Dawei (2019). Graph Neural Networks for Social Recommendation. pp. 417–426. arXiv: 1902.07243 . doi:10.1145/3308558.3313488. hdl:10397/81232. ISBN   9781450366748. S2CID   67769538.
  41. Cappart, Quentin; Chételat, Didier; Khalil, Elias; Lodi, Andrea; Morris, Christopher; Veličković, Petar (2021). "Combinatorial optimization and reasoning with graph neural networks". arXiv: 2102.09544 [cs.LG].
  42. Mirhoseini, Azalia; Goldie, Anna; Yazgan, Mustafa; Jiang, Joe Wenjie; Songhori, Ebrahim; Wang, Shen; Lee, Young-Joon; Johnson, Eric; Pathak, Omkar; Nazi, Azade; Pak, Jiwoo; Tong, Andy; Srinivasa, Kavya; Hang, William; Tuncer, Emre; Le, Quoc V.; Laudon, James; Ho, Richard; Carpenter, Roger; Dean, Jeff (2021). "A graph placement methodology for fast chip design". Nature. 594 (7862): 207–212. Bibcode:2021Natur.594..207M. doi:10.1038/s41586-021-03544-w. PMID   34108699. S2CID   235395490.
  43. Gasse, Maxime; Chételat, Didier; Ferroni, Nicola; Charlin, Laurent; Lodi, Andrea (2019). "Exact Combinatorial Optimization with Graph Convolutional Neural Networks". arXiv: 1906.01629 [cs.LG].
  44. Wang, Su; Wang, Zhiliang; Zhou, Tao; Sun, Hongbin; Yin, Xia; Han, Dongqi; Zhang, Han; Shi, Xingang; Yang, Jiahai (2022). "Threatrace: Detecting and Tracing Host-Based Threats in Node Level Through Provenance Graph Learning". IEEE Transactions on Information Forensics and Security. 17: 3972–3987. arXiv: 2111.04333 . doi:10.1109/TIFS.2022.3208815. ISSN   1556-6021. S2CID   243847506.
  45. Wang, Qi; Hassan, Wajih Ul; Li, Ding; Jee, Kangkook; Yu, Xiao (2020). "You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis". Network and Distributed Systems Security (NDSS) Symposium. doi: 10.14722/ndss.2020.24167 . ISBN   978-1-891562-61-7. S2CID   211267791.
  46. King, Isaiah J.; Huang, H. Howie (2022). "Euler: Detecting Network Lateral Movement via Scalable Temporal Link Prediction" (PDF). In Proceedings of the 29th Network and Distributed Systems Security Symposium (NDSS). doi:10.14722/ndss.2022.24107. S2CID   248221601.
  47. Zanfei, Ariele; et al. (2022). "Graph Convolutional Recurrent Neural Networks for Water Demand Forecasting". Water Resources Research. 58 (7). AGU. Bibcode:2022WRR....5832299Z. doi:10.1029/2022WR032299 . Retrieved June 11, 2024.
  48. Zanfei, Ariele; et al. (2023). "Shall we always use hydraulic models? A graph neural network metamodel for water system calibration and uncertainty assessment". Water Research. 242. Bibcode:2023WatRe.24220264Z. doi:10.1016/j.watres.2023.120264. PMID   37393807 . Retrieved June 11, 2024.