Graph kernel

Last updated

In structure mining, a graph kernel is a kernel function that computes an inner product on graphs. [1] Graph kernels can be intuitively understood as functions measuring the similarity of pairs of graphs. They allow kernelized learning algorithms such as support vector machines to work directly on graphs, without having to do feature extraction to transform them to fixed-length, real-valued feature vectors. They find applications in bioinformatics, in chemoinformatics (as a type of molecule kernels [2] ), and in social network analysis. [1]

Contents

Concepts of graph kernels have been around since the 1999, when D. Haussler [3] introduced convolutional kernels on discrete structures. The term graph kernels was more officially coined in 2002 by R. I. Kondor and J. Lafferty [4] as kernels on graphs, i.e. similarity functions between the nodes of a single graph, with the World Wide Web hyperlink graph as a suggested application. In 2003, Gärtner et al. [5] and Kashima et al. [6] defined kernels between graphs. In 2010, Vishwanathan et al. gave their unified framework. [1] In 2018, Ghosh et al. [7] described the history of graph kernels and their evolution over two decades.

Applications

The marginalized graph kernel has been shown to allow accurate predictions of the atomization energy of small organic molecules. [8]

Example Kernels

An example of a kernel between graphs is the random walk kernel, [5] [6] which conceptually performs random walks on two graphs simultaneously, then counts the number of paths that were produced by both walks. This is equivalent to doing random walks on the direct product of the pair of graphs, and from this, a kernel can be derived that can be efficiently computed. [1]

Another examples is the Weisfeiler-Leman graph kernel [9] which computes multiple rounds of the Weisfeiler-Leman algorithm and then computes the similarity of two graphs as the inner product of the histogram vectors of both graphs. In those histogram vectors the kernel collects the number of times a color occurs in the graph in every iteration. Note that the Weisfeiler-Leman kernel in theory has an infinite dimension as the number of possible colors assigned by the Weisfeiler-Leman algorithm is infinite. By restricting to the colors that occur in both graphs, the computation is still feasible.

See also

Related Research Articles

Rader's algorithm (1968), named for Charles M. Rader of MIT Lincoln Laboratory, is a fast Fourier transform (FFT) algorithm that computes the discrete Fourier transform (DFT) of prime sizes by re-expressing the DFT as a cyclic convolution.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Inherently, Multi-task learning is a multi-objective optimization problem having trade-offs between different tasks. Early versions of MTL were called "hints".

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

Molecule mining is the process of data mining, or extracting and discovering patterns, as applied to molecules. Since molecules may be represented by molecular graphs, this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

<span class="mw-page-title-main">Spectral clustering</span> Clustering methods

In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

In computer vision, the bag-of-words model sometimes called bag-of-visual-words model can be applied to image classification or retrieval, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.

In graph theory, a branch of mathematics, a crown graph on 2n vertices is an undirected graph with two sets of vertices {u1, u2, …, un} and {v1, v2, …, vn} and with an edge from ui to vj whenever ij.

Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.

<span class="mw-page-title-main">Line integral convolution</span> Method for visualizing vector fields

In scientific visualization, line integral convolution (LIC) is a method to visualize a vector field at high spatial resolutions. The LIC technique was first proposed by Brian Cabral and Leith Casey Leedom in 1993.

In machine learning, tree kernels are the application of the more general concept of positive-definite kernel to tree structures. They find applications in natural language processing, where they can be used for machine-learned parsing or classification of sentences.

A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

A 3D Content Retrieval system is a computer system for browsing, searching and retrieving three dimensional digital contents from a large database of digital images. The most original way of doing 3D content retrieval uses methods to add description text to 3D content files such as the content file name, link text, and the web page title so that related 3D content can be found through text retrieval. Because of the inefficiency of manually annotating 3D files, researchers have investigated ways to automate the annotation process and provide a unified standard to create text descriptions for 3D contents. Moreover, the increase in 3D content has demanded and inspired more advanced ways to retrieve 3D information. Thus, shape matching methods for 3D content retrieval have become popular. Shape matching retrieval is based on techniques that compare and contrast similarities between 3D models.

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.

In machine learning, multiple-instance learning (MIL) is a type of supervised learning. Instead of receiving a set of instances which are individually labeled, the learner receives a set of labeled bags, each containing many instances. In the simple case of multiple-instance binary classification, a bag may be labeled negative if all the instances in it are negative. On the other hand, a bag is labeled positive if there is at least one instance in it which is positive. From a collection of labeled bags, the learner tries to either (i) induce a concept that will label individual instances correctly or (ii) learn how to label bags without inducing the concept.

The following outline is provided as an overview of and topical guide to machine learning:

In network theory, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a social network, predicting co-authorship links in a citation network, and predicting interactions between genes and proteins in a biological network. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time , the goal is to predict the links at time . Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

Tensor informally refers in machine learning to two different concepts that organize and represent data. Data may be organized in a multidimensional array (M-way array) that is informally referred to as a "data tensor"; however in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor") may be analyzed either by artificial neural networks or tensor methods.

In graph theory, the Weisfeiler Leman graph isomorphism test is a heuristic test for the existence of an isomorphism between two graphs G and H. It is a generalization of the color refinement algorithm and has been first described by Weisfeiler and Leman in 1968. The original formulation is based on graph canonization, a normal form for graphs, while there is also a combinatorial interpretation in the spirit of color refinement and a connection to logic.

References

  1. 1 2 3 4 S.V. N. Vishwanathan; Nicol N. Schraudolph; Risi Kondor; Karsten M. Borgwardt (2010). "Graph kernels" (PDF). Journal of Machine Learning Research . 11: 1201–1242.
  2. L. Ralaivola; S. J. Swamidass; H. Saigo; P. Baldi (2005). "Graph kernels for chemical informatics". Neural Networks. 18 (8): 1093–1110. doi:10.1016/j.neunet.2005.07.009. PMID   16157471.
  3. Haussler, David (1999). Convolution Kernels on Discrete Structures. CiteSeerX   10.1.1.110.638 .
  4. Risi Imre Kondor; John Lafferty (2002). Diffusion Kernels on Graphs and Other Discrete Input Spaces (PDF). Proc. Int'l Conf. on Machine Learning (ICML).
  5. 1 2 Thomas Gärtner; Peter A. Flach; Stefan Wrobel (2003). On graph kernels: Hardness results and efficient alternatives. Proc. the 16th Annual Conference on Computational Learning Theory (COLT) and the 7th Kernel Workshop. doi:10.1007/978-3-540-45167-9_11.
  6. 1 2 Hisashi Kashima; Koji Tsuda; Akihiro Inokuchi (2003). Marginalized kernels between labeled graphs (PDF). Proc. the 20th International Conference on Machine Learning (ICML).
  7. Ghosh, Swarnendu; Das, Nibaran; Gonçalves, Teresa; Quaresma, Paulo; Kundu, Mahantapas (2018). "The journey of graph kernels through two decades". Computer Science Review. 27: 88–111. doi:10.1016/j.cosrev.2017.11.002.
  8. Yu-Hang Tang; Wibe A. de Jong (2019). "Prediction of atomization energy using graph kernel and active learning". The Journal of Chemical Physics. 150 (4): 044107. arXiv: 1810.07310 . Bibcode:2019JChPh.150d4107T. doi:10.1063/1.5078640. PMID   30709286.
  9. Shervashidze, Nino, et al. "Weisfeiler-lehman graph kernels." Journal of Machine Learning Research 12.9 (2011).