Knowledge graph embedding

Last updated
Embedding of a knowledge graph. The vector representation of the entities and relations can be used for different machine learning applications. KnowledgeGraphEmbedding.png
Embedding of a knowledge graph. The vector representation of the entities and relations can be used for different machine learning applications.

In representation learning, knowledge graph embedding (KGE), also referred to as knowledge representation learning (KRL), or multi-relation learning, [1] is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. [1] [2] [3] Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction. [1] [4]

Contents

Definition

A knowledge graph is a collection of entities , relations , and facts . [5] A fact is a triple that denotes a link between the head and the tail of the triple. Another notation that is often used in the literature to represent a triple (or fact) is . This notation is called resource description framework (RDF). [1] [5] A knowledge graph represents the knowledge related to a specific domain; leveraging this structured representation, it is possible to infer a piece of new knowledge from it after some refinement steps. [6] However, nowadays, people have to deal with the sparsity of data and the computational inefficiency to use them in a real-world application. [3] [7]

The embedding of a knowledge graph translates each entity and relation of a knowledge graph, into a vector of a given dimension , called embedding dimension. [7] In the general case, we can have different embedding dimensions for the entities and the relations . [7] The collection of embedding vectors for all the entities and relations in the knowledge graph can then be used for downstream tasks.

A knowledge graph embedding is characterized by four different aspects: [1]

  1. Representation space: The low-dimensional space in which the entities and relations are represented. [1]
  2. Scoring function: A measure of the goodness of a triple embedded representation. [1]
  3. Encoding models: The modality in which the embedded representation of the entities and relations interact with each other. [1]
  4. Additional information: Any additional information coming from the knowledge graph that can enrich the embedded representation. [1] Usually, an ad hoc scoring function is integrated into the general scoring function for each additional information. [5] [1] [8]

Embedding procedure

All the different knowledge graph embedding models follow roughly the same procedure to learn the semantic meaning of the facts. [7] First of all, to learn an embedded representation of a knowledge graph, the embedding vectors of the entities and relations are initialized to random values. [7] Then, starting from a training set until a stop condition is reached, the algorithm continuously optimizes the embeddings. [7] Usually, the stop condition is given by the overfitting over the training set. [7] For each iteration, is sampled a batch of size from the training set, and for each triple of the batch is sampled a random corrupted facti.e., a triple that does not represent a true fact in the knowledge graph. [7] The corruption of a triple involves substituting the head or the tail (or both) of the triple with another entity that makes the fact false. [7] The original triple and the corrupted triple are added in the training batch, and then the embeddings are updated, optimizing a scoring function. [5] [7] At the end of the algorithm, the learned embeddings should have extracted the semantic meaning from the triples and should correctly predict unseen true facts in the knowledge graph. [5]

Pseudocode

The following is the pseudocode for the general embedding procedure. [9] [7]

algorithm Compute entity and relation embeddings isinput: The training set ,             entity set ,             relation set ,                 embedding dimension output: Entity and relation embeddings      initialization:the entitiesand relationsembeddings (vectors) are randomly initializedwhile stop condition do // From the training set randomly sample a batch of size b         for eachindo  // sample a corrupted fact of triple             end for         Update embeddings by minimizing the loss function     end while

Performance indicators

These indexes are often used to measure the embedding quality of a model. The simplicity of the indexes makes them very suitable for evaluating the performance of an embedding algorithm even on a large scale. [10] Given as the set of all ranked predictions of a model, it is possible to define three different performance indexes: Hits@K, MR, and MRR. [10]

Hits@K

Hits@K or in short, H@K, is a performance index that measures the probability to find the correct prediction in the first top K model predictions. [10] Usually, it is used . [10] Hits@K reflects the accuracy of an embedding model to predict the relation between two given triples correctly. [10]

Hits@K

Larger values mean better predictive performances. [10]

Mean rank (MR)

Mean rank is the average ranking position of the items predicted by the model among all the possible items. [10]

The smaller the value, the better the model. [10]

Mean reciprocal rank (MRR)

Mean reciprocal rank measures the number of triples predicted correctly. [10] If the first predicted triple is correct, then 1 is added, if the second is correct is summed, and so on. [10]

Mean reciprocal rank is generally used to quantify the effect of search algorithms. [10]

The larger the index, the better the model. [10]

Applications

Machine learning tasks

Knowledge graph completion (KGC) is a collection of techniques to infer knowledge from an embedded knowledge graph representation. [11] In particular, this technique completes a triple inferring the missing entity or relation. [11] The corresponding sub-tasks are named link or entity prediction (i.e., guessing an entity from the embedding given the other entity of the triple and the relation), and relation prediction (i.e., forecasting the most plausible relation that connects two entities). [11]

Triple Classification is a binary classification problem. [1] Given a triple, the trained model evaluates the plausibility of the triple using the embedding to determine if a triple is true or false. [11] The decision is made with the model score function and a given threshold. [11] Clustering is another application that leverages the embedded representation of a sparse knowledge graph to condense the representation of similar semantic entities close in a 2D space. [4]

Real world applications

The use of knowledge graph embedding is increasingly pervasive in many applications. In the case of recommender systems, the use of knowledge graph embedding can overcome the limitations of the usual reinforcement learning. [12] [13] Training this kind of recommender system requires a huge amount of information from the users; however, knowledge graph techniques can address this issue by using a graph already constructed over a prior knowledge of the item correlation and using the embedding to infer from it the recommendation. [12] Drug repurposing is the use of an already approved drug, but for a therapeutic purpose different from the one for which it was initially designed. [14] It is possible to use the task of link prediction to infer a new connection between an already existing drug and a disease by using a biomedical knowledge graph built leveraging the availability of massive literature and biomedical databases. [14] Knowledge graph embedding can also be used in the domain of social politics. [4]

Models

Publication timeline of some knowledge graph embedding models. In red the tensor decomposition models, in blue the geometric models, and in green the deep learning models. RESCAL (2011) was the first modern KGE approach. In it was applied to the YAGO knowledge graph. This was the first application of KGE to a large scale knowledge graph. KG-Embedding.svg
Publication timeline of some knowledge graph embedding models. In red the tensor decomposition models, in blue the geometric models, and in green the deep learning models. RESCAL (2011) was the first modern KGE approach. In it was applied to the YAGO knowledge graph. This was the first application of KGE to a large scale knowledge graph.

Given a collection of triples (or facts) , the knowledge graph embedding model produces, for each entity and relation present in the knowledge graph a continuous vector representation. [7] is the corresponding embedding of a triple with and , where is the embedding dimension for the entities, and for the relations. [7] The score function of a given model is denoted by and measures the distance of the embedding of the head from the embedding of tail given the embedding of the relation, or in other words, it quantifies the plausibility of the embedded representation of a given fact. [5]

Rossi et al. propose a taxonomy of the embedding models and identifies three main families of models: tensor decomposition models, geometric models, and deep learning models. [5]

Tensor decomposition model

The tensor decomposition is a family of knowledge graph embedding models that use a multi-dimensional matrix to represent a knowledge graph, [1] [5] [17] that is partially knowable due to the gaps of the knowledge graph describing a particular domain thoroughly. [5] In particular, these models use a three-way (3D) tensor, which is then factorized into low-dimensional vectors that are the entities and relations embeddings. [5] [17] The third-order tensor is a suitable methodology to represent a knowledge graph because it records only the existence or the absence of a relation between entities, [17] and for this reason is simple, and there is no need to know a priori the network structure, [15] making this class of embedding models light, and easy to train even if they suffer from high-dimensional and sparsity of data. [5] [17]

Bilinear models

This family of models uses a linear equation to embed the connection between the entities through a relation. [1] In particular, the embedded representation of the relations is a bidimensional matrix. [5] These models, during the embedding procedure, only use the single facts to compute the embedded representation and ignore the other associations to the same entity or relation. [18]

  • DistMult [19] : Since the embedding matrix of the relation is a diagonal matrix, [5] the scoring function can not distinguish asymmetric facts. [5] [18]
  • ComplEx [20] : As DistMult uses a diagonal matrix to represent the relations embedding but adds a representation in the complex vector space and the hermitian product, it can distinguish symmetric and asymmetric facts. [5] [17] This approach is scalable to a large knowledge graph in terms of time and space cost. [20]
  • ANALOGY [21] : This model encodes in the embedding the analogical structure of the knowledge graph to simulate inductive reasoning. [21] [5] [1] Using a differentiable objective function, ANALOGY has good theoretical generality and computational scalability. [21] It is proven that the embedding produced by ANALOGY fully recovers the embedding of DistMult, ComplEx, and HolE. [21]
  • SimplE [22] : This model is the improvement of canonical polyadic decomposition (CP), in which an embedding vector for the relation and two independent embedding vectors for each entity are learned, depending on whether it is a head or a tail in the knowledge graph fact. [22] SimplE resolves the problem of independent learning of the two entity embeddings using an inverse relation and average the CP score of and . [7] [17] In this way, SimplE collects the relation between entities while they appear in the role of subject or object inside a fact, and it is able to embed asymmetric relations. [5]

Non-bilinear models

  • HolE: [23] HolE uses circular correlation to create an embedded representation of the knowledge graph, [23] which can be seen as a compression of the matrix product, but is more computationally efficient and scalable while keeping the capabilities to express asymmetric relation since the circular correlation is not commutative. [18] HolE links holographic and complex embeddings since, if used together with Fourier, can be seen as a special case of ComplEx. [1]
  • TuckER: [24] TuckER sees the knowledge graph as a tensor that could be decomposed using the Tucker decomposition in a collection of vectorsi.e., the embeddings of entities and relationswith a shared core. [24] [5] The weights of the core tensor are learned together with the embeddings and represent the level of interaction of the entries. [25] Each entity and relation has its own embedding dimension, and the size of the core tensor is determined by the shape of the entities and relations that interact. [5] The embedding of the subject and object of a fact are summed in the same way, making TuckER fully expressive, and other embedding models such as RESCAL, DistMult, ComplEx, and SimplE can be expressed as a special formulation of TuckER. [24]
  • MEI: [26] MEI introduces the multi-partition embedding interaction technique with the block term tensor format, which is a generalization of CP decomposition and Tucker decomposition. It divides the embedding vector into multiple partitions and learns the local interaction patterns from data instead of using fixed special patterns as in ComplEx or SimplE models. This enables MEI to achieve optimal efficiency—expressiveness trade-off, not just being fully expressive. [26] Previous models such as TuckER, RESCAL, DistMult, ComplEx, and SimplE are suboptimal restricted special cases of MEI.
  • MEIM: [27] MEIM goes beyond the block term tensor format to introduce the independent core tensor for ensemble boosting effects and the soft orthogonality for max-rank relational mapping, in addition to multi-partition embedding interaction. MEIM generalizes several previous models such as MEI and its subsumed models, RotaE, and QuatE. [27] MEIM improves expressiveness while still being highly efficient in practice, helping it achieve good results using fairly small model sizes.

Geometric models

The geometric space defined by this family of models encodes the relation as a geometric transformation between the head and tail of a fact. [5] For this reason, to compute the embedding of the tail, it is necessary to apply a transformation to the head embedding, and a distance function is used to measure the goodness of the embedding or to score the reliability of a fact. [5]

Geometric models are similar to the tensor decomposition model, but the main difference between the two is that they have to preserve the applicability of the transformation in the geometric space in which it is defined. [5]

Pure translational models

This class of models is inspired by the idea of translation invariance introduced in word2vec. [7] A pure translational model relies on the fact that the embedding vector of the entities are close to each other after applying a proper relational translation in the geometric space in which they are defined. [18] In other words, given a fact, when the embedding of head is added to the embedding of relation, the expected result should be the embedding of the tail. [5] The closeness of the entities embedding is given by some distance measure and quantifies the reliability of a fact. [17]

TransE embedding model. The vector representation (embedding) of the head plus the vector representation of the relation should be equal to the vector representation of the tail entity. TransE.pdf
TransE embedding model. The vector representation (embedding) of the head plus the vector representation of the relation should be equal to the vector representation of the tail entity.
  • TransE [9] : This model uses a scoring function that forces the embeddings to satisfy a simple vector sum equation in each fact in which they appear: . [7] The embedding will be exact if each entity and relation appears in only one fact, and, for this reason, in practice does not well represent one-to-many, many-to-one, and asymmetric relations. [5] [7]
  • TransH [28] : It is an evolution of TransE introducing a hyperplane as geometric space to solve the problem of representing correctly the types of relations. [28] In TransH, each relation has a different embedded representation, on a different hyperplane, based on which entities it interacts with. [7] Therefore, to compute, for example, the score function of a fact, the embedded representation of the head and tail need to be projected using a relational projection matrix on the correct hyperplane of the relation. [1] [7]
  • TransR [29] : TransR is an evolution of TransH because it uses two different spaces to represent the embedded representation of the entities and the relations, [1] [18] and separate completely the semantic space of entities and relations. [7] Also TransR uses a relational projection matrix to translate the embedding of the entities to the relation space. [7]
  • TransD: [30] Given a fact, in TransR, the head and the tail of a fact could belongs to two different types of entities, for example, in the fact, Obama and USA are two entities but one is a person and the other is a country. [30] [7] The matrix multiplication also is an expensive procedure in TransR to compute the projection. [7] [30] In this context, TransD employs two vector for each entity-relation pair to compute a dynamic mapping that substitutes the projection matrix while reducing the dimensional complexity. [1] [7] [30] The first vector is used to represent the semantic meaning of the entities and relations, the second one to compute the mapping matrix. [30]
  • TransA: [31] All the translational models define a score function in their representation space, but they oversimplify this metric loss. [31] Since the vector representation of the entities and relations is not perfect, a pure translation of could be distant from , and a spherical equipotential Euclidean distance makes it hard to distinguish which is the closest entity. [31] TransA, instead, introduces an adaptive Mahalanobis distance to weights the embedding dimensions, together with elliptical surfaces to remove the ambiguity. [1] [7] [31]

Translational models with additional embeddings

It is possible to associate additional information to each element in the knowledge graph and their common representation facts. [1] Each entity and relation can be enriched with text descriptions, weights, constraints, and others in order to improve the overall description of the domain with a knowledge graph. [1] During the embedding of the knowledge graph, this information can be used to learn specialized embeddings for these characteristics together with the usual embedded representation of entities and relations, with the cost of learning a more significant number of vectors. [5]

  • STransE: [32] This model is the result of the combination of TransE and of the structure embedding [32] in such a way it is able to better represent the one-to-many, many-to-one, and many-to-many relations. [5] To do so, the model involves two additional independent matrix and for each embedded relation in the KG. [32] Each additional matrix is used based on the fact the specific relation interact with the head or the tail of the fact. [32] In other words, given a fact , before applying the vector translation, the head is multiplied by and the tail is multiplied by . [7]
  • CrossE: [33] Crossover interactions can be used for related information selection, and could be very useful for the embedding procedure. [33] Crossover interactions provide two distinct contributions in the information selection: interactions from relations to entities and interactions from entities to relations. [33] This means that a relation, e.g.'president_of' automatically selects the types of entities that are connecting the subject to the object of a fact. [33] In a similar way, the entity of a fact inderectly determine which is inference path that has to be choose to predict the object of a related triple. [33] CrossE, to do so, learns an additional interaction matrix , uses the element-wise product to compute the interaction between and . [5] [33] Even if, CrossE, does not rely on a neural network architecture, it is shown that this methodology can be encoded in such architecture. [1]

Roto-translational models

This family of models, in addition or in substitution of a translation they employ a rotation-like transformation. [5]

  • TorusE: [34] The regularization term of TransE makes the entity embedding to build a spheric space, and consequently loses the translation properties of the geometric space. [34] To address this problem, TorusE leverages the use of a compact Lie group that in this specific case is n-dimensional torus space, and avoid the use of regularization. [1] [34] TorusE defines the distance functions to substitute the L1 and L2 norm of TransE. [5]
  • RotatE: [35] RotatE is inspired by the Euler's identity and involves the use of Hadamard product to represent a relation as a rotation from the head to the tail in the complex space. [35] For each element of the triple, the complex part of the embedding describes a counterclockwise rotation respect to an axis, that can be describe with the Euler's identity, whereas the modulus of the relation vector is 1. [35] It is shown that the model is capable of embedding symmetric, asymmetric, inversion, and composition relations from the knowledge graph. [35]

Deep learning models

This group of embedding models uses deep neural network to learn patterns from the knowledge graph that are the input data. [5] These models have the generality to distinguish the type of entity and relation, temporal information, path information, underlay structured information, [18] and resolve the limitations of distance-based and semantic-matching-based models in representing all the features of a knowledge graph. [1] The use of deep learning for knowledge graph embedding has shown good predictive performance even if they are more expensive in the training phase, data-hungry, and often required a pre-trained embedding representation of knowledge graph coming from a different embedding model. [1] [5]

Convolutional neural networks

This family of models, instead of using fully connected layers, employs one or more convolutional layers that convolve the input data applying a low-dimensional filter capable of embedding complex structures with few parameters by learning nonlinear features. [1] [5] [18]

  • ConvE: [36] ConvE is an embedding model that represents a good tradeoff expressiveness of deep learning models and computational expensiveness, [17] in fact it is shown that it used 8x less parameters, when compared to DistMult. [36] ConvE uses a one-dimensional -sized embedding to represent the entities and relations of a knowledge graph. [5] [36] To compute the score function of a triple, ConvE apply a simple procedure: first concatenes and merge the embeddings of the head of the triple and the relation in a single data , then this matrix is used as input for the 2D convolutional layer. [5] [17] The result is then passed through a dense layer that apply a linear transformation parameterized by the matrix and at the end, with the inner product is linked to the tail triple. [5] [18] ConvE is also particularly efficient in the evaluation procedure: using a 1-N scoring, the model matches, given a head and a relation, all the tails at the same time, saving a lot of evaluation time when compared to the 1-1 evaluation program of the other models. [18]
  • ConvR: [37] ConvR is an adaptive convolutional network aimed to deeply represent all the possible interactions between the entities and the relations. [37] For this task, ConvR, computes convolutional filter for each relation, and, when required, applies these filters to the entity of interest to extract convoluted features. [37] The procedure to compute the score of triple is the same as ConvE. [5]
  • ConvKB: [38] ConvKB, to compute score function of a given triple , it produces an input of dimension without reshaping and passes it to series of convolutional filter of size . [38] This result feeds a dense layer with only one neuron that produces the final score. [38] The single final neuron makes this architecture as a binary classifier in which the fact could be true or false. [5] A difference with ConvE is that the dimensionality of the entities is not changed. [17]

Capsule neural networks

This family of models uses capsule neural networks to create a more stable representation that is able to recognize a feature in the input without losing spatial information. [5] The network is composed of convolutional layers, but they are organized in capsules, and the overall result of a capsule is sent to a higher-capsule decided by a dynamic process routine. [5]

  • CapsE: [39] CapsE implements a capsule network to model a fact . [39] As in ConvKB, each triple element is concatenated to build a matrix and is used to feed to a convolutional layer to extract the convolutional features. [5] [39] These features are then redirected to a capsule to produce a continuous vector, more the vector is long, more the fact is true. [39]

Recurrent neural networks

This class of models leverages the use of recurrent neural network. [5] The advantage of this architecture is to memorize a sequence of fact, rather than just elaborate single events. [40]

  • RSN: [40] During the embedding procedure is commonly assumed that, similar entities has similar relations. [40] In practice, this type of information is not leveraged, because the embedding is computed just on the undergoing fact rather than a history of facts. [40] Recurrent skipping networks (RSN) uses a recurrent neural network to learn relational path using a random walk sampling. [5] [40]

Model performance

The machine learning task for knowledge graph embedding that is more often used to evaluate the embedding accuracy of the models is the link prediction. [1] [3] [5] [6] [7] [18] Rossi et al. [5] produced an extensive benchmark of the models, but also other surveys produces similar results. [3] [7] [18] [25] The benchmark involves five datasets FB15k, [9] WN18, [9] FB15k-237, [41] WN18RR, [36] and YAGO3-10. [42] More recently, it has been discussed that these datasets are far away from real-world applications, and other datasets should be integrated as a standard benchmark. [43]

Table summary of the characteristics of the datasets used to benchmark the embedding models.
Dataset nameNumber of different entitiesNumber of different relationsNumber of triples
FB15k [9] 149511345584,113
WN18 [9] 4094318151,442
FB15k-237 [41] 14541237310,116
WN18RR [36] 409431193,003
YAGO3-10 [42] 123182371,089,040
Table summary of the memory complexity and the link prediction accuracy of the knowledge graph embedding models according to Rossi et al. [5] in terms of Hits@10, MR, and MRR. Best results on each metric for each dataset are in bold.
Model nameMemory complexityFB15K (Hits@10)FB15K (MR)FB15K (MRR)FB15K - 237 (Hits@10)FB15K - 237 (MR)FB15K - 237 (MRR)WN18 (Hits@10)WN18 (MR)WN18 (MRR)WN18RR (Hits@10)WN18RR (MR)WN18RR (MRR)YAGO3-10 (Hits@10)YAGO3-10 (MR)YAGO3-10 (MRR)
DistMul [19] 0.8631730.7840.4901990.3130.9466750.8240.50259130.4330.66111070.501
ComplEx [20] 0.905340.8480.5292020.3490.95536230.9490.52149070.4580.70311120.576
HolE [23] 0.8672110.8000.4761860.3030.9496500.9380.48784010.4320.65164890.502
ANALOGY [21] 0.8371260.7260.3534760.2020.9448080.9340.38092660.3660.45624230.283
SimplE [22] 0.8361380.7260.3436510.1790.9457590.9380.42687640.3980.63128490.453
TuckER [24] 0.888390.7880.5361620.3520.9585100.9510.51462390.4590.68024170.544
MEI [26] 0.5521450.3650.55132680.4810.7097560.578
MEIM [27] 0.5571370.3690.57724340.4990.7167470.585
TransE [9] 0.847450.6280.4972090.3100.9482790.6460.49539360.2060.67311870.501
STransE [32] 0.796690.5430.4953570.3150.9342080.6560.42251720.2260.07357970.049
CrossE [33] 0.8621360.7020.4702270.2980.9504410.8340.44952120.4050.65438390.446
TorusE [34] 0.8391430.7460.4472110.2810.9545250.9470.53548730.4630.474194550.342
RotatE [35] 0.881420.7910.5221780.3360.9602740.9490.57333180.4750.57018270.498
ConvE [36] 0.849510.6880.5212810.3050.9564130.9450.50749440.4270.65724290.488
ConvKB [38] 0.4083240.2110.5173090.2300.9482020.7090.52534290.2490.60416830.420
ConvR [37] 0.885700.7730.5262510.3460.9584710.9500.52656460.4670.67325820.527
CapsE [39] 0.2176100.0870.3564050.1600.9502330.8900.5597200.4150606760.000
RSN [40] 0.870510.7770.4442480.2800.9513460.9280.48342100.3950.66413390.511

Libraries

See also

Related Research Articles

In mathematics, a binary relation associates elements of one set called the domain with elements of another set called the codomain. Precisely, a binary relation over sets and is a set of ordered pairs where is in and is in . It encodes the common concept of relation: an element is related to an element , if and only if the pair belongs to the set of ordered pairs that defines the binary relation.

<span class="mw-page-title-main">Semantic network</span> Knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Projection of data onto lower-dimensional manifolds

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Inherently, Multi-task learning is a multi-objective optimization problem having trade-offs between different tasks. Early versions of MTL were called "hints".

In universal algebra and in model theory, a structure consists of a set along with a collection of finitary operations and relations that are defined on it.

A logical matrix, binary matrix, relation matrix, Boolean matrix, or (0, 1)-matrix is a matrix with entries from the Boolean domain B = {0, 1}. Such a matrix can be used to represent a binary relation between a pair of finite sets. It is an important tool in combinatorial mathematics and theoretical computer science.

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

Word2vec is a technique in natural language processing (NLP) for obtaining vector representations of words. These vectors capture information about the meaning of the word based on the surrounding words. The word2vec algorithm estimates these representations by modeling text in a large corpus. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. Word2vec was developed by Tomáš Mikolov and colleagues at Google and published in 2013.

<span class="mw-page-title-main">Stochastic block model</span>

The stochastic block model is a generative model for random graphs. This model tends to produce graphs containing communities, subsets of nodes characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation was first introduced in 1983 in the field of social network analysis by Paul W. Holland et al. The stochastic block model is important in statistics, machine learning, and network science, where it serves as a useful benchmark for the task of recovering community structure in graph data.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Machine learning architecture for modelling sequential data

A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. It is notable for its dramatic improvement over previous state-of-the-art models, and as an early example of a large language model. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.

In network theory, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a social network, predicting co-authorship links in a citation network, and predicting interactions between genes and proteins in a biological network. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time , the goal is to predict the links at time . Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.

<span class="mw-page-title-main">Attention (machine learning)</span> Machine learning technique

Attention is a machine learning method that determines the relative importance of each component in a sequence relative to the other components in that sequence. In natural language processing, importance is represented by "soft" weights assigned to each word in a sentence. More generally, attention encodes vectors called token embeddings across a fixed-width sequence that can range from tens to millions of tokens in size.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

<span class="mw-page-title-main">Vision transformer</span> Variant of Transformer designed for vision processing

A vision transformer (ViT) is a transformer designed for computer vision. A ViT breaks down an input image into a series of patches, serialises each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform: a prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem about leaves falling", or a longer statement including context, instructions, and conversation history.

Tensor informally refers in machine learning to two different concepts that organize and represent data. Data may be organized in a multidimensional array (M-way array) that is informally referred to as a "data tensor"; however in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor") may be analyzed either by artificial neural networks or tensor methods.

Topological Deep Learning (TDL) is a research field that extends deep learning to handle complex, non-Euclidean data structures. Traditional deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), excel in processing data on regular grids and sequences. However, scientific and real-world data often exhibit more intricate data domains encountered in scientific computations, including point clouds, meshes, time series, scalar fields graphs, or general topological spaces like simplicial complexes and CW complexes. TDL addresses this by incorporating topological concepts to process data with higher-order relationships, such as interactions among multiple entities and complex hierarchies. This approach leverages structures like simplicial complexes and hypergraphs to capture global dependencies and qualitative spatial properties, offering a more nuanced representation of data. TDL also encompasses methods from computational and algebraic topology that permit studying properties of neural networks and their training process, such as their predictive performance or generalization properties.,.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Ji, Shaoxiong; Pan, Shirui; Cambria, Erik; Marttinen, Pekka; Yu, Philip S. (2021). "A Survey on Knowledge Graphs: Representation, Acquisition, and Applications". IEEE Transactions on Neural Networks and Learning Systems. PP (2): 494–514. arXiv: 2002.00388 . doi:10.1109/TNNLS.2021.3070843. hdl:10072/416709. ISSN   2162-237X. PMID   33900922. S2CID   211010433.
  2. Mohamed, Sameh K; Nováček, Vít; Nounu, Aayah (2019-08-01). Cowen, Lenore (ed.). "Discovering Protein Drug Targets Using Knowledge Graph Embeddings". Bioinformatics. 36 (2): 603–610. doi:10.1093/bioinformatics/btz600. hdl: 10379/15375 . ISSN   1367-4803. PMID   31368482.
  3. 1 2 3 4 Lin, Yankai; Han, Xu; Xie, Ruobing; Liu, Zhiyuan; Sun, Maosong (2018-12-28). "Knowledge Representation Learning: A Quantitative Review". arXiv: 1812.10901 [cs.CL].
  4. 1 2 3 Abu-Salih, Bilal; Al-Tawil, Marwan; Aljarah, Ibrahim; Faris, Hossam; Wongthongtham, Pornpit; Chan, Kit Yan; Beheshti, Amin (2021-05-12). "Relational Learning Analysis of Social Politics using Knowledge Graph Embedding". Data Mining and Knowledge Discovery. 35 (4): 1497–1536. arXiv: 2006.01626 . doi:10.1007/s10618-021-00760-w. ISSN   1573-756X. S2CID   219179556.
  5. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Rossi, Andrea; Barbosa, Denilson; Firmani, Donatella; Matinata, Antonio; Merialdo, Paolo (2020). "Knowledge Graph Embedding for Link Prediction: A Comparative Analysis". ACM Transactions on Knowledge Discovery from Data. 15 (2): 1–49. arXiv: 2002.00819 . doi:10.1145/3424672. hdl:11573/1638610. ISSN   1556-4681. S2CID   211011226.
  6. 1 2 Paulheim, Heiko (2016-12-06). Cimiano, Philipp (ed.). "Knowledge graph refinement: A survey of approaches and evaluation methods". Semantic Web. 8 (3): 489–508. doi:10.3233/SW-160218. S2CID   13151033.
  7. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Dai, Yuanfei; Wang, Shiping; Xiong, Neal N.; Guo, Wenzhong (May 2020). "A Survey on Knowledge Graph Embedding: Approaches, Applications and Benchmarks". Electronics. 9 (5): 750. doi: 10.3390/electronics9050750 .
  8. Guo, Shu; Wang, Quan; Wang, Bin; Wang, Lihong; Guo, Li (2015). "Semantically Smooth Knowledge Graph Embedding". Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics. pp. 84–94. doi: 10.3115/v1/P15-1009 . S2CID   205692.
  9. 1 2 3 4 5 6 7 Bordes, Antoine; Usunier, Nicolas; Garcia-Durán, Alberto; Weston, Jason; Yakhnenko, Oksana (May 2013). "Translating embeddings for modeling multi-relational data". NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems. Curran Associates Inc. pp. 2787–2795.
  10. 1 2 3 4 5 6 7 8 9 10 11 12 Chen, Zhe; Wang, Yuehan; Zhao, Bin; Cheng, Jing; Zhao, Xin; Duan, Zongtao (2020). "Knowledge Graph Completion: A Review". IEEE Access. 8: 192435–192456. Bibcode:2020IEEEA...8s2435C. doi: 10.1109/ACCESS.2020.3030076 . ISSN   2169-3536. S2CID   226230006.
  11. 1 2 3 4 5 Cai, Hongyun; Zheng, Vincent W.; Chang, Kevin Chen-Chuan (2018-02-02). "A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications". arXiv: 1709.07604 [cs.AI].
  12. 1 2 Zhou, Sijin; Dai, Xinyi; Chen, Haokun; Zhang, Weinan; Ren, Kan; Tang, Ruiming; He, Xiuqiang; Yu, Yong (2020-06-18). "Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning". arXiv: 2006.10389 [cs.IR].
  13. Liu, Chan; Li, Lun; Yao, Xiaolu; Tang, Lin (August 2019). "A Survey of Recommendation Algorithms Based on Knowledge Graph Embedding". 2019 IEEE International Conference on Computer Science and Educational Informatization (CSEI). pp. 168–171. doi:10.1109/CSEI47661.2019.8938875. ISBN   978-1-7281-2308-0. S2CID   209459928.
  14. 1 2 Sosa, Daniel N.; Derry, Alexander; Guo, Margaret; Wei, Eric; Brinton, Connor; Altman, Russ B. (2020). "A Literature-Based Knowledge Graph Embedding Method for Identifying Drug Repurposing Opportunities in Rare Diseases". Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 25: 463–474. ISSN   2335-6936. PMC   6937428 . PMID   31797619.
  15. 1 2 Nickel, Maximilian; Tresp, Volker; Kriegel, Hans-Peter (2011-06-28). "A three-way model for collective learning on multi-relational data". ICML'11: Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress. pp. 809–816. ISBN   978-1-4503-0619-5.
  16. Nickel, Maximilian; Tresp, Volker; Kriegel, Hans-Peter (2012-04-16). "Factorizing YAGO". Proceedings of the 21st international conference on World Wide Web. Association for Computing Machinery. pp. 271–280. doi:10.1145/2187836.2187874. ISBN   978-1-4503-1229-5. S2CID   6348464.
  17. 1 2 3 4 5 6 7 8 9 10 Alshahrani, Mona; Thafar, Maha A.; Essack, Magbubah (2021-02-18). "Application and evaluation of knowledge graph embeddings in biomedical data". PeerJ Computer Science. 7: e341. doi: 10.7717/peerj-cs.341 . ISSN   2376-5992. PMC   7959619 . PMID   33816992.
  18. 1 2 3 4 5 6 7 8 9 10 11 Wang, Meihong; Qiu, Linling; Wang, Xiaoli (2021-03-16). "A Survey on Knowledge Graph Embeddings for Link Prediction". Symmetry. 13 (3): 485. Bibcode:2021Symm...13..485W. doi: 10.3390/sym13030485 . ISSN   2073-8994.
  19. 1 2 Yang, Bishan; Yih, Wen-tau; He, Xiaodong; Gao, Jianfeng; Deng, Li (2015-08-29). "Embedding Entities and Relations for Learning and Inference in Knowledge Bases". arXiv: 1412.6575 [cs.CL].
  20. 1 2 3 Trouillon, Théo; Welbl, Johannes; Riedel, Sebastian; Gaussier, Éric; Bouchard, Guillaume (2016-06-20). "Complex Embeddings for Simple Link Prediction". arXiv: 1606.06357 [cs.AI].
  21. 1 2 3 4 5 Liu, Hanxiao; Wu, Yuexin; Yang, Yiming (2017-07-06). "Analogical Inference for Multi-Relational Embeddings". arXiv: 1705.02426 [cs.LG].
  22. 1 2 3 Kazemi, Seyed Mehran; Poole, David (2018-10-25). "SimplE Embedding for Link Prediction in Knowledge Graphs". arXiv: 1802.04868 [stat.ML].
  23. 1 2 3 Nickel, Maximilian; Rosasco, Lorenzo; Poggio, Tomaso (2015-12-07). "Holographic Embeddings of Knowledge Graphs". arXiv: 1510.04935 [cs.AI].
  24. 1 2 3 4 Balažević, Ivana; Allen, Carl; Hospedales, Timothy M. (2019). "TuckER: Tensor Factorization for Knowledge Graph Completion". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 5184–5193. arXiv: 1901.09590 . doi:10.18653/v1/D19-1522. S2CID   59316623.
  25. 1 2 Ali, Mehdi; Berrendorf, Max; Hoyt, Charles Tapley; Vermue, Laurent; Galkin, Mikhail; Sharifzadeh, Sahand; Fischer, Asja; Tresp, Volker; Lehmann, Jens (2021). "Bringing Light into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework". IEEE Transactions on Pattern Analysis and Machine Intelligence. PP (12): 8825–8845. arXiv: 2006.13365 . doi:10.1109/TPAMI.2021.3124805. PMID   34735335. S2CID   220041612.
  26. 1 2 3 Tran, Hung Nghiep; Takasu, Atsuhiro (2020). "Multi-Partition Embedding Interaction with Block Term Format for Knowledge Graph Completion". Proceedings of the European Conference on Artificial Intelligence (ECAI 2020). Frontiers in Artificial Intelligence and Applications. Vol. 325. IOS Press. pp. 833–840. doi:10.3233/FAIA200173. S2CID   220265751.
  27. 1 2 3 Tran, Hung-Nghiep; Takasu, Atsuhiro (2022-07-16). "MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction". Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Vol. 3. pp. 2262–2269. doi:10.24963/ijcai.2022/314. ISBN   978-1-956792-00-3. S2CID   250635995.
  28. 1 2 Wang, Zhen (2014). "Knowledge Graph Embedding by Translating on Hyperplanes". Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 28. doi:10.1609/aaai.v28i1.8870. S2CID   15027084.
  29. Lin, Yankai; Liu, Zhiyuan; Sun, Maosong; Liu, Yang; Zhu, Xuan (2015-01-25). Learning entity and relation embeddings for knowledge graph completion. AAAI Press. pp. 2181–2187. ISBN   978-0-262-51129-2.
  30. 1 2 3 4 5 Ji, Guoliang; He, Shizhu; Xu, Liheng; Liu, Kang; Zhao, Jun (July 2015). "Knowledge Graph Embedding via Dynamic Mapping Matrix". Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics. pp. 687–696. doi:10.3115/v1/P15-1067. S2CID   11202498.
  31. 1 2 3 4 Xiao, Han; Huang, Minlie; Hao, Yu; Zhu, Xiaoyan (2015-09-27). "TransA: An Adaptive Approach for Knowledge Graph Embedding". arXiv: 1509.05490 [cs.CL].
  32. 1 2 3 4 5 Nguyen, Dat Quoc; Sirts, Kairit; Qu, Lizhen; Johnson, Mark (June 2016). "STransE: A novel embedding model of entities and relationships in knowledge bases". Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics. pp. 460–466. arXiv: 1606.08140 . doi:10.18653/v1/N16-1054. S2CID   9884935.
  33. 1 2 3 4 5 6 7 Zhang, Wen; Paudel, Bibek; Zhang, Wei; Bernstein, Abraham; Chen, Huajun (2019-01-30). "Interaction Embeddings for Prediction and Explanation in Knowledge Graphs". Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 96–104. arXiv: 1903.04750 . doi:10.1145/3289600.3291014. ISBN   9781450359405. S2CID   59516071.
  34. 1 2 3 4 Ebisu, Takuma; Ichise, Ryutaro (2017-11-15). "TorusE: Knowledge Graph Embedding on a Lie Group". arXiv: 1711.05435 [cs.AI].
  35. 1 2 3 4 5 Sun, Zhiqing; Deng, Zhi-Hong; Nie, Jian-Yun; Tang, Jian (2019-02-26). "RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space". arXiv: 1902.10197 [cs.LG].
  36. 1 2 3 4 5 6 Dettmers, Tim; Minervini, Pasquale; Stenetorp, Pontus; Riedel, Sebastian (2018-07-04). "Convolutional 2D Knowledge Graph Embeddings". arXiv: 1707.01476 [cs.LG].
  37. 1 2 3 4 Jiang, Xiaotian; Wang, Quan; Wang, Bin (June 2019). "Adaptive Convolution for Multi-Relational Learning". Proceedings of the 2019 Conference of the North. Association for Computational Linguistics. pp. 978–987. doi:10.18653/v1/N19-1103. S2CID   174800352.
  38. 1 2 3 4 Nguyen, Dai Quoc; Nguyen, Tu Dinh; Nguyen, Dat Quoc; Phung, Dinh (2018). "A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network". Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). pp. 327–333. arXiv: 1712.02121 . doi:10.18653/v1/N18-2053. S2CID   3882054.
  39. 1 2 3 4 5 Nguyen, Dai Quoc; Vu, Thanh; Nguyen, Tu Dinh; Nguyen, Dat Quoc; Phung, Dinh (2019-03-06). "A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization". arXiv: 1808.04122 [cs.CL].
  40. 1 2 3 4 5 6 Guo, Lingbing; Sun, Zequn; Hu, Wei (2019-05-13). "Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs". arXiv: 1905.04914 [cs.AI].
  41. 1 2 Toutanova, Kristina; Chen, Danqi (July 2015). "Observed versus latent features for knowledge base and text inference". Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality. Association for Computational Linguistics. pp. 57–66. doi: 10.18653/v1/W15-4007 . S2CID   5378837.
  42. 1 2 Mahdisoltani, F.; Biega, J.; Suchanek, Fabian M. (2015). "YAGO3: A Knowledge Base from Multilingual Wikipedias". CIDR. S2CID   6611164.
  43. Hu, Weihua; Fey, Matthias; Zitnik, Marinka; Dong, Yuxiao; Ren, Hongyu; Liu, Bowen; Catasta, Michele; Leskovec, Jure (2021-02-24). "Open Graph Benchmark: Datasets for Machine Learning on Graphs". arXiv: 2005.00687 [cs.LG].