Latent space

Last updated

A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another. Position within the latent space can be viewed as being defined by a set of latent variables that emerge from the resemblances from the objects.

Contents

In most cases, the dimensionality of the latent space is chosen to be lower than the dimensionality of the feature space from which the data points are drawn, making the construction of a latent space an example of dimensionality reduction, which can also be viewed as a form of data compression. [1] Latent spaces are usually fit via machine learning, and they can then be used as feature spaces in machine learning models, including classifiers and other supervised predictors.

The interpretation of the latent spaces of machine learning models is an active field of study, but latent space interpretation is difficult to achieve. Due to the black-box nature of machine learning models, the latent space may be completely unintuitive. Additionally, the latent space may be high-dimensional, complex, and nonlinear, which may add to the difficulty of interpretation. [2] Some visualization techniques have been developed to connect the latent space to the visual world, but there is often not a direct connection between the latent space interpretation and the model itself. Such techniques include t-distributed stochastic neighbor embedding (t-SNE), where the latent space is mapped to two dimensions for visualization. Latent space distances lack physical units, so the interpretation of these distances may depend on the application. [3]

Embedding models

Several embedding models have been developed to perform this transformation to create latent space embeddings given a set of data items and a similarity function. These models learn the embeddings by leveraging statistical techniques and machine learning algorithms. Here are some commonly used embedding models:

  1. Word2Vec: [4] Word2Vec is a popular embedding model used in natural language processing (NLP). It learns word embeddings by training a neural network on a large corpus of text. Word2Vec captures semantic and syntactic relationships between words, allowing for meaningful computations like word analogies.
  2. GloVe: [5] GloVe (Global Vectors for Word Representation) is another widely used embedding model for NLP. It combines global statistical information from a corpus with local context information to learn word embeddings. GloVe embeddings are known for capturing both semantic and relational similarities between words.
  3. Siamese Networks: [6] Siamese networks are a type of neural network architecture commonly used for similarity-based embedding. They consist of two identical subnetworks that process two input samples and produce their respective embeddings. Siamese networks are often used for tasks like image similarity, recommendation systems, and face recognition.
  4. Variational Autoencoders (VAEs): [7] VAEs are generative models that simultaneously learn to encode and decode data. The latent space in VAEs acts as an embedding space. By training VAEs on high-dimensional data, such as images or audio, the model learns to encode the data into a compact latent representation. VAEs are known for their ability to generate new data samples from the learned latent space.

Multimodality

Multimodality refers to the integration and analysis of multiple modes or types of data within a single model or framework. Embedding multimodal data involves capturing relationships and interactions between different data types, such as images, text, audio, and structured data.

Multimodal embedding models aim to learn joint representations that fuse information from multiple modalities, allowing for cross-modal analysis and tasks. These models enable applications like image captioning, visual question answering, and multimodal sentiment analysis.

To embed multimodal data, specialized architectures such as deep multimodal networks or multimodal transformers are employed. These architectures combine different types of neural network modules to process and integrate information from various modalities. The resulting embeddings capture the complex relationships between different data types, facilitating multimodal analysis and understanding.

Applications

Embedding latent space and multimodal embedding models have found numerous applications across various domains:

See also

Related Research Articles

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate speech. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Unsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

<span class="mw-page-title-main">Autoencoder</span> Neural network that learns efficient data encoding in an unsupervised manner

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction.

<span class="mw-page-title-main">Distributional semantics</span> Field of linguistics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

<span class="mw-page-title-main">Deeplearning4j</span> Open-source deep learning library

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

<span class="mw-page-title-main">Word embedding</span> Method in natural language processing

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

<span class="mw-page-title-main">Word2vec</span> Models used to produce word embeddings

Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector. The vectors are chosen carefully such that they capture the semantic and syntactic qualities of words; as such, a simple mathematical function can indicate the level of semantic similarity between the words represented by those vectors.

GloVe, coined from Global Vectors, is a model for distributed word representation. The model is an unsupervised learning algorithm for obtaining vector representations for words. This is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. It is developed as an open-source project at Stanford and was launched in 2014. As log-bilinear regression model for unsupervised learning of word representations, it combines the features of two model families, namely the global matrix factorization and local context window methods.

Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch and ambiguity of natural language.

Bidirectional Encoder Representations from Transformers (BERT) is a family of language models introduced in 2018 by researchers at Google. A 2020 literature survey concluded that "in a little over a year, BERT has become a ubiquitous baseline in Natural Language Processing (NLP) experiments counting over 150 research publications analyzing and improving the model."

<span class="mw-page-title-main">Variational autoencoder</span> Deep learning generative model to encode data representation

In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling. It is part of the families of probabilistic graphical models and variational Bayesian methods.

An energy-based model (EBM) is a form of generative model (GM) imported directly from statistical physics to learning. GMs learn an underlying data distribution by analyzing a sample dataset. Once trained, a GM can produce other datasets that also match the data distribution. EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models.

struc2vec is a framework to generate node vector representations on a graph that preserve the structural identity. In contrast to node2vec, that optimizes node embeddings so that nearby nodes in the graph have similar embedding, struc2vec captures the roles of nodes in a graph, even if structurally similar nodes are far apart in the graph. It learns low-dimensional representations for nodes in a graph, generating random walks through a constructed multi-layer graph starting at each graph node. It is useful for machine learning applications where the downstream application is more related with the structural equivalence of the nodes. struc2vec identifies nodes that play a similar role based solely on the structure of the graph, for example computing the structural identity of individuals in social networks. In particular, struc2vec employs a degree-based method to measure the pairwise structural role similarity, which is then adopted to build the multi-layer graph. Moreover, the distance between the latent representation of nodes is strongly correlated to their structural similarity. The framework contains three optimizations: reducing the length of degree sequences considered, reducing the number of pairwise similarity calculations, and reducing the number of layers in the generated graph.

<span class="mw-page-title-main">Self-supervised learning</span> A paradigm in machine learning

Self-supervised learning (SSL) is a paradigm in machine learning for processing data of lower quality, rather than improving ultimate outcomes. Self-supervised learning more closely imitates the way humans learn to classify objects.

References

  1. Liu, Yang; Jun, Eunice; Li, Qisheng; Heer, Jeffrey (June 2019). "Latent Space Cartography: Visual Analysis of Vector Space Embeddings". Computer Graphics Forum. 38 (3): 67–78. doi:10.1111/cgf.13672. ISSN   0167-7055. S2CID   189858337.
  2. Li, Ziqiang; Tao, Rentuo; Wang, Jie; Li, Fu; Niu, Hongjing; Yue, Mingdao; Li, Bin (February 2021). "Interpreting the Latent Space of GANs via Measuring Decoupling". IEEE Transactions on Artificial Intelligence. 2 (1): 58–70. doi:10.1109/TAI.2021.3071642. ISSN   2691-4581. S2CID   234847784.
  3. Arvanitidis, Georgios; Hansen, Lars Kai; Hauberg, Søren (13 December 2021). "Latent Space Oddity: on the Curvature of Deep Generative Models". arXiv: 1710.11379 [stat.ML].
  4. Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg S; Dean, Jeff (2013). "Distributed Representations of Words and Phrases and their Compositionality". Advances in Neural Information Processing Systems. Curran Associates, Inc. 26. arXiv: 1310.4546 .
  5. Pennington, Jeffrey; Socher, Richard; Manning, Christopher (October 2014). "Glove: Global Vectors for Word Representation". Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics. pp. 1532–1543. doi: 10.3115/v1/D14-1162 .
  6. Chicco, Davide (2021), Cartwright, Hugh (ed.), "Siamese Neural Networks: An Overview", Artificial Neural Networks, Methods in Molecular Biology, New York, NY: Springer US, vol. 2190, pp. 73–94, doi:10.1007/978-1-0716-0826-5_3, ISBN   978-1-0716-0826-5, PMID   32804361, S2CID   221144012 , retrieved 2023-06-26
  7. Kingma, Diederik P.; Welling, Max (2019-11-27). "An Introduction to Variational Autoencoders". Foundations and Trends in Machine Learning. 12 (4): 307–392. arXiv: 1906.02691 . doi:10.1561/2200000056. ISSN   1935-8237. S2CID   174802445.
  8. Gürsoy, Furkan; Badur, Bertan (2022-10-06). "Investigating internal migration with network analysis and latent space representations: an application to Turkey". Social Network Analysis and Mining. 12 (1): 150. doi:10.1007/s13278-022-00974-w. ISSN   1869-5469. PMC   9540093 . PMID   36246429.
  9. Asatani, Kimitaka; Mori, Junichiro; Ochi, Masanao; Sakata, Ichiro (2018-05-21). "Detecting trends in academic research from a citation network using network representation learning". PLOS ONE. 13 (5): e0197260. doi: 10.1371/journal.pone.0197260 . ISSN   1932-6203. PMC   5962067 . PMID   29782521.
  10. García-Pérez, Guillermo; Boguñá, Marián; Allard, Antoine; Serrano, M. Ángeles (2016-09-16). "The hidden hyperbolic geometry of international trade: World Trade Atlas 1870–2013". Scientific Reports. 6 (1): 33441. doi:10.1038/srep33441. ISSN   2045-2322. PMC   5025783 . PMID   27633649.