Semantic folding

Last updated

Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex. [1]

Contents

Theory

Semantic folding theory draws inspiration from Douglas R. Hofstadter's Analogy as the Core of Cognition which suggests that the brain makes sense of the world by identifying and applying analogies. [2] The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a similarity measure and offers, as a solution, the sparse binary vector employing a two-dimensional topographic semantic space as a distributional reference frame. The theory builds on the computational theory of the human cortex known as hierarchical temporal memory (HTM), and positions itself as a complementary theory for the representation of language semantics.

A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level.

Two-dimensional semantic space

Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectors [note 1] in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This vector space model is presented in the theory as an equivalence to the well known word space model [3] described in the information retrieval literature.

Given a semantic space (implemented as described above) a word-vector [note 2] can be obtained for any given word Y by employing the following algorithm:

For each position X in the semantic map (where X represents cartesian coordinates)     if the word Y is contained in the context-vector at position X         then add 1 to the corresponding position in the word-vector for Yelse         add 0 to the corresponding position in the word-vector for Y

The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format [Schütze, 1993] & [Sahlgreen, 2006]. [3] [4] Some properties of word-SDRs that are of particular interest with respect to computational semantics are: [5]

Semantic spaces

Semantic spaces [note 3] [6] in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings).

The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. [7] [8] Rule-based and machine learning-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models.

Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis [9] from Microsoft and Hyperspace Analogue to Language [10] from the University of California. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) [11] in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors.

More recently, advances in neural networking techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec [12] from Google and GloVe [13] from Stanford University.

Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns. [5]

Visualization

Semantic fingerprint image comparing the terms "dog" and "car". Semantic fingerprint comparing the terms "dog" and "car".png
Semantic fingerprint image comparing the terms "dog" and "car".
Semantic fingerprint image comparing the terms "jaguar" and "Porsche" Semantic fingerprint comparing the terms "jaguar" and "Porsche".png
Semantic fingerprint image comparing the terms "jaguar" and "Porsche"

The topological distribution over a two-dimensional grid (outlined above) lends itself to a bitmap type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a pixel. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items.

Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics.

Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the fMRI images produced in a research study conducted by A.G. Huth et al., [14] [15] where it is claimed that words are grouped in the brain by meaning. voxels, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively. [16] [17] [18]

Notes

  1. A context-vector is defined as a vector containing all the words in a particular context.
  2. A word-vector or word-SDR is referred to as a Semantic Fingerprint in Semantic Folding theory.
  3. also referred to as distributed semantic spaces or distributed semantic memory

Related Research Articles

<span class="mw-page-title-main">Semantics</span> Study of meaning in language

Semantics is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and computer science.

<span class="mw-page-title-main">Semantic network</span> Knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.

Computational semiotics is an interdisciplinary field that applies, conducts, and draws on research in logic, mathematics, the theory and practice of computation, formal and natural language studies, the cognitive sciences generally, and semiotics proper. The term encompasses both the application of semiotics to computer hardware and software design and, conversely, the use of computation for performing semiotic analysis. The former focuses on what semiotics can bring to computation; the latter on what computation can bring to semiotics.

Semantic memory refers to general world knowledge that humans have accumulated throughout their lives. This general knowledge is intertwined in experience and dependent on culture. New concepts are learned by applying knowledge learned from things in the past.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

<span class="mw-page-title-main">Distributional semantics</span> Field of linguistics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.

Hierarchical temporal memory (HTM) is a biologically constrained machine intelligence technology developed by Numenta. Originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee, HTM is primarily used today for anomaly detection in streaming data. The technology is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain.

In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word. Given that the output of word-sense induction is a set of senses for the target word, this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.

Pentti Kanerva is an American neuroscientist who is the originator of the sparse distributed memory model. He is responsible for relating the properties of long-term memory to mathematical properties of high-dimensional spaces and compares artificial neural-net associative memory to conventional computer random-access memory and to the neurons in the brain. This theory has been applied to design and implement the random indexing approach to learning semantic relations from linguistic data.

A conceptual space is a geometric structure that represents a number of quality dimensions, which denote basic features by which concepts and objects can be compared, such as weight, color, taste, temperature, pitch, and the three ordinary spatial dimensions. In a conceptual space, points denote objects, and regions denote concepts. The theory of conceptual spaces is a theory about concept learning first proposed by Peter Gärdenfors. It is motivated by notions such as conceptual similarity and prototype theory.

Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimensionality when new items are encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately.

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">Word embedding</span> Method in natural language processing

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

<span class="mw-page-title-main">Word2vec</span> Models used to produce word embeddings

Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector. The vectors are chosen carefully such that they capture the semantic and syntactic qualities of words; as such, a simple mathematical function can indicate the level of semantic similarity between the words represented by those vectors.

Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch and ambiguity of natural language.

<span class="mw-page-title-main">Sentence embedding</span>

In natural language processing, a sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information.

References

  1. De Sousa Webber, Francisco (2015). "Semantic Folding theory and its Application in Semantic Fingerprinting". Cornell University Library. arXiv: 1511.08855 . Bibcode:2015arXiv151108855D.
  2. The Analogical Mind. 2 March 2001. ISBN   9780262072069 . Retrieved 2016-04-18.{{cite book}}: |website= ignored (help)
  3. 1 2 Sahlgreen, Magnus (2006). "The Word-Space Model".
  4. Schütze, Hinrich (1993). "Word Space": 895–902. CiteSeerX   10.1.1.41.8856 .{{cite journal}}: Cite journal requires |journal= (help)
  5. 1 2 Subutai Ahmad; Jeff Hawkins (2015). "Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory". arXiv: 1503.07469 [q-bio.NC].
  6. Baroni, Marco; Lenci, Alessandro (2010). "Distributional Memory: A General Framework for Corpus-Based Semantics". Computational Linguistics. 36 (4): 673–721. CiteSeerX   10.1.1.331.3769 . doi:10.1162/coli_a_00016. S2CID   5584134.
  7. Scott C. Deerwester; Susan T. Dumais; Thomas K. Landauer; George W. Furnas; Richard A. Harshen (1990). "Indexing by Latent Semantic Analysis" (PDF). Journal of the American Society for Information Science.
  8. Xing Wei; W. Bruce Croft (2007). "Investigating retrieval performance with manually-built topic models". Proceeding RIAO '07 Large Scale Semantic Access to Content (Text, Image, Video, and Sound). Riao '07: 333–349.
  9. "LSA: A Solution to Plato's Problem". lsa.colorado.edu. Retrieved 2016-04-19.
  10. Lund, Kevin; Burgess, Curt (1996-06-01). "Producing high-dimensional semantic spaces from lexical co-occurrence". Behavior Research Methods, Instruments, & Computers. 28 (2): 203–208. doi: 10.3758/BF03204766 . ISSN   0743-3808.
  11. Evgeniy Gabrilovich & Shaul Markovitch (2007). "Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis" (PDF). Proc. 20th Int'l Joint Conf. On Artificial Intelligence (IJCAI). Pp. 1606–1611.
  12. Tomas Mikolov; Ilya Sutskever; Kai Chen; Greg Corrado; Jeffrey Dean (2013). "Distributed Representations of Words and Phrases and their Compositionality". arXiv: 1310.4546 [cs.CL].
  13. Jeffrey Pennington; Richard Socher; Christopher D. Manning (2014). "GloVe: Global Vectors for Word Representation" (PDF).
  14. Huth, Alexander (27 April 2016). "Natural speech reveals the semantic maps that tile human cerebral cortex". Nature. 532 (7600): 453–458. Bibcode:2016Natur.532..453H. doi:10.1038/nature17637. PMC   4852309 . PMID   27121839.
  15. "Brain". gallantlab.org. Retrieved 2022-02-16.
  16. Popham, Sara F.; Huth, Alexander G.; Bilenko, Natalia Y.; Deniz, Fatma; Gao, James S.; Nunez-Elizalde, Anwar O.; Gallant, Jack L. (11 August 2021). "Visual and linguistic semantic representations are aligned at the border of human visual cortex". Nature Neuroscience. 24 (11): 1628–1636. doi:10.1038/s41593-021-00921-6. ISSN   1097-6256. PMID   34711960. S2CID   240152854.
  17. Steel, Adam; Billings, Madeleine M.; Silson, Edward H.; Robertson, Caroline E. (2021-05-11). "A network linking scene perception and spatial memory systems in posterior cerebral cortex". Nature Communications. 12 (1): 2632. Bibcode:2021NatCo..12.2632S. doi:10.1038/s41467-021-22848-z. ISSN   2041-1723. PMC   8113503 . PMID   33976141.
  18. Cepelewicz, Jordana (2022-02-08). "New Map of Meaning in the Brain Changes Ideas About Memory". Quanta Magazine. Retrieved 2022-02-16.