Semantic similarity

Last updated

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content [ citation needed ] as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. [1] [2] The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. [3] For example, "car" is similar to "bus", but is also related to "road" and "driving".

Contents

Computationally, semantic similarity can be estimated by defining a topological similarity, by using ontologies to define the distance between terms/concepts. For example, a naive metric for the comparison of concepts ordered in a partially ordered set and represented as nodes of a directed acyclic graph (e.g., a taxonomy), would be the shortest-path linking the two concept nodes. Based on text analyses, semantic relatedness between units of language (e.g., words, sentences) can also be estimated using statistical means such as a vector space model to correlate words and textual contexts from a suitable text corpus. The evaluation of the proposed semantic similarity / relatedness measures are evaluated through two main ways. The former is based on the use of datasets designed by experts and composed of word pairs with semantic similarity / relatedness degree estimation. The second way is based on the integration of the measures inside specific applications such as information retrieval, recommender systems, natural language processing, etc.

Terminology

The concept of semantic similarity is more specific than semantic relatedness, as the latter includes concepts as antonymy and meronymy, while similarity does not. [4] However, much of the literature uses these terms interchangeably, along with terms like semantic distance. In essence, semantic similarity, semantic distance, and semantic relatedness all mean, "How much does term A have to do with term B?" The answer to this question is usually a number between −1 and 1, or between 0 and 1, where 1 signifies extremely high similarity.

Visualization

An intuitive way of visualizing the semantic similarity of terms is by grouping together terms which are closely related and spacing wider apart the ones which are distantly related. This is also common in practice for mind maps and concept maps.

A more direct way of visualizing the semantic similarity of two linguistic items can be seen with the Semantic Folding approach. In this approach a linguistic item such as a term or a text can be represented by generating a pixel for each of its active semantic features in e.g. a 128 x 128 grid. This allows for a direct visual comparison of the semantics of two items by comparing image representations of their respective feature sets.

Applications

In biomedical informatics

Semantic similarity measures have been applied and developed in biomedical ontologies. [5] [6] They are mainly used to compare genes and proteins based on the similarity of their functions [7] rather than on their sequence similarity, but they are also being extended to other bioentities, such as diseases. [8]

These comparisons can be done using tools freely available on the web:

In geoinformatics

Similarity is also applied in geoinformatics to find similar geographic features or feature types: [12]

In computational linguistics

Several metrics use WordNet, a manually constructed lexical database of English words. Despite the advantages of having human supervision in constructing the database, since the words are not automatically learned the database cannot measure relatedness between multi-word term, non-incremental vocabulary. [4] [18]

In natural language processing

Natural language processing (NLP) is a field of computer science and linguistics. Sentiment analysis, Natural language understanding and Machine translation (Automatically translate text from one human language to another) are a few of the major areas where it is being used. For example, knowing one information resource in the internet, it is often of immediate interest to find similar resources. The Semantic Web provides semantic extensions to find similar data by content and not just by arbitrary descriptors. [19] [20] [21] [22] [23] [24] [25] [26] [27] Deep learning methods have become an accurate way to gauge semantic similarity between two text passages, in which each passage is first embedded into a continuous vector representation. [28] [29] [30]

In ontology matching

Semantic similarity plays a crucial role in ontology alignment, which aims to establish correspondences between entities from different ontologies. It involves quantifying the degree of similarity between concepts or terms using the information present in the ontology for each entity, such as labels, descriptions, and hierarchical relations to other entities. Traditional metrics used in ontology matching are based on a lexical similarity between features of the entities, such as using the Levenshtein distance to measure the edit distance between entity labels. [31] However, it is difficult to capture the semantic similarity between entities using these metrics. For example, when comparing two ontologies describing conferences, the entities "Contribution" and "Paper" may have high semantic similarity since they share the same meaning. Nonetheless, due to their lexical differences, lexicographical similarity alone cannot establish this alignment. To capture these semantic similarities, embeddings are being adopted in ontology matching. [32] By encoding semantic relationships and contextual information, embeddings enable the calculation of similarity scores between entities based on the proximity of their vector representations in the embedding space. This approach allows for efficient and accurate matching of ontologies since embeddings can model semantic differences in entity naming, such as homonymy, by assigning different embeddings to the same word based on different contexts. [32]

Measures

Topological similarity

There are essentially two types of approaches that calculate topological similarity between ontological concepts:

Other measures calculate the similarity between ontological instances:

Some examples:

Edge-based

  • Pekar et al. [33]
  • Cheng and Cline [34]
  • Wu et al. [35]
  • Del Pozo et al. [36]
  • IntelliGO: Benabderrahmane et al. [6]

Node-based

  • Resnik [37]
    • based on the notion of information content. The information content of a concept (term or word) is the logarithm of the probability of finding the concept in a given corpus.
    • only considers the information content of lowest common subsumer (lcs). A lowest common subsumer is a concept in a lexical taxonomy ( e.g. WordNet), which has the shortest distance from the two concepts compared. For example, animal and mammal both are the subsumers of cat and dog, but mammal is lower subsumer than animal for them.
  • Lin [38]
    • based on Resnik's similarity.
    • considers the information content of lowest common subsumer (lcs) and the two compared concepts.
  • Maguitman, Menczer, Roinestad and Vespignani [39]
    • Generalizes Lin's similarity to arbitrary ontologies (graphs).
  • Jiang and Conrath [40]
    • based on Resnik's similarity.
    • considers the information content of lowest common subsumer (lcs) and the two compared concepts to calculate the distance between the two concepts. The distance is later used in computing the similarity measure.
  • Align, Disambiguate, and Walk: Random walks on Semantic Networks [41]

Node-and-relation-content-based

  • applicable to ontology
  • consider properties (content) of nodes
  • consider types (content) of relations
  • based on eTVSM [42]
  • based on Resnik's similarity [43]

Pairwise

  • maximum of the pairwise similarities
  • composite average in which only the best-matching pairs are considered (best-match average)

Groupwise

Statistical similarity

Statistical similarity approaches can be learned from data, or predefined. Similarity learning can often outperform predefined similarity measures. Broadly speaking, these approaches build a statistical model of documents, and use it to estimate similarity.

Semantics-based similarity

Semantics similarity networks

Gold standards

Researchers have collected datasets with similarity judgements on pairs of words, which are used to evaluate the cognitive plausibility of computational measures. The golden standard up to today is an old 65 word list where humans have judged the word similarity. [57] [58]

See also

Related Research Articles

<span class="mw-page-title-main">Semantic network</span> Knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map. Typical standardized semantic networks are expressed as semantic triples.

<span class="mw-page-title-main">WordNet</span> Computational lexicon of English

WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.

In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

The sequence between semantic related ordered words is classified as a lexical chain. A lexical chain is a sequence of related words in writing, spanning narrow or wide context window. A lexical chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text. A lexical chain can provide a context for the resolution of an ambiguous term and enable disambiguation of concepts that the term represents.

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

<span class="mw-page-title-main">Distributional semantics</span> Field of linguistics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

Semantic analytics, also termed semantic relatedness, is the use of ontologies to analyze content in web resources. This field of research combines text analytics and Semantic Web technologies like RDF. Semantic analytics measures the relatedness of different ontological concepts.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word. Given that the output of word-sense induction is a set of senses for the target word, this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectoral representation of text that uses a document corpus as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf–idf matrix of the text corpus and a document is represented as the centroid of the vectors representing its words. Typically, the text corpus is English Wikipedia, though other corpora including the Open Directory Project have been used.

<span class="mw-page-title-main">BabelNet</span> Multilingual semantic network and encyclopedic dictionary

BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions in many languages harvested from both WordNet and Wikipedia.

Automatic taxonomy construction (ATC) is the use of software programs to generate taxonomical classifications from a body of texts called a corpus. ATC is a branch of natural language processing, which in turn is a branch of artificial intelligence.

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

UBY is a large-scale lexical-semantic resource for natural language processing (NLP) developed at the Ubiquitous Knowledge Processing Lab (UKP) in the department of Computer Science of the Technische Universität Darmstadt . UBY is based on the ISO standard Lexical Markup Framework (LMF) and combines information from several expert-constructed and collaboratively constructed resources for English and German.

Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex.

Paraphrase or paraphrasing in computational linguistics is the natural language processing task of detecting and generating paraphrases. Applications of paraphrasing are varied including information retrieval, question answering, text summarization, and plagiarism detection. Paraphrasing is also useful in the evaluation of machine translation, as well as semantic parsing and generation of new samples to expand existing corpora.

References

  1. Harispe S.; Ranwez S.; Janaqi S.; Montmain J. (2015). "Semantic Similarity from Natural Language and Ontology Analysis". Synthesis Lectures on Human Language Technologies. 8 (1): 1–254. arXiv: 1704.05295 . doi:10.2200/S00639ED1V01Y201504HLT027. S2CID   17428739.
  2. Feng Y.; Bagheri E.; Ensan F.; Jovanovic J. (2017). "The state of the art in semantic relatedness: a framework for comparison". Knowledge Engineering Review. 32: 1–30. doi:10.1017/S0269888917000029. S2CID   52172371.
  3. A. Ballatore; M. Bertolotto; D.C. Wilson (2014). "An evaluative baseline for geo-semantic relatedness and similarity". GeoInformatica. 18 (4): 747–767. arXiv: 1402.3371 . Bibcode:2014arXiv1402.3371B. doi:10.1007/s10707-013-0197-8. S2CID   17474023.
  4. 1 2 Budanitsky, Alexander; Hirst, Graeme (2001). "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures" (PDF). Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh.
  5. Guzzi, Pietro Hiram; Mina, Marco; Cannataro, Mario; Guerra, Concettina (2012). "Semantic similarity analysis of protein data: assessment with biological features and issues". Briefings in Bioinformatics. 13 (5): 569–585. doi: 10.1093/bib/bbr066 . PMID   22138322.
  6. 1 2 Benabderrahmane, Sidahmed; Smail Tabbone, Malika; Poch, Olivier; Napoli, Amedeo; Devignes, Marie-Domonique. (2010). "IntelliGO: a new vector-based semantic similarity measure including annotation origin". BMC Bioinformatics. 11: 588. doi: 10.1186/1471-2105-11-588 . PMC   3098105 . PMID   21122125.
  7. Chicco, D; Masseroli, M (2015). "Software suite for gene and protein annotation prediction and similarity search". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 12 (4): 837–843. doi:10.1109/TCBB.2014.2382127. hdl: 11311/959408 . PMID   26357324. S2CID   14714823.
  8. Köhler, S; Schulz, MH; Krawitz, P; Bauer, S; Dolken, S; Ott, CE; Mundlos, C; Horn, D; et al. (2009). "Clinical diagnostics in human genetics with semantic similarity searches in ontologies". American Journal of Human Genetics. 85 (4): 457–64. doi:10.1016/j.ajhg.2009.09.003. PMC   2756558 . PMID   19800049.
  9. "ProteInOn".
  10. "CMPSim".
  11. "CESSM".
  12. Janowicz, K.; Raubal, M.; Kuhn, W. (2011). "The semantics of similarity in geographic information retrieval". Journal of Spatial Information Science. 2 (2): 29–57. doi: 10.5311/josis.2011.2.3 .
  13. Algorithm, implementation and application of the SIM-DL similarity server. Second International Conference on Geospatial Semantics (GEOS 2007). Lecture Notes in Computer Science. 2007. pp. 128–145. CiteSeerX   10.1.1.172.5544 .
  14. "Geo-Net-PT Similarity Calculator".
  15. "Geo-Net-PT".
  16. "OSM Semantic Network". OSM Wiki.
  17. A. Ballatore; D.C. Wilson; M. Bertolotto. "Geographic Knowledge Extraction and Semantic Similarity in OpenStreetMap" (PDF). Knowledge and Information Systems: 61–81.
  18. Kaur, I. & Hornof, A.J. (2005). "A comparison of LSA, wordNet and PMI-IR for predicting user click behavior". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 51–60. doi:10.1145/1054972.1054980. ISBN   978-1-58113-998-3. S2CID   14347026.
  19. Similarity-based Learning Methods for the Semantic Web (C. d'Amato, PhD Thesis)
  20. Gracia, J. & Mena, E. (2008). "Web-Based Measure of Semantic Relatedness" (PDF). Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE '08): 136–150.
  21. Raveendranathan, P. (2005). Identifying Sets of Related Words from the World Wide Web. Master of Science Thesis, University of Minnesota Duluth.
  22. Wubben, S. (2008). Using free link structure to calculate semantic relatedness. In ILK Research Group Technical Report Series, nr. 08-01, 2008.
  23. Juvina, I., van Oostendorp, H., Karbor, P., & Pauw, B. (2005). Towards modeling contextual information in web navigation. In B. G. Bara & L. Barsalou & M. Bucciarelli (Eds.), 27th Annual Meeting of the Cognitive Science Society, CogSci2005 (pp. 1078–1083). Austin, Tx: The Cognitive Science Society, Inc.
  24. Navigli, R., Lapata, M. (2007). Graph Connectivity Measures for Unsupervised Word Sense Disambiguation, Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, January 6–12th, 2007, pp. 1683–1688.
  25. Pirolli, P. (2005). "Rational analyses of information foraging on the Web". Cognitive Science. 29 (3): 343–373. doi: 10.1207/s15516709cog0000_20 . PMID   21702778.
  26. Pirolli, P. & Fu, W.-T. (2003). "SNIF-ACT: A model of information foraging on the World Wide Web". Lecture Notes in Computer Science. Vol. 2702. pp. 45–54. CiteSeerX   10.1.1.6.1506 . doi:10.1007/3-540-44963-9_8. ISBN   978-3-540-40381-4.
  27. Turney, P. (2001). Mining the Web for Synonyms: PMI versus LSA on TOEFL. In L. De Raedt & P. Flach (Eds.), Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp. 491–502). Freiburg, Germany.
  28. Reimers, Nils; Gurevych, Iryna (November 2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics. pp. 3982–3992. arXiv: 1908.10084 . doi: 10.18653/v1/D19-1410 .
  29. Mueller, Jonas; Thyagarajan, Aditya (2016-03-05). "Siamese Recurrent Architectures for Learning Sentence Similarity". Thirtieth AAAI Conference on Artificial Intelligence. 30. doi: 10.1609/aaai.v30i1.10350 . S2CID   16657628.
  30. Kiros, Ryan; Zhu, Yukun; Salakhutdinov, Russ R; Zemel, Richard; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015), Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M. (eds.), "Skip-Thought Vectors" (PDF), Advances in Neural Information Processing Systems 28, Curran Associates, Inc., pp. 3294–3302, retrieved 2020-03-13
  31. Cheatham, Michelle; Hitzler, Pascal (2013). "String Similarity Metrics for Ontology Alignment". In Alani, Harith; Kagal, Lalana; Fokoue, Achille; Groth, Paul; Biemann, Chris; Parreira, Josiane Xavier; Aroyo, Lora; Noy, Natasha; Welty, Chris (eds.). Advanced Information Systems Engineering. The Semantic Web – ISWC 2013. Lecture Notes in Computer Science. Vol. 7908. Berlin, Heidelberg: Springer. pp. 294–309. doi: 10.1007/978-3-642-41338-4_19 . ISBN   978-3-642-41338-4. S2CID   18372966.
  32. 1 2 Sousa, G., Lima, R., & Trojahn, C. (2022). An eye on representation learning in ontology matching. OM@ISWC.
  33. Pekar, Viktor; Staab, Steffen (2002). Taxonomy learning. Proceedings of the 19th international conference on Computational linguistics –. Vol. 1. pp. 1–7. doi:10.3115/1072228.1072318.
  34. Cheng, J; Cline, M; Martin, J; Finkelstein, D; Awad, T; Kulp, D; Siani-Rose, MA (2004). "A knowledge-based clustering algorithm driven by Gene Ontology". Journal of Biopharmaceutical Statistics. 14 (3): 687–700. doi:10.1081/BIP-200025659. PMID   15468759. S2CID   25224811.
  35. Wu, H; Su, Z; Mao, F; Olman, V; Xu, Y (2005). "Prediction of functional modules based on comparative genome analysis and Gene Ontology application". Nucleic Acids Research. 33 (9): 2822–37. doi:10.1093/nar/gki573. PMC   1130488 . PMID   15901854.
  36. Del Pozo, Angela; Pazos, Florencio; Valencia, Alfonso (2008). "Defining functional distances over Gene Ontology". BMC Bioinformatics. 9: 50. doi: 10.1186/1471-2105-9-50 . PMC   2375122 . PMID   18221506.
  37. Philip Resnik (1995). Chris S. Mellish (ed.). "Using information content to evaluate semantic similarity in a taxonomy". Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI'95). 1: 448–453. arXiv: cmp-lg/9511007 . Bibcode:1995cmp.lg...11007R. CiteSeerX   10.1.1.41.6956 .
  38. Dekang Lin. 1998. An Information-Theoretic Definition of Similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML '98), Jude W. Shavlik (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 296–304
  39. Ana Gabriela Maguitman, Filippo Menczer, Heather Roinestad, Alessandro Vespignani: Algorithmic detection of semantic similarity. WWW 2005: 107–116
  40. J. J. Jiang and D. W. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In International Conference on Research on Computational Linguistics (ROCLING X), pages 9008+, September 1997
  41. M. T. Pilehvar, D. Jurgens and R. Navigli. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.. Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, August 4–9, 2013, pp. 1341–1351.
  42. Dong, Hai (2009). "A Hybrid Concept Similarity Measure Model for Ontology Environment". On the Move to Meaningful Internet Systems: OTM 2009 Workshops. Lecture Notes in Computer Science. Vol. 5872. pp. 848–857. Bibcode:2009LNCS.5872..848D. doi:10.1007/978-3-642-05290-3_103. ISBN   978-3-642-05289-7.
  43. Dong, Hai (2011). "A context-aware semantic similarity model for ontology environments". Concurrency and Computation: Practice and Experience. 23 (2): 505–524. doi:10.1002/cpe.1652. S2CID   412845.
  44. Landauer, T. K.; Dumais, S. T. (1997). "A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge" (PDF). Psychological Review. 104 (2): 211–240. CiteSeerX   10.1.1.184.4759 . doi:10.1037/0033-295x.104.2.211. S2CID   1144461.
  45. Landauer, T. K.; Foltz, P. W. & Laham, D. (1998). "Introduction to Latent Semantic Analysis" (PDF). Discourse Processes. 25 (2–3): 259–284. CiteSeerX   10.1.1.125.109 . doi:10.1080/01638539809545028. S2CID   16625196.
  46. "Google Similarity Distance".
  47. Carrillo, F.; Cecchi, G. A.; Sigman, M.; Slezak, D. F. (2015). "Fast Distributed Dynamics of Semantic Networks via Social Media" (PDF). Computational Intelligence and Neuroscience. 2015: 712835. doi: 10.1155/2015/712835 . PMC   4449913 . PMID   26074953.
  48. "Samer Hassan" (PDF).[ dead link ]
  49. Wilson Wong; Wei Liu; Mohammed Bennamoun (November 2006). Featureless similarities for terms clustering using tree-traversing ants. PCAR '06: Proceedings of the 2006 international symposium on Practical cognitive agents and robots. pp. 177–191. doi:10.1145/1232425.1232448.
  50. "6 Degrees of Wikipedia". The Chronicle of Higher Education. The Wired Campus. May 28, 2008. Archived from the original on May 30, 2008.
  51. V. D. Veksler; Ryan Z. Govostes (2008). "Defining the Dimensions of the Human Semantic Space" (PDF).
  52. J. Camacho-Collados; M. T. Pilehvar; R. Navigli (2015). NASARI: a Novel Approach to a Semantically-Aware Representation of Items (PDF). Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL 2015). Denver, US. pp. 567–577.
  53. J. Camacho-Collados; M. T. Pilehvar; R. Navigli (July 27–29, 2015). A Unified Multilingual Semantic Representation of Concepts (PDF). Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015). Beijing, China. pp. 741–751.
  54. Fähndrich J.; Weber S.; Ahrndt S. (2016). "Design and Use of a Semantic Similarity Measure for Interoperability Among Agents". In Klusch M.; Unland R.; Shehory O.; Pokahr A.; Ahrndt S. (eds.). Multiagent System Technologies. MATES 2016. Lecture Notes in Computer Science. Vol. 9872. Springer. Available at author version
  55. C. d'Amato; S. Staab; N. Fanizzi (2008). "On the influence of description logics ontologies on conceptual similarity". Knowledge Engineering: Practice and Patterns. pp. 48–63. doi:10.1007/978-3-540-87696-0_7.
  56. Bendeck, F. (2008). WSM-P Workflow Semantic Matching Platform, PhD dissertation, University of Trier, Germany. Verlag Dr. Hut. ASIN   3899638549.
  57. Rubenstein, Herbert, and John B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.
  58. For a list of datasets, and an overview of the state of the art see https://www.aclweb.org/.
  59. Rubenstein, Herbert; Goodenough, John B. (1965-10-01). "Contextual correlates of synonymy". Communications of the ACM. 8 (10): 627–633. doi: 10.1145/365628.365657 . S2CID   18309234.
  60. Miller, George A.; Charles, Walter G. (1991-01-01). "Contextual correlates of semantic similarity". Language and Cognitive Processes. 6 (1): 1–28. doi:10.1080/01690969108406936. ISSN   0169-0965.
  61. "Placing search in context". ACM Transactions on Information Systems. 20: 116–131. 2002-01-01. CiteSeerX   10.1.1.29.1912 . doi:10.1145/503104.503110. S2CID   12956853.

Sources

Survey articles