Semantic search

Last updated

Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. [1] Semantic search is an approach to information retrieval that seeks to improve search accuracy by understanding the searcher's intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Modern semantic search systems often use vector embeddings to represent words, phrases, or documents as numerical vectors, allowing the retrieval engine to measure similarity based on meaning rather than exact keyword matches. [2] [3]

Contents

Some authors regard semantic search as a set of techniques for retrieving knowledge from richly structured data sources like ontologies and XML as found on the Semantic Web. [4] Such technologies enable the formal articulation of domain knowledge at a high level of expressiveness and could enable the user to specify their intent in more detail at query time. [5] The articulation enhances content relevance and depth by including specific places, people, or concepts relevant to the query.

Models and tools

Tools like Google's Knowledge Graph provide structured relationships between entities to enrich query interpretation. [6]

Models like BERT and Sentence-BERT convert words or sentences into dense vectors for similarity comparison. [7]

Semantic ontologies like Web Ontology Language, Resource Description Framework, and Schema.org organize concepts and relationships, allowing systems to infer related terms and deeper meanings. [8]

Hybrid search models combine lexical retrieval (e.g., BM25) with semantic ranking using pretrained transformer models for optimal performance. [9]

See also

References

  1. Bast, Hannah; Buchhold, Björn; Haussmann, Elmar (2016). "Semantic search on text and knowledge bases" . Foundations and Trends in Information Retrieval. 10 (2–3): 119–271. doi:10.1561/1500000032 . Retrieved 1 December 2018.
  2. Klampanos, Iraklis A. (2009-06-02). "Manning Christopher, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval". Information Retrieval. 12 (5): 609–612. doi:10.1007/s10791-009-9096-x. ISSN   1386-4564.
  3. Kim, Bosung; Hong, Taesuk; Ko, Youngjoong; Seo, Jungyun (2020). "Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models". Proceedings of the 28th International Conference on Computational Linguistics. Stroudsburg, PA, USA: International Committee on Computational Linguistics. doi:10.18653/v1/2020.coling-main.153.
  4. Dong, Hai (2008). A survey in semantic search technologies. IEEE. pp. 403–408. Retrieved 1 May 2009.
  5. Ruotsalo, T. (May 2012). "Domain Specific Data Retrieval on the Semantic Web". The Semantic Web: Research and Applications. Eswc2012. Lecture Notes in Computer Science. Vol. 7295. pp. 422–436. doi: 10.1007/978-3-642-30284-8_35 . ISBN   978-3-642-30283-1.
  6. Singhal, A. (2012). Introducing the Knowledge Graph: things, not strings. Google Blog. https://blog.google/products/search/introducing-knowledge-graph-things-not/
  7. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP 2019. https://arxiv.org/abs/1908.10084
  8. Bodenreider, O. (2004). The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(suppl_1), D267–D270.
  9. Lin, J., et al. (2021). Pretrained Transformers for Text Ranking: BERT and Beyond. https://arxiv.org/abs/2010.06467