Geographic information retrieval

Last updated

Geographic information retrieval (GIR) or geographical information retrieval systems are search tools for searching the Web, enterprise documents, and mobile local search that combine traditional text-based queries with location querying, such as a map or placenames. Like traditional information retrieval systems, GIR systems index text and information from structured and unstructured documents, and also augment those indices with geographic information. The development and engineering of GIR systems aims to build systems that can reliably answer queries that include a geographic dimension, such as "What wars were fought in Greece?" or "restaurants in Beirut". [1] Semantic similarity and word-sense disambiguation are important components of GIR. [2] To identify place names, GIR systems often rely on natural language processing [3] or other metadata to associate text documents with locations. Such georeferencing, geotagging, and geoparsing tools often need databases of location names, known as gazetteers. [4] [5] [6] [7]

Contents

GIR architecture

GIR involves extracting and resolving the meaning of locations in unstructured text. This is known as geoparsing. [5] After identifying mentions of places and locations in text, a GIR system indexes this information for search and retrieval. GIR systems can commonly be broken down into the following stages: geoparsing, text and geographic indexing, data storage, geographic relevance ranking with respect to a geographic query and browsing results commonly with a map interface.

Some GIR systems separate text indexing from geographic indexing, which enables the use of generic database joins, [8] or multi-stage filtering, [9] and others combine them for efficiency. [10]

GIR must manage several forms of uncertainty, including semantic ambiguity of mentions of places in natural language text and position precision. [11]

GIR systems

Study & Evaluation

The study of GIR systems has a rich history dating back to the 1970s and possibly earlier. See Ray Larson’s book Geographic information retrieval and spatial browsing [20] for references to much of the pre-Web literature on GIR.

In 2005 the Cross-Language Evaluation Forum added a geographic track, GeoCLEF. GeoCLEF was the first TREC-style evaluation forum for GIR systems and provided participants a chance to compare systems. [21]

Applications

GIR has many applications in geoweb, neogeography, and mobile local search and has been a focus of many conferences, including the ESRI Users Conferences and O'Reilly’s Where 2.0 conferences. [22] [23]

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

A backlink is a link from some other website to that web resource. A web resource may be a website, web page, or web directory.

<span class="mw-page-title-main">Content-based image retrieval</span> Method of image retrieval

Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval is opposed to traditional concept-based approaches.

<span class="mw-page-title-main">Automatic image annotation</span>

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

<span class="mw-page-title-main">Erik Rauch</span> American ecologist and entrepreneur

Erik Rauch was an American biophysicist and theoretical ecologist who worked at NECSI, MIT, Santa Fe Institute, Yale University, Princeton University, and other institutions. Rauch's most notable paper was published in Nature and concerned the mathematical modeling of the conservation of biodiversity.

The concept of a Geospatial Web may have first been introduced by Dr. Charles Herring in his US DoD paper, An Architecture of Cyberspace: Spatialization of the Internet, 1994, U.S. Army Construction Engineering Research Laboratory.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. Semantic search seeks to improve search accuracy by understanding the searcher's intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Content that ranks well in semantic search is well-written in a natural voice, focuses on the user's intent, and considers related topics that the user may look for in the future.

A web query or web search query is a query that a user enters into a web search engine to satisfy their information needs. Web search queries are distinctive in that they are often plain text and boolean search directives are rarely used. They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.

GenieKnows Inc. was a privately owned vertical search engine company based in Halifax, Nova Scotia. It was started by Rami Hamodah who also started SwiftlyLabs.com and Salesboom.com. Like many internet search engines, its revenue model centers on an online advertising platform and B2B transactions. It focuses on a set of search markets, or verticals, including health search, video games search, and local business directory search.

MetaCarta is a software company that developed one of the first search engines to use a map to find unstructured documents. The product uses natural language processing to georeference text for customers in defense, intelligence, homeland security, law enforcement, oil and gas companies, and publishing. The company was founded in 1999 and was acquired by Nokia in 2010. Nokia subsequently spun out the enterprise products division and the MetaCarta brand to Qbase, now renamed to Finch.

Amit Sheth is a computer scientist at University of South Carolina in Columbia, South Carolina. He is the founding Director of the Artificial Intelligence Institute, and a Professor of Computer Science and Engineering. From 2007 to June 2019, he was the Lexis Nexis Ohio Eminent Scholar, director of the Ohio Center of Excellence in Knowledge-enabled Computing, and a Professor of Computer Science at Wright State University. Sheth's work has been cited by over 48,800 publications. He has an h-index of 106, which puts him among the top 100 computer scientists with the highest h-index. Prior to founding the Kno.e.sis Center, he served as the director of the Large Scale Distributed Information Systems Lab at the University of Georgia in Athens, Georgia.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

<span class="mw-page-title-main">LGTE</span>

Lucene Geographic and Temporal (LGTE) is an information retrieval tool developed at Technical University of Lisbon which can be used as a search engine or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

GeoSPARQL is a standard for representation and querying of geospatial linked data for the Semantic Web from the Open Geospatial Consortium (OGC). The definition of a small ontology based on well-understood OGC standards is intended to provide a standardized exchange basis for geospatial RDF data which can support both qualitative and quantitative spatial reasoning and querying with the SPARQL database query language.

In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place.

Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.

References

  1. Purves, Ross; Jones, Christopher (2011-07-01). "Geographic Information Retrieval". SIGSPATIAL Special. 3 (2): 2–4. CiteSeerX   10.1.1.130.3521 . doi:10.1145/2047296.2047297. ISSN   1946-7729. S2CID   1940653.
  2. Kuhn, Werner; Raubal, Martin; Janowicz, Krzysztof (2011-05-25). "The semantics of similarity in geographic information retrieval | Janowicz | Journal of Spatial Information Science". Journal of Spatial Information Science. 2011 (2): 29–57. doi:10.5311/JOSIS.2011.2.26 (inactive 31 January 2024). Retrieved 2015-09-12.{{cite journal}}: CS1 maint: DOI inactive as of January 2024 (link)
  3. "MetaCarta: Putting Natural Language on the Map". GIS Monitor. 2003-08-21. Archived from the original on 2003-10-03.
  4. Smith, Susan. "The Space Between Maps, Search and Content".
  5. 1 2 Dinan, Elizabeth (2003-11-10). "Ware-Withal: MIT-rooted MetaCarta stakes its claim with automatic geoparsing software".
  6. "MetaCarta Unveils First Geo-referencing Solution to Support Arabic and Spanish Languages". 2007-06-20.
  7. Frank, John; Warren, Bob. "Locating All Content" (PDF).
  8. "Chapter 15. Query performance tuning". PostGIS In Action (Second ed.). Manning Publications.
  9. "Apache Solr - Lucene Reference Guide - Spatial Search" . Retrieved 2021-01-03.
  10. "CartaTrees Map Search Text Index". Archived from the original on 2003-04-02.
  11. Bordognaa, Gloria; Ghisalbertib, Giorgio; Psailac, Giuseppe (2012-06-01). "Geographic information retrieval: Modeling uncertainty of user's context". Fuzzy Sets and Systems. 196: 105–124. doi:10.1016/j.fss.2011.04.005. Geographic information retrieval (GIR) is nowadays a hot research issue that involves the management of uncertainty and imprecision and the modeling of user preferences and context. Indexing the geographic content of documents implies dealing with the ambiguity, synonymy and homonymy of geographic names in texts. On the other side, the evaluation of queries specifying both content based conditions and spatial conditions on documents' contents requires representing the vagueness and context dependency of spatial conditions and the personal user's preferences.
  12. Jennifer 8. Lee (2002-01-14). "Federal Agents Look to Adapt Private Technology". New York Times .{{cite news}}: CS1 maint: numeric names: authors list (link)
  13. "The revenge of geography". The Economist. 2003-03-13. Archived from the original on 2020-12-31.
  14. Levy, Steven (2004-06-07). "Making the Ultimate Map - When digital geography teams up with wireless technology and the Web, the world takes on some new dimensions". Newsweek. Archived from the original on 2004-06-03.
  15. USgranted 7117199,Frank, John R.; Rauch, Erik M. & Donoghue, Karen,"Spatially coding and displaying information",issued 2006-10-03
  16. Erik Rauch; Michael Bukatin; Kenneth Baker from MetaCarta. A confidence-based framework for disambiguating geographic terms (Speech). Retrieved 2021-01-03.
  17. András Kornai, MetaCarta (2005). MetaCarta at GeoCLEF 2005. GeoCLEF. In Memoriam Erik Rauch
  18. Adams, Benjamin; McKenzie, Grant; Gahegan, Mark (2015-01-01). "Frankenplace". Proceedings of the 24th International Conference on World Wide Web. WWW '15. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee. pp. 12–22. doi:10.1145/2736277.2741137. ISBN   978-1-4503-3469-3. S2CID   1639723.
  19. Amitay, Einat; Har'El, Nadav; Sivan, Ron; Soffer, Aya (July 2004). Web-a-where: geotagging web content. SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. pp. 273–280. doi:10.1145/1008992.1009040. We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole.
  20. Larson, Ray R. (1996). Geographic information retrieval and spatial browsing. Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. hdl:2142/416. ISBN   0878450971. ISSN   0069-4789.
  21. Gey, Fredric; Larson, Ray; Sanderson, Mark; Joho, Hideo; Clough, Paul; Petras, Vivien (2005-09-21). "GeoCLEF: The CLEF 2005 Cross-Language Geographic Information Retrieval Track Overview". In Peters, Carol; Gey, Fredric C.; Gonzalo, Julio; Müller, Henning; Jones, Gareth J. F.; Kluck, Michael; Magnini, Bernardo; Rijke, Maarten de (eds.). Accessing Multilingual Information Repositories. Lecture Notes in Computer Science. Vol. 4022. Springer Berlin Heidelberg. pp. 908–919. CiteSeerX   10.1.1.156.6368 . doi:10.1007/11878773_101. ISBN   978-3-540-45697-1.
  22. Local Search Faces Off - Craig Donato, Perry Evans, John Frank, Jeremy Kreitler, Shailesh Rao (Speech). Where 2.0. 2005-06-29. Archived from the original on 2013-07-29. Retrieved 2021-01-03.
  23. Himmelstein, Marty (2005). "Local Search: The Internet Is the Yellow Pages". Computer. Published by the IEEE Computer Society. 38 (2): 26–34. doi:10.1109/MC.2005.65. Every day, millions of people use their local newspapers, classified ad circulars, Yellow Pages directories, regional magazines, and the Internet to find information pertaining to the activities of daily life…

See also