GeoNames

Last updated
Worldwide density of GeoNames entries in 2006 Geonames4.png
Worldwide density of GeoNames entries in 2006

GeoNames (or GeoNames.org) is a user-editable geographical database available and accessible through various web services, under a Creative Commons attribution license. The project was founded in late 2005. [1]

Contents

The GeoNames dataset differs from, but includes data from, [2] the US Government's similarly named GEOnet Names Server.

Database and web services

The GeoNames database contains over 25,000,000 geographical names corresponding to over 11,800,000 unique features. [3] All features are categorized into one of nine feature classes and further subcategorized into one of 645 feature codes. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. All coordinates use the World Geodetic System 1984 (WGS84).

Those data are accessible free of charge through a number of Web services and a daily database export. [4]

Wiki interface

The core of GeoNames database is provided by official public sources, the quality of which may vary. Through a wiki interface, users are invited to manually edit and improve the database by adding or correcting names, move existing features, add new features, etc. [5]

Semantic Web integration

Each GeoNames feature is represented as a web resource identified by a stable URI. This URI provides access, through content negotiation, either to the HTML wiki page, or to a RDF description of the feature, using elements of the GeoNames ontology. [6] This ontology describes the GeoNames features properties using the Web Ontology Language, the feature classes and codes being described in the SKOS language. Through Wikipedia articles URL linked in the RDF descriptions, GeoNames data are linked to DBpedia data and other RDF Linked Data.

Accuracy and improvements

As in other crowdsourcing schemes, GeoNames edit interface allows everyone to sign in and edit the database, hence false information can be entered and such information can remain undetected especially for places that are not accessed frequently. Ahlers (2013) studies these inaccuracies and classifies them into loss in the granularity of coordinates (e.g., due to truncation and low-resolution geocoding in some cases), wrong feature codes, near-identical places, and the placement of places outside their designated countries. Manually correcting these inaccuracies is both tedious and error-prone (due to the database size) and may require experts.

The literature provides very few works on automatically resolving them. Singh & Rafiei (2018) study the problem of automatically detecting the scope of locations in a geographical database and its applications in identifying inconsistencies and improving the quality of the database. Computing the boundary information can help detect inconsistencies such as near-identical places and the placement of locations such as cities under wrong parents such as provinces or countries. Singh and Rafiei show that the boundary information derived in their work can move more than 20% of locations in GeoNames to better positions in the spatial hierarchy and the accuracy of those moves is over 90%.

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

<span class="mw-page-title-main">Geography Markup Language</span> XML grammar for geographical features

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. Key to GML's utility is its ability to integrate all forms of geographic information, including not only conventional "vector" or discrete objects, but coverages and sensor data.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

A geocode is a code that represents a geographic entity. It is a unique identifier of the entity, to distinguish it from others in a finite set of geographic entities. In general the geocode is a human-readable and short identifier.

<span class="mw-page-title-main">Geotagging</span> Act of associating geographic coordinates to digital media

Geotagging, or GeoTagging, is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RSS feeds and is a form of geospatial metadata. This data usually consists of latitude and longitude coordinates, though they can also include altitude, bearing, distance, accuracy data, and place names, and perhaps a time stamp.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

A web resource is any identifiable resource present on or connected to the World Wide Web. Resources are identified using Uniform Resource Identifiers (URIs). In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).

A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

<span class="mw-page-title-main">Blank node</span>

In RDF, a blank node is a node in an RDF graph representing a resource for which a URI or literal is not given. The resource represented by a blank node is also called an anonymous resource. According to the RDF standard a blank node can only be used as subject or object of an RDF triple.

The Great Britain Historical GIS is a spatially enabled database that documents and visualises the changing human geography of the British Isles, although is primarily focussed on the subdivisions of the United Kingdom mainly over the 200 years since the first census in 1801. The project is currently based at the University of Portsmouth, and is the provider of the website A Vision of Britain through Time.

<span class="mw-page-title-main">Linked data</span> Structured data and method for its publication

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.

<span class="mw-page-title-main">DBpedia</span> Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

The FAO geopolitical ontology is an ontology developed by the Food and Agriculture Organization of the United Nations (FAO) to describe, manage and exchange data related to geopolitical entities such as countries, territories, regions and other similar areas.

GeoSPARQL is a standard for representation and querying of geospatial linked data for the Semantic Web from the Open Geospatial Consortium (OGC). The definition of a small ontology based on well-understood OGC standards is intended to provide a standardized exchange basis for geospatial RDF data which can support both qualitative and quantitative spatial reasoning and querying with the SPARQL database query language.

In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place.

Identifiers.org is a project providing stable and perennial identifiers for data records used in the Life Sciences. The identifiers are provided in the form of Uniform Resource Identifiers (URIs). Identifiers.org is also a resolving system, that relies on collections listed in the MIRIAM Registry to provide direct access to different instances of the identified records.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

References

  1. "Marc Wick: Geek of the Week". Simple Talk. 2009-05-06. Retrieved 2020-07-01.
  2. "Datasources used by GeoNames in the GeoNames Gazetteer" . Retrieved 2020-08-20.
  3. "GeoNames web site". Geonames.org. Retrieved 2018-09-08.
  4. "GeoNames API". ProgrammableWeb. Archived from the original on 2018-11-26. Retrieved 2018-09-08.
  5. "How can I help ?". GeoNames Forum. GeoNames. Retrieved 11 August 2018.
  6. "GeoNames ontology". Geonames.org. Retrieved 2013-12-15.