GeoNames

Last updated
Worldwide density of GeoNames entries in 2006 Geonames4.png
Worldwide density of GeoNames entries in 2006

GeoNames (or GeoNames.org) is a user editable geographical database available and accessible through various web services, under a Creative Commons attribution license. The project was founded in late 2005. [1]

Contents

The GeoNames dataset differs from, but includes data, [2] from the US Government's similarly named GEOnet Names Server which establishes the official repository of standard spellings of all non-US geographic names.

Database and web services

The GeoNames database contains over 25,000,000 geographical names corresponding to over 11,800,000 unique features. [3] All features are categorized into one of nine feature classes and further subcategorized into one of 645 feature codes. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. All coordinates use the World Geodetic System 1984 (WGS84).

Those data are accessible free of charge through a number of Web services and a daily database export. [4]

Wiki interface

The core of GeoNames database is provided by official public sources, the quality of which may vary. Through a wiki interface, users are invited to manually edit and improve the database by adding or correcting names, move existing features, add new features, etc. [5]

Semantic Web integration

Each GeoNames feature is represented as a web resource identified by a stable URI. This URI provides access, through content negotiation, either to the HTML wiki page, or to a RDF description of the feature, using elements of the GeoNames ontology. [6] This ontology describes the GeoNames features properties using the Web Ontology Language, the feature classes and codes being described in the SKOS language. Through Wikipedia articles URL linked in the RDF descriptions, GeoNames data are linked to DBpedia data and other RDF Linked Data.

Accuracy and improvements

As in other crowdsourcing schemes, GeoNames edit interface allows everyone to sign in and edit the database, hence false information can be entered and such information can remain undetected especially for places that are not accessed frequently. Ahlers (2013) studies these inaccuracies and classifies them into loss in the granularity of coordinates (e.g., due to truncation and low-resolution geocoding in some cases), wrong feature codes, near-identical places, and the placement of places outside their designated countries. Manually correcting these inaccuracies is both tedious and error prone (due to the database size) and may require experts.

The literature provides very few works on automatically resolving them. Singh & Rafiei (2018) study the problem of automatically detecting the scope of locations in a geographical database and its applications in identifying inconsistencies and improving the quality of the database. Computing the boundary information can help detect inconsistencies such as near-identical places and the placement of locations such as cities under wrong parents such as provinces or countries. Singh and Rafiei show that the boundary information derived in their work can move more than 20% of locations in GeoNames to better positions in the spatial hierarchy and the accuracy of those moves is over 90%.

Related Research Articles

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL) are used. These technologies are used to formally represent metadata. For example, ontology can describe concepts, relationships between entities, and categories of things. These embedded semantics offer significant advantages such as reasoning over data and operating with heterogeneous data sources.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

Geography Markup Language used to describe geographical features

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. Key to GML's utility is its ability to integrate all forms of geographic information, including not only conventional "vector" or discrete objects, but coverages and sensor data.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects. Ontologies resemble class hierarchies in object-oriented programming but there are several critical differences. Class hierarchies are meant to represent structures used in source code that evolve fairly slowly whereas ontologies are meant to represent information on the Internet and are expected to be evolving almost constantly. Similarly, ontologies are typically far more flexible as they are meant to represent information on the Internet coming from all sorts of heterogeneous data sources. Class hierarchies on the other hand are meant to be fairly static and rely on far less diverse and more structured sources of data such as corporate databases.

Geotagging Act of associating geographic coordinates to digital media

Geotagging, or GeoTagging, is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RSS feeds and is a form of geospatial metadata. This data usually consists of latitude and longitude coordinates, though they can also include altitude, bearing, distance, accuracy data, and place names, and perhaps a time stamp.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

FOAF (ontology) Semantic Web ontology to describe relations between people

FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe themselves. FOAF allows groups of people to describe social networks without the need for a centralised database.


A web resource, or simply resource, is any identifiable thing, whether digital, physical, or abstract. Resources are identified using Uniform Resource Identifiers. In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework.

Simple Features is a set of standards that specify a common storage and access model of geographic feature made of mostly two-dimensional geometries used by geographic information systems. Is is formalized by both the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO).

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

AGROVOC is a multilingual controlled vocabulary covering all areas of interest to the Food and Agriculture Organization of the United Nations (FAO), including food, nutrition, agriculture, fisheries, forestry and the environment. The vocabulary consists of over 35,000 concepts with up to 671,000 terms in different languages. It is a collaborative effort, edited by a community of experts and coordinated by FAO.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

An RDF query language is a computer language, specifically a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

The Great Britain Historical GIS, is a spatially enabled database that documents and visualises the changing human geography of the British Isles, although is primarily focussed on the subdivisions of the United Kingdom mainly over the 200 years since the first census in 1801. The project is currently based at the University of Portsmouth, and is the provider of the website A Vision of Britain through Time.

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.

Hyperdata are data objects linked to other data objects in other places, as hypertext indicates text linked to other text in other places. Hyperdata enables formation of a web of data, evolving from the "data on the Web" that is not inter-related.

The FAO geopolitical ontology is an ontology developed by the Food and Agriculture Organization of the United Nations (FAO) to describe, manage and exchange data related to geopolitical entities such as countries, territories, regions and other similar areas.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

OpenRefine data wrangling software

OpenRefine, formerly called Google Refine and before that Freebase Gridworks, is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling. It is similar to spreadsheet applications ; however, it behaves more like a database.

GeoSPARQL is a standard for representation and querying of geospatial linked data for the Semantic Web from the Open Geospatial Consortium (OGC). The definition of a small ontology based on well-understood OGC standards is intended to provide a standardized exchange basis for geospatial RDF data which can support both qualitative and quantitative spatial reasoning and querying with the SPARQL database query language.

References

  1. "Marc Wick: Geek of the Week". Simple Talk. 2009-05-06. Retrieved 2020-07-01.
  2. "Datasources used by GeoNames in the GeoNames Gazetteer" . Retrieved 2020-08-20.
  3. "GeoNames web site". Geonames.org. Retrieved 2018-09-08.
  4. "GeoNames API". ProgrammableWeb. Retrieved 2018-09-08.
  5. "How can I help ?". GeoNames Forum. GeoNames. Retrieved 11 August 2018.
  6. "GeoNames ontology". Geonames.org. Retrieved 2013-12-15.