A geocode is a code that represents a geographic entity (location or object). It is a unique identifier of the entity, to distinguish it from others in a finite set of geographic entities. In general the geocode is a human-readable and short identifier.
Typical geocodes and entities represented by it:
AF
for Afghanistan or BR
for Brazil), and its subdivision conventions, such as AF subdivision codes (e.g. AF-GHO
for Ghor province) or BR subdivision codes (e.g. BR-AM
for Amazonas state).6vjyngd
at the Brazilian's center) or an OLC code (e.g. ~0.004 km2 cell 58PJ642P+4
at the same point).70040
represents a Brazilian's central area for postal distribution).The ISO 19112:2019 standard (section 3.1.2) adopted the term "geographic identifier" instead geocode, to encompass long labels: spatial reference in the form of a label or code that identifies a location. For example, for ISO, the country name “People's Republic of China” is a label.
Geocodes are mainly used (in general as an atomic data type) for labelling, data integrity, geotagging and spatial indexing.
In theoretical computer science a geocode system is a locality-preserving hashing function.
There are some common aspects of many geocodes (or geocode systems) that can be used as classification criteria:
The set of all geocodes used as unique identifiers of the cells of a full-coverage of the geographic surface (or any well-defined area like a country or the oceans), is a geocode system (also named geocode scheme). The syntax and semantic of the geocodes are also components of the system definition:
/[A-Z]{2,2}/
).Many syntax and semantic characteristics are also summarized by classification.
Any geocode can be translated from a formal (and expanded) expression of the geographical entity, or vice versa, the geocode translated to entity. The first is named encode process, the second decode. The actors and process involved, as defined by OGC, [3] are:
In spatial indexing applications the geocode can also be translated between human-readable (e.g. hexadecimal) and internal (e.g. binary 64-bit unsigned integer) representations.
Geocodes like country codes, city codes, etc. comes from a table of official names, and the corresponding official codes and geometries (typically polygon of administrative areas). "Official" in the context of control and consensus, typically a table controlled by a standards organization or governmental authority. So, the most general case is a table of standard names and the corresponding standard codes (and its official geometries).
Strictly speaking, the "name" related to a geocode is a toponym, and the table (e.g. toponym to standard code) is the resource for toponym resolution: is the relationship process, usually effectuated by a software agent, between a toponym and "an unambiguous spatial footprint of the same place". [4] Any standardized system of toponym resolution, having codes or encoded abbreviations, can be used as geocode system. The "resolver" agent in this context is also a geocoder.
Sometimes names are translated into numeric codes, to be compact or machine-readable. Since numbers, in this case, are name identifiers, we can consider "numeric names" — so this set of codes will be a kind of "system of standard names".
In the geocode context, space partitioning is the process of dividing a geographical space into two or more disjoint subsets, resulting in a mosaic of subdivisions. Each subdivision can be partitioned again, recursively, resulting in an hierarchical mosaic.
When subdivisions's names are expressed as codes, and code syntax can be decomposed into a parent-child relations, through a well-defined syntactic scheme, the geocode set configures a hierarchical system. A geocode fragment (associated to a subdivision name) can be an abbreviation, numeric or alphanumeric code.
A popular example is the ISO 3166-2 geocode system, representing country names and the names of respective administrative subdivisions separated by hyphen. For example DE
is Germany, a simple geocode, and its subdivisions (illustrated) are DE-BW
for Baden-Württemberg, DE-BY
for Bayern, ..., DE-NW
for Nordrhein-Westfalen, etc. The scope is only the first level of the hierarchy. For more levels there are other conventions, like HASC code. [5] [6] The HASC codes are alphabetic and its fragments have constant length (2 letters). Examples:
DE.NW
- North Rhine-Westphalia. A two-level hierarchical geocode.DE.NW.CE
- Kreis Coesfeld. A 3-level hierarchical geocode.Two geocodes of a hierarchical geocode system with same prefix represents different parts of the same location. For instance DE.NW.CE
and DE.NW.BN
represents geographically interior parts of DE.NW
, the common prefix.
Changing the subdivision criteria we can obtain other hierarchical systems. For example, for hydrological criteria there is a geocode system, the US's hydrologic unit code (HUC), that is a numeric representation of basin names in a hierarchical syntax schema (first level illustred). For example, the HUC 17
is the identifier of "Pacific Northwest Columbia basin"; HUC 1706
of "Lower Snake basin", a spatial subset of HUC 17
and a superset of 17060102
("Imnaha River").
Inspired in the classic alphanumeric grids, a discrete global grid (DGG) is a regular mosaic which covers the entire Earth's surface (the globe). The regularity of the mosaic is defined by the use of cells of same shape in all the grid, or "near the same shape and near same area" in a region of interest, like a country.
All cells of the grid have an identifier (DGG's cell ID), and the center of the cell can be used as reference for cell ID conversion into geographical point. When a compact human-readable expression of the cell ID is standardized, it becomes a geocode.
Geocodes of different geocode systems can represent the same position in the globe, with same shape and precision, but differ in string-length, digit-alphabet, separators, etc. Non-global grids also differ by scope, and in general are geometrically optimized (avoid overlaps, gaps or loss of uniformity) for the local use.
Each cell of a grid can be transformed into a new local grid, in a recurring process. In the illustrated example, the cell TQ 2980
is a sub-cell of TQ 29
, that is a sub-cell of TQ
. A system of geographic regular grid references is the base of a hierarchical geocode system.
Two geocodes of a hierarchical geocode grid system can use the prefix rule: geocodes with same prefix represents different parts of the same broader location. Using again the side illustration: TQ 28
and TQ 61
represents geographically interior parts of TQ
, the common prefix.
Hierarchical geocode can be split into keys. The Geohash 6vd23gq
is the key q
of the cell 6vd23g
, that is a cell of 6vd23
(key g
), and so on, per-digit keys. The OLC 58PJ642P
is the key 48
of the cell 58PJ64
, that is a cell of 58Q8
(key 48
), and so on, two-digit keys. In the case of OLC there is a second key schema, after the +
separator: 58PJ642P+48
is the key 2
of the cell 58PJ642P+4
. It uses two key schemas. Some geocodes systems (e.g. S2 geometry) also use initial prefix with non-hierarchical key schema.
In general, as technical and non-compact optional representation, geocode systems (based on hierarchical grids) also offer the possibility of expressing their cell identifier with a fine-grained schema, by longer path of keys. For example, the Geohash 6vd2
, which is a base32 code, can be expanded to base4 0312312002
, which is also a schema with per-digit keys. Geometrically, each Geohash cell is a rectangle that subdivides space recurrently into 32 new rectangles, so, base4 subdividing into 4, is the encoding-expansion limit. [7]
The uniformity of shape and area of cells in a grid can be important for other uses, like spatial statistics. There are standard ways to build a grid covering the entire globe with cells of equal area, regular shape and other properties: Discrete Global Grid System (DGGS) is a series of discrete global grids satisfying all standardized requirements defined in 2017 by the OGC. [8] When human-readable codes obtained from cell identifiers of a DGGS are also standardized, it can be classified as DGGS based geocode system.
There are also mixed systems, using a syntactical partition, where for example the first part (code prefix) is a name-code and the other part (code suffix) is a grid-code. Example:
FR-4J.Q2
, where FR
is the name-code [9] and 4J.Q2
is the grid-code. Semantically France is the context, to obtain its local grid.For mnemonic coherent semantics, in fine-grained geocode applications, the mixed solutions are most suitable.
Any geocode system based on regular grid, in general is also a shorter way to express a latitudinal/longitudinal coordinate. But a geocode with more than 6 characters is difficult for remember. On the other hand, a geocode based on standard name (or abbreviation or the complete name) is easier to remember.
This suggests that a "mixed code" can solve the problem, reducing the number of characters when a name can be used as the "context" for the grid-based geocode. For example, in a book where the author says "all geocodes here are contextualized by the chapter's city". In the chapter about Paris, where all places have a Geohash with prefix u09
, that code can be removed —. For instance Geohash u09tut
can be reduced to tut
, or, by an explicit code for context "FR-Paris tut
". This is only possible when the context resolution (e.g. translation from "FR-Paris" to the prefix u09
) is well-known.
In fact a methodology exists for hierarchical grid-based geocodes with non-variable size, where the code prefix describes a broader area, which can be associated with a name. So, it is possible to shorten by replacing the prefix to the associated context. The most usual context is an official name. Examples:
Standards mixed | Grid-based | Mixed reference |
---|---|---|
Grid OLC and country's official names | 796RWF8Q+WF | Cape Verde, Praia, WF8Q+WF |
Grid Geohash and ISO 3166-2 hierarchical abbreviations | e6xkbgxed | CV-PR , bgxed |
The examples of the Mixed reference column are significantly easier than remembering DGG code column. The methods vary, for example OLC can be shortened by elimination of its first four digits and attaching a suitable sufficiently close locality. [10]
When the mixed reference is also short (9 characters in the second example) and there are a syntax convention to express it (suppose CP‑PR~bgxed
), this convention is generating a new name-and-grid geocode system. This is not the case of the first example because, strictly speaking, "Cape Verde, Praia" is not a code.
To be both, a name-and-grid system and also a mixed reference convention, the system must be reversible. Pure name-and-grid systems, like Mapcode, with no way to transform it into a global code, is not a mixed reference, because there is no algorithm to transform the mixed geocode into a grid-based geocode.
This section needs additional citations for verification .(May 2024) |
Geocodes in use and with general scope:
Geocode | Inception | Coverage | Formation | Ownership | Rep. entity | Context and description | |
---|---|---|---|---|---|---|---|
ISO 3166 (alpha-2 and alpha-3) | 1974 | globe/only nations | Name abbreviation | free | polygon | Administrative divisions. Country codes and codes of their subdivisions. Two letters (alpha-2) or three letters (alpha-3). | |
ISO 3166-1 numeric | 1970 | globe/only nations | Serial number | free | polygon | Administrative divisions. Country codes expressed by serial numbers. | |
UN M.49 | ~1970 | globe/only nations | Serial number | free | polygon | Administrative divisions. region codes, area code, continents, countries (re-using ISO 3166-1 numeric codes). | |
Geohash | 2008 | globe | encode(latLon,precision) | free | grid cell | Hash notation for locations. See also Geohash and its variants, like OpenStreetMap's short-link [11] | |
Open Location Code (OLC) | 2014 | globe | encode(latLon,precision) | free | grid cell | See also PlusCodes. [12] | |
What3words | 2013 | globe | encode(latLon) | patented | grid cell | patent-restrictions system, converts 3x3 meter squares into 3 words. [13] It is in use at Mongol Post. [14] | |
Mapcode | 2001 | globe | encode(latLon) | patented | point | A mapcode is a code consisting of two groups of letters and digits, separated by a dot. | |
Geopeg | 2020 | globe/only nations | encode(latLon) | open standard | grid cell | Geopeg is word-based GPS address, using simple words like London.RedFish. It is a combination of a city and two simple words. It is an open standard geocoding of Earth, currently in development. Geopeg | |
Dymaxion Geographic Encoding | 2024 | globe | encode(x,y,z) | open standard | triangle | Based on the Dymaxion map projection, uses a 64-bit unsigned integer to represent locations. Provides global coverage with high precision, achieving nearly 100% utilization of the available bit space. Uses an icosahedron-based triangular grid. |
Geocodes can be used in place of official street names and/or house numbers, particularly when a given location has not been assigned an address by authorities. They can also be used as an "alternative address" if it can be converted to a Geo URI. Even if the geocode is not the official designation for a location, it can be used as a "local standard" to allow homes to receive deliveries, access emergency services, register to vote, etc.
Geocode | Inception | Coverage | Formation | Ownership | Rep. entity | Context and description |
---|---|---|---|---|---|---|
Local OLC (Cape Verde) | 2016 | globe | encode(latLon,precision) | free | grid cell | OLC is used to provide postal services. [15] |
Eircode (Ireland) | 2014 [16] | Ireland | encode(latLon,precision) | copyrighted [17] | grid cell | It is used officially as alternative address and as postal code. Limited database and algorithm access. It is a kind of fine-grained postal code. |
Geocodes in use, as postal codes. A geocode recognized by Universal Postal Union and adopted as "official postal code" by a country, is also a valid postal code. Not all postal codes are geographic, and for some postal code systems, there are codes that are not geocodes (e.g. in UK system). Samples, not a complete list:
Geocode | Inception | Coverage | Formation | Ownership | Rep. entity | Context and description |
---|---|---|---|---|---|---|
CEP (Brazil) | 1970? | cities or streets | Hierarchical serial number | proprietary | (variable) | ... The CEP5 is geographic and CEP8 can be a city (polygon), a street (also street side or a fragment of street side) or a point (specific address). |
Postal Index Number (India) | ? | postal regions | Hierarchical serial number? | proprietary? | (undefined?) | ... |
ZIP Code (United States) | ? | postal regions | Hierarchical serial number? | proprietary? | (undefined?) | ... |
Geocodes in use for telephony or radio broadcasting scope:
Geocodes in use and with specific scope:
Geocode | Inception | Scope | Coverage | Formation | Ownership | Rep. entity | Context and description |
---|---|---|---|---|---|---|---|
ONS code | 2001 | UK only | UK/themes | Serial number | free | polygon | Administrative divisions. Geographical areas of the UK, for use in tabulating census. |
NUTS area code | 2003 | EU only | Europe | Hierarchical | free | polygon | Administrative divisions. Partially administrative, worldwide (countries) and Europe (country to community) |
MARC country codes | 1971 | USA only? | globe/only nations | Name abbreviation | free | polygon | Administrative divisions. Country codes. |
SGC codes | ? | Canada only | ? | Serial number | free | polygon | Administrative divisions, numeric codes. ... Statistical, like ONS. |
UN/LOCODE | ? | trade and transport | globe | Serial number | free | polygon | Administrative divisions. UN codes for trade and transport locations. |
IATA airport codes | 1930s | airport | globe | ? | free | polygon | Administrative divisions. area /point codes, airports and 3-letter city codes |
ICAO airport codes | 1950s | airport | globe | ? | free | polygon | Administrative divisions.area /point codes, airports |
IANA country codes | 1994 | Internet | globe | ? | free | polygon | Administrative divisions. Similar to ISO 3166-1 alpha-2, see Country code top-level domain, List and Internationalized country codes. |
IOC country codes | ~1960 | Sport | globe | abbreviation | free | polygon | Administrative divisions. Codes of IOC members; uses three-letter abbreviation country codes, like ISO 3166-1 alpha-3. |
Longhurst code | ? | Environment | globe | ? | free | polygon | Administrative divisions. A set of four-letter codes used in ecological/geographic regions in oceanography. |
FIFA country code | ? | sport/football | global | ? | free | polygon | Administrative divisions. |
FIPS country codes | 1994? | scope | U.S. | ? | free | polygon | Administrative divisions. (FIPS 10-4) area code. |
FIPS place codes | ? | U.S. | place | ? | free | polygon | (FIPS 55). Administrative divisions. |
FIPS country codes | ? | U.S. | globe/nations | ? | free | polygon | (FIPS 6-4). Administrative divisions |
FIPS state codes | ? | U.S. | ? | ? | free | polygon | (FIPS 5-2). Administrative divisions |
Geocode | Inception | Scope | Coverage | Formation | Ownership | Rep. entity | Context and description |
---|---|---|---|---|---|---|---|
HASC | ? | general | nations and subdivs. | Name abbreviation | free | polygon | Administrative divisions. HASC stands "Hierarchical Administrative Subdivision Codes". |
UTM Zone | ? | general | ? | ? | free | grid cell | ? |
UTM Grid Zones | ? | general | ? | ? | free | grid cell | based on UTM Zones, and Latitude bands of MGRS.. |
WMO squares | ~2005? | Meteorology | globe | grid | free | grid cell | ... replaced by modern DGGS's ... |
C-squares | 2002 | general | globe | ? | free | grid cell | compact encoding of geographic coordinate bounds (latitude-longitude). Uses WMO squares as starting point for hierarchical subdivision. |
GEOREF | ? | general | ? | ? | free | polygon | World Geographic Reference System, a military / air navigation coordinate system for point and area identification |
GARS | ~2007? | general | ? | ? | free | polygon | reference system developed by the National Geospatial-Intelligence Agency (NGA) |
MGRS | ~1960s | general | ? | ? | free | grid cell | Military Grid Reference System. Derived from UTM and UPS grids by NATO with a unique naming convention. |
Other geocodes:
Some standards and name servers include: ISO 3166, FIPS, INSEE, Geonames, IATA and ICAO.
A number of commercial solutions have also been proposed:
ISO 3166 is a standard published by the International Organization for Standardization (ISO) that defines codes for the names of countries, dependent territories, special areas of geographical interest, and their principal subdivisions. The official name of the standard is Codes for the representation of names of countries and their subdivisions.
A postal code is a series of letters or digits or both, sometimes including spaces or punctuation, included in a postal address for the purpose of sorting mail.
There are many different numbering schemes for assigning nominal numbers to entities. These generally require an agreed set of rules, or a central coordinator. The schemes can be considered to be examples of a primary key of a database management system table, whose table definitions require a database design.
ISO 3166-2 is part of the ISO 3166 standard published by the International Organization for Standardization (ISO), and defines codes for identifying the principal subdivisions of all countries coded in ISO 3166-1. The official name of the standard is Codes for the representation of names of countries and their subdivisions – Part 2: Country subdivision code. It was first published in 1998.
ISO 3166-1 alpha-2 codes are two-letter country codes defined in ISO 3166-1, part of the ISO 3166 standard published by the International Organization for Standardization (ISO), to represent countries, dependent territories, and special areas of geographical interest. They are the most widely used of the country codes published by ISO, and are used most prominently for the Internet's country code top-level domains. They are also used as country identifiers extending the postal code when appropriate within the international postal system for paper mail, and have replaced the previous one consisting one-letter codes. They were first included as part of the ISO 3166 standard in its first edition in 1974.
The Natural Area Code, or Universal Address, is a proprietary geocode system for identifying an area anywhere on the Earth, or a volume of space anywhere around the Earth. The use of thirty alphanumeric characters instead of only ten digits makes a NAC shorter than its numerical latitude/longitude equivalent.
An address is a collection of information, presented in a mostly fixed format, used to give the location of a building, apartment, or other structure or a plot of land, generally using political boundaries and street names as references, along with other identifiers such as house or apartment numbers and organization name. Some addresses also contain special codes, such as a postal code, to make identification easier and aid in the routing of mail.
In geometry, space partitioning is the process of dividing an entire space into two or more disjoint subsets. In other words, space partitioning divides a space into non-overlapping regions. Any point in the space can then be identified to lie in exactly one of the regions.
Geotagging, or GeoTagging, is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RgSSfeeds and is a form of geospatial metadata. This data usually consists of latitude and longitude coordinates, though they can also include altitude, bearing, distance, accuracy data, and place names, and perhaps a time stamp.
Address geocoding, or simply geocoding, is the process of taking a text-based description of a location, such as an address or the name of a place, and returning geographic coordinates, frequently latitude/longitude pair, to identify a location on the Earth's surface. Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries.
A spatial reference system (SRS) or coordinate reference system (CRS) is a framework used to precisely measure locations on the surface of Earth as coordinates. It is thus the application of the abstract mathematics of coordinate systems and analytic geometry to geographic space. A particular SRS specification comprises a choice of Earth ellipsoid, horizontal datum, map projection, origin point, and unit of measure. Thousands of coordinate systems have been specified for use around the world or in specific regions and for various purposes, necessitating transformations between different SRS.
UN M49 or the Standard Country or Area Codes for Statistical Use is a standard for area codes used by the United Nations for statistical purposes, developed and maintained by the United Nations Statistics Division. Each area code is a 3-digit number which can refer to a wide variety of geographical and political regions, like a continent and a country. Codes assigned in the system generally do not change when the country or area's name changes, but instead change when the territorial extent of the country or area changes significantly, although there have been exceptions to this rule.
Digital Earth is the name given to a concept by former US vice president Al Gore in 1998, describing a virtual representation of the Earth that is georeferenced and connected to the world's digital knowledge archives.
C-squares is a system of spatially unique, location-based identifiers (geocodes) for areas on the surface of the earth, represented as cells from a latitude- and longitude-based Discrete Global Grid at a hierarchical set of resolution steps, obtained by progressively subdividing 10×10 degree World Meteorological Organization squares; the term "c-square" is also available for use to designate any component cell of the grid. Individual cell identifiers incorporate literal values of latitude and longitude in an interleaved notation, together with additional digits that support intermediate grid resolutions of 5, 0.5, 0.05 degrees, etc.
In the context of a spatial index, a grid or mesh is a regular tessellation of a manifold or 2-D surface that divides it into a series of contiguous cells, which can then be assigned unique identifiers and used for spatial indexing purposes. A wide variety of such grids have been proposed or are currently in use, including grids based on "square" or "rectangular" cells, triangular grids or meshes, hexagonal grids, and grids based on diamond-shaped cells. A "global grid" is a kind of grid that covers the entire surface of the globe.
Geohash is a public domain geocode system invented in 2008 by Gustavo Niemeyer which encodes a geographic location into a short string of letters and digits. Similar ideas were introduced by G.M. Morton in 1966. It is a hierarchical spatial data structure which subdivides space into buckets of grid shape, which is one of the many applications of what is known as a Z-order curve, and generally space-filling curves.
The Geohash-36 geocode is an open-source compression algorithm for world coordinate data. It was developed as a variation of the OpenPostcode format developed as a candidate geolocation postcode for the Republic of Ireland. It is calculated differently and uses a more concise base 36 representation rather than other geocodes that adopted base 32.
In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place.
The mapcode system is an open-source geocode system consisting of two groups of letters and digits, separated by a dot. It represents a location on the surface of the Earth, within the context of a separately specified country or territory. For example, the entrance to the elevator of the Eiffel Tower in Paris is “France 4J.Q2”. As with postal addresses, it is often unnecessary to explicitly mention the country.
A discrete global grid (DGG) is a mosaic that covers the entire Earth's surface. Mathematically it is a space partitioning: it consists of a set of non-empty regions that form a partition of the Earth's surface. In a usual grid-modeling strategy, to simplify position calculations, each region is represented by a point, abstracting the grid as a set of region-points. Each region or region-point in the grid is called a cell.