This article needs additional citations for verification .(January 2014) |
Address geocoding, or simply geocoding, is the process of taking a text-based description of a location, such as an address or the name of a place, and returning geographic coordinates, frequently latitude/longitude pair, to identify a location on the Earth's surface. [1] Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries.
The geographic coordinates representing locations often vary greatly in positional accuracy. Examples include building centroids, land parcel centroids, interpolated locations based on thoroughfare ranges, street segments centroids, postal code centroids (e.g. ZIP codes, CEDEX), and Administrative division Centroids.
Geocoding – a subset of Geographic Information System (GIS) spatial analysis – has been a subject of interest since the early 1960s.
In 1960, the first operational GIS – named the Canada Geographic Information System (CGIS) – was invented by Dr. Roger Tomlinson, who has since been acknowledged as the father of GIS. The CGIS was used to store and analyze data collected for the Canada Land Inventory, which mapped information about agriculture, wildlife, and forestry at a scale of 1:50,000, in order to regulate land capability for rural Canada. However, the CGIS lasted until the 1990s and was never available commercially.
On 1 July 1963, five-digit ZIP codes were introduced nationwide by the United States Post Office Department (USPOD). In 1983, nine-digit ZIP+4 codes were brought about as an extra identifier in more accurately locating addresses.
In 1964, the Harvard Laboratory for Computer Graphics and Spatial Analysis developed groundbreaking software code – e.g. GRID, and SYMAP – all of which were sources for commercial development of GIS.
In 1967, a team at the Census Bureau – including the mathematician James Corbett [3] and Donald Cooke [4] – invented Dual Independent Map Encoding (DIME) – the first modern vector mapping model – which ciphered address ranges into street network files and incorporated the "percent along" geocoding algorithm. [5] Still in use by platforms such as Google Maps and MapQuest, the "percent along" algorithm denotes where a matched address is located along a reference feature as a percentage of the reference feature's total length. DIME was intended for the use of the United States Census Bureau, and it involved accurately mapping block faces, digitizing nodes representing street intersections, and forming spatial relationships. New Haven, Connecticut, was the first city on Earth with a geocodable streets network database.
In the late 1970s, two main public domain geocoding platforms were in development: GRASS GIS and MOSS. The early 1980s saw the rise of many more commercial vendors of geocoding software, namely Intergraph, ESRI, CARIS, ERDAS, and MapInfo Corporation. These platforms merged the 1960s approach of separating spatial information with the approach of organizing this spatial information into database structures.
In 1986, Mapping Display and Analysis System (MIDAS) became the first desktop geocoding software, designed for the DOS operating system. Geocoding was elevated from the research department into the business world with the acquisition of MIDAS by MapInfo. MapInfo has since been acquired by Pitney Bowes, and has pioneered in merging geocoding with business intelligence; allowing location intelligence to provide solutions for the public and private sectors.
The end of the 20th century had seen geocoding become more user-oriented, especially via open-source GIS software. Mapping applications and geospatial data had become more accessible over the Internet.
Because the mail-out/mail-back technique was so successful in the 1980 Census, the U.S. Bureau of Census was able to put together a large geospatial database, using interpolated street geocoding. [6] This database – along with the Census' nationwide coverage of households – allowed for the birth of TIGER (Topologically Integrated Geographic Encoding and Referencing).
Containing address ranges instead of individual addresses, TIGER has since been implemented in nearly all geocoding software platforms used today. By the end of the 1990 Census, TIGER "contained a latitude/longitude-coordinate for more than 30 million feature intersections and endpoints and nearly 145 million feature 'shape' points that defined the more than 42 million feature segments that outlined more than 12 million polygons." [7]
TIGER was the breakthrough for "big data" geospatial solutions.
The early 2000s saw the rise of Coding Accuracy Support System (CASS) address standardization. The CASS certification is offered to all software vendors and advertising mailers who want the United States Postal Services (USPS) to assess the quality of their address-standardizing software. The annually renewed CASS certification is based on delivery point codes, ZIP codes, and ZIP+4 codes. Adoption of a CASS certified software by software vendors allows them to receive discounts in bulk mailing and shipping costs. They can benefit from increased accuracy and efficiency in those bulk mailings, after having a certified database. In the early 2000s, geocoding platforms were also able to support multiple datasets.
In 2003, geocoding platforms were capable of merging postal codes with street data, updated monthly. This process became known as "conflation".
Beginning in 2005, geocoding platforms included parcel-centroid geocoding. Parcel-centroid geocoding allowed for a lot of precision in geocoding an address. For example, parcel-centroid allowed a geocoder to determine the centroid of a specific building or lot of land. Platforms were now also able to determine the elevation of specific parcels.
2005 also saw the introduction of the Assessor's Parcel Number (APN). A jurisdiction's tax assessor was able to assign this number to parcels of real estate. This allowed for proper identification and record-keeping. An APN is important for geocoding an area which is covered by a gas or oil lease, and indexing property tax information provided to the public.
In 2006, Reverse Geocoding and reverse APN lookup were introduced to geocoding platforms. This involved geocoding a numerical point location – with a longitude and latitude – to a textual, readable address.
2008 and 2009 saw the growth of interactive, user-oriented geocoding platforms – namely MapQuest, Google Maps, Bing Maps, and Global Positioning Systems (GPS). These platforms were made even more accessible to the public with the simultaneous growth of the mobile industry, specifically smartphones.
The 2010s saw vendors fully support geocoding and reverse geocoding globally. Cloud-based geocoding application programming interface (API) and on-premises geocoding have allowed for a greater match rate, greater precision, and greater speed. There is now a popularity in the idea of geocoding being able to influence business decisions. This is the integration between the geocoding process and business intelligence.
The future of geocoding also involves three-dimensional geocoding, indoor geocoding, and multiple language returns for the geocoding platforms.
Geocoding is a task which involves multiple datasets and processes, all of which work together. Some of the components are provided by the user, while others are built into the geocoding software.
Input data are the descriptive, textual information (address or building name) which the user wants to turn into numerical, spatial data (latitude and longitude) through the process of geocoding. These are often included in a table with other attributes of the locations. Input data is classified into two categories:
To achieve the greatest accuracy, the geocodes in the input dataset need to be as correct as possible, and formatted in standard ways. Thus, it is common to first go through a process of data cleansing, often called "address scrubbing," to find and correct any errors. This is especially important for databases in which participants enter their own location geocodes, frequently resulting in a variety of forms (e.g., "Pennsylvania," "PA," "Penn.") and misspellings.
The second necessary dataset specifies the locations of geographic features in a common spatial reference system, usually stored in a GIS file format or spatial database. Examples include a point dataset of buildings, a line dataset of streets, or a polygon dataset of counties. The attributes of these features must include information that will match the geocodes in the input dataset, such as a name, unique id, or standard geocode such as the United States FIPS codes for geographic features. It is common for the reference dataset to include multiple attribute columns of geocodes for flexibility or handling of complex geocodes. For example, a street dataset intended to be used for street address geocoding must include not only the street name, but any directional suffixes or prefixes and the range of address numbers found on each segment.
The third component is software that matches each geocode in the input dataset to the attributes of a corresponding feature in the reference dataset. Once a match is made, the location of the reference feature can be attached to the input row. These algorithms are of two types:
The algorithm is rarely able to perfectly locate all of the input data; mismatches can occur due to misspelled or incomplete input data, imperfect (usually outdated) reference data, or unique regional geocoding systems that the algorithm does not recognize. Many geocoders provide a follow-up stage to manually review and correct suspect matches.
A simple method of geocoding is address interpolation. This method makes use of data from a street geographic information system where the street network is already mapped within the geographic coordinate space. Each street segment is attributed with address ranges (e.g. house numbers from one segment to the next). Geocoding takes an address, matches it to a street and specific segment (such as a block, in towns that use the "block" convention). Geocoding then interpolates the position of the address, within the range along the segment.
Take for example: 742 Evergreen Terrace
Let's say that this segment (for instance, a block) of Evergreen Terrace runs from 700 to 799. Even-numbered addresses fall on the east side of Evergreen Terrace, with odd-numbered addresses on the west side of the street. 742 Evergreen Terrace would (probably) be located slightly less than halfway up the block, on the east side of the street. A point would be mapped at that location along the street, perhaps offset a distance to the east of the street centerline.
This section is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic.(December 2014) |
However, this process is not always as straightforward as in this example. Difficulties arise when
While there might be a 742 Evergreen Terrace in Springfield, there might also be a 742 Evergreen Terrace in Shelbyville. Asking for the city name (and state, province, country, etc. as needed) can solve this problem. Boston, Massachusetts [8] has multiple "100 Washington Street" locations because several cities have been annexed without changing street names, thus requiring use of unique postal codes or district names for disambiguation. Geocoding accuracy can be greatly improved by first utilizing good address verification practices. Address verification will confirm the existence of the address and will eliminate ambiguities. Once the valid address is determined, it is very easy to geocode and determine the latitude/longitude coordinates. Finally, several caveats on using interpolation:
A very common error is to believe the accuracy ratings of a given map's geocodable attributes. Such accuracy as quoted by vendors has no bearing on an address being attributed to the correct segment or to the correct side of the segment, nor resulting in an accurate position along that correct segment. With the geocoding process used for U.S. Census TIGER datasets, 5–7.5% of the addresses may be allocated to a different census tract, while a study of Australia's TIGER-like system found that 50% of the geocoded points were mapped to the wrong property parcel. [9] The accuracy of geocoded data can also have a bearing on the quality of research that uses this data. One study [10] by a group of Iowa researchers found that the common method of geocoding using TIGER datasets as described above, can cause a loss of as much as 40% of the power of a statistical analysis. An alternative is to use orthophoto or image coded data such as the Address Point data from Ordnance Survey in the UK, but such datasets are generally expensive.
Because of this, it is quite important to avoid using interpolated results except for non-critical applications. Interpolated geocoding is usually not appropriate for making authoritative decisions, for example if life safety will be affected by that decision. Emergency services, for example, do not make an authoritative decision based on their interpolations; an ambulance or fire truck will always be dispatched regardless of what the map says.[ citation needed ]
In rural areas or other places lacking high quality street network data and addressing, GPS is useful for mapping a location. For traffic accidents, geocoding to a street intersection or midpoint along a street centerline is a suitable technique. Most highways in developed countries have mile markers to aid in emergency response, maintenance, and navigation. It is also possible to use a combination of these geocoding techniques — using a particular technique for certain cases and situations and other techniques for other cases. In contrast to geocoding of structured postal address records, toponym resolution maps place names in unstructured document collections to their corresponding spatial footprints.
Research has introduced a new approach to the control and knowledge aspects of geocoding, by using an agent-based paradigm. [12] In addition to the new paradigm for geocoding, additional correction techniques and control algorithms have been developed. [13] The approach represents the geographic elements commonly found in addresses as individual agents. This provides a commonality and duality to control and geographic representation. In addition to scientific publication, the new approach and subsequent prototype gained national media coverage in Australia. [14] The research was conducted at Curtin University in Perth, Western Australia. [15]
With the recent advance in Deep Learning and Computer Vision, a new geocoding workflow, which leverages Object Detection techniques to directly extract the centroid of the building rooftops as geocoding output, has been proposed. [16]
Geocoded locations are useful in many GIS analysis, cartography, decision making workflow, transaction mash-up, or injected into larger business processes. On the web, geocoding is used in services like routing and local search. Geocoding, along with GPS provides location data for geotagging media, such as photographs or RSS items.
The proliferation and ease of access to geocoding (and reverse geocoding) services raises privacy concerns. For example, in mapping crime incidents, law enforcement agencies aim to balance the privacy rights of victims and offenders, with the public's right to know. Law enforcement agencies have experimented with alternative geocoding techniques that allow them to mask a portion of the locational detail (e.g., address specifics that would lead to identifying a victim or offender). As well, in providing online crime mapping to the public, they also place disclaimers regarding the locational accuracy of points on the map, acknowledging these location masking techniques, and impose terms of use for the information.
A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database; however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.
Computer-aided dispatch (CAD), also called computer-assisted dispatch, is a method of dispatching taxicabs, couriers, field service technicians, mass transit vehicles or emergency services assisted by computer. It can either be used to send messages to the dispatchee via a mobile data terminal (MDT) and/or used to store and retrieve data. A dispatcher may announce the call details to field units over a two-way radio. Some systems communicate using a two-way radio system's selective calling features. CAD systems may send text messages with call-for-service details to alphanumeric pagers or wireless telephony text services like SMS. The central idea is that persons in a dispatch center are able to easily view and understand the status of all units being dispatched. CAD provides displays and tools so that the dispatcher has an opportunity to handle calls-for-service as efficiently as possible.
A geocode is a code that represents a geographic entity. It is a unique identifier of the entity, to distinguish it from others in a finite set of geographic entities. In general the geocode is a human-readable and short identifier.
Linear referencing, also called linear reference system or linear referencing system (LRS), is a method of spatial referencing in engineering and construction, in which the locations of physical features along a linear element are described in terms of measurements from a fixed point, such as a milestone along a road. Each feature is located by either a point or a line. If a segment of the linear element or route is changed, only those locations on the changed segment need to be updated. Linear referencing is suitable for management of data related to linear features like roads, railways, oil and gas transmission pipelines, power and data transmission lines, and rivers.
A GIS software program is a computer program to support the use of a geographic information system, providing the ability to create, store, manage, query, analyze, and visualize geographic data, that is, data representing phenomena for which location is important. The GIS software industry encompasses a broad range of commercial and open-source products that provide some or all of these capabilities within various information technology architectures.
The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.
ArcGIS is a family of client, server and online geographic information system (GIS) software developed and maintained by Esri.
Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.
Georeferencing or georegistration is a type of coordinate transformation that binds a digital raster image or vector database that represents a geographic space to a spatial reference system, thus locating the digital data in the real world. It is thus the geographic form of image registration. The term can refer to the mathematical formulas used to perform the transformation, the metadata stored alongside or within the image file to specify the transformation, or the process of manually or automatically aligning the image to the real world to create such metadata. The most common result is that the image can be visually and analytically integrated with other geographic data in geographic information systems and remote sensing software.
JTS Topology Suite is an open-source Java software library that provides an object model for Euclidean planar linear geometry together with a set of fundamental geometric functions. JTS is primarily intended to be used as a core component of vector-based geomatics software such as geographical information systems. It can also be used as a general-purpose library providing algorithms in computational geometry.
Spatial extract, transform, load, also known as geospatial transformation and load (GTL), is a process for managing and manipulating geospatial data, for example map data. It is a type of extract, transform, load (ETL) process, with software tools and libraries specialised for geographical information.
The concept of a Geospatial Web may have first been introduced by Dr. Charles Herring in his US DoD paper, An Architecture of Cyberspace: Spatialization of the Internet, 1994, U.S. Army Construction Engineering Research Laboratory.
A geographic data model, geospatial data model, or simply data model in the context of geographic information systems, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such data models represent various aspects of these phenomena by means of geographic data, including spatial locations, attributes, change over time, and identity. For example, the vector data model represents geography as collections of points, lines, and polygons, and the raster data model represent geography as cell matrices that store numeric values. Data models are implemented throughout the GIS ecosystem, including the software tools for data management and spatial analysis, data stored in a variety of GIS file formats, specifications and standards, and specific designs for GIS installations.
Reverse geocoding is the process of converting a location as described by geographic coordinates to a human-readable address or place name. It is the opposite of forward geocoding, hence the term reverse. Reverse geocoding permits the identification of nearby street addresses, places, and/or areal subdivisions such as neighbourhoods, county, state, or country. Combined with geocoding and routing services, reverse geocoding is a critical component of mobile location-based services and Enhanced 911 to convert a coordinate obtained by GPS to a readable street address which is easier to understand by the end user, but not necessarily with a better accuracy.
CrimeStat is a crime mapping software program. CrimeStat is Windows-based program that conducts spatial and statistical analysis and is designed to interface with a geographic information system (GIS). The program is developed by Ned Levine & Associates under the direction of Ned Levine, with funding by the National Institute of Justice (NIJ), an agency of the United States Department of Justice. The program and manual are distributed for free by NIJ.
Geospatial topology is the study and application of qualitative spatial relationships between geographic features, or between representations of such features in geographic information, such as in geographic information systems (GIS). For example, the fact that two regions overlap or that one contains the other are examples of topological relationships. It is thus the application of the mathematics of topology to GIS, and is distinct from, but complementary to the many aspects of geographic information that are based on quantitative spatial measurements through coordinate geometry. Topology appears in many aspects of geographic information science and GIS practice, including the discovery of inherent relationships through spatial query, vector overlay and map algebra; the enforcement of expected relationships as validation rules stored in geospatial data; and the use of stored topological relationships in applications such as network analysis. Spatial topology is the generalization of geospatial topology for non-geographic domains, e.g., CAD software.
In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place.
The mapcode system is an open-source geocode system consisting of two groups of letters and digits, separated by a dot. It represents a location on the surface of the Earth, within the context of a separately specified country or territory. For example, the entrance to the elevator of the Eiffel Tower in Paris is “France 4J.Q2”. As with postal addresses, it is often unnecessary to explicitly mention the country.
CARTO is a software as a service (SaaS) spatial analysis platform that provides GIS, web mapping, data visualization, spatial analytics, and spatial data science features. The company is positioned as a Location Intelligence platform due to its tools for geospatial data analysis and visualization that do not require advanced GIS or development experience. As a cloud-native platform, CARTO runs natively on cloud data warehouse platforms overcoming any previous limits on data scale for spatial workloads.
A Geodatabase is a proprietary GIS file format developed in the late 1990s by Esri to represent, store, and organize spatial datasets within a geographic information system. A geodatabase is both a logical data model and the physical implementation of that logical model in several proprietary file formats released during the 2000s. The geodatabase design is based on the spatial database model for storing spatial data in relational and object-relational databases. Given the dominance of Esri in the GIS industry, the term "geodatabase" is used by some as a generic trademark for any spatial database, regardless of platform or design.