Geo-imputation

Last updated

In data analysis involving geographical locations, geo-imputation or geographical imputation methods are steps taken to replace missing values for exact locations with approximate locations derived from associated data. They assign a reasonable location or geographic based attribute (e.g., census tract) to a person by using both the demographic characteristics of the person and the population characteristics from a larger geographic aggregate area in which the person was geocoded (e.g., postal delivery area or county). For example, if a person's census tract was known and no other address information was available then geo-imputation methods could be used to probabilistically assign that person to a smaller geographic area, such as a census block group. [1]

See also

Notes and references

  1. Henry, Kevin A.; Boscoe, Francis P. (2008). "Estimating the accuracy of geographical imputation". International Journal of Health Geographics. 7 (3): 3. doi: 10.1186/1476-072X-7-3 . PMC   2266732 . PMID   18215308.

Related Research Articles

<span class="mw-page-title-main">Geographic information system</span> System to capture, manage and present geographic data

A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database, however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.

A census tract, census area, census district or meshblock is a geographic region defined for the purpose of taking a census. Sometimes these coincide with the limits of cities, towns or other administrative areas and several tracts commonly exist within a county. In unincorporated areas of the United States these are often arbitrary, except for coinciding with political lines.

<span class="mw-page-title-main">Exurb</span> Area of less population and density than suburbs

An exurb is an area outside the typically denser inner suburban area, at the edge of a metropolitan area, which has some economic and commuting connection to the metro area, low housing density, and growth. It shapes an interface between urban and rural landscapes holding a limited urban nature for its functional, economic, and social interaction with the urban center, due to its dominant residential character. Exurbs consist of "agglomerations of housing and jobs outside the municipal boundaries of a primary city" and beyond the surrounding suburbs.

A geocode is a code that represents a geographic entity. It is a unique identifier of the entity, to distinguish it from others in a finite set of geographic entities. In general the geocode is a human-readable and short identifier.

Address geocoding, or simply geocoding, is the process of taking a text-based description of a location, such as an address or the name of a place, and returning geographic coordinates, frequently latitude/longitude pair, to identify a location on the Earth's surface. Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries.

A GIS software program is a computer program to support the use of a geographic information system, providing the ability to create, store, manage, query, analyze, and visualize geographic data, that is, data representing phenomena for which location is important. The GIS software industry encompasses a broad range of commercial and open-source products that provide some or all of these capabilities within various information technology architectures.

Geodemography is the study of people based on where they live; it links the sciences of demography, the study of human population dynamics, and geography, the study of the locational and spatial variation of both physical and human phenomena on Earth, along with sociology. It includes the application of geodemographic classifications for business, social research and public policy but has a parallel history in academic research seeking to understand the processes by which settlements evolve and neighborhoods are formed. Geodemographic systems estimate the most probable characteristics of people based on the pooled profile of all people living in a small area near a particular address.

<span class="mw-page-title-main">Health geography</span>

Health geography is the application of geographical information, perspectives, and methods to the study of health, disease, and health care. Medical geography, a sub-discipline of or sister field of health geography, focuses on understanding spatial patterns of health and disease as related to the natural and social environment. Conventionally, there are two primary areas of research within medical geography: the first deals with the spatial distribution and determinants of morbidity and mortality, while the second deals with health planning, help-seeking behavior, and the provision of health services.

<span class="mw-page-title-main">Confounding</span> Variable or factor in causal inference

In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

<span class="mw-page-title-main">Spatial analysis</span> Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.

<span class="mw-page-title-main">Modifiable areal unit problem</span> Source of statistical bias

The modifiable areal unit problem (MAUP) is a source of statistical bias that can significantly impact the results of statistical hypothesis tests. MAUP affects results when point-based measures of spatial phenomena are aggregated into spatial partitions or areal units as in, for example, population density or illness rates. The resulting summary values are influenced by both the shape and scale of the aggregation unit.

Georeferencing or georegistration is a type of coordinate transformation that binds a digital raster image or vector database that represents a geographic space to a spatial reference system, thus locating the digital data in the real world. It is thus the geographic form of image registration. The term can refer to the mathematical formulas used to perform the transformation, the metadata stored alongside or within the image file to specify the transformation, or the process of manually or automatically aligning the image to the real world to create such metadata. The most common result is that the image can be visually and analytically integrated with other geographic data in geographic information systems and remote sensing software.

The concept of a Geospatial Web may have first been introduced by Dr. Charles Herring in his US DoD paper, An Architecture of Cyberspace: Spatialization of the Internet, 1994, U.S. Army Construction Engineering Research Laboratory.

Reverse geocoding is the process of converting a location as described by geographic coordinates to a human-readable address or place name. It is the opposite of forward geocoding, hence the term reverse. Reverse geocoding permits the identification of nearby street addresses, places, and/or areal subdivisions such as neighbourhoods, county, state, or country. Combined with geocoding and routing services, reverse geocoding is a critical component of mobile location-based services and Enhanced 911 to convert a coordinate obtained by GPS to a readable street address which is easier to understand by the end user, but not necessarily with a better accuracy.

Crime hotspots are areas that have high crime intensity. These are usually visualized using a map. They are developed for researchers and analysts to examine geographic areas in relation to crime. Researchers and theorists examine the occurrence of hotspots in certain areas and why they happen, and analysts examine the techniques used to perform the research. Developing maps that contain hotspots are becoming a critical and influential tool for policing; they help develop knowledge and understanding of different areas in a city and possibly why crime occurs there.

A Crime concentration is a spatial area to which high levels of crime incidents are attributed. A crime concentration can be the result of homogeneous or heterogeneous crime incidents. Hotspots are the result of various crimes occurring in relative proximity to each other within predefined human geopolitical or social boundaries. Crime concentrations are smaller units or set of crime targets within a hotspot. A single or a conjunction of crime concentrations within a study area can make up a crime hotspot.

In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place.

Imputation in genetics refers to the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a population, for instance from the HapMap or the 1000 Genomes Project in humans, thereby allowing to test for association between a trait of interest and experimentally untyped genetic variants, but whose genotypes have been statistically inferred ("imputed"). Genotype imputation is usually performed on SNPs, the most common kind of genetic variation.

<span class="mw-page-title-main">Discrete global grid</span> Partition of Earths surface into subdivided cells

A discrete global grid (DGG) is a mosaic that covers the entire Earth's surface. Mathematically it is a space partitioning: it consists of a set of non-empty regions that form a partition of the Earth's surface. In a usual grid-modeling strategy, to simplify position calculations, each region is represented by a point, abstracting the grid as a set of region-points. Each region or region-point in the grid is called a cell.

<span class="mw-page-title-main">Uncertain geographic context problem</span> Source of statistical bias

The uncertain geographic context problem or UGCoP is a source of statistical bias that can significantly impact the results of spatial analysis when dealing with aggregate data. The UGCoP is very closely related to the Modifiable areal unit problem (MAUP), and like the MAUP, arises from how we divide the land into areal units. It is caused by the difficulty, or impossibility, of understanding how phenomena under investigation in different enumeration units interact between enumeration units, and outside of a study area over time. It is particularly important to consider the UGCoP within the discipline of time geography, where phenomena under investigation can move between spatial enumeration units during the study period. Examples of research that needs to consider the UGCoP include food access and human mobility.