Modifiable areal unit problem

Last updated
An example of the modifiable areal unit problem and the distortion of rate calculations Maup rate numbers.png
An example of the modifiable areal unit problem and the distortion of rate calculations

The modifiable areal unit problem (MAUP) is a source of statistical bias that can significantly impact the results of statistical hypothesis tests. MAUP affects results when point-based measures of spatial phenomena are aggregated into spatial partitions or areal units (such as regions or districts) as in, for example, population density or illness rates. [1] [2] The resulting summary values (e.g., totals, rates, proportions, densities) are influenced by both the shape and scale of the aggregation unit. [3]

Contents

For example, census data may be aggregated into county districts, census tracts, postcode areas, police precincts, or any other arbitrary spatial partition. Thus the results of data aggregation are dependent on the mapmaker's choice of which "modifiable areal unit" to use in their analysis. A census choropleth map calculating population density using state boundaries will yield radically different results than a map that calculates density based on county boundaries. Furthermore, census district boundaries are also subject to change over time, [4] meaning the MAUP must be considered when comparing past data to current data.

Background

The issue was first recognized by Gehlke and Biehl in 1934 [5] and later described in detail in an entry in the Concepts and Techniques in Modern Geography (CATMOG) series by Stan Openshaw (1984) and in the book by Giuseppe Arbia (1988). In particular, Openshaw (1984) observed that "the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating". [6] The problem is especially apparent when the aggregate data are used for cluster analysis for spatial epidemiology, spatial statistics or choropleth mapping, in which misinterpretations can easily be made without realizing it. Many fields of science, especially human geography are prone to disregard the MAUP when drawing inferences from statistics based on aggregated data. [2] MAUP is closely related to the topic of ecological fallacy and ecological bias (Arbia, 1988). Stan Openshaw's work on this topic has led to Michael F. Goodchild suggesting it be referred to as the "Openshaw effect." [7]

Ecological bias caused by MAUP has been documented as two separate effects that usually occur simultaneously during the analysis of aggregated data. First, the scale effect causes variation in statistical results between different levels of aggregation (radial distance). Therefore, the association between variables depends on the size of areal units for which data are reported. Generally, correlation increases as areal unit size increases. The zoning effect describes variation in correlation statistics caused by the regrouping of data into different configurations at the same scale (areal shape). [8]

Since the 1930s, research has found extra variation in statistical results because of the MAUP. The standard methods of calculating within-group and between-group variance do not account for the extra variance seen in MAUP studies as the groupings change. MAUP can be used as a methodology to calculate upper and lower limits as well as average regression parameters for multiple sets of spatial groupings. The MAUP is a critical source of error in spatial studies, whether observational or experimental. As such, unit consistency, particularly in a time-series cross-sectional (TSCS) context, is essential. Further, robustness checks of unit sensitivity to alternative spatial aggregation should be routinely performed to mitigate associated biases on resulting statistical estimates.

A hand map with different spatial patterns. Note: p is the probability of q-statistic; * denotes statistical significant at level 0.05, ** for 0.001, *** for smaller than 10 ;(D) subscripts 1, 2, 3 of q and p denotes the strata Z1+Z2 with Z3,Z1 with Z2+Z3, and Z1 and Z2 and Z3 individually, respectively; (E) subscripts 1 and 2 of q and p denotes the strata Z1+Z2 with Z3+Z4,and Z1+Z3 with Z2+Z4, respectively. Q-fig2.jpg
A hand map with different spatial patterns. Note: p is the probability of q-statistic; * denotes statistical significant at level 0.05, ** for 0.001, *** for smaller than 10 ;(D) subscripts 1, 2, 3 of q and p denotes the strata Z1+Z2 with Z3,Z1 with Z2+Z3, and Z1 and Z2 and Z3 individually, respectively; (E) subscripts 1 and 2 of q and p denotes the strata Z1+Z2 with Z3+Z4,and Z1+Z3 with Z2+Z4, respectively.

Suggested solutions

Several suggestions have been made in literature to reduce aggregation bias during regression analysis. A researcher might correct the variance-covariance matrix using samples from individual-level data. [9] Alternatively, one might focus on local spatial regression rather than global regression. A researcher might also attempt to design areal units to maximize a particular statistical result. [6] Others have argued that it may be difficult to construct a single set of optimal aggregation units for multiple variables, each of which may exhibit non-stationarity and spatial autocorrelation across space in different ways. Others have suggested developing statistics that change across scales in a predictable way, perhaps using fractal dimension as a scale-independent measure of spatial relationships. Others have suggested Bayesian hierarchical models as a general methodology for combining aggregated and individual-level data for ecological inference.

Studies of the MAUP based on empirical data can only provide limited insight due to an inability to control relationships between multiple spatial variables. Data simulation is necessary to have control over various properties of individual-level data. Simulation studies have demonstrated that the spatial support of variables can affect the magnitude of ecological bias caused by spatial data aggregation. [10]

MAUP sensitivity analysis

Using simulations for univariate data, Larsen advocated the use of a Variance Ratio to investigate the effect of spatial configuration, spatial association, and data aggregation. [11] A detailed description of the variation of statistics due to MAUP is presented by Reynolds, who demonstrates the importance of the spatial arrangement and spatial autocorrelation of data values. [12] Reynold’s simulation experiments were expanded by Swift, who in which a series of nine exercises began with simulated regression analysis and spatial trend, then focused on the topic of MAUP in the context of spatial epidemiology. A method of MAUP sensitivity analysis is presented that demonstrates that the MAUP is not entirely a problem. [10] MAUP can be used as an analytical tool to help understand spatial heterogeneity and spatial autocorrelation.

This topic is of particular importance because in some cases data aggregation can obscure a strong correlation between variables, making the relationship appear weak or even negative. Conversely, MAUP can cause random variables to appear as if there is a significant association where there is not. Multivariate regression parameters are more sensitive to MAUP than correlation coefficients. Until a more analytical solution to MAUP is discovered, spatial sensitivity analysis using a variety of areal units is recommended as a methodology to estimate the uncertainty of correlation and regression coefficients due to ecological bias. An example of data simulation and re-aggregation using the ArcPy library is available. [13] [14]

In transport planning, MAUP is associated to Traffic Analysis Zoning (TAZ). A major point of departure in understanding problems in transportation analysis is the recognition that spatial analysis has some limitations associated with the discretization of space. Among them, modifiable areal units and boundary problems are directly or indirectly related to transportation planning and analysis through the design of traffic analysis zones – most of transport studies require directly or indirectly the definition of TAZs. The modifiable boundary and the scale issues should all be given specific attention during the specification of a TAZ because of the effects these factors exert on statistical and mathematical properties of spatial patterns (ie the modifiable areal unit problem—MAUP). In the studies of Viegas, Martinez and Silva (2009, 2009b) [14] the authors propose a method where the results obtained from the study of spatial data are not independent of the scale, and the aggregation effects are implicit in the choice of zonal boundaries. The delineation of zonal boundaries of TAZs has a direct impact on the reality and accuracy of the results obtained from transportation forecasting models. In this paper the MAUP effects on the TAZ definition and the transportation demand models are measured and analyzed using different grids (in size and in origin location). This analysis was developed by building an application integrated in commercial GIS software and by using a case study (Lisbon Metropolitan Area) to test its implementabiity and performance. The results reveal the conflict between statistical and geographic precision, and their relationship with the loss of information in the traffic assignment step of the transportation planning models. [14]

See also

Applications

Related Research Articles

An ecological fallacy is a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the group to which those individuals belong. From the conceptual standpoint of mereology, four common ecological fallacies are:

<span class="mw-page-title-main">Spatial analysis</span> Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.

In statistics, an ecological correlation is a correlation between two variables that are group means, in contrast to a correlation between two variables that describe individuals. For example, one might study the correlation between physical activity and weight among sixth-grade children. A study at the individual level might make use of 100 children, then measure both physical activity and weight; the correlation between the two variables would be at the individual level. By contrast, another study might make use of 100 classes of sixth-grade students, then measure the mean physical activity and the mean weight of each of the 100 classes. A correlation between these group means would be an example of an ecological correlation.

Spatial epidemiology is a subfield of epidemiology focused on the study of the spatial distribution of health outcomes; it is closely related to health geography.

<span class="mw-page-title-main">Thematic map</span> Type of map that visualizes data

A thematic map is a type of map that portrays the geographic pattern of a particular subject matter (theme) in a geographic area. This usually involves the use of map symbols to visualize selected properties of geographic features that are not naturally visible, such as temperature, language, or population. In this, they contrast with general reference maps, which focus on the location of a diverse set of physical features, such as rivers, roads, and buildings. Alternative names have been suggested for this class, such as special-subject or special-purpose maps, statistical maps, or distribution maps, but these have generally fallen out of common usage. Thematic mapping is closely allied with the field of Geovisualization.

Gap analysis is a tool used in wildlife conservation to identify gaps in conservation lands or other wildlands where significant plant and animal species and their habitat or important ecological features occur.

Statistical geography is the study and practice of collecting, analysing and presenting data that has a geographic or areal dimension, such as census or demographics data. It uses techniques from spatial analysis, but also encompasses geographical activities such as the defining and naming of geographical regions for statistical purposes. For example, for the purposes of statistical geography, the Australian Bureau of Statistics uses the Australian Standard Geographical Classification, a hierarchical regionalisation that divides Australia up into states and territories, then statistical divisions, statistical subdivisions, statistical local areas, and finally census collection districts.

Geographic information systems (GISs) and geographic information science (GIScience) combine computer-mapping capabilities with additional database management and data analysis tools. Commercial GIS systems are very powerful and have touched many applications and industries, including environmental science, urban planning, agricultural applications, and others.

A boundary problem in analysis is a phenomenon in which geographical patterns are differentiated by the shape and arrangement of boundaries that are drawn for administrative or measurement purposes. The boundary problem occurs because of the loss of neighbors in analyses that depend on the values of the neighbors. While geographic phenomena are measured and analyzed within a specific unit, identical spatial data can appear either dispersed or clustered depending on the boundary placed around the data. In analysis with point data, dispersion is evaluated as dependent of the boundary. In analysis with areal data, statistics should be interpreted based upon the boundary.

Spatial econometrics is the field where spatial analysis and econometrics intersect. The term “spatial econometrics” was introduced for the first time by the Belgian economist Jean Paelinck in the general address he delivered to the annual meeting of the Dutch Statistical Association in May 1974 . In general, econometrics differs from other branches of statistics in focusing on theoretical models, whose parameters are estimated using regression analysis. Spatial econometrics is a refinement of this, where either the theoretical model involves interactions between different entities, or the data observations are not truly independent. Thus, models incorporating spatial auto-correlation or neighborhood effects can be estimated using spatial econometric methods. Such models are common in regional science, real estate economics, education economics, housing market and many others. Adopting a more general view, in the by-law of the Spatial Econometrics Association, the discipline is defined as the set of “models and theoretical instruments of spatial statistics and spatial data analysis to analyse various economic effects such as externalities, interactions, spatial concentration and many others”. Recent developments tend to include also methods and models from social network econometrics.

Quantitative geography is a subfield and methodological approach to geography that develops, tests, and uses scientific, mathematical, and statistical methods to analyze and model geographic phenomena and patterns. It aims to explain and predict the distribution and dynamics of human and physical geography through the collection and analysis of quantifiable data. The approach quantitative geographers take is generally in line with the scientific method, where a falsifiable hypothesis is generated, and then tested through observational studies. This has received criticism, and in recent years, quantitative geography has moved to include systematic model creation and understanding the limits of their models. This approach is used to study a wide range of topics, including population demographics, urbanization, environmental patterns, and the spatial distribution of economic activity. The methods of quantitative geography are often contrasted by those employed by qualitative geography, which is more focused on observing and recording characteristics of geographic place. However, there is increasing interest in using combinations of both qualitative and quantitative methods through mixed-methods research to better understand and contextualize geographic phenomena.

In geography, scale is the level at which a geographical phenomenon occurs or is described. This concept is derived from the map scale in cartography. Geographers describe geographical phenomena and differences using different scales. From an epistemological perspective, scale is used to describe how detailed an observation is, while ontologically, scale is inherent in the complex interaction between society and nature.

Giuseppe Arbia is an Italian statistician. He is known for his contributions to the field of spatial statistics and spatial econometrics. In 2006 together with Jean Paelinck he founded the Spatial Econometrics Association, which he has been chairing ever since.

<span class="mw-page-title-main">Tobler's second law of geography</span> One of several proposed laws of geography

The second law of geography, according to Waldo Tobler, is "the phenomenon external to a geographic area of interest affects what goes on inside." This is an extension of his first. He first published it in 1999 in reply to a paper titled "Linear pycnophylactic reallocation comment on a paper by D. Martin" and then again in response to criticism of his first law of geography titled "On the First Law of Geography: A Reply." Much of this criticism was centered on the question of if laws were meaningful in geography or any of the social sciences. In this document, Tobler proposed his second law while recognizing others have proposed other concepts to fill the role of 2nd law. Tobler asserted that this phenomenon is common enough to warrant the title of 2nd law of geography. Unlike Tobler's first law of geography, which is relatively well accepted among geographers, there are a few contenders for the title of the second law of geography. Tobler's second law of geography is less well known but still has profound implications for geography and spatial analysis.

<span class="mw-page-title-main">Arbia's law of geography</span> One of several proposed laws of geography

Arbia’s law of geography states, "Everything is related to everything else, but things observed at a coarse spatial resolution are more related than things observed at a finer resolution." Originally proposed as the 2nd law of geography, this is one of several laws competing for that title. Because of this, Arbia's law is sometimes referred to as the second law of geography, or Arbia's second law of geography.

Concepts and Techniques in Modern Geography, abbreviated CATMOG, is a series of 59 short publications, each focused on an individual method or theory in geography.

<span class="mw-page-title-main">Uncertain geographic context problem</span> Source of statistical bias

The uncertain geographic context problem or UGCoP is a source of statistical bias that can significantly impact the results of spatial analysis when dealing with aggregate data. The UGCoP is very closely related to the Modifiable areal unit problem (MAUP), and like the MAUP, arises from how we divide the land into areal units. It is caused by the difficulty, or impossibility, of understanding how phenomena under investigation in different enumeration units interact between enumeration units, and outside of a study area over time. It is particularly important to consider the UGCoP within the discipline of time geography, where phenomena under investigation can move between spatial enumeration units during the study period. Examples of research that needs to consider the UGCoP include food access and human mobility.

<span class="mw-page-title-main">Modifiable temporal unit problem</span> Source of statistical bias

The Modified Temporal Unit Problem (MTUP) is a source of statistical bias that occurs in time series and spatial analysis when using temporal data that has been aggregated into temporal units. In such cases, choosing a temporal unit can affect the analysis results and lead to inconsistencies or errors in statistical hypothesis testing.

The neighborhood effect averaging problem or NEAP delves into the challenges associated with understanding the influence of aggregating neighborhood-level phenomena on individuals when mobility-dependent exposures influence the phenomena. The problem confounds the neighbourhood effect, which suggests that a person's neighborhood impacts their individual characteristics, such as health. It relates to the boundary problem, in that delineated neighborhoods used for analysis may not fully account for an individuals activity space if the borders are permeable, and individual mobility crosses the boundaries. The term was first coined by Mei-Po Kwan in the peer-reviewed journal "International Journal of Environmental Research and Public Health" in 2018.

References

  1. Openshaw, Stan (1983). The Modifiable Areal Unit Problem (PDF). ISBN   0-86094-134-5.
  2. 1 2 Chen, Xiang; Ye, Xinyue; Widener, Michael J.; Delmelle, Eric; Kwan, Mei-Po; Shannon, Jerry; Racine, Racine F.; Adams, Aaron; Liang, Lu; Peng, Jia (27 December 2022). "A systematic review of the modifiable areal unit problem (MAUP) in community food environmental research". Urban Informatics. 1. doi: 10.1007/s44212-022-00021-1 . S2CID   255206315.
  3. "MAUP | Definition – Esri Support GIS Dictionary". support.esri.com. Retrieved 2017-03-09.
  4. Geography, US Census Bureau. "Geographic Boundary Change Notes". www.census.gov. Retrieved 2017-02-24.
  5. Gehlke & Biehl 1934
  6. 1 2 Openshaw 1984 , p. 3
  7. Goodchild, Michael F. (2022). "The Openshaw effect". International Journal of Geographical Information Science. 36: 1697–1698. doi:10.1080/13658816.2022.2102637 . Retrieved 24 January 2024.
  8. Fotheringham, A. S.; Rogerson, P. A (2008). "The Modifiable Areal Unit Problem (MAUP)". The SAGE handbook of spatial analysis. Sage. pp. 105–124. ISBN   978-1-4129-1082-8.
  9. Holt D, Steel D, Tranmer M, Wrigley N. (1996). “Aggregation and ecological effects in geographically based data.” “Geographical Analysis” 28:244{261
  10. 1 2 Swift, A., Liu, L., and Uber, J. (2008) "Reducing MAUP bias of correlation statistics between water quality and GI illness." Computers, Environment and Urban Systems 32, 134–148
  11. Larsen, J. (2000). "The Modifiable Areal Unit Problem: A problem or a source of spatial information?" PhD thesis, Ohio State University.
  12. Reynolds, H. (1998). "The Modifiable Area Unit Problem: Empirical Analysis By Statistical Simulation." PhD thesis, Department of Geography University of Toronto, http://www.badpets.net/Thesis
  13. Swift, A. (2017). "Crime mapping data simulation", https://app.box.com/s/a84w16x7hffljjvkhtlr72eisj4qiene
  14. 1 2 3 Viegas, José Manuel; Martinez, L. Miguel; Silva, Elisabete A. (January 2009). "Effects of the Modifiable Areal Unit Problem on the Delineation of Traffic Analysis Zones". Environment and Planning B: Planning and Design. 36 (4): 625–643. doi:10.1068/b34033. S2CID   54840846.

Sources

Further reading