A boundary problem in analysis is a phenomenon in which geographical patterns are differentiated by the shape and arrangement of boundaries that are drawn for administrative or measurement purposes. The boundary problem occurs because of the loss of neighbors in analyses that depend on the values of the neighbors. While geographic phenomena are measured and analyzed within a specific unit, identical spatial data can appear either dispersed or clustered depending on the boundary placed around the data. In analysis with point data, dispersion is evaluated as dependent of the boundary. In analysis with areal data, statistics should be interpreted based upon the boundary.
In spatial analysis, four major problems interfere with an accurate estimation of the statistical parameter: the boundary problem, scale problem, pattern problem (or spatial autocorrelation), and modifiable areal unit problem. [1] The boundary problem occurs because of the loss of neighbours in analyses that depend on the values of the neighbours. While geographic phenomena are measured and analyzed within a specific unit, identical spatial data can appear either dispersed or clustered depending on the boundary placed around the data. In analysis with point data, dispersion is evaluated as dependent of the boundary. In analysis with area data, statistics should be interpreted based upon the boundary.
In geographical research, two types of areas are taken into consideration in relation to the boundary: an area surrounded by fixed natural boundaries (e.g., coastlines or streams), outside of which neighbours do not exist, [2] or an area included in a larger region defined by arbitrary artificial boundaries (e.g., an air pollution boundary in modeling studies or an urban boundary in population migration). [3] In an area isolated by the natural boundaries, the spatial process discontinues at the boundaries. In contrast, if a study area is delineated by the artificial boundaries, the process continues beyond the area.
If a spatial process in an area occurs beyond a study area or has an interaction with neighbours outside artificial boundaries, the most common approach is to neglect the influence of the boundaries and assume that the process occurs at the internal area. However, such an approach leads to a significant model misspecification problem. [4]
That is, for measurement or administrative purposes, geographic boundaries are drawn, but the boundaries per se can bring about different spatial patterns in geographic phenomena. [5] It has been reported that the difference in the way of drawing the boundary significantly affects identification of the spatial distribution and estimation of the statistical parameters of the spatial process. [6] [7] [8] [9] The difference is largely based on the fact that spatial processes are generally unbounded or fuzzy-bounded, [10] but the processes are expressed in data imposed within boundaries for analysis purposes. [11] Although the boundary problem was discussed in relation to artificial and arbitrary boundaries, the effect of the boundaries also occurs according to natural boundaries as long as it is ignored that properties at sites on the natural boundary such as streams are likely to differ from those at sites within the boundary. [12]
The boundary problem occurs with regard not only to horizontal boundaries but also to vertically drawn boundaries according to delineations of heights or depths (Pineda 1993). For example, biodiversity such as the density of species of plants and animals is high near the surface, so if the identically divided height or depth is used as a spatial unit, it is more likely to find fewer number of the plant and animal species as the height or depth increases.
By drawing a boundary around a study area, two types of problems in measurement and analysis takes place. [7] The first is an edge effect. [13] This effect originates from the ignorance of interdependences that occur outside the bounded region. [13] Griffith [14] [8] and Griffith and Amrhein [15] highlighted problems according to the edge effect. A typical example is a cross-boundary influence such as cross-border jobs, services and other resources located in a neighbouring municipality. [16]
The second is a shape effect that results from the artificial shape delineated by the boundary. As an illustration of the effect of the artificial shape, point pattern analysis tends to provide higher levels of clustering for the identical point pattern within a unit that is more elongated. [7] Similarly, the shape can influence interaction and flow among spatial entities. [17] [18] [19] For example, the shape can affect the measurement of origin-destination flows since these are often recorded when they cross an artificial boundary. Because of the effect set by the boundary, the shape and area information is used to estimate travel distances from surveys, [20] or to locate traffic counters, travel survey stations, or traffic monitoring systems. [21] From the same perspective, Theobald (2001; retrieved from [5] ) argued that measures of urban sprawl should consider interdependences and interactions with nearby rural areas.
In spatial analysis, the boundary problem has been discussed along with the modifiable areal unit problem (MAUP) inasmuch as MAUP is associated with the arbitrary geographic unit and the unit is defined by the boundary. [22] For administrative purposes, data for policy indicators are usually aggregated within larger units (or enumeration units) such as census tracts, school districts, municipalities and counties. [23] [24] The artificial units serve the purposes of taxation and service provision. For example, municipalities can effectively respond to the need of the public in their jurisdictions. However, in such spatially aggregated units, spatial variations of detailed social variables cannot be identified. The problem is noted when the average degree of a variable and its unequal distribution over space are measured. [5]
Several strategies for resolving geographic boundary problems in measurement and analysis have been proposed. [25] [26] To identify the effectiveness of the strategies, Griffith reviewed traditional techniques that were developed to mitigate the edge effects: [8] ignoring the effects, undertaking a torus mapping, construction of an empirical buffer zone, construction of an artificial buffer zone, extrapolation into a buffer zone, utilizing a correction factor, etc. The first method (i.e., the ignorance of the edge effects), assumes an infinite surface in which the edge effects do not occur. In fact, this approach has been used by traditional geographical theories (e.g., central place theory). Its main shortcoming is that empirical phenomena occur within a finite area, so an infinite and homogeneous surface is unrealistic. [15] The remaining five approaches are similar in that they attempted to produce unbiased parameter estimation, that is, to provide a medium by which the edge effects are removed. [8] (He called these operational solutions as opposed to statistical solutions to be discussed below.) Specifically, the techniques aim at a collection of data beyond the boundary of the study area and fit a larger model, that is, mapping over the area or over-bounding the study area. [27] [26] Through simulation analysis, however, Griffith and Amrhein identified the inadequacy of such an overbounding technique. [15] Moreover, this technique can bring about issues related to large-area statistics, that is, ecological fallacy. By expanding the boundary of the study area, micro-scale variations within the boundary can be ignored.
As alternatives to operational solutions, Griffith examined three correction techniques (i.e., statistical solutions) in removing boundary-induced bias from inference. [8] They are (1) based on generalized least squares theory, (2) using dummy variables and a regression structure (as a way of creating a buffer zone), and (3) regarding the boundary problem as a missing values problem. However, these techniques require rather strict assumptions about the process of interest. [28] For example, the solution according to the generalized least squares theory utilizes time-series modeling that needs an arbitrary transformation matrix to fit the multidirectional dependencies and multiple boundary units found in geographical data. [14] Martin also argued that some of the underlying assumptions of the statistical techniques are unrealistic or unreasonably strict. [29] Moreover, Griffith (1985) himself also identified the inferiority of the techniques through simulation analysis. [30]
As particularly applicable using GIS technologies, [31] [32] a possible solution for addressing both edge and shape effects is to an re-estimation of the spatial or process under repeated random realizations of the boundary. This solution provides an experimental distribution that can be subjected to statistical tests. [7] As such, this strategy examines the sensitivity in the estimation result according to changes in the boundary assumptions. With GIS tools, boundaries can be systematically manipulated. The tools then conduct the measurement and analysis of the spatial process in such differentiated boundaries. Accordingly, such a sensitivity analysis allows the evaluation of the reliability and robustness of place-based measures that defined within artificial boundaries. [33] In the meantime, the changes in the boundary assumptions refer not only to altering or tilting the angles of the boundary, but also differentiating between the boundary and interior areas in examination and considering a possibility that isolated data collection points close to the boundary may show large variances.
A choropleth map is a type of statistical thematic map that uses pseudocolor, meaning color corresponding with an aggregate summary of a geographic characteristic within spatial enumeration units, such as population density or per-capita income.
Robert M. Haralick is Distinguished Professor in Computer Science at Graduate Center of the City University of New York (CUNY). Haralick is one of the leading figures in computer vision, pattern recognition, and image analysis. He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and a Fellow and past president of the International Association for Pattern Recognition. Prof. Haralick is the King-Sun Fu Prize winner of 2016, "for contributions in image analysis, including remote sensing, texture analysis, mathematical morphology, consistent labeling, and system performance evaluation".
Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.
A dasymetric map is a type of thematic map that uses areal symbols to visualize a geographic field by refining a choropleth map with ancillary information about the distribution of the variable. The name refers to the fact that the most common variable mapped using this technique has generally been population density. The dasymetric map is a hybrid product combining the strengths and weaknesses of choropleth and isarithmic maps.
The modifiable areal unit problem (MAUP) is a source of statistical bias that can significantly impact the results of statistical hypothesis tests. MAUP affects results when point-based measures of spatial phenomena are aggregated into spatial partitions or areal units as in, for example, population density or illness rates. The resulting summary values are influenced by both the shape and scale of the aggregation unit.
Spatial epidemiology is a subfield of epidemiology focused on the study of the spatial distribution of health outcomes; it is closely related to health geography.
A thematic map is a type of map that portrays the geographic pattern of a particular subject matter (theme) in a geographic area. This usually involves the use of map symbols to visualize selected properties of geographic features that are not naturally visible, such as temperature, language, or population. In this, they contrast with general reference maps, which focus on the location of a diverse set of physical features, such as rivers, roads, and buildings. Alternative names have been suggested for this class, such as special-subject or special-purpose maps, statistical maps, or distribution maps, but these have generally fallen out of common usage. Thematic mapping is closely allied with the field of Geovisualization.
In the context of spatial analysis, geographic information systems, and geographic information science, a field is a property that fills space, and varies over space, such as temperature or density. This use of the term has been adopted from physics and mathematics, due to their similarity to physical fields (vector or scalar) such as the electromagnetic field or gravitational field. Synonymous terms include spatially dependent variable (geostatistics), statistical surface ( thematic mapping), and intensive property (physics and chemistry) and crossbreeding between these disciplines is common. The simplest formal model for a field is the function, which yields a single value given a point in space (i.e., t = f(x, y, z) )
Indicators of spatial association are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. For instance, if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.
Spatial statistics is a field of applied statistics dealing with spatial data. It involves stochastic processes, sampling, smoothing and interpolation, regional and lattice (gridded) data, point patterns, as well as image analysis and stereology.
Statistical geography is the study and practice of collecting, analysing and presenting data that has a geographic or areal dimension, such as census or demographics data. It uses techniques from spatial analysis, but also encompasses geographical activities such as the defining and naming of geographical regions for statistical purposes. For example, for the purposes of statistical geography, the Australian Bureau of Statistics uses the Australian Standard Geographical Classification, a hierarchical regionalisation that divides Australia up into states and territories, then statistical divisions, statistical subdivisions, statistical local areas, and finally census collection districts.
In statistics, Moran's I is a measure of spatial autocorrelation developed by Patrick Alfred Pierce Moran. Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space. Spatial autocorrelation is more complex than one-dimensional autocorrelation because spatial correlation is multi-dimensional and multi-directional.
Geographic information systems (GISs) and geographic information science (GIScience) combine computer-mapping capabilities with additional database management and data analysis tools. Commercial GIS systems are very powerful and have touched many applications and industries, including environmental science, urban planning, agricultural applications, and others.
Spatial econometrics is the field where spatial analysis and econometrics intersect. The term “spatial econometrics” was introduced for the first time by the Belgian economist Jean Paelinck in the general address he delivered to the annual meeting of the Dutch Statistical Association in May 1974 . In general, econometrics differs from other branches of statistics in focusing on theoretical models, whose parameters are estimated using regression analysis. Spatial econometrics is a refinement of this, where either the theoretical model involves interactions between different entities, or the data observations are not truly independent. Thus, models incorporating spatial auto-correlation or neighborhood effects can be estimated using spatial econometric methods. Such models are common in regional science, real estate economics, education economics, housing market and many others. Adopting a more general view, in the by-law of the Spatial Econometrics Association, the discipline is defined as the set of “models and theoretical instruments of spatial statistics and spatial data analysis to analyse various economic effects such as externalities, interactions, spatial concentration and many others”. Recent developments tend to include also methods and models from social network econometrics.
Quantitative geography is a subfield and methodological approach to geography that develops, tests, and uses mathematical and statistical methods to analyze and model geographic phenomena and patterns. It aims to explain and predict the distribution and dynamics of human and physical geography through the collection and analysis of quantifiable data. The approach quantitative geographers take is generally in line with the scientific method, where a falsifiable hypothesis is generated, and then tested through observational studies. This has received criticism, and in recent years, quantitative geography has moved to include systematic model creation and understanding the limits of their models. This approach is used to study a wide range of topics, including population demographics, urbanization, environmental patterns, and the spatial distribution of economic activity. The methods of quantitative geography are often contrasted by those employed by qualitative geography, which is more focused on observing and recording characteristics of geographic place. However, there is increasing interest in using combinations of both qualitative and quantitative methods through mixed-methods research to better understand and contextualize geographic phenomena.
In geography, scale is the level at which a geographical phenomenon occurs or is described. This concept is derived from the map scale in cartography. Geographers describe geographical phenomena and differences using different scales. From an epistemological perspective, scale is used to describe how detailed an observation is, while ontologically, scale is inherent in the complex interaction between society and nature.
The second law of geography, according to Waldo Tobler, is "the phenomenon external to a geographic area of interest affects what goes on inside." This is an extension of his first. He first published it in 1999 in reply to a paper titled "Linear pycnophylactic reallocation comment on a paper by D. Martin" and then again in response to criticism of his first law of geography titled "On the First Law of Geography: A Reply." Much of this criticism was centered on the question of if laws were meaningful in geography or any of the social sciences. In this document, Tobler proposed his second law while recognizing others have proposed other concepts to fill the role of 2nd law. Tobler asserted that this phenomenon is common enough to warrant the title of 2nd law of geography. Unlike Tobler's first law of geography, which is relatively well accepted among geographers, there are a few contenders for the title of the second law of geography. Tobler's second law of geography is less well known but still has profound implications for geography and spatial analysis.
Arbia’s law of geography states, "Everything is related to everything else, but things observed at a coarse spatial resolution are more related than things observed at a finer resolution." Originally proposed as the 2nd law of geography, this is one of several laws competing for that title. Because of this, Arbia's law is sometimes referred to as the second law of geography, or Arbia's second law of geography.
The uncertain geographic context problem (UGCoP) is a source of statistical bias that can significantly impact the results of spatial analysis when dealing with aggregate data. The UGCoP is very closely related to the Modifiable areal unit problem (MAUP), and like the MAUP, arises from how we divide the land into areal units. It is caused by the difficulty, or impossibility, of understanding how phenomena under investigation in different enumeration units interact between enumeration units, and outside of a study area over time. It is particularly important to consider the UGCoP within the discipline of time geography, where phenomena under investigation can move between spatial enumeration units during the study period. Examples of research that needs to consider the UGCoP include food access and human mobility.
The Modified Temporal Unit Problem (MTUP) is a source of statistical bias that occurs in time series and spatial analysis when using temporal data that has been aggregated into temporal units. In such cases, choosing a temporal unit can affect the analysis results and lead to inconsistencies or errors in statistical hypothesis testing.