Vector overlay

Last updated

Vector overlay is an operation (or class of operations) in a geographic information system (GIS) for integrating two or more vector spatial data sets. Terms such as polygon overlay, map overlay, and topological overlay are often used synonymously, although they are not identical in the range of operations they include. Overlay has been one of the core elements of spatial analysis in GIS since its early development. Some overlay operations, especially Intersect and Union, are implemented in all GIS software and are used in a wide variety of analytical applications, while others are less common.

Contents

Overlay is based on the fundamental principle of geography known as areal integration, in which different topics (say, climate, topography, and agriculture) can be directly compared based on a common location. It is also based on the mathematics of set theory and point-set topology.

The basic approach of a vector overlay operation is to take in two or more layers composed of vector shapes, and output a layer consisting of new shapes created from the topological relationships discovered between the input shapes. A range of specific operators allows for different types of input, and different choices in what to include in the output.

History

Prior to the advent of GIS, the overlay principle had developed as a method of literally superimposing different thematic maps (typically an isarithmic map or a chorochromatic map) drawn on transparent film (e.g., cellulose acetate) to see the interactions and find locations with specific combinations of characteristics. [1] The technique was largely developed by landscape architects. Warren Manning appears to have used this approach to compare aspects of Billerica, Massachusetts, although his published accounts only reproduce the maps without explaining the technique. [2] Jacqueline Tyrwhitt published instructions for the technique in an English textbook in 1950, including: [3]

As far as possible maps should be drawn on transparent paper, so that when completed the maps to the same scale can be ‘sieved’—i.e., placed one on top of another in turn so that correlations or their absence can be noted. (p.157)

Ian McHarg was perhaps most responsible for widely publicizing this approach to planning in Design with Nature (1969), in which he gave several examples of projects on which he had consulted, such as transportation planning and land conservation. [4]

The first true GIS, the Canada Geographic Information System (CGIS), developed during the 1960s and completed in 1971, was based on a rudimentary vector data model, and one of the earliest functions was polygon overlay. [5] Another early vector GIS, the Polygon Information Overlay System (PIOS), developed by ESRI for San Diego County, California in 1971, also supported polygon overlay. [6] It used the Point in polygon algorithm to find intersections quickly. Unfortunately, the results of overlay in these early systems was often prone to error. [7]

Carl Steinitz, a landscape architect, helped found the Harvard Laboratory for Computer Graphics and Spatial Analysis, in part to develop GIS as a digital tool to implement McHarg's methods. In 1975, Thomas Peucker and Nicholas Chrisman of the Harvard Lab introduced the POLYVRT data model, one of the first to explicitly represent topological relationships and attributes in vector data. [8] They envisioned a system that could handle multiple "polygon networks" (layers) that overlapped by computing Least Common Geographic Units (LCGU), the area where a pair of polygons overlapped, with attributes inherited from the original polygons. Chrisman and James Dougenik implemented this strategy in the WHIRLPOOL program, released in 1979 as part of the Odyssey project to develop a general-purpose GIS. [9] This system implemented several improvements over the earlier approaches in CGIS and PIOS, and its algorithm became part of the core of GIS software for decades to come.

Algorithm

Illustration of the steps in computing a polygon overlay in a geographic information system Polygon overlay algorithm.png
Illustration of the steps in computing a polygon overlay in a geographic information system

The goal of all overlay operations is to take in vector layers, and create a layer that integrates both the geometry and the attributes of the inputs. [10] Usually, both inputs are polygon layers, but lines and points are allowed in many operations, with simpler processing.

Since the original implementation, the basic strategy of the polygon overlay algorithm has remained the same, although the vector data structures that are used have evolved. [11]

  1. Given the two input polygon layers, extract the boundary lines.
  2. Cracking part A: In each layer, identify edges shared between polygons. Break each line at the junction of shared edges and remove duplicates to create a set of topologically planar connected lines. In early topological data structures such as POLYVRT and the ARC/INFO coverage, the data was natively stored this way, so this step was unnecessary.
  3. Cracking part B: Find any intersections between lines from the two inputs. At each intersection, split both lines. Then merge the two line layers into a single set of topologically planar connected lines.
  4. Assembling part A: Find each minimal closed ring of lines, and use it to create a polygon. Each of these will be a least common geographic unit (LCGU), with at most one "parent" polygon from each of the two inputs.
  5. Assembling part B: Create an attribute table that includes the columns from both inputs. For each LCGU, determine its parent polygon from each input layer, and copy its attributes into the LCGU's row the new table; if was not in any of the polygons for one of the input layers, leave the values as null.

Parameters are usually available to allow the user to calibrate the algorithm for a particular situation. One of the earliest was the snapping or fuzzy tolerance, a threshold distance. Any pair of lines that stay within this distance of each other are collapsed into a single line, avoiding unwanted narrow sliver polygons that can occur when lines that should be coincident (for example, a river and a boundary that should follow it de jure) are digitized separately with slightly different vertices. [12]

Operators

A visualization of the polygon overlay operations available in most GIS software Vector Overlay Operators.png
A visualization of the polygon overlay operations available in most GIS software

The basic algorithm can be modified in a number of ways to return different forms of integration between the two input layers. These different overlay operators are used to answer a variety of questions, although some are far more commonly implemented and used than others. The most common are closely analogous to operators in set theory and boolean logic, and have adopted their terms. As in these algebraic systems, the overlay operators may be commutative (giving the same result regardless of order) and/or associative (more than two inputs giving the same result regardless of the order in which they are paired).

Boolean overlay algebra

One of the most common uses of polygon overlay is to perform a suitability analysis, also known as a suitability model or multi-criteria evaluation. The task is to find the region that meets a set of criteria, each of which can be represented by a region. For example, the habitat of a species of wildlife might need to be A) within certain vegetation cover types, B) within a threshold distance of a water source (computed using a buffer), and C) not within a threshold distance of significant roads. Each of the criteria can be considered boolean in the sense of Boolean logic, because for any point in space, each criterion is either present or not present, and the point is either in the final habitat area or it is not (acknowledging that the criteria may be vague, but this requires more complex fuzzy suitability analysis methods). That is, which vegetation polygon the point is in is not important, only whether it is suitable or not suitable. This means that the criteria can be expressed as a Boolean logic expression, in this case, H = A and B and not C.

In a task such as this, the overlay procedure can be simplified because the individual polygons within each layer are not important, and can be dissolved into a single boolean region (consisting of one or more disjoint polygons but no adjacent polygons) representing the region that meets the criterion. With these inputs, each of the operators of Boolean logic corresponds exactly to one of the polygon overlay operators: intersect = AND, union = OR, subtract = AND NOT, exclusive or = XOR. Thus, the above habitat region would be generated by computing the intersection of A and B, and subtracting C from the result.

Thus, this particular use of polygon overlay can be treated as an algebra that is homomorphic to Boolean logic. This enables the use of GIS to solve many spatial tasks that can be reduced to simple logic.

Lines and points

Vector overlay is most commonly performed using two polygon layers as input and creating a third polygon layer. However, it is possible to perform the same algorithm (parts of it at least) on points and lines. [13] The following operations are typically supported in GIS software:

Implementations

Vector Overlay is included in some form in virtually every GIS software package that supports vector analysis, although the interface and underlying algorithms vary significantly.

Related Research Articles

<span class="mw-page-title-main">Geographic information system</span> System to capture, manage and present geographic data

A geographic information system (GIS) is a type of database containing geographic data, combined with software tools for managing, analyzing, and visualizing those data. In a broader sense, one may consider such a system to also include human users and support staff, procedures and workflows, body of knowledge of relevant concepts and methods, and institutional organizations.

A GIS file format is a standard of encoding geographical information into a computer file. They are created mainly by government mapping agencies or by GIS software developers.

<span class="mw-page-title-main">GRASS GIS</span>

Geographic Resources Analysis Support System is a geographic information system (GIS) software suite used for geospatial data management and analysis, image processing, producing graphics and maps, spatial and temporal modeling, and visualizing. It can handle raster, topological vector, image processing, and graphic data.

Transport network analysis Spatial analysis tools for geographic networks

A transport network, or transportation network is a network or graph in geographic space, describing an infrastructure that permits and constrains movement or flow. Examples include but are not limited to road networks, railways, air routes, pipelines, aqueducts, and power lines. The digital representation of these networks, and the methods for their analysis, is a core part of spatial analysis, geographic information systems, public utilities, and transport engineering. Network analysis is an application of the theories and algorithms of Graph theory and is a form of proximity analysis.

A GIS software program is a computer program to support the use of a geographic information system, providing the ability to create, store, manage, query, analyze, and visualize geographic data, that is, data representing phenomena for which location is important. The GIS software industry encompasses a broad range of commercial and open-source products that provide some or all of these capabilities within various information technology architectures.

A Web Map Service (WMS) is a standard protocol developed by the Open Geospatial Consortium in 1999 for serving georeferenced map images over the Internet. These images are typically produced by a map server from data provided by a GIS database.

Spatial network analysis software packages are analytic software used to prepare graph-based analysis of spatial networks. They stem from research fields in transportation, architecture, and urban planning. The earliest examples of such software include the work of Garrison (1962), Kansky (1963), Levin (1964), Harary (1969), Rittel (1967), Tabor (1970) and others in the 1960s and 70s. Specific packages address to suit their domain-specific needs, including TransCAD for transportation, GIS for planning and geography, and Axman for Space syntax researchers.

<span class="mw-page-title-main">Shapefile</span> Geospatial vector data format

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

<span class="mw-page-title-main">ArcGIS</span> Geographic information system maintained by Esri

ArcGIS is a family of client software, server software, and online geographic information system (GIS) services developed and maintained by Esri. ArcGIS was first released in 1999 and originally was released as ARC/INFO, a command line based GIS system for manipulating data. ARC/INFO was later merged into ArcGIS Desktop, which was eventually superseded by ArcGIS Pro in 2015. ArcGIS Pro works in 2D and 3D for cartography and visualization, and includes Artificial Intelligence (AI).

gvSIG Desktop application for working with geographic data

gvSIG, geographic information system (GIS), is a desktop application designed for capturing, storing, handling, analyzing and deploying any kind of referenced geographic information in order to solve complex management and planning problems. gvSIG is known for having a user-friendly interface, being able to access the most common formats, both vector and raster ones. It features a wide range of tools for working with geographic-like information.

QGIS Open source desktop GIS software

QGIS is a free and open-source cross-platform desktop geographic information system (GIS) application that supports viewing, editing, printing, and analysis of geospatial data.

In cartography, rubbersheeting refers to the process by which a layer is distorted to allow it to be seamlessly joined to an adjacent geographic layer of matching imagery, such as satellite imagery which are digital maps. This is sometimes referred to as image-to-vector conflation. Often this has to be done when layers created from adjacent map sheets are joined together. Rubber-sheeting is necessary because the imagery and the vector data will rarely match up correctly due to various reasons, such as the angle at which the image was taken, the curvature of the surface of the earth, minor movements in the imaging platform, and other errors in the imagery.

Map algebra is an algebra for manipulating geographic data, primarily fields. Developed by Dr. Dana Tomlin and others in the late 1970s, it is a set of primitive operations in a geographic information system (GIS) which allows one or more raster layers ("maps") of similar dimensions to produce a new raster layer (map) using mathematical or other operations such as addition, subtraction etc.

In geographic information systems (GIS) and spatial analysis, buffer analysis is the determination of a zone around a geographic feature containing locations that are within a specified distance of that feature, the buffer zone. A buffer is likely the most commonly used tool within the proximity analysis methods.

A geographic data model, geospatial data model, or simply data model in the context of geographic information systems, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such data models represent various aspects of these phenomena by means of geographic data, including spatial locations, attributes, change over time, and identity. For example, the vector data model represents geography as collections of points, lines, and polygons, and the raster data model represent geography as cell matrices that store numeric values. Data models are implemented throughout the GIS ecosystem, including the software tools for data management and spatial analysis, data stored in a variety of GIS file formats, specifications and standards, and specific designs for GIS installations.

<span class="mw-page-title-main">DE-9IM</span>

The Dimensionally Extended 9-Intersection Model (DE-9IM) is a topological model and a standard used to describe the spatial relations of two regions, in geometry, point-set topology, geospatial topology, and fields related to computer spatial analysis. The spatial relations expressed by the model are invariant to rotation, translation and scaling transformations.

Geospatial topology Type of spatial relationship

Geospatial topology is the study and application of qualitative spatial relationships between geographic features, or between representations of such features in geographic information, such as in geographic information systems (GIS). For example, the fact that two regions overlap or that one contains the other are examples of topological relationships. It is thus the application of the mathematics of topology to GIS, and is distinct from, but complimentary to the many aspects of geographic information that are based on quantitative spatial measurements through coordinate geometry. Topology appears in many aspects of geographic information science and GIS practice, including the discovery of inherent relationships through spatial query, vector overlay and map algebra; the enforcement of expected relationships as validation rules stored in geospatial data; and the use of stored topological relationships in applications such as network analysis. Spatial topology is the generalization of geospatial topology for non-geographic domains, e.g., CAD software.

Vector tiles, tiled vectors or vectiles are packets of geographic data, packaged into pre-defined roughly-square shaped "tiles" for transfer over the web. This is an emerging method for delivering styled web maps, combining certain benefits of pre-rendered raster map tiles with vector map data. As with the widely used raster tiled web maps, map data is requested by a client as a set of "tiles" corresponding to square areas of land of a pre-defined size and location. Unlike raster tiled web maps, however, the server returns vector map data, which has been clipped to the boundaries of each tile, instead of a pre-rendered map image.

Sliver polygon Type of error in vector GIS

A Sliver Polygon, in the context of Geographic Information Systems (GIS), is a small polygon found in vector data that is an artifact of error rather than representing a real-world feature. They have been a recognized source of error since overlay was first invented in the 1970s.

The Harvard Laboratory for Computer Graphics and Spatial Analysis pioneered early cartographic and architectural computer applications that led to integrated geographic information systems (GIS). Some of the Laboratory's influential programs included SYMAP, SYMVU, GRID, CALFORM, and POLYVRT. The Laboratory's Odyssey project created a geographic information system that served as a milestone in the development of integrated mapping systems. The Laboratory influenced numerous computer graphic, mapping and architectural systems such as Intergraph, Computervision, and Esri.

References

  1. Steinitz, Carl; Parker, Paul; Jordan, Lawrie (1976). "Hand-Drawn Overlays: Their History and Prospective Uses". Landcape Architecture. 66 (5 (September)): 444–455.
  2. Manning, Warren (1913). "The Billerica Town Plan". Landscape Architecture. 3: 108–118.
  3. Tyrwhitt, Jacqueline (1950). "Surveys for Planning". In APRR (ed.). Town and Country Planning Textbook. Architectural Press.
  4. McHarg, Ian (1969). Design with Nature. p. 34. ISBN   0-471-11460-X.
  5. Tomlinson, Roger (1968). "A Geographic Information System for Regional Planning". In Stewart, G.A. (ed.). Land Evaluation: Papers of a CSIRO Symposium. Macmillan of Australia. pp. 200–210.
  6. Tomlinson, Roger F.; Calkins, Hugh W.; Marble, Duane F. (1976). Computer handling of geographical data. UNESCO Press.
  7. Goodchild, Michael F. (1978). "Statistical aspects of the polygon overlay problem". Harvard papers on geographic information systems. 6.
  8. Peucker, Thomas K.; Chrisman, Nicholas (1975). "Cartographic Data Structures". The American Cartographer. 2 (1): 55–69. doi:10.1559/152304075784447289.
  9. Dougenik, James (1979). "WHIRLPOOL: A geometric processor for polygon coverage data" (PDF). Proceedings of the International Symposium on Cartography and Computing (Auto-Carto IV). 2: 304–311.
  10. Bolstad, Paul (2008). GIS Fundamentals: A First Text on Geographic Information Systems (3rd ed.). Eider Press. p. 352.
  11. Chrisman, Nicholas R. (2002). Exploring Geographic Information Systems (2nd ed.). Wiley. pp. 125–137.
  12. Lo, C.P.; Yeung, Albert K.W. (2002). Concepts and Techniques of Geographic Information Systems. Prentice Hall. p. 211. ISBN   0-13-080427-4.
  13. Esri. "Intersect (Analysis)". ArcGIS Pro Documentation. Retrieved 29 October 2021.
  14. QGIS. "Line intersections". QGIS 3.16 documentation.
  15. Morehouse, Scott (1985). "ARC/INFO: A geo-relational model for spatial information" (PDF). Proceedings of the International Symposium on Cartography and Computing (Auto-Carto VII): 388.
  16. Westervelt, James (2004). "GRASS Roots" (PDF). Proceedings of the FOSS/GRASS Users Conference. Retrieved 26 October 2021.