Spatial ETL

Last updated

Spatial extract, transform, load (spatial ETL), also known as geospatial transformation and load (GTL), is a process for managing and manipulating geospatial data, for example map data. It is a type of extract, transform, load (ETL) process, with software tools and libraries specialised for geographical information. [1]

Contents

A common use of spatial ETL is to convert geographical information from a data source into another format that can be more easily used, for example by importing it into GIS software. [1] A tool may translate data directly from one format to another, or via an intermediate format. Intermediate formats are often used when data transformation must be carried out.

Origins and history

Although ETL tools for processing non-spatial data have existed for some time, ETL tools that can manage the unique characteristics of spatial data only emerged in the early 1990s.

Spatial ETL tools emerged in the GIS industry to enable interoperability (or the exchange of information) between the industry's diverse array of mapping applications and associated proprietary formats. However, spatial ETL tools are also becoming increasingly important in the realm of management information systems as a tool to help organizations integrate spatial data with their existing non-spatial databases, and also to leverage their spatial data assets to develop more competitive business strategies.

Traditionally, GIS applications have had the ability to read or import a limited number of spatial data formats, but with few specialist ETL transformation tools; the concept being to import data then carry out step-by-step transformation or analysis within the GIS application itself. Conversely, spatial ETL does not require the user to import or view the data, and generally carries out its tasks in a single predefined process.

With the push to achieve greater interoperability within the GIS industry, many existing GIS applications are now incorporating spatial ETL tools within their products; the ArcGIS Data Interoperability Extension being an example of this. [2]

Transformation

The transformation phase of a spatial ETL process allows a variety of functions; some of these are similar to standard ETL, but some are unique to spatial data. [3] Spatial data commonly consists of a geographic element and related attribute data; therefore spatial ETL transformations are often described as being either geometric transformations – transformation of the geographic element – or attribute transformations – transformations of the related attribute data.

Common geospatial transformations

Additional features

Desirable features of a spatial ETL application are:

Uses

Spatial ETL has a number of distinct uses:

Examples of spatial ETL tools

See also

Related Research Articles

<span class="mw-page-title-main">Geographic information system</span> System to capture, manage, and present geographic data

A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database; however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.

A coverage is the digital representation of some spatio-temporal phenomenon. ISO 19123 provides the definition:

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

A GIS file format is a standard for encoding geographical information into a computer file, as a specialized type of file format for use in geographic information systems (GIS) and other geospatial applications. Since the 1970s, dozens of formats have been created based on various data models for various purposes. They have been created by government mapping agencies, GIS software vendors, standards bodies such as the Open Geospatial Consortium, informal user communities, and even individual developers.

A GIS software program is a computer program to support the use of a geographic information system, providing the ability to create, store, manage, query, analyze, and visualize geographic data, that is, data representing phenomena for which location is important. The GIS software industry encompasses a broad range of commercial and open-source products that provide some or all of these capabilities within various information technology architectures.

<span class="mw-page-title-main">Shapefile</span> Geospatial vector data format

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

JTS Topology Suite is an open-source Java software library that provides an object model for Euclidean planar linear geometry together with a set of fundamental geometric functions. JTS is primarily intended to be used as a core component of vector-based geomatics software such as geographical information systems. It can also be used as a general-purpose library providing algorithms in computational geometry.

Geospatial metadata is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog.

A Spatial Data Infrastructure (SDI), also called geospatial data infrastructure, is a data infrastructure implementing a framework of geographic data, metadata, users and tools that are interactively connected in order to use spatial data in an efficient and flexible way. Another definition is "the technology, policies, standards, human resources, and related activities necessary to acquire, process, distribute, use, maintain, and preserve spatial data". Most commonly, institutions with large repositories of geographic data create SDIs to facilitate the sharing of their data with a broader audience.

ArcMap is the former main component of Esri's ArcGIS suite of geospatial processing programs. Used primarily to view, edit, create, and analyze geospatial data. ArcMap allows the user to explore data within a data set, symbolize features accordingly, and create maps. This is done through two distinct sections of the program, the table of contents and the data frame. In October 2020, it was announced that there are no plans to release 10.9 in 2021, and that ArcMap would no longer be supported after March 1, 2026. Esri is encouraging their users to transition to ArcGIS Pro.

A geoportal is a type of web portal used to find and access geographic information and associated geographic services via the Internet. Geoportals are important for effective use of geographic information systems (GIS) and a key element of a spatial data infrastructure (SDI).

A geographic data model, geospatial data model, or simply data model in the context of geographic information systems, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such data models represent various aspects of these phenomena by means of geographic data, including spatial locations, attributes, change over time, and identity. For example, the vector data model represents geography as collections of points, lines, and polygons, and the raster data model represent geography as cell matrices that store numeric values. Data models are implemented throughout the GIS ecosystem, including the software tools for data management and spatial analysis, data stored in a variety of GIS file formats, specifications and standards, and specific designs for GIS installations.

The Spatial Archive and Interchange Format was defined in the early 1990s as a self-describing, extensible format designed to support interoperability and storage of geospatial data.

Geographic information systems (GIS) play a constantly evolving role in geospatial intelligence (GEOINT) and United States national security. These technologies allow a user to efficiently manage, analyze, and produce geospatial data, to combine GEOINT with other forms of intelligence collection, and to perform highly developed analysis and visual production of geospatial data. Therefore, GIS produces up-to-date and more reliable GEOINT to reduce uncertainty for a decisionmaker. Since GIS programs are Web-enabled, a user can constantly work with a decision maker to solve their GEOINT and national security related problems from anywhere in the world. There are many types of GIS software used in GEOINT and national security, such as Google Earth, ERDAS IMAGINE, GeoNetwork opensource, and Esri ArcGIS.

<span class="mw-page-title-main">Geospatial topology</span> Type of spatial relationship

Geospatial topology is the study and application of qualitative spatial relationships between geographic features, or between representations of such features in geographic information, such as in geographic information systems (GIS). For example, the fact that two regions overlap or that one contains the other are examples of topological relationships. It is thus the application of the mathematics of topology to GIS, and is distinct from, but complementary to the many aspects of geographic information that are based on quantitative spatial measurements through coordinate geometry. Topology appears in many aspects of geographic information science and GIS practice, including the discovery of inherent relationships through spatial query, vector overlay and map algebra; the enforcement of expected relationships as validation rules stored in geospatial data; and the use of stored topological relationships in applications such as network analysis. Spatial topology is the generalization of geospatial topology for non-geographic domains, e.g., CAD software.

<span class="mw-page-title-main">Carto (company)</span> Cloud computing platform

CARTO is a software as a service (SaaS) spatial analysis platform that provides GIS, web mapping, data visualization, spatial analytics, and spatial data science features. The company is positioned as a Location Intelligence platform due to its tools for geospatial data analysis and visualization that do not require advanced GIS or development experience. As a cloud-native platform, CARTO runs natively on cloud data warehouse platforms overcoming any previous limits on data scale for spatial workloads.

<span class="mw-page-title-main">FME (software)</span> Geospatial ETL Software

FME, also known as Feature Manipulation Engine, is a geospatial extract, transformation and load software platform developed and maintained by Safe Software of British Columbia, Canada. FME was first released in 1996, and evolved out of a successful bid by the founders of Safe Software, Don Murray and Dale Lutz, for a Canadian Government contract to monitor logging activities.

<span class="mw-page-title-main">Web GIS</span> Technologies employing the World Wide Web to manage spatial data

Web GIS, or Web Geographic Information Systems, are GIS that employ the World Wide Web to facilitate the storage, visualization, analysis, and distribution of spatial information over the Internet. The World Wide Web, or the Web, is an information system that uses the internet to host, share, and distribute documents, images, and other data. Web GIS involves using the World Wide Web to facilitate GIS tasks traditionally done on a desktop computer, as well as enabling the sharing of maps and spatial data. While Web GIS and Internet GIS are sometimes used interchangeably, they are different concepts. Web GIS is a subset of Internet GIS, which is itself a subset of distributed GIS, which itself is a subset of broader Geographic information system. The most common application of Web GIS is Web mapping, so much so that the two terms are often used interchangeably in much the same way as Digital mapping and GIS. However, Web GIS and web mapping are distinct concepts, with web mapping not necessarily requiring a Web GIS.

A spatial join is an operation in a geographic information system (GIS) or spatial database that combines the attribute tables of two spatial layers based on a desired spatial relation between their geometries. It is similar to the table join operation in relational databases in merging two tables, but each pair of rows is correlated based on some form of matching location rather than a common key value. It is also similar to vector overlay operations common in GIS software such as Intersect and Union in merging two spatial datasets, but the output does not contain a composite geometry, only merged attributes.

A Geodatabase is a proprietary GIS file format developed in the late 1990s by Esri to represent, store, and organize spatial datasets within a geographic information system. A geodatabase is both a logical data model and the physical implementation of that logical model in several proprietary file formats released during the 2000s. The geodatabase design is based on the spatial database model for storing spatial data in relational and object-relational databases. Given the dominance of Esri in the GIS industry, the term "geodatabase" is used by some as a generic trademark for any spatial database, regardless of platform or design.

References

  1. 1 2 3 4 "What is ETL… and How Can it Turn You into a Geospatial Rock Star?". XYHt. 4 June 2020. Archived from the original on 6 November 2022. Retrieved 5 November 2022.
  2. "Spatial ETL tools". Esri . Archived from the original on 11 April 2023. Retrieved 11 April 2023.
  3. Miller, Harvey; Han, Jiawei, eds. (27 May 2009). Geographic Data Mining and Knowledge Discovery. CRC Press. p. 63. ISBN   9781420073980.