A GIS file format is a standard for encoding geographical information into a computer file, as a specialized type of file format for use in geographic information systems (GIS) and other geospatial applications. Since the 1970s, dozens of formats have been created based on various data models for various purposes. They have been created by government mapping agencies (such as the USGS or National Geospatial-Intelligence Agency), GIS software vendors, standards bodies such as the Open Geospatial Consortium, informal user communities, and even individual developers.
The first GIS installations of the 1960s, such as the Canada Geographic Information System were based on bespoke software and stored data in bespoke file structures designed for the needs of the particular project. As more of these appeared, they could be compared to find best practices and common structures. [1] When general-purpose GIS software was developed in the 1970s and early 1980s, including programs from academic labs such as the Harvard Laboratory for Computer Graphics and Spatial Analysis, government agencies (e.g., the Map Overlay and Statistical System (MOSS) developed by the U.S. Fish & Wildlife Service and Bureau of Land Management), and new GIS software companies such as Esri and Intergraph, each program was built around its own proprietary (and often secret) file format. [2] Since each GIS installation was effectively isolated from all others, interchange between them was not a major consideration.
By the early 1990s, the proliferation of GIS worldwide and an increasing need for sharing data, soon accelerated by the emergence of the World Wide Web and spatial data infrastructures, led to the need for interoperable data and standard formats. An early attempt at standardization was the U.S. Spatial Data Transfer Standard, released in 1994 and designed to encode the wide variety of federal government data. [3] Although this particular format failed to garner widespread support, it led to other standardization efforts, especially the Open Geospatial Consortium (OGC), which has developed or adopted several vendor-neutral standards, some of which have been adopted by the International Standards Organization (ISO). [4]
Another development in the 1990s was the public release of proprietary file formats by GIS software vendors, enabling them to be used by other software. The most notable example of this was the publication of the Esri Shapefile format, [5] which by the late 1990s had become the most popular de facto standard for data sharing by the entire geospatial industry. [6] When proprietary formats were not shared (for example, the ESRI ARC/INFO coverage), software developers frequently reverse-engineered them to enable import and export in other software, further facilitating data exchange. One result of this was the emergence of free and open-source software libraries, such as the Geospatial Data Abstraction Library (GDAL), which have greatly facilitated the integration of spatial data in any format into a variety of software. [7]
During the 2000s, the need for specialized spatial files was reduced somewhat by the emergence of spatial databases, which incorporated spatial data into general-purpose relational databases. However, new file formats have continued to appear, especially with the proliferation of web mapping; formats such as the Keyhole Markup Language (KML) and GeoJSON can be more easily integrated into web development languages than traditional GIS files.
Over a hundred distinct formats have been created for the storage of spatial data, of which 20-30 are currently in common usage for different purposes. These can be distinguished in a number of ways:
Like any digital image, raster GIS data is based on a regular tessellation of space into a rectangular grid of rows and columns of cells (also known as pixels), with each cell having a measured value stored. The major difference from a photograph is that the grid is registered to geographic space rather than a field of view. The resolution of the raster data set is its cell width in ground units.
Because a grid is a sample of a continuous space, raster data is most commonly used to represent geographic fields, in which a property varies continuously or discretely over space. Common examples include remote sensing imagery, terrain/elevation, population density, weather and climate, soil properties, and many others. Raster data can be images with each pixel (or cell) containing a color value. The value recorded for each cell may be of any level of measurement, including a discrete qualitative value, such as land use type, or a continuous quantitative value, such as temperature, or a null value if no data is available. While a raster cell stores a single value, it can be extended by using raster bands to represent RGB (red, green, blue) colors, colormaps (a mapping between a thematic code and RGB value), or an extended attribute table with one row for each unique cell value. It can also be used to represent discrete Geographic features, but usually only in exigent circumstances.
Raster data is stored in various formats; from a standard file-based structure of TIFF, JPEG, etc. to binary large object (BLOB) data stored directly in a relational database management system (RDBMS) similar to other vector-based feature classes. Database storage, when properly indexed, typically allows for quicker retrieval of the raster data but can require storage of millions of significantly sized records.
A vector dataset (sometimes called a feature dataset) stores information about discrete objects, using an encoding of the vector logical data model to represent the location or geometry of each object, and an encoding of its other properties that is usually based on relational database technology. Typically, a single dataset collects information about a set of closely related or similar objects, such as all of the roads in a city.
The Vector data model uses coordinate geometry to represent each shape as one of several geometric primitives, most commonly points (a single coordinate of zero dimension), lines (a one-dimensional ordered list of coordinates connected by straight lines), and polygons (a self-closing boundary line enclosing a two-dimensional region). Many data structures have been developed to encode these primitives as digital data, but most modern vector file formats are based on the Open Geospatial Consortium (OGC) Simple Features specification, often directly incorporating its Well-known text (WKT) or Well-known binary (WKB) encodings.
In addition to the geometry of each object, a vector dataset must also be able to store its attributes. For example, a database that describes lakes may contain each lake's depth, water quality, and pollution level. Since the 1970s, almost all vector file formats have adopted the relational database model, either in principle or directly incorporating RDBMS software. Thus, the entire dataset is stored in a table, with each row representing a single object that contains columns for each attribute. [12] : 256
Two strategies have been used to integrate the geometry and attributes into a single vector file format structure: [13]
Geospatial topology is often an important part of vector data, representing the inherent spatial relationships (especially adjacency) between objects. Topology has been managed in vector file formats in four ways. In a topological data structure, most notably Harvard's POLYVRT and its successor the ARC/INFO coverage, topological connections between points, lines, and polygons are an inherent part of the encoding of those features. [8] : 46–49 Conversely, non-topological or spaghetti data (such as the Esri Shapefile and most spatial databases) includes no topology information, with each geometry being completely independent of all others. A topology dataset (often used in network analysis) augments spaghetti data with a separate file encoding the topological connections. [12] : 218 A topology rulebase is a list of desired topology rules used to enforce spatial integrity in spaghetti data, such as "county polygons must not overlap" and "state polygons must share boundaries with county polygons." [13]
Vector datasets usually represent discrete geographical features, such as buildings, trees, and counties. However, they may also be used to represent geographical fields by storing locations where the spatially continuous field has been sampled. Sample points (e.g., weather stations and sensor networks), Contour lines and triangulated irregular networks (TIN) are used to represent elevation or other values that change continuously over space. TINs record values at point locations, which are connected by lines to form an irregular mesh of triangles. The face of the triangles represent the terrain surface.
Formats commonly in current usage:
Historical formats seldom used today:
There are some important advantages and disadvantages to using a raster or vector data model to represent reality:
Modern object–relational databases can now store a variety of complex data using the binary large object datatype, including both raster grids and vector geometries. This enables some spatial database systems to store data of both models in the same database.
A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database, however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.
Vector graphics are a form of computer graphics in which visual images are created directly from geometric shapes defined on a Cartesian plane, such as points, lines, curves and polygons. The associated mechanisms may include vector display and printing hardware, vector data models and file formats, as well as the software based on these data models. Vector graphics is an alternative to raster or bitmap graphics, with each having advantages and disadvantages in specific situations.
PostGIS is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium (OGC).
Environmental Systems Research Institute, Inc., doing business as Esri, is an American multinational geographic information system (GIS) software company headquartered in Redlands, California. It is best known for its ArcGIS products. With a 40% market share, Esri is the world's leading supplier of GIS software, web GIS and geodatabase management applications.
A coverage is the digital representation of some spatio-temporal phenomenon. ISO 19123 provides the definition:
TerraLib is an open-source geographic information system (GIS) software library. It extends object-relational database management systems (DBMS) to handle spatiotemporal data types.
A GIS software program is a computer program to support the use of a geographic information system, providing the ability to create, store, manage, query, analyze, and visualize geographic data, that is, data representing phenomena for which location is important. The GIS software industry encompasses a broad range of commercial and open-source products that provide some or all of these capabilities within various information technology architectures.
ArcSDE is a server-software sub-system that aims to enable the usage of Relational Database Management Systems for spatial data. The spatial data may then be used as part of a geodatabase.
The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.
ArcGIS is a family of client, server and online geographic information system (GIS) software developed and maintained by Esri.
A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.
CityGML is an open standardised data model and exchange format to store digital 3D models of cities and landscapes. It defines ways to describe most of the common 3D features and objects found in cities and the relationships between them. It also defines different standard levels of detail (LoDs) for the 3D objects, which allows the representation of objects for different applications and purposes, such as simulations, urban data mining, facility management, and thematic inquiries.
A georelational data model is a geographic data model that represents geographic features as an interrelated set of spatial and attribute data. The georelational model was the dominant form of vector file format during the 1980s and 1990s, including the Esri coverage and Shapefile.
A geographic data model, geospatial data model, or simply data model in the context of geographic information systems, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such data models represent various aspects of these phenomena by means of geographic data, including spatial locations, attributes, change over time, and identity. For example, the vector data model represents geography as collections of points, lines, and polygons, and the raster data model represent geography as cell matrices that store numeric values. Data models are implemented throughout the GIS ecosystem, including the software tools for data management and spatial analysis, data stored in a variety of GIS file formats, specifications and standards, and specific designs for GIS installations.
The Spatial Data File (SDF) is a single-user geodatabase file format developed by Autodesk. The file format is the native spatial data storage format for Autodesk GIS programs MapGuide and AutoCAD Map 3D. As of 2014 SDF format version SDF3 uses a single file. Prior versions of the format required a spatial index file (SIF), with an optional key index file (KIF) to speed access to the file.
The following tables compare general and technical information for a number of GIS vector file format. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.
Geospatial PDF is a set of geospatial extensions to the Portable Document Format (PDF) 1.7 specification to include information that relates a region in the document page to a region in physical space — called georeferencing. A geospatial PDF can contain geometry such as points, lines, and polygons. These, for example, could represent building locations, road networks and city boundaries, respectively. The georeferencing metadata for geospatial PDF is most commonly encoded in one of two ways: the OGC best practice; and as Adobe's proposed geospatial extensions to ISO 32000. The specifications also allow geometry to have attributes, such as a name or identifying type.
SpatiaLite is a spatial extension to SQLite, providing vector geodatabase functionality. It is similar to PostGIS, Oracle Spatial, and SQL Server with spatial extensions, although SQLite/SpatiaLite aren't based on client-server architecture: they adopt a simpler personal architecture. i.e. the whole SQL engine is directly embedded within the application itself: a complete database simply is an ordinary file which can be freely copied and transferred from one computer/OS to a different one without any special precaution.
Geospatial topology is the study and application of qualitative spatial relationships between geographic features, or between representations of such features in geographic information, such as in geographic information systems (GIS). For example, the fact that two regions overlap or that one contains the other are examples of topological relationships. It is thus the application of the mathematics of topology to GIS, and is distinct from, but complementary to the many aspects of geographic information that are based on quantitative spatial measurements through coordinate geometry. Topology appears in many aspects of geographic information science and GIS practice, including the discovery of inherent relationships through spatial query, vector overlay and map algebra; the enforcement of expected relationships as validation rules stored in geospatial data; and the use of stored topological relationships in applications such as network analysis. Spatial topology is the generalization of geospatial topology for non-geographic domains, e.g., CAD software.
GeoPackage (GPKG) is an open, non-proprietary, platform-independent and standards-based data format for geographic information systems built as a set of conventions over a SQLite database. Defined by the Open Geospatial Consortium (OGC) with the backing of the US military and published in 2014, GeoPackage has seen widespread support from various government, commercial, and open source organizations.