Full name | Concise spatial query and representation system |
---|---|
Organisation | CSIRO |
Introduced | 13 May 2002 |
No. issued | resolution dependent, e.g.:
|
No. of digits | resolution dependent, e.g.:
|
Example |
|
Website | csquares www |
C-squares (acronym for the Concise Spatial QUery And REpresentation System) is a system of spatially unique, location-based identifiers (geocodes) for areas on the surface of the earth, represented as cells from a latitude- and longitude-based Discrete Global Grid at a hierarchical set of resolution steps, obtained by progressively subdividing 10×10 degree World Meteorological Organization squares; the term "c-square" is also available for use to designate any component cell of the grid. Individual cell identifiers incorporate literal values of latitude and longitude in an interleaved notation (producing grid resolutions of 10, 1, 0.1 degrees, etc.), together with additional digits that support intermediate grid resolutions of 5, 0.5, 0.05 degrees, etc.
The system was initially designed to represent data "footprints" or spatial extents in a more flexible manner than a standard minimum bounding rectangle, and to support "lightweight", text-based spatial querying; it can also provide a set of identifiers for grid cells used for assembly, storage and analysis of spatially organised data, in a unified notation that transcends national or jurisdictional boundaries. Dataset extents expressed in c-squares notation can be visualised using a web-based utility, the c-squares mapper, an online instance of which is currently provided by CSIRO Oceans and Atmosphere in Australia. C-squares codes and associated published software are free to use and the software is released under version 2 of the GNU General Public License (GPL), a licence of the Free Software Foundation.
The c-squares method was developed by Tony Rees at CSIRO Oceans and Atmosphere in Australia (then "CSIRO Marine Research") in 2001–2, initially as a method for spatial indexing, rapid query, and compact storage and visualization of dataset spatial "footprints" in an agency-specific metadata directory (data catalogue); [1] it was first publicly announced at the 2002 "EOGEO" Technical Workshop held at Ispra, Italy in May 2002. [2] A more complete description was published in the scientific literature in 2003, together with a web-accessible mapping utility entitled the "c-squares mapper" for visualisation of data extents expressed in the c-squares notation. [3] Since that time, a number of projects and international collaborations have employed c-squares to support spatial indexing and/or map production, including FishBase (to map stored data points for any species), the Ocean Biogeographic Information System (OBIS), [4] [5] AquaMaps, [6] data analysis to support the designation of marine biogeographic realms, [7] for multi-national fisheries data collation by the Scientific, Technical and Economic Committee for Fisheries (STECF) of the European Commission, [8] and for data reporting by ICES. [9] [10] For its application in displaying and modelling global biodiversity data, c-squares was one of four components cited in the award of the Ebbe Nielsen Prize to Rees by the Global Biodiversity Information Facility (GBIF) in 2014. [11] The concept of representing dataset "footprints" as cells of spatial data of this nature and alignment was stated to have been inspired by the data addressing method in the U.S. National Oceanographic Data Center (NODC) "World Ocean Database" product, [12] [1] which uses 10 degree World Meteorological Organization squares (the starting point for c-squares hierarchical subdivision) for organising its data content, and the set of 1:100,000 topographic maps issued by the national mapping agency for Australia (coverage and index here); each map covers a 0.5 degree square and, with its associated mapsheet labels, can notionally be used as a unit of spatial identification. [1] The method has been discussed further in texts on georeferencing, including those by Hill, 2006 [13] and Guo et al., 2020. [14] [lower-alpha 1]
The system name "c-squares" was chosen because it can be represented as an acronym (for "concise spatial query and representation system") and also because it signals that this method belongs to a notional group of similarly named, latitude-longitude gridded subdivisions of the Globe that includes World Meteorological Organization Squares and Marsden squares, and contrasts with other tessellations of the Globe that use different shaped basic units such as rectangles, triangles, diamonds, and hexagons (for examples refer e.g. Sahr et al., 2003 [16] ). It is also intended that any individual component cell of the grid can be referred to as a "c-square" (no initial capitalization required).
Spatial data are inherently (at least) 2-dimensional; without additional indexing, a numeric range query in 2 dimensions (e.g. x and y, or latitude and longitude) is required to retrieve data items within a particular area. Such queries are computationally expensive so it can be beneficial to pre-process (index) the data in some manner that reduces the inherent dimensionality from two to one dimension, for example as labelled cells of a grid. The grid labels can then be indexed by standard, one dimensional methods for rapid search and retrieval, [17] and/or searched by simple alphanumeric text searches. C-squares is an example of such a grid where the cell identifiers are designed to be human- as well as machine-readable, and to be concordant with recognizable and commonly intervals of latitude and longitude.
Additional areas where a grid-based approach to spatial indexing can be beneficial can be for the representation of data "footprints" in support of spatial search, [13] data binning to reduce complex and potentially voluminous data into "blocks" which then can be more easily compared and summarised, and the potential for a hierarchical approach wherein finer resolutions of the grid are nested into coarser ones, with a shared notation (common identifiers for the larger portions of the relevant grid cells). A jurisdiction-independent, (global) grid such as c-squares can also be used to integrate data across national boundaries, in contrast to (for example) the national grids of various countries such as those of the United Kingdom, Ireland, etc., which are not the same in their approach and may have differences or gaps where such grids overlap, or fail to meet (for example in marine regions around two areas).
A potential disadvantage of "equal angle" grids (the class that includes c-squares), which are based on standardised units of latitude and longitude, is that the length of the "sides" and the shape (and area) of the grid cells is not constant on the ground (the height remains approximately constant but the width varies with latitude), and some particular effects are noticeable at the poles, where the cells become 3- rather than 4-sided in practice (refer illustration). These disadvantages can be offset by the advantages that data transformation in and out of grid notation can be accomplished by relatively straightforward steps, the results are congruent with conventional maps that show intervals of latitude and longitude, and the concepts of (for example) "1-degree squares" and "0.5 degree squares" may have familiarity and meaning to human users, in a way that non-square, purely mathematically derived shapes and sizes (based upon some form of spherical trigonometry) may not.
10-degree c-squares are specified as being identical to equivalent World Meteteorological Organization (WMO) square codes, refer illustration at right. These squares are aligned with 10-degree subdivisions of the global latitude–longitude grid, which for c-squares use is specified as employing the WGS84 datum. WMO (10 degree) squares are encoded with four digits, in the series 1xxx, 3xxx, 5xxx and 7xxx. [12] The leading digit indicates the "global quadrant" with 1 for north-east (latitude and longitude are both positive), 3 for south-east (latitude is negative and longitude positive), 5 for south-west (latitude and longitude are both negative) and 7 for north-west (latitude is positive and longitude negative). The next digit, 0 through 8, corresponds to the tens of latitude degrees either north or south; while the remaining 2 digits, 00 through 17, correspond to the tens of longitude degrees either east or west (by specification, 0 is treated as positive). Thus the 10 degree cell with its lower left corner at 0,0 (latitude,longitude) is encoded 1000, and acts as a bin to contain all spatial data between 0 and 10 degrees north (actually, 0 and 9.999...) and 0 and 9.999... degrees east; the 10 degree cell with its lower left corner at 80 N, 170 E is encoded 1817, and acts as a bin to contain all spatial data between 80 and 90 degrees north and 170 and 179.999... degrees east.
C-squares extends the initial WMO 10×10 square notation via a recursive series of "cycles", each 3 digits long (the final one may be 1 digit), separated by the colon character, the number of characters (and cycles) indicating the resolution encoded, as per these examples:
(etc.)
Cell size is typically selected to suit the nature (granularity and volume) of the data to be encoded, the overall spatial extent of the area in question (e.g. global to local), the desired spatial resolution of the resulting grid (smallest features/areas that can be differentiated from each other), and the computing resources available (numbers of cells to cover the same area increase by either ×4 or ×25 with each decrease in square size, either requiring an equivalent increase in computing resources or possibly slower addressing times). For example, relatively generalised, global compilations may be best suited to aggregate (index) data by 10- or 5- degree cells, while more local gridded areas may favour 1-, 0.5- or 0.1- degree cells, as appropriate.
The nominal sizes given above reflect the fact that at the equator, 1 degree of both latitude and longitude correspond to around 110 km, with the actual value for longitude declining between there and the poles, where it becomes zero (latitude actual: 110.567 km at the equator, 111.699 km at the poles; longitude actual: 111.320 km at the equator, 78.847 km at latitude ±45 degrees, 0 km at the poles); at a sample northern hemisphere latitude e.g. that of London (51.5 degrees north), a 1×1 degree square measures approximately 111×69 km. [18]
To produce the 1 or 3 digits in any cycle following the initial 4-digit, 10-degree square identifier, first an "intermediate quadrant", 1 through 4 is designated (refer diagram at right), where 1 indicates low absolute values of both latitude and longitude (regardless of sign), 2 indicates low longitude and high latitude, 3 indicates high latitude and low longitude, and 4 indicates high values for both; "low" and high" being taken from the relevant portion of the data to be gridded (for example within the 10 degree cell extending from 10 to 20 degrees, 10 is treated as low and 19 as high). This leading digit in a cycle is then followed simply by the next applicable digit for first latitude and then longitude: thus an input value of latitude +11.0, longitude +12.0 degrees will be encoded as the 5 degree c-square code 1101:1 and the 1 degree code 1101:112. Inspection of this code will show that the input latitude value can be recovered directly from the digits 1101:112 while the longitude is included as 1101:112; the sign for these is both positive, as indicated by the first digit of the leading 4 (1 in this case, indicating the north east global quadrant).
From 2002 onwards (still current at 2020), an online "latlong to c-squares conversion page" is available at the website of CSIRO Marine Research (now CSIRO Oceans and Atmosphere) which will convert input values of latitude and longitude to the equivalent c-square code at user selectable resolutions from 10 to 0.1 degree cell size. Alternatively it is a comparatively simple task to program from first principles (or construct as, for example, a Microsoft Excel worksheet) according to the c-squares specification; [19] an example is available here.
A set of c-squares (contiguous or non contiguous) can be represented as a concatenated list of individual square codes, separated by the "pipe" (|) character, thus: 7500:110:3|7500:110:1|1500:110:3|1500:110:1 (etc.). This set of squares can then serve as an indication of a dataset extent, similar in function (but simpler to specify) to a MultiPolygon in the Well-known text representation of geometry, the functional difference being that defined points forming the boundary of a polygon can be continuously variable, while those for the c-square boundaries are constrained to fixed intervals from the grid square resolution in use. If these strings are stored, for example as "long text" within a field of a conventional text storage system (e.g. spreadsheet, database, etc.) they can be used for the operation of spatial searches (see following section/s).
C-squares strings can also be used directly as input to an instance of the "c-squares mapper", a web-based utility in operation since 2002 at CSIRO in Australia (under the domain obis.org.au) and also at other global locations. To visualize the position of any set of squares on a map, the current syntax to address an installation of the "c-squares mapper" is (e.g.):
It should be noted here that the above call to the c-squares mapper is a simple one, with only a single parameter (a single c-squares string) which produces a simple "default map"; the mapper is in fact quite highly customizable, capable of accepting up to seven c-squares strings concurrently, plotting them in user-specified colours, with a choice of empty of filled squares, user-selectable base map, etc. etc.; a full list of available input parameters is provided on the mapper "technical information" page. [20] A more sophisticated map produced using a larger number of available parameters is the colour-coded example at right (AquaMap, i.e. modelled distribution, for the ocean sunfish). Commencing in 2006, an upgrade of the mapper incorporating the independently-written Xplanet software also allows the plots of supplied c-squares to be displayed on a user-rotatable and zoomable globe, which can offer a more realistic view for either Pacific Ocean- or polar- centred data than are possible with a flat map (e.g. equirectangular) projection. [21]
Th c-squares mapper is one of several options currently (2006–present) available for real time mapping of fish point data records in FishBase, as per this example page for the species Salmo trutta (sea trout); similar options are also available for other (non fish) marine species via SeaLifeBase as per this example. Since 2006, the mapper has also produced in excess of 100,000 species maps for the AquaMaps project (33,500 species x 4 "standard maps" per species as at 2021, additional user-generated maps available on demand).
In a system that uses c-squares codes as units of spatial indexing, a text-based search on any of these square identifiers will retrieve data associated with the relevant square. If a wildcard search is supported (for example in the case that the wildcard character is a percent sign), a search on "7500%" will retrieve all data items in that ten degree square, a search on "7500:1%" will retrieve all data items in that five degree square, etc.
The asterisk character "*" has a special (reserved) meaning in c-squares notation, being a "compact" notation indicating that all finer cells within a higher level cell are included, to the level of resolution indicated by the number of asterisks. In the example above, "7500:*" would indicate that all 4 five-degree cells within parent ten-degree cell "7500" are filled, "7500:***" would indicate that all 100 one-degree cells within parent ten-degree cell "7500" are filled, etc. This approach enables the filling of contiguous blocks of cells with an economy of characters in many cases (a form of data compression), that is useful for efficient storage and transfer of c-squares codes as required.
C-squares has been employed at a range of resolutions for data reporting, assembly and analysis on scales ranging from global to local, also incorporating multi-national data compilations where a gridded data system is required that is not tied to the boundaries of any single jurisdiction. Examples include:
C-squares labelled cells were adopted as the underlying grid for analysis by the European Union-funded MINOUW project (MINimisation Of UnWanted catches in European Waters), via their web application (MINOUWApp), in support of spatial data (notably fishing effort and density patches of potential unwanted catches) supplied by project researchers across different European countries in a range of formats, in combination with layers of spatial information from external sources. [42]
According to its design principles, the principal target audience for c-squares is data custodians who wish to organise spatial data by latitude-longitude grid squares at any of the resolutions supported by the system, namely any decimal subdivision of either 10×10 or 5×5 degree squares, to support associated data query, retrieval, analysis, representation (mapping), and potential external data exchange and aggregation. Fine resolution c-squares may also be used as a general "location encoder", selected desirable attributes of which are discussed further by the developers of the Google Open Location Code method, [43] since the c-squares method satisfies the majority of the criteria set out in that discussion document. As evidenced by the references cited in this article, principal adopters of the method to date have been concerned with marine data in particular; this most likely stems from the fact that the oceans are trans-national in their governance, therefore otherwise established local or national grids are unsuitable for analysis of ocean or fisheries data on anything other than a local scale. Although initially deployed in marine-related systems (as per its description in the journal "Oceanography"), in essence the system is terrain-agnostic (as is the latitude-longitude grid upon which it is based) and is applicable equally to both marine and terrestrial data.
An additional aspect of c-squares noted by Larsen et al., 2009 and either implicit or explicit in other equivalent "data aggregation methods" is the use of such frameworks to "allow general level analyses without exposing the precise coordinates of potentially sensitive information". [44] For example, real time data on the exact location of fishing vessels is frequently considered "commercial in confidence" to avoid release to competitors of the best fishing localities according to the nature of the resource, which may be continually moving, while for biodiversity data, the exact location of individuals or (for example) nests of rare species may again not be desirable to release to the public. The use of grid cells or similar methods to accurately represent the general location of data points without revealing their more exact location, while still rendering the data available for statistical analysis, is a recognised useful approach in such situations, refer e.g. Chapman, 2020. [45]
At its maximum scale, 10 degree c-squares are congruent with both World Meteorological Organization squares (whose identifiers are re-used within the c-squares notation) and Marsden squares, which share the same boundaries but use a different notation. Both 1 degree and 0.5 degree c-squares are partially congruent with "standard resolution" ICES Statistical Rectangles, which utilize a grid cell area of 1×0.5 degrees over a restricted portion of the Globe (north Atlantic region): 2 vertically adjacent ICES rectangles are exactly equivalent to a single 1 degree c-square, while if needed, the content of a single ICES rectangle can be apportioned between 2 horizontally adjacent 0.5 degree c-squares for data interchange at that resolution (refer note).
A separate system, QDGC or Quarter Degree Grid Cells, has been developed for interchange of some biodiversity data in Africa, and later extended to cope with data across the Equator and Prime Meridian. [44] QDGC cells, at 0.25×0.25 degrees, lie between the 0.5×0.5 and 0.1×0.1 degree resolution steps of the c-squares system, and are thus not exactly compatible with it, although the "parent" squares of the QDGC grid from which they are derived, at 1×1 and 0.5×0.5 degrees, are congruent with equivalent c-squares grid cells, however using a different notation. In their proposal for an "extended" QDGC system, Larsen et al. additionally describe the potential subdivision of 0.25×0.25 degree QDGC cells by a recursive factor of 2, giving cell sizes of 0.125, 0.0625, 0.03125 degrees, etc., which progressively depart further from the "decimal degrees" concept incorporated into c-squares.
There is no licence required to use the c-squares method, which has been openly published in the scientific literature since 2003. Source code for the mapper, etc., available via the SourceForge website, is released under the GNU General Public License version 2.0 (GPLv2), which provides free use and redistribution, and subsequent modification for any purpose so long as that licence is retained with the product and any subsequent modifications, in other words, that all the released improved versions will also be free software. [46]
A geographic coordinate system (GCS) is a spherical or geodetic coordinate system for measuring and communicating positions directly on the Earth as latitude and longitude. It is the simplest, oldest and most widely used of the various spatial reference systems that are in use, and forms the basis for most others. Although latitude and longitude form a coordinate tuple like a cartesian coordinate system, the geographic coordinate system is not cartesian because the measurements are angles and are not on a planar surface.
A general circulation model (GCM) is a type of climate model. It employs a mathematical model of the general circulation of a planetary atmosphere or ocean. It uses the Navier–Stokes equations on a rotating sphere with thermodynamic terms for various energy sources. These equations are the basis for computer programs used to simulate the Earth's atmosphere or oceans. Atmospheric and oceanic GCMs are key components along with sea ice and land-surface components.
A projected coordinate system – also called a projected coordinate reference system, planar coordinate system, or grid reference system – is a type of spatial reference system that represents locations on Earth using Cartesian coordinates (x, y) on a planar surface created by a particular map projection. Each projected coordinate system, such as "Universal Transverse Mercator WGS 84 Zone 26N," is defined by a choice of map projection (with specific parameters), a choice of geodetic datum to bind the coordinate system to real locations on the earth, an origin point, and a choice of unit of measure. Hundreds of projected coordinate systems have been specified for various purposes in various regions.
The Ordnance Survey National Grid reference system (OSGB), also known as British National Grid (BNG), is a system of geographic grid references used in Great Britain, distinct from latitude and longitude.
The Maidenhead Locator System is a geocode system used by amateur radio operators to succinctly describe their geographic coordinates, which replaced the deprecated QRA locator, which was limited to European contacts. Its purpose is to be concise, accurate, and robust in the face of interference and other adverse transmission conditions. The Maidenhead Locator System can describe locations anywhere in the world.
A geocode is a code that represents a geographic entity. It is a unique identifier of the entity, to distinguish it from others in a finite set of geographic entities. In general the geocode is a human-readable and short identifier.
The Irish grid reference system is a system of geographic grid references used for paper mapping in Ireland. The Irish grid partially overlaps the British grid, and uses a similar co-ordinate system but with a meridian more suited to its westerly location.
The Military Grid Reference System (MGRS) is the geocoordinate standard used by NATO militaries for locating points on Earth. The MGRS is derived from the Universal Transverse Mercator (UTM) grid system and the Universal Polar Stereographic (UPS) grid system, but uses a different labeling convention. The MGRS is used as geocode for the entire Earth.
Marsden square mapping or Marsden squares is a system that divides a world map with latitude-longitude gridlines between 80°N and 70°S latitudes into grid cells of 10° latitude by 10° longitude, each with a geocode, a unique numeric identifier. The method was devised by William Marsden, when first secretary of the British Admiralty, for collecting and combining geographically based information about the oceans.
The World Geographic Reference System (GEOREF) is a geocode, a grid-based method of specifying locations on the surface of the Earth. GEOREF is essentially based on the geographic system of latitude and longitude, but using a simpler and more flexible notation. GEOREF was used primarily in aeronautical charts for air navigation, particularly in military or inter-service applications, but it is rarely seen today. However, GEOREF can be used with any map or chart that has latitude and longitude printed on it.
The United States National Grid (USNG) is a multi-purpose location system of grid references used in the United States. It provides a nationally consistent "language of location", optimized for local applications, in a compact, user friendly format. It is similar in design to the national grid reference systems used in other countries. The USNG was adopted as a national standard by the Federal Geographic Data Committee (FGDC) of the US Government in 2001.
World Meteorological Organization (WMO) squares is a system of geocodes that divides a world map with latitude-longitude gridlines into grid cells of 10° latitude by 10° longitude, each with a unique, 4-digit numeric identifier. On the plate carrée projection, the grid cells appear square; however, if the Mercator projection is used, the grid cells appear 'stretched' vertically nearer the tops and bottoms of the map. On the actual surface of the Globe, the cells are approximately "square" only adjacent to the Equator, and become progressively narrower and tapered as they approach the poles, and cells adjoining the poles are unique in possessing three faces rather than four.
In the context of a spatial index, a grid or mesh is a regular tessellation of a manifold or 2-D surface that divides it into a series of contiguous cells, which can then be assigned unique identifiers and used for spatial indexing purposes. A wide variety of such grids have been proposed or are currently in use, including grids based on "square" or "rectangular" cells, triangular grids or meshes, hexagonal grids, and grids based on diamond-shaped cells. A "global grid" is a kind of grid that covers the entire surface of the globe.
A geodesic grid is a spatial grid based on a geodesic polyhedron or Goldberg polyhedron.
Geohash is a public domain geocode system invented in 2008 by Gustavo Niemeyer which encodes a geographic location into a short string of letters and digits. Similar ideas were introduced by G.M. Morton in 1966. It is a hierarchical spatial data structure which subdivides space into buckets of grid shape, which is one of the many applications of what is known as a Z-order curve, and generally space-filling curves.
A discrete global grid (DGG) is a mosaic that covers the entire Earth's surface. Mathematically it is a space partitioning: it consists of a set of non-empty regions that form a partition of the Earth's surface. In a usual grid-modeling strategy, to simplify position calculations, each region is represented by a point, abstracting the grid as a set of region-points. Each region or region-point in the grid is called a cell.
The Open Location Code (OLC) is a geocode based in a system of regular grids for identifying an area anywhere on the Earth. It was developed at Google's Zürich engineering office, and released late October 2014. Location codes created by the OLC system are referred to as "plus codes".
A global relief model, sometimes also denoted as global topography model or composite model, combines digital elevation model (DEM) data over land with digital bathymetry model (DBM) data over water-covered areas to describe Earth's relief. A relief model thus shows how Earth's surface would look like in the absence of water or ice masses.
Anthony J. J. ("Tony") Rees is a British-born software developer, data manager and biologist resident in Australia since 1986, and previously a data manager with CSIRO Marine and Atmospheric Research. He is responsible for developing a number of software systems currently used in science data management, including c-squares, Taxamatch, and IRMNG, the Interim Register of Marine and Nonmarine Genera. He has also been closely involved with the development of other biodiversity informatics initiatives including the Ocean Biogeographic Information System (OBIS), AquaMaps, and the iPlant Taxonomic Name Resolution Service (TNRS).
ICES Statistical Rectangles is a gridded, latitude-longitude based area notation system covering the north-east Atlantic region developed by the International Council for the Exploration of the Sea (ICES) in the 1970s, for simplified analysis and visualization of spatial data of relevance to that organization's interests. The individual rectangles that make up the system each measure 1 degree of longitude by 0.5 degrees of latitude and are intended to be roughly square in real world use in the ICES region of interest, approximately 30 nautical miles by 30 nautical miles at 60°N, although the actual width varies with latitude, gradually becoming wider than they are high south of 60°N, and narrower further north. The grid covers the region from 36°N to 85°30'N and from 44°W to 69°E using a set of alphanumeric identifiers, with row of latitude cited first, then column of longitude. The last used column identifier is M8; column identifiers A4-A9, and prefix "I" i.e. columns "I"0-"I"9 are not used. The resulting grid is 113 columns by 99 rows, comprising 11,187 labelled 1×0.5 degree cells. An example cell designation is 37F3, which designates the 1×0.5 degree rectangle of which the south-west corner is 54°00'N, 03°00'E. The grid covers both land and sea areas across its designated region, but as per the interests of its originating body, is typically employed for use with marine data such as analysis of marine resources, fishing activities, seabed habitat, etc., refer example references below. The full extent of the grid is visible in published figures such as Figs. 5-8 in Williamson et al., 2017.