Rasdaman

Last updated
rasdaman
Rasdaman logo.png
Original author(s) Peter Baumann
Developer(s) rasdaman GmbH
Stable release
rasdaman v9.8.1 / July 26, 2019 (2019-07-26)
Repository
Written in C++ [1]
Operating system most Unix-like operating systems
Type Array DBMS
License GPL v3 (server) / LGPL v3 (client) or proprietary [2]
Website rasdaman.org , rasdaman.com

rasdaman ("raster data manager") is an Array DBMS, that is: a Database Management System which adds capabilities for storage and retrieval of massive multi-dimensional arrays, such as sensor, image, simulation, and statistics data. A frequently used synonym to arrays is raster data, such as in 2-D raster graphics; this actually has motivated the name rasdaman. However, rasdaman has no limitation in the number of dimensions - it can serve, for example, 1-D measurement data, 2-D satellite imagery, 3-D x/y/t image time series and x/y/z exploration data, 4-D ocean and climate data, and even beyond spatio-temporal dimensions.

Contents

History

In 1989, Peter Baumann started a research on database support for images, then at Fraunhofer Computer Graphics Institute. Following an in-depth investigation on raster data formalizations in imaging, in particular the AFATL Image Algebra, he established a database model for multi-dimensional arrays, including a data model and declarative query language. [3] pioneering the field of Array Databases. Today, multi-dimensional arrays are also known as Data Cubes.

At TU Munich, in the EU funded basic research project RasDaMan, a first prototype was established, on top of the O2 object-oriented DBMS, and tested in Earth and Life science applications. [4] Over further EU funded projects, this system was completed and extended to support relational DBMSs. A dedicated research spin-off, rasdaman GmbH, [5] was established to give commercial support in addition to the research which subsequently has been continued at Jacobs University. [6] Since then, both entities collaborate on the further development and use of the rasdaman technology.

Concepts

Data model

Based on an array algebra [7] specifically developed for database purposes, rasdaman adds a new attribute type, array, to the relational model. As this array definition is parametrized it constitutes a second-order construct or template; this fact is reflected by the second-order functionals in the algebra and query language.

For historical reasons, tables are called collections, as initial design emphasized an embedding into the object-oriented database standard, ODMG. Anticipating a full integration with SQL, rasdaman collections represent a binary relation with the first attribute being an object identifier and the second being the array. This allows the establishment of foreign key references between arrays and regular relational tuples.

Raster Query Language

The rasdaman query language, rasql, embeds itself into standard SQL and its set-oriented processing. On the new attribute type, multi-dimensional arrays, a set of extra operations is provided which all are based on a minimal set of algebraically defined core operators, an array constructor (which establishes a new array and fills it with values) and an array condenser (which, similarly to SQL aggregates, derives scalar summary information from an array). The query language is declarative (and, hence, optimizable) and safe in evaluation - that is: every query is guaranteed to return after a finite number of processing steps.

The rasql query guide [8] provides details, here some examples may illustrate its use:

selectc[*:*,100:200,*:*,42]fromClimateSimulationsasc
selectimg*(img.green>130)fromLandsatArchiveasimg

Note: this is a very naive phrasing of vegetation search; in practice one would use the NDVI formula, use null values for cloud masking, and several more techniques.

selectimgfromMRIasimg,Masksasmwheresome_cells(img>250andm)
selectpng(c[*:*,*:*,100,42])fromClimateSimulationsasc

Architecture

Storage management

Sample array tiling in rasdaman Sample tiling of an array for storage in rasdaman.png
Sample array tiling in rasdaman

Raster objects are maintained in a standard relational database, based on the partitioning of a raster object into tiles. [9] Aside from a regular subdivision, any user or system generated partitioning is possible. As tiles form the unit of disk access, it is of critical importance that the tiling pattern is adjusted to the query access patterns; several tiling strategies assist in establishing a well-performing tiling. A geo index is employed to quickly determine the tiles affected by a query. Optionally, tiles are compressed using one of various choices, including lossless and lossy (wavelet) algorithms; independently from that, query results can be compressed for transfer to the client. Both tiling strategy and compression comprise database tuning parameters.

Tiles and tile index are stored as BLOBs in a relational database which also holds the data dictionary needed by rasdaman's dynamic type system. Adapters are available for several relational systems, among them open-source PostgreSQL. For arrays larger than disk space, hierarchical storage management (HSM) support has been developed.

Query processing

Queries are parsed, optimised, and executed in the rasdaman server. The parser receives the query string and generates the operation tree. Further, it applies algebraic optimisation rules to the query tree where applicable; of the 150 algebraic rewriting rules, 110 are actually optimising while the other 40 serve to transform the query into canonical form. Parsing and optimization together take less than a millisecond on a laptop.

Execution follows a tile streaming paradigm: whenever possible, array tiles addressed by a query are fetched sequentially, and each tile is discarded after processing. This leads to an architecture scalable to data volumes exceeding server main memory by orders of magnitude.

Query execution is parallelised. First, rasdaman offers inter-query parallelism: A dispatcher schedules requests into a pool of server processes on a per-transaction basis. Intra-query parallelism transparently distributes query subtrees across available cores, GPUs, or cloud nodes.

Client APIs

The primary interface to rasdaman is the query language. Embeddings into C++ and Java APIs allow invocation of queries, as well as client-side convenience functions for array handling. Arrays per se are delivered in the main memory format of the client language and processor architecture, ready for further processing. Data format codecs allow to retrieve arrays in common raster formats, such as CSV, PNG, and NetCDF.

A Web design toolkit, raswct, is provided which makes the creation of Web query frontends easy, including graphical widgets for parametrized query handling, such as sliders for thresholds in queries.

Geo Web Services

A Java servlet, petascope, running as a rasdaman client offers Web service interfaces specifically for geo data access, processing and filtering. The following OGC standards are supported: WMS, WCS, WCPS, and WPS.

For WCS and WCPS, rasdaman is the reference implementation.

Status and license model

Today, rasdaman is a fully-fledged implementation offering select / insert / update / delete array query functionality. It is being used in both research and commercial installations.

In a collaboration of the original code owner, rasdaman GmbH [5] and Jacobs University, a code split was performed in 2008 - 2009 resulting in rasdaman community, [10] an open-source branch, and rasdaman enterprise, the commercial branch. Since then, rasdaman community is being maintained by Jacobs University whereas rasdaman enterprise remains proprietary to rasdaman GmbH. The difference between both variants mainly consists of performance boosters (such as specific optimization techniques) intended to support particularly large databases, user numbers, and complex queries; Details are available on the rasdaman community website. [11]

The rasdaman community license releases the server in GPL and all client parts in LGPL, thereby allowing the use of the system in any kind of license environment.

Impact and use

Being the first Array DBMS shipped (first prototype available in 1996), rasdaman has shaped this recent database research domain. Concepts of the data and query model (declarativeness, sometimes choice of operators) find themselves in more recent approaches.

In 2008, the Open Geospatial Consortium released the Web Coverage Processing Service standard which defines a raster query language based on the concept of a coverage. Operator semantics [12] is influenced by the rasdaman array algebra.

EarthLook [13] is a showcase for OGC coverage standards in action, offering 1-D through 4-D use cases of raster data access and ad-hoc processing. EarthLook is built on rasdaman.

A sample large project in which rasdaman is being used for large-scale services in all Earth sciences is EarthServer, [14] six services with a volume of at least 100 terabytes each have been set up for integrated data / metadata retrieval and distributed query processing.

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

A coverage is the digital representation of some spatio-temporal phenomenon. ISO 19123 provides the definition:

The SQL SELECT statement returns a result set of records, from one or more tables.

Dataphor is an open-source truly-relational database management system (RDBMS) and its accompanying user interface technologies, which together are designed to provide highly declarative software application development. The Dataphor Server has its own storage engine or it can be a virtual, or federated, DBMS, meaning that it can utilize other database engines for storage.

In computer programming contexts, a data cube is a multi-dimensional ("n-D") array of values. Typically, the term datacube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data.

<span class="mw-page-title-main">MonetDB</span>

MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing, data mining, geographic information system (GIS), Resource Description Framework (RDF), text retrieval and sequence alignment processing.

A GIS software program is a computer program to support the use of a geographic information system, providing the ability to create, store, manage, query, analyze, and visualize geographic data, that is, data representing phenomena for which location is important. The GIS software industry encompasses a broad range of commercial and open-source products that provide some or all of these capabilities within various information technology architectures.

Manifold System

Manifold System is a geographic information system (GIS) software package developed by Manifold Software Limited that runs on Microsoft Windows. Manifold System handles both vector and raster data, includes spatial SQL, a built-in Internet Map Server (IMS), and other general GIS features.

A Web Map Service (WMS) is a standard protocol developed by the Open Geospatial Consortium in 1999 for serving georeferenced map images over the Internet. These images are typically produced by a map server from data provided by a GIS database.

gvSIG Desktop application for working with geographic data

gvSIG, geographic information system (GIS), is a desktop application designed for capturing, storing, handling, analyzing and deploying any kind of referenced geographic information in order to solve complex management and planning problems. gvSIG is known for having a user-friendly interface, being able to access the most common formats, both vector and raster ones. It features a wide range of tools for working with geographic-like information.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spatial databases allow the representation of simple geometric objects such as points, lines and polygons. Some spatial databases handle more complex structures such as 3D objects, topological coverages, linear networks, and triangulated irregular networks (TINs). While typical databases have developed to manage various numeric and character types of data, such databases require additional functionality to process spatial data types efficiently, and developers have often added geometry or feature data types. The Open Geospatial Consortium (OGC) developed the Simple Features specification and sets standards for adding spatial functionality to database systems. The SQL/MM Spatial ISO/IEC standard is a part the SQL/MM multimedia standard and extends the Simple Features standard with data types that support circular interpolations.

Kosmo is a desktop geographic information system (GIS) with advanced functions. It is the first of a series of developments that are being made available to the community.

Peter Baumann (computer scientist) German computer scientist

Peter Baumann is a German computer scientist and professor at Jacobs University, Bremen, Germany, where he is head of the Large-Scale Scientific Information Systems research group in the Department of Computer Science and Electrical Engineering.

The Open Geospatial Consortium Web Coverage Service Interface Standard (WCS) defines Web-based retrieval of coverages – that is, digital geospatial information representing space/time-varying phenomena.

<span class="mw-page-title-main">Open Geospatial Consortium</span> Standards organization

The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization, originated in 1994. In the OGC, more than 500 commercial, governmental, nonprofit and research organizations collaborate in a consensus process encouraging development and implementation of open standards for geospatial content and services, sensor web and Internet of Things, GIS data processing and data sharing.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">Array DBMS</span> System that provides database services specifically for arrays

Array database management systems provide database services specifically for arrays, that is: homogeneous collections of data items, sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.

The Web Coverage Processing Service (WCPS) defines a language for filtering and processing of multi-dimensional raster coverages, such as sensor, simulation, image, and statistics data. The Web Coverage Processing Service is maintained by the Open Geospatial Consortium (OGC). This raster query language allows clients to obtain original coverage data, or derived information, in a platform-neutral manner over the Web.

References

  1. "The rasdaman Open Source Project on Open Hub". Open Hub. Black Duck Software. Retrieved 2020-01-14.
  2. "Rasdaman License". rasdaman.org. Retrieved 2016-08-01.
  3. Baumann, P.: On the Management of Multidimensional Discrete Data. VLDB Journal 4(3)1994, Special Issue on Spatial Database Systems, pp. 401 - 444
  4. "Raster data management in databases". Community Research and Development Information Service (CORDIS).
  5. 1 2 "rasdaman, the Big Data Analytics Server". Rasdaman.com. Retrieved 2022-09-11.
  6. "Rasdaman - the Agile Array Analytics Engine". www.rasdaman.com. Archived from the original on 24 September 2015. Retrieved 15 January 2022.
  7. Baumann, P.: A Database Array Algebra for Spatio-Temporal Data and Beyond. Proc. NGITS’99, LNCS 1649, Springer 1999, pp.76-93
  8. n.n.: Rasdaman Query Language Guide
  9. Furtado, P., Baumann, P.: Storage of Multidimensional Arrays based on Arbitrary Tiling. Proc. ICDE'99, March 23–26, 1999, Sydney, Australia, pp. 328-336
  10. "rasdaman". rasdaman. 2022-02-28. Retrieved 2022-09-11.
  11. rasdaman license model
  12. Baumann, P.: The OGC Web Coverage Processing Service (WCPS) Standard. Geoinformatica, 14(4)2010, pp. 447-479
  13. "Big Earth Datacube Standards: Coverages, WCS, WCPS on rasdaman". Standards.rasdaman.com. Retrieved 2022-09-11.
  14. "EarthServer". Earthserver.eu. Retrieved 2022-09-11.