Australian Geoscience Data Cube

Last updated

The Australian Geoscience Data Cube (AGDC) is an approach to storing, processing and analyzing large collections of Earth observation data. The technology is designed to meet challenges of national interest by being agile and flexible with vast amounts of layered grid data.

Contents

The AGDC reduces processing time of traditional image analysis by calibrating, pre-computing known extents, pixel alignment and storing metadata in a cell lattice structure. The temporal-pixel aligned data can often be analysed faster across space and time dimensions than previous scene based techniques. This allows the AGDC to be flexible in tackling future challenges and improve analysis times on every-increasing data repositories of earth observation.

The AGDC has also been used internationally [1] to allow countries to maintain ecologically sustainable programs and reduce the difficulty curve of utilizing Remote Sensing data. [2]

Background

The AGDC was originally conceived by Geoscience Australia but is now maintained in a partnership between Geoscience Australia, Commonwealth Scientific and Industrial Research Organisation (CSIRO) and National Computational Infrastructure National Facility (Australia) (NCI). This is made possible by the funding from the partnership and a number of organisations such as National Collaborative Research Infrastructure Strategy (NCRIS). [3]

Analysis ready data, ingestion and indexing

The data processed in the cube is made analysis ready [4] before being ingested and indexed into the AGDC. Analysis ready data is pre-processed data that has applied corrections for instrument calibration (gains and offsets), geolocation (spatial alignment) and radiometry (solar illumination, incidence angle, topography, atmospheric interference). The ingestion process manages the translation of datasets into the storage units while maintaining a database index. The data within the storage and index can be accessed via API calls often compiled within code such as Python (programming language).

Example:

s2a_l1c = dc.load(product='s2a_level1c_granule',x=(147.36, 147.41), y=(-35.1, -35.15), measurements=['04','03','02'], output_crs='EPSG:4326', resolution=(-0.00025,0.00025))

Datasets currently stored

Datasets that have been piloted

Open source

The AGDC code base is situated in GitHub as an open repository. [5] The core code base moved to the Open Data Cube in early 2017 as part of an international collaboration. Whilst the code base is the Open Data Cube, individual cubes exist as their own right such as the AGDC on the National Computational Infrastructure National Facility (Australia) (NCI) using the High-Performance Computing Cluster HPCC. The core code can be installed on personal computers or public computers (using git) and has many unit tests.

Documentation for the code base exists on Read the Docs. [6]

Challenges of the AGDC

The AGDC is designed to meet nationally significant challenges such as the following.

International awards

The AGDC won the 2016 Content Platform of the Year award from Geospatial World Forum. [7] [8]

Related Research Articles

<span class="mw-page-title-main">Wolfram Mathematica</span> Computational software program

Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimization, plotting functions and various types of data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other programming languages. It was conceived by Stephen Wolfram, and is developed by Wolfram Research of Champaign, Illinois. The Wolfram Language is the programming language used in Mathematica. Mathematica 1.0 was released on June 23, 1988 in Champaign, Illinois and Santa Clara, California.

<span class="mw-page-title-main">BioRuby</span>

BioRuby is a collection of open-source Ruby code, comprising classes for computational molecular biology and bioinformatics. It contains classes for DNA and protein sequence analysis, sequence alignment, biological database parsing, structural biology and other bioinformatics tasks. BioRuby is released under the GNU GPL version 2 or Ruby licence and is one of a number of Bio* projects, designed to reduce code duplication.

<span class="mw-page-title-main">NASA WorldWind</span> Open-source virtual globe

NASA WorldWind is an open-source virtual globe. According to the website, "WorldWind is an open source virtual globe API. WorldWind allows developers to quickly and easily create interactive visualizations of 3D globe, map and geographical information. Organizations around the world use WorldWind to monitor weather patterns, visualize cities and terrain, track vehicle movement, analyze geospatial data and educate humanity about the Earth." It was first developed by NASA in 2003 for use on personal computers and then further developed in concert with the open source community since 2004. As of 2017, a web-based version of WorldWind is available online. An Android version is also available.

In computer programming contexts, a data cube is a multi-dimensional ("n-D") array of values. Typically, the term data cube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data.

<span class="mw-page-title-main">Orange (software)</span>

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.

<span class="mw-page-title-main">Geoscience Australia</span> Agency of the Australian Government

Geoscience Australia is an agency of the Australian Government. It carries out geoscientific research. The agency is the government's technical adviser on all aspects of geoscience, and custodian of the geographic and geological data and knowledge of the nation.

The cancer Biomedical Informatics Grid (caBIG) was a US government program to develop an open-source, open access information network called caGrid for secure data exchange on cancer research. The initiative was developed by the National Cancer Institute and was maintained by the Center for Biomedical Informatics and Information Technology (CBIIT) and program managed by Booz Allen Hamilton. In 2011 a report on caBIG raised significant questions about effectiveness and oversight, and its budget and scope were significantly trimmed. In May 2012, the National Cancer Informatics Program (NCIP) was created as caBIG's successor program.

<span class="mw-page-title-main">QGIS</span> Open-source desktop GIS software

QGIS, also known as Quantum GIS, is a geographic information system (GIS) software that is free and open-source. QGIS supports Windows, macOS, and Linux. It supports viewing, editing, printing, and analysis of geospatial data.

Azure DevOps Server is a Microsoft product that provides version control, reporting, requirements management, project management, automated builds, testing and release management capabilities. It covers the entire application lifecycle and enables DevOps capabilities. Azure DevOps can be used as a back-end to numerous integrated development environments (IDEs) but is tailored for Microsoft Visual Studio and Eclipse on all platforms.

<span class="mw-page-title-main">National Computational Infrastructure</span> HPC facility in Canberra, Australia

The National Computational Infrastructure is a high-performance computing and data services facility, located at the Australian National University (ANU) in Canberra, Australian Capital Territory. The NCI is supported by the Australian Government's National Collaborative Research Infrastructure Strategy (NCRIS), with operational funding provided through a formal collaboration incorporating CSIRO, the Bureau of Meteorology, the Australian National University, Geoscience Australia, the Australian Research Council, and a number of research intensive universities and medical research institutes.

<span class="mw-page-title-main">ANUGA Hydro</span>

ANUGA Hydro is a free and open source software tool for hydrodynamic modelling, suitable for predicting the consequences of hydrological disasters such as riverine flooding, storm surges and tsunamis. For example, ANUGA can be used to create predicted inundation maps based on hypothetical tsunami or flood scenarios. The ANUGA name without qualification is used informally to mean the ANUGA Hydro tool.

LabKey Server is a software suite available for scientists to integrate, analyze, and share biomedical research data. The platform provides a secure data repository that allows web-based querying, reporting, and collaborating across a range of data sources. Specific scientific applications and workflows can be added on top of the basic platform and leverage a data processing pipeline.

rasdaman Database management system

rasdaman is an Array DBMS, that is: a Database Management System which adds capabilities for storage and retrieval of massive multi-dimensional arrays, such as sensor, image, simulation, and statistics data. A frequently used synonym to arrays is raster data, such as in 2-D raster graphics; this actually has motivated the name rasdaman. However, rasdaman has no limitation in the number of dimensions - it can serve, for example, 1-D measurement data, 2-D satellite imagery, 3-D x/y/t image time series and x/y/z exploration data, 4-D ocean and climate data, and even beyond spatio-temporal dimensions.

<span class="mw-page-title-main">Carto (company)</span> Cloud computing platform

CARTO is a software as a service (SaaS) cloud computing platform that provides GIS, web mapping, data visualization, spatial analytics, and spatial data science features. The company is positioned as a Location Intelligence platform due to its tools for geospatial data analysis and visualization that do not require advanced GIS or development experience. As a cloud-native platform, CARTO overcomes any previous limits on data scale for spatial workloads.

<span class="mw-page-title-main">Enterprise Architect (software)</span> Visual modeling and design tool

Sparx Systems Enterprise Architect is a visual modeling and design tool based on the OMG UML. The platform supports: the design and construction of software systems; modeling business processes; and modeling industry based domains. It is used by businesses and organizations to not only model the architecture of their systems, but to process the implementation of these models across the full application development life-cycle.

<span class="mw-page-title-main">Discrete global grid</span> Partition of Earths surface into subdivided cells

A discrete global grid (DGG) is a mosaic that covers the entire Earth's surface. Mathematically it is a space partitioning: it consists of a set of non-empty regions that form a partition of the Earth's surface. In a usual grid-modeling strategy, to simplify position calculations, each region is represented by a point, abstracting the grid as a set of region-points. Each region or region-point in the grid is called a cell.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

Perforce Software, Inc. is an American developer of software used for developing and running applications, including version control software, web-based repository management, developer collaboration, application lifecycle management, web application servers, debugging tools and agile planning software.

<span class="mw-page-title-main">Notebook interface</span> Programming tool blending code and documents

A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.

GeoTrellis is an open source, geographic data processing library designed to work with large geospatial raster data sets. It is written in Scala and has an open-source Apache 2.0 license.

References

  1. "CEOS". Archived from the original on 2019-11-14. Retrieved 2017-03-14.
  2. "CEOS Data Cube Platform version 2 (CEOS2)". software.nasa.gov. Archived from the original on 2017-03-15.
  3. "National Collaborative Research Infrastructure Strategy (NCRIS)". 5 October 2021.
  4. "Open Data Cube | Open Source". Open Data Cube.
  5. "Opendatacube/Datacube-core". GitHub . 20 January 2022.
  6. "Open Data Cube Manual — Open Data Cube 1.8 documentation".
  7. "Australian Geoscience Data Cube innovation recognised on world stage". 23 January 2017.
  8. "Geospatial World Forum 2024 : 13-16 May, Rotterdam, The Netherlands | A Geospatial Conference". geospatialworldforum.org.