METAFOR

Last updated

The Common Metadata for Climate Modelling Digital Repositories, or METAFOR project, is creating a Common Information Model (CIM) for climate data and the models that produce it. [1]

Contents

The CIM aims to describe climate data and the models that produce it in a standard way, and to address the fragmentation and gaps in availability of metadata (data describing data) as well as duplication of information collection and problems of identifying, accessing or using climate data that are currently found in existing repositories. A further aim of the METAFOR project is to ensure the wide adoption of the CIM.

METAFOR is optimizing the way climate data infrastructures are used to store knowledge, thereby adding value to primary research data and information, and providing an essential asset for the numerous stakeholders actively engaged in climate change issues (policy, research, impacts, mitigation, private sector).

METAFOR has created tools for practical use of the CIM, e.g., the CMIP5 questionnaire for input and creation of CIM documents. External groups, e.g., the Earth Systems Grid, are also writing tools for CIM content.

METAFOR and the CMIP5 metadata questionnaire

METAFOR was tasked by the World Climate Research Programme to produce the metadata for the 5th Coupled model intercomparison project, an international experiment involving multiple general circulation models that will serve as a basis for the IPCC Fifth Assessment Report.

The CMIP5 questionnaire is an ambitious metadata collection tool and will help scientists to provide the most comprehensive metadata of any climate model inter-comparison project. It aims to collect enough detail to allow users to easily:

The questionnaire also allows users to enter descriptions of components which are not already specified by the questionnaire controlled vocabulary. It produces XML output complies with the Metafor Common Information Model (CIM), allowing tools and services developed using the CIM to be applied to the questionnaire outputs.


Related Research Articles

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

In climatology, the Coupled Model Intercomparison Project (CMIP) is a collaborative framework designed to improve knowledge of climate change, being the analog of Atmospheric Model Intercomparison Project (AMIP) for global coupled ocean-atmosphere general circulation models (GCMs). It was organized in 1995 by the Working Group on Coupled Modelling (WGCM) of the World Climate Research Programme’s (WCRP). It is developed in phases to foster the climate model improvements but also to support national and international assessments of climate change.

The Earth System Modeling Framework (ESMF) is open-source software for building climate, numerical weather prediction, data assimilation, and other Earth science software applications. These applications are computationally demanding and usually run on supercomputers. The ESMF is considered a technical layer, integrated into a sophisticated common modeling infrastructure for interoperability. Other aspects of interoperability and shared infrastructure include: common experimental protocols, common analytic methods, common documentation standards for data and data provenance, shared workflow, and shared model components.

Fedora Commons

Fedora is a digital asset management (DAM) architecture upon which institutional repositories, digital archives, and digital library systems might be built. Fedora is the underlying architecture for a digital repository, and is not a complete management, indexing, discovery, and delivery application. It is a modular architecture built on the principle that interoperability and extensibility are best achieved by the integration of data, interfaces, and mechanisms as clearly defined modules.

Agricultural Information Management Standards, abbreviated to AIMS is a space for accessing and discussing agricultural information management standards, tools and methodologies connecting information workers worldwide to build a global community of practice. Information management standards, tools and good practices can be found on AIMS:

Established in 2002, Carleton Immersive Media Studio (CIMS) is a Carleton University Research Centre within the School of Architecture. The CIMS research agenda is based on the intertwining of content creation and applied research, allowing each to affect and inform the other. With a twofold aim of building upon existing and burgeoning Canadian digital media and technology and being situated alongside Canada's social and cultural commitment, CIMS research projects privilege content and user-driven research that is enabled by technology.

Downscaling is any procedure to infer high-resolution information from low-resolution variables. This technique is based on dynamical or statistical approaches commonly used in several disciplines, especially meteorology, climatology and remote sensing. The term downscaling usually refers to an increase in spatial resolution, but it is often also used for temporal resolution.

Kepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows. Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement. Workflows in general, and scientific workflows in particular, are directed graphs where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between components. In Kepler, the nodes are called 'Actors' and the edges are called 'channels'. Kepler includes a graphical user interface for composing workflows in a desktop environment, a runtime engine for executing workflows within the GUI and independently from a command-line, and a distributed computing option that allows workflow tasks to be distributed among compute nodes in a computer cluster or computing grid. The Kepler system principally targets the use of a workflow metaphor for organizing computational tasks that are directed towards particular scientific analysis and modeling goals. Thus, Kepler scientific workflows generally model the flow of data from one step to another in a series of computations that achieve some scientific goal.

Metadata Data about data

Metadata is "data that provides information about other data". In other words, it is "data about data". Many distinct types of metadata exist, including descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata.

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata program at the University Corporation for Atmospheric Research (UCAR). They are also the chief source of netCDF software, standards development, updates, etc. The format is an open standard. NetCDF Classic and 64-bit Offset Format are an international standard of the Open Geospatial Consortium.

German National Library of Economics

The German National Library of Economics is the world's largest research infrastructure for economic literature, online as well as offline. The ZBW is a member of the Leibniz Association and has been a foundation under public law since 2007. Several times the ZBW received the international LIBER Award for its innovative work in librarianship. The ZBW allows for access of millions of documents and research on economics, partnering with over 40 research institutions to create a connective Open Access portal and social web of research. Through its EconStor and EconBiz, researchers and students have accessed millions of datasets and thousands of articles. The ZBW also edits two journals: Wirtschaftsdienst and Intereconomics.

AGRIS is a global public domain database with more than 12 million structured bibliographical records on agricultural science and technology. It became operational in 1975 and the database was maintained by Coherence in Information for Agricultural Research for Development, and its content is provided by more than 150 participating institutions from 65 countries. The AGRIS Search system, allows scientists, researchers and students to perform sophisticated searches using keywords from the AGROVOC thesaurus, specific journal titles or names of countries, institutions, and authors.

Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics of the data.

The Climate and Forecast (CF) metadata conventions are conventions for the description of Earth sciences data, intended to promote the processing and sharing of data files. The metadata defined by the CF conventions are generally included in the same file as the data, thus making the file "self-describing". The conventions provide a definitive description of what the data values found in each netCDF variable represent, and of the spatial and temporal properties of the data, including information about grids, such as grid cell bounds and cell averaging methods. This enables users of files from different sources to decide which variables are comparable, and is a basis for building software applications with powerful data extraction, grid remapping, data analysis, and data visualization capabilities.

Geographic information systems (GIS) play a constantly evolving role in geospatial intelligence (GEOINT) and United States national security. These technologies allow a user to efficiently manage, analyze, and produce geospatial data, to combine GEOINT with other forms of intelligence collection, and to perform highly developed analysis and visual production of geospatial data. Therefore, GIS produces up-to-date and more reliable GEOINT to reduce uncertainty for a decisionmaker. Since GIS programs are Web-enabled, a user can constantly work with a decision maker to solve their GEOINT and national security related problems from anywhere in the world. There are many types of GIS software used in GEOINT and national security, such as Google Earth, ERDAS IMAGINE, GeoNetwork opensource, and Esri ArcGIS.

The NOAA National Operational Model Archive and Distribution System (NOMADS) is a Web-services based project providing both real-time and retrospective format independent access to climate and weather model data.

Data grid

A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. The adjacent diagram depicts a high level view of a data grid.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

COnnecting REpositories

CORE is a service provided by the Knowledge Media Institute, based at The Open University, United Kingdom. The goal of the project is to aggregate all open access content distributed across different systems, such as repositories and open access journals, enrich this content using text mining and data mining, and provide free access to it through a set of services. The CORE project also aims to promote open access to scholarly outputs. CORE works closely with digital libraries and institutional repositories.

Open energy system database projects employ open data methods to collect, clean, and republish energy-related datasets for open use. The resulting information is then available, given a suitable open license, for statistical analysis and for building numerical energy system models, including open energy system models. Permissive licenses like Creative Commons CC0 and CC BY are preferred, but some projects will house data made public under market transparency regulations and carrying unqualified copyright.

References

  1. A. Treshansky (2009). "Common Metadata for Climate Modelling Digital Repositories". Global Interoperability Program Kickoff Meeting. Geophysical Fluid Dynamics Laboratory, Princeton, NJ. Archived from the original on 2011-07-20.