Nordic Data Grid Facility

Last updated

The Nordic Data Grid Facility, or NDGF, is a common e-Science infrastructure provided by the Nordic countries (Denmark, Finland, Norway, Sweden and Iceland) for scientific computing and data storage. It is the first and so far only internationally distributed WLCG Tier1 center, providing computing and storage services to experiments at CERN. [1]

Contents

History

Nordic Data Grid Facility traces its history back to end-2001, being intrinsically related to the NorduGrid project. Success of the latter indicated need for a larger pan-Nordic facility, with storage resources being of high priority. This need has been addressed by establishing a pilot NDGF infrastructure, which was operational in 2002-2005, and provided distributed storage in addition to the NorduGrid computing resources. During this phase, NDGF committed to provide a Nordic Tier1 (regional computing center) for the Worldwide LHC Computing Grid project at CERN. Specifics of this Tier1 are such that it has to be an internationally distributed Facility. The Nordic Data Grid Facility in its present function as a provider of the Nordic Grid Infrastructure was established in April 2006 by the Nordic Research Councils. [2] It came into operation on June 1, 2006, and its initial priority is to live up to the original commitment of establishing the Nordic Tier1, with the traditional focus on storage facilities. NDGF team includes software experts who take part in various Grid middleware development.

In 2012 NDGF became a part of a wider initiative, the Nordic e-Infrastructure Collaboration. [3]

Users and operations

NDGF Tier1 is a production Grid facility that leverages existing, national computational resources and Grid infrastructures.

To qualify for support research groups should form a Virtual Organization, a VO. The VO provides compute resources for sharing and NDGF Tier1 operates a Grid interface for the sharing of these resources.

Currently, most computational resources of NDGF Tier1 are accessible through ARC middleware. Some resources are also available via AliEn software. Distributed storage facility is realised through dCache storage management solution.

Today, the dominant user community of the NDGF Tier1 is the High Energy Physics - the ALICE, ATLAS and CMS Virtual Organizations - through the operation of the Nordic Tier1, which together with the Tier0, CERN, and the other 12 Tier1s collects, stores and processes the data produced by the Large Hadron Collider at CERN. [4] [5]

Since 2010, NDGF Tier1 is a part of the European Grid Infrastructure. [6] [7]

NDGF Tier1 was hosted by NORDUnet in 2006-2011, and since 2012 is hosted by NordForsk. [8]

NDGF vs NorduGrid

Many confuse NDGF and NorduGrid - which is not surprising, especially since in its second phase NDGF was proposed to assume the name "NorduGrid". It was however decided to distinguish between the mostly development-oriented project, NorduGrid, and the mostly operations-oriented one, NDGF. As a rule of thumb, NDGF provides mostly services, while NorduGrid provides mostly ARC middleware.

See also

Related Research Articles

<span class="mw-page-title-main">CERN</span> European research centre in Switzerland

The European Organization for Nuclear Research, known as CERN, is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in Meyrin, western suburb of Geneva, on the France–Switzerland border. It comprises 23 member states. Israel, admitted in 2013, is the only non-European full member. CERN is an official United Nations General Assembly observer.

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

<span class="mw-page-title-main">DESY</span> German national research center

DESY, short for Deutsches Elektronen-Synchrotron, is a national research centre for fundamental science located in Hamburg and Zeuthen near Berlin in Germany. It operates particle accelerators used to investigate the structure, dynamics and function of matter, and conducts a broad spectrum of interdisciplinary scientific research in four main areas: particle and high energy physics; photon science; astroparticle physics; and the development, construction and operation of particle accelerators. Its name refers to its first project, an electron synchrotron. DESY is publicly financed by the Federal Republic of Germany and the Federal States of Hamburg and Brandenburg and is a member of the Helmholtz Association.

<span class="mw-page-title-main">Large Hadron Collider</span> Particle accelerator at CERN, Switzerland

The Large Hadron Collider (LHC) is the world's largest and highest-energy particle collider. It was built by the European Organization for Nuclear Research (CERN) between 1998 and 2008 in collaboration with over 10,000 scientists and hundreds of universities and laboratories across more than 100 countries. It lies in a tunnel 27 kilometres (17 mi) in circumference and as deep as 175 metres (574 ft) beneath the France–Switzerland border near Geneva.

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.

<span class="mw-page-title-main">NorduGrid</span> Grid computing project

NorduGrid is a collaboration aiming at development, maintenance and support of the free Grid middleware, known as the Advanced Resource Connector (ARC).

<span class="mw-page-title-main">Advanced Resource Connector</span> Grid computing software

Advanced Resource Connector (ARC) is a grid computing middleware introduced by NorduGrid. It provides a common interface for submission of computational tasks to different distributed computing systems and thus can enable grid infrastructures of varying size and complexity. The set of services and utilities providing the interface is known as ARC Computing Element (ARC-CE). ARC-CE functionality includes data staging and caching, developed in order to support data-intensive distributed computing. ARC is an open source software distributed under the Apache License 2.0.

<span class="mw-page-title-main">European Grid Infrastructure</span> Effort to provide access to high-throughput computing resources across Europe

European Grid Infrastructure (EGI) is a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The EGI links centres in different European countries to support international research in many scientific disciplines. Following a series of research projects such as DataGrid and Enabling Grids for E-sciencE, the EGI Foundation was formed in 2010 to sustain the services of EGI.

The D-Grid Initiative was a government project to fund computer infrastructure for education and research (e-Science) in Germany. It uses the term grid computing. D-Grid started September 1, 2005 with six community projects and an integration project (DGI) as well as several partner projects.

The INFN Grid project was an initiative of the Istituto Nazionale di Fisica Nucleare (INFN) —Italy's National Institute for Nuclear Physics—for grid computing. It was intended to develop and deploy grid middleware services to allow INFN's users to transparently and securely share the computing and storage resources together with applications and technical facilities for scientific collaborations.

<span class="mw-page-title-main">Worldwide LHC Computing Grid</span> Grid computing project

The Worldwide LHC Computing Grid (WLCG), formerly the LHC Computing Grid (LCG), is an international collaborative project that consists of a grid-based computer network infrastructure incorporating over 170 computing centers in 42 countries, as of 2017. It was designed by CERN to handle the prodigious volume of data produced by Large Hadron Collider (LHC) experiments.

The Open Science Grid Consortium is an organization that administers a worldwide grid of technological resources called the Open Science Grid, which facilitates distributed computing for scientific research. Founded in 2004, the consortium is composed of service and resource providers, researchers from universities and national laboratories, as well as computing centers across the United States. Members independently own and manage the resources which make up the distributed facility, and consortium agreements provide the framework for technological and organizational integration.

gLite Grid computing software

gLite is a middleware computer software project for grid computing used by the CERN LHC experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet. The gLite services were adopted by more than 250 computing centres, and used by more than 15000 researchers in Europe and around the world.

GridPP is a collaboration of particle physicists and computer scientists from the United Kingdom and CERN. They manage and maintain a distributed computing grid across the UK with the primary aim of providing resources to particle physicists working on the Large Hadron Collider (LHC) experiments at CERN. They are funded by the UK's Science and Technology Facilities Council. The collaboration oversees a major computing facility called the Tier1 at the Rutherford Appleton Laboratory (RAL) along with the four Tier 2 organisations of ScotGrid, NorthGrid, SouthGrid and LondonGrid. The Tier 2s are geographically distributed and are composed of computing clusters at multiple institutes.

<span class="mw-page-title-main">European Middleware Initiative</span>

The European Middleware Initiative (EMI) is a computer software platform for high performance distributed computing. It is developed and distributed directly by the EMI project. It is the base for other grid middleware distributions used by scientific research communities and distributed computing infrastructures all over the world especially in Europe, South America and Asia. EMI supports broad scientific experiments and initiatives, such as the Worldwide LHC Computing Grid.

<span class="mw-page-title-main">Data grid</span> Set of services used to access, modify and transfer geographical data

A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. The adjacent diagram depicts a high level view of a data grid.

DaviX is an open-source client for WebDAV and Amazon S3 available for Microsoft Windows, Apple MacOSX and Linux. DaviX is written in C++ and provide several command-line tools and a C++ shared library.

<span class="mw-page-title-main">Edward Karavakis</span> Greek computer scientist (born 1983)

Edward Karavakis is a Greek computer scientist working as a Senior Applications Engineer at Brookhaven National Laboratory (BNL) stationed at CERN, the European Organization for Nuclear Research in Geneva, Switzerland.

References

  1. Bird, Ian (2011). "Computing for the Large Hadron Collider". Annual Review of Nuclear and Particle Science . 61 (1): 99–118. Bibcode:2011ARNPS..61...99B. doi: 10.1146/annurev-nucl-102010-130059 .
  2. Buch, Rene; Fischer, Lars; Grønager, Michael; Smirnova, Oxana (2006). "The Nordic Data Grid Facility". META. UNINEtt Sigma AS (1): 14–17.
  3. "NordForsk hosts new organisation for Nordic eInfrastructure cooperation". NordForsk.
  4. "Worldwide LHC Computing Grid sites". CERN. 2002. Archived from the original on 24 December 2011. Retrieved 20 December 2011.
  5. Fischer, Lars; Grønager, Michael; Kleist, Josva; Smirnova, Oxana (2008). "A Distributed Tier-1". Journal of Physics: Conference Series. IOP Publishing Ltd. 119 (5): 052016. Bibcode:2008JPhCS.119e2016F. doi: 10.1088/1742-6596/119/5/052016 .
  6. "European Grid Infrastructure resource providers". 2010. Retrieved 20 December 2011.
  7. Field, Laurence; et al. (2010). "Towards sustainability: An interoperability outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures". Journal of Physics: Conference Series. IOP Publishing Ltd. 219 (6): 062051. Bibcode:2010JPhCS.219f2051F. doi: 10.1088/1742-6596/219/6/062051 .
  8. "NordForsk".