The Nordic Data Grid Facility, or NDGF, is a common e-Science infrastructure provided by the Nordic countries (Denmark, Finland, Norway, Sweden and Iceland) for scientific computing and data storage. It is the first and so far only internationally distributed WLCG Tier1 center, providing computing and storage services to experiments at CERN. [1]
Nordic Data Grid Facility traces its history back to end-2001, being intrinsically related to the NorduGrid project. Success of the latter indicated need for a larger pan-Nordic facility, with storage resources being of high priority. This need has been addressed by establishing a pilot NDGF infrastructure, which was operational in 2002-2005, and provided distributed storage in addition to the NorduGrid computing resources. During this phase, NDGF committed to provide a Nordic Tier1 (regional computing center) for the Worldwide LHC Computing Grid project at CERN. Specifics of this Tier1 are such that it has to be an internationally distributed Facility. The Nordic Data Grid Facility in its present function as a provider of the Nordic Grid Infrastructure was established in April 2006 by the Nordic Research Councils. [2] It came into operation on June 1, 2006, and its initial priority is to live up to the original commitment of establishing the Nordic Tier1, with the traditional focus on storage facilities. NDGF team includes software experts who take part in various Grid middleware development.
In 2012 NDGF became a part of a wider initiative, the Nordic e-Infrastructure Collaboration. [3]
NDGF Tier1 is a production Grid facility that leverages existing, national computational resources and Grid infrastructures.
To qualify for support research groups should form a Virtual Organization, a VO. The VO provides compute resources for sharing and NDGF Tier1 operates a Grid interface for the sharing of these resources.
Currently, most computational resources of NDGF Tier1 are accessible through ARC middleware. Some resources are also available via AliEn software. Distributed storage facility is realised through dCache storage management solution.
Today, the dominant user community of the NDGF Tier1 is the High Energy Physics - the ALICE, ATLAS and CMS Virtual Organizations - through the operation of the Nordic Tier1, which together with the Tier0, CERN, and the other 12 Tier1s collects, stores and processes the data produced by the Large Hadron Collider at CERN. [4] [5]
Since 2010, NDGF Tier1 is a part of the European Grid Infrastructure. [6] [7]
NDGF Tier1 was hosted by NORDUnet in 2006-2011, and since 2012 is hosted by NordForsk. [8]
Many confuse NDGF and NorduGrid - which is not surprising, especially since in its second phase NDGF was proposed to assume the name "NorduGrid". It was however decided to distinguish between the mostly development-oriented project, NorduGrid, and the mostly operations-oriented one, NDGF. As a rule of thumb, NDGF provides mostly services, while NorduGrid provides mostly ARC middleware.
The European Organization for Nuclear Research, known as CERN, is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in Meyrin, western suburb of Geneva, on the France–Switzerland border. It comprises 24 member states. Israel, admitted in 2013, is the only non-European full member. CERN is an official United Nations General Assembly observer.
Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.
The Large Hadron Collider (LHC) is the world's largest and highest-energy particle accelerator. It was built by the European Organization for Nuclear Research (CERN) between 1998 and 2008 in collaboration with over 10,000 scientists and hundreds of universities and laboratories across more than 100 countries. It lies in a tunnel 27 kilometres (17 mi) in circumference and as deep as 175 metres (574 ft) beneath the France–Switzerland border near Geneva.
E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.
LHC@home is a volunteer computing project researching particle physics that uses the Berkeley Open Infrastructure for Network Computing (BOINC) platform. The project's computing power is utilized by physicists at CERN in support of the Large Hadron Collider and other experimental particle accelerators.
NorduGrid is a collaboration aiming at development, maintenance and support of the free Grid middleware, known as the Advanced Resource Connector (ARC).
Advanced Resource Connector (ARC) is a grid computing middleware introduced by NorduGrid. It provides a common interface for submission of computational tasks to different distributed computing systems and thus can enable grid infrastructures of varying size and complexity. The set of services and utilities providing the interface is known as ARC Computing Element (ARC-CE). ARC-CE functionality includes data staging and caching, developed in order to support data-intensive distributed computing. ARC is an open source software distributed under the Apache License 2.0.
European Grid Infrastructure (EGI) is a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The EGI links centres in different European countries to support international research in many scientific disciplines. Following a series of research projects such as DataGrid and Enabling Grids for E-sciencE, the EGI Foundation was formed in 2010 to sustain the services of EGI.
The D-Grid Initiative was a government project to fund computer infrastructure for education and research (e-Science) in Germany. It uses the term grid computing. D-Grid started September 1, 2005 with six community projects and an integration project (DGI) as well as several partner projects.
The INFN Grid project was an initiative of the Istituto Nazionale di Fisica Nucleare (INFN) —Italy's National Institute for Nuclear Physics—for grid computing. It was intended to develop and deploy grid middleware services to allow INFN's users to transparently and securely share the computing and storage resources together with applications and technical facilities for scientific collaborations.
The Worldwide LHC Computing Grid (WLCG), formerly the LHC Computing Grid (LCG), is an international collaborative project that consists of a grid-based computer network infrastructure incorporating over 170 computing centers in 42 countries, as of 2017. It was designed by CERN to handle the prodigious volume of data produced by Large Hadron Collider (LHC) experiments.
The Open Science Grid Consortium is an organization that administers a worldwide grid of technological resources called the Open Science Grid, which facilitates distributed computing for scientific research. Founded in 2004, the consortium is composed of service and resource providers, researchers from universities and national laboratories, as well as computing centers across the United States. Members independently own and manage the resources which make up the distributed facility, and consortium agreements provide the framework for technological and organizational integration.
gLite is a middleware computer software project for grid computing used by the CERN LHC experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet. The gLite services were adopted by more than 250 computing centres, and used by more than 15000 researchers in Europe and around the world.
GridPP is a collaboration of particle physicists and computer scientists from the United Kingdom and CERN. They manage and maintain a distributed computing grid across the UK with the primary aim of providing resources to particle physicists working on the Large Hadron Collider (LHC) experiments at CERN. They are funded by the UK's Science and Technology Facilities Council. The collaboration oversees a major computing facility called the Tier1 at the Rutherford Appleton Laboratory (RAL) along with the four Tier 2 organisations of ScotGrid, NorthGrid, SouthGrid and LondonGrid. The Tier 2s are geographically distributed and are composed of computing clusters at multiple institutes.
The European Middleware Initiative (EMI) was a computer software platform for high performance distributed computing. It was developed and distributed directly by the EMI project. It was the base for other grid middleware distributions used by scientific research communities and distributed computing infrastructures all over the world especially in Europe, South America and Asia. EMI supported broad scientific experiments and initiatives, such as the Worldwide LHC Computing Grid.
A data grid is an architecture or set of services that allows users to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request.
dCache is a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods. dCache is open source software built in Java and is used by, among others, ten out of fourteen Tier1 sites to CERN to store data from the Large Hadron Collider.
DaviX is an open-source client for WebDAV and Amazon S3 available for Microsoft Windows, Apple MacOSX and Linux. DaviX is written in C++ and provide several command-line tools and a C++ shared library.
Edward Karavakis is a Greek computer scientist working as a Senior Applications Engineer at Brookhaven National Laboratory (BNL) stationed at CERN, the European Organization for Nuclear Research in Geneva, Switzerland.