Open Science Grid Consortium

Last updated

The Open Science Grid Consortium is an organization that administers a worldwide grid of technological resources called the Open Science Grid, which facilitates distributed computing for scientific research. Founded in 2004, the consortium is composed of service and resource providers, researchers from universities and national laboratories, as well as computing centers across the United States. Members independently own and manage the resources which make up the distributed facility, and consortium agreements provide the framework for technological and organizational integration.

Contents

Use

The OSG is used by scientists and researchers for data analysis tasks which are too computationally intensive for a single data center or supercomputer. While most of the grid's resources are used for particle physics, research teams from disciplines like biology, chemistry, astronomy, and geographic information systems are currently using the grid to analyze data. Research using the grid's resources has been published in the Journal of Physical Chemistry. [1] [2]

Large Hadron Collider

The Open Science Grid was created in order to facilitate data analysis from the Large Hadron Collider, and about 70% of its 300,000 computing-hours per day are dedicated to the analysis of data from particle colliders. [3] Once data has been collected and distributed by the LHC Computing Grid, the Open Science Grid assists physicists from institutions around the world in analysis. The grid has been designed so that resources and data are shared automatically:

It's really driven not so much by where the physicists come from, but what their interests are. Physicists will be able to submit jobs to this distributed network of centers and not worry about which center that their job is actually going to run on, because the data for their task will already be there. [4]

Robert Gardner, Senior Research Associate at The University of Chicago

Architecture

As of 2008, the OSG comprises over 25,000 computers with over 43,000 processors, most of which are running a distribution of Linux. [5] 72 institutions, including 42 universities, are consortium members who contribute resources to the grid. [6] There are 90 distinct computational and storage nodes in the grid, which are distributed across the United States and Brazil. [7]

Peerage

The grid is peered with other grids, including TeraGrid, LHC Computing Grid, the European Grid Infrastructure, and the Extreme Science and Engineering Discovery Environment (XSEDE), [8] allowing data and resources from those grids to be shared.

Study

The grid's architecture has been studied by many researchers in the fields of computer science and information systems. Research about the OSG has been published in Science [9] and Lecture Notes in Computer Science. [10]

Funding

The consortium is funded by the Department of Energy and National Science Foundation, and has received a $30 million joint grant. [11]

Related Research Articles

<span class="mw-page-title-main">CERN</span> European research centre in Switzerland

The European Organization for Nuclear Research, known as CERN, is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in Meyrin, western suburb of Geneva, on the France–Switzerland border. It comprises 23 member states. Israel, admitted in 2013, is the only non-European full member. CERN is an official United Nations General Assembly observer.

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

<span class="mw-page-title-main">Large Hadron Collider</span> Particle accelerator at CERN, Switzerland

The Large Hadron Collider (LHC) is the world's largest and highest-energy particle collider. It was built by the European Organization for Nuclear Research (CERN) between 1998 and 2008 in collaboration with over 10,000 scientists and hundreds of universities and laboratories across more than 100 countries. It lies in a tunnel 27 kilometres (17 mi) in circumference and as deep as 175 metres (574 ft) beneath the France–Switzerland border near Geneva.

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.

<i>In silico</i> Latin phrase referring to computer simulations

In biology and other experimental sciences, an in silico experiment is one performed on a computer or via computer simulation software. The phrase is pseudo-Latin for 'in silicon', referring to silicon in computer chips. It was coined in 1987 as an allusion to the Latin phrases in vivo, in vitro, and in situ, which are commonly used in biology. The latter phrases refer, respectively, to experiments done in living organisms, outside living organisms, and where they are found in nature.

Computational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science that uses advanced computing capabilities to understand and solve complex physical problems. This includes

<span class="mw-page-title-main">NorduGrid</span> Grid computing project

NorduGrid is a collaboration aiming at development, maintenance and support of the free Grid middleware, known as the Advanced Resource Connector (ARC).

<span class="mw-page-title-main">Advanced Resource Connector</span> Grid computing software

Advanced Resource Connector (ARC) is a grid computing middleware introduced by NorduGrid. It provides a common interface for submission of computational tasks to different distributed computing systems and thus can enable grid infrastructures of varying size and complexity. The set of services and utilities providing the interface is known as ARC Computing Element (ARC-CE). ARC-CE functionality includes data staging and caching, developed in order to support data-intensive distributed computing. ARC is an open source software distributed under the Apache License 2.0.

<span class="mw-page-title-main">Charlie Catlett</span> American computer scientist

Charlie Catlett is a senior computer scientist at Argonne National Laboratory and a visiting senior fellow at the Mansueto Institute for Urban Innovation at the University of Chicago. From 2020 to 2022 he was a senior research scientist at the University of Illinois Discovery Partners Institute. He was previously a senior computer scientist at Argonne National Laboratory and a senior fellow in the Computation Institute, a joint institute of Argonne National Laboratory and The University of Chicago, and a senior fellow at the University of Chicago's Harris School of Public Policy.

<span class="mw-page-title-main">European Grid Infrastructure</span> Effort to provide access to high-throughput computing resources across Europe

European Grid Infrastructure (EGI) is a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The EGI links centres in different European countries to support international research in many scientific disciplines. Following a series of research projects such as DataGrid and Enabling Grids for E-sciencE, the EGI Foundation was formed in 2010 to sustain the services of EGI.

The D-Grid Initiative was a government project to fund computer infrastructure for education and research (e-Science) in Germany. It uses the term grid computing. D-Grid started September 1, 2005 with six community projects and an integration project (DGI) as well as several partner projects.

The Nordic Data Grid Facility, or NDGF, is a common e-Science infrastructure provided by the Nordic countries for scientific computing and data storage. It is the first and so far only internationally distributed WLCG Tier1 center, providing computing and storage services to experiments at CERN.

<span class="mw-page-title-main">Worldwide LHC Computing Grid</span> Grid computing project

The Worldwide LHC Computing Grid (WLCG), formerly the LHC Computing Grid (LCG), is an international collaborative project that consists of a grid-based computer network infrastructure incorporating over 170 computing centers in 42 countries, as of 2017. It was designed by CERN to handle the prodigious volume of data produced by Large Hadron Collider (LHC) experiments.

Computational particle physics refers to the methods and computing tools developed in and used by particle physics research. Like computational chemistry or computational biology, it is, for particle physics both a specific branch and an interdisciplinary field relying on computer science, theoretical and experimental particle physics and mathematics. The main fields of computational particle physics are: lattice field theory, automatic calculation of particle interaction or decay and event generators.

Polish Grid Infrastructure PL-Grid, a nationwide computing structure, built in 2009-2011, under the scientific project PL-Grid – Polish Infrastructure for Supporting Computational Science in the European Research Space. Its purpose was to enable scientific research based on advanced computer simulations and large-scale computations using the computer clusters, and to provide convenient access to the computer resources for research teams, also outside the communities, in which the high performance computing centers operate.

GridPP is a collaboration of particle physicists and computer scientists from the United Kingdom and CERN. They manage and maintain a distributed computing grid across the UK with the primary aim of providing resources to particle physicists working on the Large Hadron Collider (LHC) experiments at CERN. They are funded by the UK's Science and Technology Facilities Council. The collaboration oversees a major computing facility called the Tier1 at the Rutherford Appleton Laboratory (RAL) along with the four Tier 2 organisations of ScotGrid, NorthGrid, SouthGrid and LondonGrid. The Tier 2s are geographically distributed and are composed of computing clusters at multiple institutes.

<span class="mw-page-title-main">European Middleware Initiative</span>

The European Middleware Initiative (EMI) is a computer software platform for high performance distributed computing. It is developed and distributed directly by the EMI project. It is the base for other grid middleware distributions used by scientific research communities and distributed computing infrastructures all over the world especially in Europe, South America and Asia. EMI supports broad scientific experiments and initiatives, such as the Worldwide LHC Computing Grid.

<span class="mw-page-title-main">Francine Berman</span> American computer scientist

Francine Berman is an American computer scientist, and a leader in digital data preservation and cyber-infrastructure. In 2009, she was the inaugural recipient of the IEEE/ACM-CS Ken Kennedy Award "for her influential leadership in the design, development and deployment of national-scale cyberinfrastructure, her inspiring work as a teacher and mentor, and her exemplary service to the high performance community". In 2004, Business Week called her the "reigning teraflop queen".

The SBGrid Consortium is a research computing group financially supported by participating research laboratories and operated out of Harvard Medical School. SBGrid provides the global structural biology community with support for research computing. Members of the SBGrid Consortium fund SBGrid’s ongoing operations through an annual membership fee. The resulting organization is a user-supported and user-directed community resource.

References

  1. Benjamin, Kenneth M.; Andrew J. Schultz; David A. Kofke (2007-11-01). "Virial Coefficients of Polarizable Water: Applications to Thermodynamic Properties and Molecular Clustering†". The Journal of Physical Chemistry C. 111 (43): 16021–16027. doi:10.1021/jp0743166. and the Journal of Chemical Information and Modeling
  2. Damjanović, Ana; Benjamin T. Miller, Torre J. Wenaus, Petar Maksimović, Bertrand García-Moreno E., Bernard R. Brooks (2008-10-27). "Open Science Grid Study of the Coupling between Conformation and Water Content in the Interior of a Protein". Journal of Chemical Information and Modeling. 48 (10): 2021–2029. doi:10.1021/ci800263c. PMID   18834189.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  3. Gaudin, Sharon (2008-11-09). "Collider probes universe's mysteries at the speed of light". ComputerWorld. Retrieved 2009-03-02.[ dead link ]
  4. Shread, Paul (2004-11-21). "Open Science Grid Consortium Declares Grid3 A Success". GridComputingPlanet. Archived from the original on July 15, 2007. Retrieved 2009-03-02.
  5. Gaudin, Sharon (2008-11-15). "Worldwide grid evaluating collider test results". InfoWorld. Retrieved 2009-03-02.[ dead link ]
  6. "Members and Partners". Archived from the original on March 13, 2009. Retrieved 2009-03-02.
  7. "VO Resource Selector". Open Science Grid. Retrieved 2009-03-02.[ permanent dead link ]
  8. "Open Science Grid User Guide". Archived from the original on August 8, 2014.
  9. Foster, I. (2005). "Service-oriented science". Science. 308 (5723): 814–817. Bibcode:2005Sci...308..814F. CiteSeerX   10.1.1.455.2392 . doi:10.1126/science.1110411. PMID   15879208. S2CID   23938543.
  10. Gannon, D.; B. Plale; M. Christie; L. Fang; Y. Huang; S. Jensen; G. Kandaswamy; S. Marru; S. L. Pallickara; S. Shirasuna (2005). Service oriented architectures for science gateways on grid systems. Lecture Notes in Computer Science. Vol. 3826. p. 21. doi:10.1007/11596141_3. ISBN   978-3-540-30817-1.
  11. "Open Science Grid Receives 30 Million Dollar Award to Empower Scientific Collaboration and Computation". Open Science Grid. 2006-11-25. Archived from the original on July 5, 2010. Retrieved 2009-03-02.