Storage Resource Broker

Last updated

Storage Resource Broker (SRB) is data grid management computer software used in computational science research projects. SRB is a logical distributed file system based on a client-server architecture which presents users with a single global logical namespace or file hierarchy. [1] Essentially, the software enables a user to use a single mechanism to work with multiple data sources.

Contents

Description

SRB provides a uniform interface to heterogeneous computer data storage resources over a network. As part of this, it implements a logical namespace (distinct from physical file names) and maintains metadata on data-objects (files), users, groups, resources, collections, and other items in an SRB metadata catalog (MCAT) stored in a relational database management system. [2] System and user-defined metadata can be queried to locate files based on attributes as well as by name. SRB runs on various versions of Unix, Linux, and Microsoft Windows.

The SRB system is middleware in the sense that it is built on top of other major software packages (various storage systems, real-time data sources, a relational database management system, etc.) and it has callable library functions that can be utilized by higher level software. However, it is more complete than many middleware software systems as it implements a comprehensive distributed data management environment, including various end-user client applications. It has features to support the management and collaborative (and controlled) sharing, publication, replication, transfer, and preservation of distributed data collections. [3]

SRB is sometimes used in conjunction with computational grid computing systems, such as Globus Alliance, and can utilize the Globus Alliance Grid Security Infrastructure (GSI) authentication system.

SRB can store and retrieve data in archival storage systems such as the High Performance Storage System and SAM-FS, on disk file systems (Unix, Linux, or Windows), as binary large objects or tabular data in relational database management systems, and on tape libraries.

SRB was used since 1997. In 2008 the SRB was estimated to be managing over two petabytes of data.

While licensed, SRB source distributions are freely available to academic and non-profit organizations. Nirvana SRB, a commercial version of SRB, featured capabilities specifically adapted to government and commercial use. [4]

History

SRB development began in 1995, through the cooperative efforts of General Atomics, the Data Intensive Cyber Environments Group (DICE), and the San Diego Supercomputer Center (SDSC) at the University of California, San Diego (UCSD) with the support of the National Science Foundation (NSF).

SRB builds on the work of Reagan Moore. Moore, a doctorate in plasma physics from UCSD and former computational plasma physicist at General Atomics, joined the San Diego Supercomputer Center at its inception. [5] A project for a distributed object computation testbed was funded by DARPA and the US Patent and Trademark Office in 1998 and 1999. [6]

In 2003, General Atomics was granted an exclusive license from UCSD to develop SRB for commercial applications. [7] New versions were announced in 2008 and 2012. [8] [9]

The integrated Rule-Oriented Data management System (iRODS) is a follow-on project of the SDSC SRB team (which became the Data Intensive Cyber Environments group), and largely replaced the use of SRB. iRODS is based on SRB concepts but was completely re-written, includes a highly-configurable rule engine at its core and is fully open source. Conferences in 2011 included demonstrations of iRODS. [10]

Related Research Articles

<span class="mw-page-title-main">Client–server model</span> Distributed application structure in computing

The client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client usually does not share any of its resources, but it requests content or service from a server. Clients, therefore, initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client–server model are email, network printing, and the World Wide Web.

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

SRB or Srb may refer to:

<span class="mw-page-title-main">TeraGrid</span>

TeraGrid was an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

A grid file system is a computer file system whose goal is improved reliability and availability by taking advantage of many smaller file storage areas.

<span class="mw-page-title-main">Renaissance Computing Institute</span>

Renaissance Computing Institute (RENCI) was launched in 2004 as a collaboration involving the State of North Carolina, University of North Carolina at Chapel Hill (UNC-CH), Duke University, and North Carolina State University. RENCI is organizationally structured as a research institute within UNC-CH, and its main campus is located in Chapel Hill, NC, a few miles from the UNC-CH campus. RENCI has engagement centers at UNC-CH, Duke University (Durham), and North Carolina State University (Raleigh).

Kepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows. Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement. Workflows in general, and scientific workflows in particular, are directed graphs where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between components. In Kepler, the nodes are called 'Actors' and the edges are called 'channels'. Kepler includes a graphical user interface for composing workflows in a desktop environment, a runtime engine for executing workflows within the GUI and independently from a command-line, and a distributed computing option that allows workflow tasks to be distributed among compute nodes in a computer cluster or computing grid. The Kepler system principally targets the use of a workflow metaphor for organizing computational tasks that are directed towards particular scientific analysis and modeling goals. Thus, Kepler scientific workflows generally model the flow of data from one step to another in a series of computations that achieve some scientific goal.

Eucalyptus is a paid and open-source computer software for building Amazon Web Services (AWS)-compatible private and hybrid cloud computing environments, originally developed by the company Eucalyptus Systems. Eucalyptus is an acronym for Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems. Eucalyptus enables pooling compute, storage, and network resources that can be dynamically scaled up or down as application workloads change. Mårten Mickos was the CEO of Eucalyptus. In September 2014, Eucalyptus was acquired by Hewlett-Packard and then maintained by DXC Technology. After DXC stopped developing the product in late 2017, AppScale Systems forked the code and started supporting Eucalyptus customers.

gLite Grid computing software

gLite is a middleware computer software project for grid computing used by the CERN LHC experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet. The gLite services were adopted by more than 250 computing centres, and used by more than 15000 researchers in Europe and around the world.

<span class="mw-page-title-main">GLite-AMGA</span>

The ARDA Metadata Grid Application (AMGA) is a general purpose metadata catalogue and part of the European Middleware Initiative middleware distribution. It was originally developed by the EGEE project as part of its gLite middleware, when it became clear that many Grid applications needed metadata information on files and to organize a work-flow. AMGA is now developed and supported by the European Middleware Initiative.

gUSE Grid computing framework

The Grid and Cloud User Support Environment (gUSE), also known as WS-PGRADE /gUSE, is an open source science gateway framework that enables users to access grid and cloud infrastructures. gUSE is developed by the Laboratory of Parallel and Distributed Systems (LPDS) at Institute for Computer Science and Control (SZTAKI) of the Hungarian Academy of Sciences.

<span class="mw-page-title-main">Data grid</span> Set of services used to access, modify and transfer geographical data

A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. The adjacent diagram depicts a high level view of a data grid.

Data defined storage is a marketing term for managing, protecting, and realizing value from data by combining application, information and storage tiers.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

Nirvana was virtual object storage software developed and maintained by General Atomics.

References

  1. "What is the SRB". SRB wiki. 16 May 2006. Archived from the original on 2 July 2013. Retrieved 17 July 2013.
  2. Baru, Chaitanya; Moore, Reagan; Rajasekar, Arcot; Wan, Michael (2010). "The SDSC storage resource broker". CASCON First Decade High Impact Papers: 189–200. CiteSeerX   10.1.1.203.4142 . doi:10.1145/1925805.1925816. S2CID   15937740. (Reprint from November 30 – December 3, 1998)
  3. Baru, Chaitanya; Moore, Reagan; Rajasekar, Arcot; Wan, Michael (2010). "The SDSC storage resource broker". CASCON First Decade High Impact Papers: 189–200. CiteSeerX   10.1.1.203.4142 . doi:10.1145/1925805.1925816. S2CID   15937740. (Reprint from November 30 – December 3, 1998)
  4. "Nirvana Storage - Home of the Storage Resource Broker (SRB)". web site. Archived from the original on 2008-07-24. Retrieved 17 July 2013.
  5. "San Diego Business Journal", 29 September 2003 [ dead link ]
  6. "Digging Into Data: Q&A with Reagan Moore". SDSC web site. Archived from the original on 2 July 2013. Retrieved 17 July 2013.
  7. "General Atomics Acquires Exclusive License from UCSD for Commercialization of Unique Data Management Software". Press release. General Atomics. 22 September 2003. Retrieved 17 July 2013.
  8. "General Atomics, Nirvana Division releases SRB 2008". Press release. General Atomics. 29 July 2008. Retrieved 17 July 2013.
  9. "Nirvana SRB 2012 R3® Is Enhanced With Significant Caching Performance, Synchronization and Database Migration Improvements". Press release. General Atomics. 5 November 2012. Archived from the original on 12 June 2015. Retrieved 17 July 2013.
  10. Conway, Mike; Moore, Reagan; Rajasekar, Arcot; Nief, Jean-Yves (2011). "Demonstration of Policy-Guided Data Preservation Using iRODS". 2011 IEEE International Symposium on Policies for Distributed Systems and Networks. pp. 173–174. doi:10.1109/POLICY.2011.17. ISBN   978-0-7695-4330-7. S2CID   8684444.
Bibliography