Storage@home

Last updated
Storage@home
Developer(s) Stanford University / Adam Beberg
Initial release2009-09-15
Stable release
1.05 / 2009-12-02
Operating system Microsoft Windows, Mac OS X, Linux [1]
Platform x86
Available inEnglish
Type Distributed Storage
License Proprietary
Website en.fah-addict.net/articles/articles-6.php   OOjs UI icon edit-ltr-progressive.svg

Storage@home was a distributed data store project designed to store massive amounts of scientific data across a large number of volunteer machines. [2] The project was developed by some of the Folding@home team at Stanford University, from about 2007 through 2011. [3]

Contents

Function

Scientists such as those running Folding@home deal with massive amounts of data, which must be stored and backed up, and this is very expensive. [2] Traditionally, methods such as storing the data on RAID servers are used, but these become impractical for research budgets at this scale. [3] Pande's research group already dealt with storing hundreds of terabytes of scientific data. [2] Professor Vijay Pande and student Adam Beberg took experience from Folding@home and began work on Storage@home. [3] The project is designed based on the distributed file system known as Cosm, and the workload and analysis needed for Folding@home results. [3] While Folding@home volunteers can easily participate in Storage@home, much more disk space is needed from the user than Folding@home, to create a robust network. Volunteers each donate 10 GB of storage space, which would hold encrypted files. [3] These users gain points as a reward for reliable storage. Each file saved on the system is replicated four times, each spread across 10 geographically distant hosts. [3] [4] Redundancy also occurs over different operating systems and across time zones. If the servers detect the disappearance of an individual contributor, the data blocks held by that user would then be automatically duplicated to other hosts. Ideally, users would participate for a minimum of six months, and would alert the Storage@home servers before certain changes on their end such as a planned move of a machine or a bandwidth downgrade. Data stored on Storage@home was maintained through redundancy and monitoring, with repairs done as needed. [3] Through careful application of redundancy, encryption, digital signatures, automated monitoring and correction, large quantities of data could be reliably and easily retrieved. [2] [3] This ensures a robust network that will lose the least possible data. [4]

Storage Resource Broker is the closest storage project to Storage@home. [3]

Status

Storage@home was first made available on September 15, 2009 in a testing phase. It first monitored availability data and other basic statistics on the user's machine, which would be used to create a robust and capable storage system for storing massive amounts of scientific data. [5] However, in the same year it became inactive, despite initial plans for more to come. [6] On April 11, 2011 Pande stated his group had no active plans with Storage@home. [7]

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.

<span class="mw-page-title-main">Folding@home</span> Distributed computing project simulating protein folding

Folding@home is a distributed computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements of proteins, and is reliant on simulations run on volunteers' personal computers. Folding@home is currently based at the University of Pennsylvania and led by Greg Bowman, a former student of Vijay Pande.

Genome@home was a volunteer computing project run by Stefan Larson of Stanford University, and a sister project to Folding@home. Its goal was protein design and its applications, which had implications in many fields including medicine. Genome@home was run by the Pande Lab.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

<span class="mw-page-title-main">Windows Home Server</span> Home server operating system by Microsoft released in 2007

Windows Home Server is a home server operating system from Microsoft. It was announced on 7 January 2007 at the Consumer Electronics Show by Bill Gates, released to manufacturing on 16 July 2007 and officially released on 4 November 2007.

<span class="mw-page-title-main">BOINC client–server technology</span> BOINC volunteer computing client–server structure

BOINC client–server technology refers to the model under which BOINC works. The BOINC framework consists of two layers which operate under the client–server architecture. Once the BOINC software is installed in a machine, the server starts sending tasks to the client. The operations are performed client-side and the results are uploaded to the server-side.

Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers, and the physical environment is typically owned and managed by a hosting company. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment secured, protected, and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

<span class="mw-page-title-main">Life with PlayStation</span>

Life with PlayStation was an online multimedia application for the PlayStation 3 video game console on the PlayStation Network. The application had four channels, all of which revolved around a virtual globe that displayed information according to the channel. The application also included a client for Folding@home, a distributed computing project aimed at disease research. As of November 2012 the service has been discontinued.

<span class="mw-page-title-main">Rackspace Cloud</span> Cloud computing platform

The Rackspace Cloud is a set of cloud computing products and services billed on a utility computing basis from the US-based company Rackspace. Offerings include Cloud Storage, virtual private server, load balancers, databases, backup, and monitoring.

A centralized database is a database that is located, stored, and maintained in a single location. This location is most often a central computer or database system, for example a desktop or server CPU, or a mainframe computer.In most cases, a centralized database would be used by an organization or an institution Users access a centralized database through a computer network which is able to give them access to the central CPU, which in turn maintains to the database itself.

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with tongue-in-cheek similarity to RAID:

<span class="mw-page-title-main">BeeGFS</span> Distributed file system

BeeGFS is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is data throughput.

A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers (nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage contributed in aggregate must be at least equal to the amount of storage needed by end users. However, some nodes may contribute less storage and some may contribute more. There may be reward models to compensate the nodes contributing more.

RozoFS is a free software distributed file system. It comes as a free software, licensed under the GNU GPL v2. RozoFS uses erasure coding for redundancy.

Cosm is a family of open distributed computing software and protocols developed in 1995 led by Adam L. Beberg, and later developed by Mithral Inc. Cosm is a registered trademark of Mithral Inc.

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

Distributed block storage is a computer data storage architecture that the data is stored in volumes across multiple physical servers, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and object storage which manages data as objects. A common distributed block storage system is a Storage Area Network (SAN).

References

  1. "Storage@home Installation". Folding@Home web site. September 12, 2009. Retrieved October 28, 2016.
  2. 1 2 3 4 "General Information about Storage@home". 2009. Retrieved 2011-09-17.
  3. 1 2 3 4 5 6 7 8 9 Adam L. Beberg and Vijay S. Pande (2007). "Storage@home: Petascale Distributed Storage" (PDF). 2007 IEEE International Parallel and Distributed Processing Symposium. pp. 1–6. CiteSeerX   10.1.1.421.567 . doi:10.1109/IPDPS.2007.370672. ISBN   978-1-4244-0909-9. S2CID   12487615.
  4. 1 2 "The plan for splitting up data in Storage@home". 2009. Retrieved 2011-09-17.
  5. Vijay Pande (2009-09-15). "First stage of Storage@home roll out" . Retrieved 2011-12-14.
  6. "Storage@home FAQ". September 12, 2009. Archived from the original on August 15, 2011. Retrieved October 29, 2016.
  7. Vijay Pande (April 11, 2011). "Re: Storage@Home" . Retrieved October 29, 2016.