Tahoe-LAFS

Last updated
Tahoe-LAFS
Initial releaseMay 2, 2007 [1]
Stable release
1.17.0 [2] / 6 December 2021;16 months ago (2021-12-06)
Repository
Written in Python [3]
Operating system Cross-platform
Available inEnglish
Type Cloud computing
License Choice of GNU GPL 2+ and a custom open source licence with a grace period [4]
Website tahoe-lafs.org

Tahoe-LAFS (Tahoe Least-Authority File Store [5] ) is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system. [6] [7] It can be used as an online backup system, or to serve as a file or Web host similar to Freenet,[ citation needed ] depending on the front-end used to insert and access files in the Tahoe system. Tahoe can also be used in a RAID-like fashion using multiple disks to make a single large Redundant Array of Inexpensive Nodes (RAIN) pool of reliable data storage.

Contents

The system is designed and implemented around the "principle of least authority" (POLA), described by Brian Warner (one of the project's original founders) as the idea "that any component of the system should have as little power of authority as it needs to get its job done". [8] Strict adherence to this convention is enabled by the use of cryptographic capabilities that provide the minimum set of privileges necessary to perform a given task by asking agents. A RAIN array acts as a storage volume; these servers do not need to be trusted by confidentiality or integrity of the stored data.

History

Tahoe-LAFS was started in 2006 at online backup services company All My Data [8] and has been actively developed since 2007. [9] In 2008, Brian Warner and Zooko Wilcox-O'Hearn published a paper on Tahoe at the 4th ACM international workshop on Storage security and survivability. [10]

When All My Data closed in 2009, Tahoe-LAFS became a free software project under the GNU General Public License or The Transitive Grace License, which allows owners of the code twelve months to profit from their work before releasing it. In 2010, Tahoe-LAFS was mentioned as a tool against censorship by the Electronic Frontier Foundation. [11] In 2013, it was one of the hackathon projects at the GNU 30th anniversary. [12]

Functionality

Overview of Tahoe-LAFS Tahoe-LAFS.png
Overview of Tahoe-LAFS

The Tahoe-LAFS Client sends an unencrypted file via a web API to the HTTPS Server. The HTTPS Server passes the file off to the Tahoe-LAFS Storage client which encrypts the file and then uses erasure coding to store fragments of the file on multiple storage drives. [13]

Tahoe-LAFS features "provider-independent security", in that the integrity and confidentiality of the files are guaranteed by the algorithms used on the client, independent of the storage servers, which may fail or may be operated by untrusted entities. Files are encrypted using AES, then split up using erasure coding, such that only a subset K of the original N servers storing the file chunks need to be available in order to recreate the original file. [14] [15] The default parameters are K=3, N=10, so each file is shared across 10 different servers, accessing it requires the correct function of any 3 of those servers. [10]

Tahoe provides very little control over on which nodes data is stored. [16]

Fork

A patched version of Tahoe-LAFS exists from 2011, and was made to run on anonymous networks such as I2P, with support for multiple introducers. There is also a version for Microsoft Windows. [17] It is distributed from a site within the I2P network.[ citation needed ] In contrast to normal Tahoe-LAFS operation, when I2P and Tahoe-LAFS are used together the location of the nodes is disguised. This allows for anonymous distributed grids to be formed.

See also

Related Research Articles

<span class="mw-page-title-main">Freenet</span> Peer-to-peer Internet platform for censorship-resistant communication

Freenet is a peer-to-peer platform for censorship-resistant, anonymous communication. It uses a decentralized distributed data store to keep and deliver information, and has a suite of free software for publishing and communicating on the Web without fear of censorship. Both Freenet and some of its associated tools were originally designed by Ian Clarke, who defined Freenet's goal as providing freedom of speech on the Internet with strong anonymity protection.

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

An anonymous P2P communication system is a peer-to-peer distributed application in which the nodes, which are used to share resources, or participants are anonymous or pseudonymous. Anonymity of participants is usually achieved by special routing overlay networks that hide the physical location of each node from other participants.

Mnet is software to run a peer-to-peer distributed data store for file sharing purpose. It aimed at substituting free sharing of digital resources, typical of P2P networks, with a market regulated by its own currency. Mnet is a fork of the software MojoNation.

NetApp, Inc. is an American hybrid cloud data services and data management company headquartered in San Jose, California. It has ranked in the Fortune 500 from 2012 to 2021. Founded in 1992 with an IPO in 1995, NetApp offers cloud data services for management of applications and data both online and physically.

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion. It is usually specifically used to refer to either a distributed database where users store information on a number of nodes, or a computer network in which users store information on a number of peer network nodes.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

Google File System is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. Google file system was replaced by Colossus in 2010.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Gluster Inc. was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineering center in Bangalore, India. Gluster was funded by Nexus Venture Partners and Index Ventures. Gluster was acquired by Red Hat on October 7, 2011.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006, then in Europe in November 2007.

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Ceph is an open-source software-defined storage platform that implements object storage on a single distributed computer cluster and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalability to the exabyte level, and to be freely available. Since version 12, Ceph does not rely on other filesystems and can directly manage HDDs and SSDs with its own storage backend BlueStore and can completely self reliantly expose a POSIX filesystem.

A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers (nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage contributed in aggregate must be at least equal to the amount of storage needed by end users. However, some nodes may contribute less storage and some may contribute more. There may be reward models to compensate the nodes contributing more.

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.

Convergent encryption, also known as content hash keying, is a cryptosystem that produces identical ciphertext from identical plaintext files. This has applications in cloud computing to remove duplicate files from storage without the provider having access to the encryption keys. The combination of deduplication and convergent encryption was described in a backup system patent filed by Stac Electronics in 1995. This combination has been used by Farsite, Permabit, Freenet, MojoNation, GNUnet, flud, and the Tahoe Least-Authority File Store.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

LizardFS is an open source distributed file system that is POSIX-compliant and licensed under GPLv3. It was released in 2013 as fork of MooseFS. LizardFS is also offering a paid Technical Support with possibility of configurating and setting up the cluster and active cluster monitoring.

References

  1. "Tahoe-LAFS Documentation". tahoe-lafs.org. Archived from the original on 2012-08-21. Retrieved 2013-05-01.
  2. "Release tahoe-lafs-1.17.0 · tahoe-lafs/tahoe-lafs". GitHub. Archived from the original on 2023-04-15. Retrieved 2023-04-15.
  3. Willis, Nathan (17 February 2012). "Weekend Project: Get Started with Tahoe-LAFS Storage Grids". Linux.com . Archived from the original on 27 October 2021. Retrieved 5 March 2021.
  4. "About.RST in trunk/Docs – Tahoe-LAFS". Archived from the original on 2020-06-07. Retrieved 2013-01-07.
  5. "Tahoe-LAFS wiki". tahoe-lafs.org. Archived from the original on 2014-12-05. Retrieved 2014-12-01.
  6. Paul, Ryan (4 August 2009). "P2P-like Tahoe filesystem offers secure storage in the cloud". Ars Technica. Archived from the original on 11 January 2021. Retrieved 3 March 2021.
  7. Monteiro, Julian Geraldes (16 November 2010). "Modeling and Analysis of Reliable Peer-to-Peer Storage Systems" (PDF). Sophia Antipolis: Université de Nice. p. 17. Archived (PDF) from the original on 2 July 2013. Retrieved 15 December 2012.
  8. 1 2 Byfield, Bruce (20 May 2014). "Hide Cloud Data from the Cloud Vendor". Linux Magazine. Archived from the original on 27 February 2021. Retrieved 3 March 2021.
  9. O'Brien, Danny (6 September 2013). "Tahoe and Tor: Building Privacy on Strong Foundations". Electronic Frontier Foundation. Archived from the original on 25 January 2021. Retrieved 3 March 2021.
  10. 1 2 Wilcox-O'Hearn, Zooko; Warner, Brian (31 October 2008). "Tahoe: the least-authority filesystem" (PDF). Proceedings of the 4th ACM International Workshop on Storage Security and Survivability. Association for Computing Machinery: 21–26. doi:10.1145/1456469.1456474. S2CID   12056440. Archived (PDF) from the original on 26 January 2021. Retrieved 5 March 2021.
  11. Palmer, Chris (14 December 2010). "Constructive Direct Action Against Censorship". Electronic Frontier Foundation. Archived from the original on 20 January 2021. Retrieved 3 March 2021.
  12. "GNU 30th anniversary celebration and hackathon". 28 September 2013. Archived from the original on 4 April 2021. Retrieved 3 March 2021.
  13. Huchton, Scott (March 2011). "Secure mobile distributed file system (MDFS)" (PDF). Monterey, California. Naval Postgraduate School. pp. 8–9. Archived (PDF) from the original on 21 October 2021. Retrieved 15 December 2012.
  14. Haver, Eirik; Melvold, Eivind; Ruud, Pål (2011). "Cloud Storage Vault". Institutt for telematikk. pp. 20–21. Archived from the original on 27 October 2021. Retrieved 3 March 2021.
  15. Lee, Changhoon (2011). Secure and trust computing, data management and applications : STA 2011 workshops: IWCS 2011 and STAVE 2011, Loutraki, Greece, June 28 – 30, 2011 . Berlin: Springer. pp.  192–193. ISBN   978-3642223648.
  16. Arjan Peddemors, Christiaan Kuun, Rogier Spoor, Paul Dekkers and Christiaan den Besten (29 June 2011). "Survey of Technologies for Wide Area Distributed Storage" (PDF). p. 17. Archived (PDF) from the original on 3 March 2021. Retrieved 3 March 2021.{{cite web}}: CS1 maint: multiple names: authors list (link)
  17. "OSPackages – Tahoe-LAFS". tahoe-lafs.org. Archived from the original on 2013-11-12. Retrieved 2014-01-27.