Tahoe-LAFS

Last updated
Initial releaseMay 2, 2007 [1]
Stable release
1.17.0 [2] / 6 December 2021;2 years ago (2021-12-06)
Repository
Written in Python [3]
Operating system Cross-platform
Available inEnglish
Type Cloud computing
License Choice of GNU GPL 2+ and a custom open source licence with a grace period [4]
Website tahoe-lafs.org

Tahoe-LAFS (Tahoe Least-Authority File Store [5] ) is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system. [6] [7] It can be used as an online backup system, or to serve as a file or Web host similar to Freenet,[ citation needed ] depending on the front-end used to insert and access files in the Tahoe system. Tahoe can also be used in a RAID-like fashion using multiple disks to make a single large Redundant Array of Inexpensive Nodes (RAIN) pool of reliable data storage.

Contents

The system is designed and implemented around the "principle of least authority" (POLA), described by Brian Warner (one of the project's original founders) as the idea "that any component of the system should have as little power of authority as it needs to get its job done". [8] Strict adherence to this convention is enabled by the use of cryptographic capabilities that provide the minimum set of privileges necessary to perform a given task by asking agents. A RAIN array acts as a storage volume; these servers do not need to be trusted by confidentiality or integrity of the stored data.

History

Tahoe-LAFS was started in 2006 at online backup services company All My Data [8] and has been actively developed since 2007. [9] In 2008, Brian Warner and Zooko Wilcox-O'Hearn published a paper on Tahoe at the 4th ACM international workshop on Storage security and survivability. [10]

When All My Data closed in 2009, Tahoe-LAFS became a free software project under the GNU General Public License or The Transitive Grace License, which allows owners of the code twelve months to profit from their work before releasing it. In 2010, Tahoe-LAFS was mentioned as a tool against censorship by the Electronic Frontier Foundation. [11] In 2013, it was one of the hackathon projects at the GNU 30th anniversary. [12]

Functionality

Overview of Tahoe-LAFS Tahoe-LAFS.png
Overview of Tahoe-LAFS

The Tahoe-LAFS Client sends an unencrypted file via a web API to the HTTPS Server. The HTTPS Server passes the file off to the Tahoe-LAFS Storage client which encrypts the file and then uses erasure coding to store fragments of the file on multiple storage drives. [13]

Tahoe-LAFS features "provider-independent security", in that the integrity and confidentiality of the files are guaranteed by the algorithms used on the client, independent of the storage servers, which may fail or may be operated by untrusted entities. Files are encrypted using AES, then split up using erasure coding, such that only a subset K of the original N servers storing the file chunks need to be available in order to recreate the original file. [14] [15] The default parameters are K=3, N=10, so each file is shared across 10 different servers, accessing it requires the correct function of any 3 of those servers. [10]

Tahoe provides very little control over on which nodes data is stored. [16]

Fork

A patched version of Tahoe-LAFS exists from 2011, and was made to run on anonymous networks such as I2P, with support for multiple introducers. There is also a version for Microsoft Windows. [17] It is distributed from a site within the I2P network.[ citation needed ] In contrast to normal Tahoe-LAFS operation, when I2P and Tahoe-LAFS are used together the location of the nodes are disguised. This allows for anonymous distributed grids to be formed.

See also

Related Research Articles

<span class="mw-page-title-main">Hyphanet</span> Peer-to-peer Internet platform for censorship-resistant communication

Hyphanet is a peer-to-peer platform for censorship-resistant, anonymous communication. It uses a decentralized distributed data store to keep and deliver information, and has a suite of free software for publishing and communicating on the Web without fear of censorship. Both Freenet and some of its associated tools were originally designed by Ian Clarke, who defined Freenet's goal as providing freedom of speech on the Internet with strong anonymity protection.

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

An anonymous P2P communication system is a peer-to-peer distributed application in which the nodes, which are used to share resources, or participants are anonymous or pseudonymous. Anonymity of participants is usually achieved by special routing overlay networks that hide the physical location of each node from other participants.

The Invisible Internet Project (I2P) is an anonymous network layer that allows for censorship-resistant, peer-to-peer communication. Anonymous connections are achieved by encrypting the user's traffic, and sending it through a volunteer-run network of roughly 55,000 computers distributed around the world. Given the high number of possible paths the traffic can transit, a third party watching a full connection is unlikely. The software that implements this layer is called an "I2P router", and a computer running I2P is called an "I2P node". I2P is free and open sourced, and is published under multiple licenses.

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion. It is usually specifically used to refer to either a distributed database where users store information on a number of nodes, or a computer network in which users store information on a number of peer network nodes.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

<span class="mw-page-title-main">Zooko's triangle</span> Trilemma in computer science concerning network naming schemes

Zooko's triangle is a trilemma of three properties that some people consider desirable for names of participants in a network protocol:

Gluster Inc. was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineering center in Bangalore, India. Gluster was funded by Nexus Venture Partners and Index Ventures. Gluster was acquired by Red Hat on October 7, 2011.

Amazon Simple Storage Service (S3) is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006, then in Europe in November 2007.

<span class="mw-page-title-main">Syndie</span>

Syndie is an open-source cross-platform computer application to syndicate (re-publish) data over a variety of anonymous and non-anonymous computer networks.

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. Ceph provides completely distributed operation without a single point of failure and scalability to the exabyte level, and is freely available. Since version 12 (Luminous), Ceph does not rely on any other conventional filesystem and directly manages HDDs and SSDs with its own storage backend BlueStore and can expose a POSIX filesystem.

A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers (nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage contributed in aggregate must be at least equal to the amount of storage needed by end users. However, some nodes may contribute less storage and some may contribute more. There may be reward models to compensate the nodes contributing more.

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.

Convergent encryption, also known as content hash keying, is a cryptosystem that produces identical ciphertext from identical plaintext files. This has applications in cloud computing to remove duplicate files from storage without the provider having access to the encryption keys. The combination of deduplication and convergent encryption was described in a backup system patent filed by Stac Electronics in 1995. This combination has been used by Farsite, Permabit, Freenet, MojoNation, GNUnet, flud, and the Tahoe Least-Authority File Store.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

ObjectiveFS is a distributed file system developed by Objective Security Corp. It is a POSIX-compliant file system built with an object store backend. It was initially released with AWS S3 backend, and has later implemented support for Google Cloud Storage and object store devices. It was released for beta in early 2013, and the first version was officially released on August 11, 2013.

<span class="mw-page-title-main">InterPlanetary File System</span> Content-addressable, peer-to-peer hypermedia distribution protocol

The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed file system. By using content addressing, IPFS uniquely identifies each file in a global namespace that connects IPFS hosts, creating a resilient system of file storage and sharing.

<span class="mw-page-title-main">ZeroNet</span> Peer to peer web hosting

ZeroNet is a decentralized web-like network of peer-to-peer users, created by Tamas Kocsis in 2015, programming for the network was based in Budapest, Hungary; is built in Python; and is fully open source. Instead of having an IP address, sites are identified by a public key. The private key allows the owner of a site to sign and publish changes, which propagate through the network. Sites can be accessed through an ordinary web browser when using the ZeroNet application, which acts as a local webhost for such pages. In addition to using bitcoin cryptography, ZeroNet uses trackers from the BitTorrent network to negotiate connections between peers. ZeroNet is not anonymous by default, but it supports routing traffic through the Tor network.

References

  1. "Tahoe-LAFS Documentation". tahoe-lafs.org. Archived from the original on 2012-08-21. Retrieved 2013-05-01.
  2. "Release tahoe-lafs-1.17.0 · tahoe-lafs/tahoe-lafs". GitHub. Archived from the original on 2023-04-15. Retrieved 2023-04-15.
  3. Willis, Nathan (17 February 2012). "Weekend Project: Get Started with Tahoe-LAFS Storage Grids". Linux.com . Archived from the original on 27 October 2021. Retrieved 5 March 2021.
  4. "About.RST in trunk/Docs – Tahoe-LAFS". Archived from the original on 2020-06-07. Retrieved 2013-01-07.
  5. "Tahoe-LAFS wiki". tahoe-lafs.org. Archived from the original on 2014-12-05. Retrieved 2014-12-01.
  6. Paul, Ryan (4 August 2009). "P2P-like Tahoe filesystem offers secure storage in the cloud". Ars Technica. Archived from the original on 11 January 2021. Retrieved 3 March 2021.
  7. Monteiro, Julian Geraldes (16 November 2010). "Modeling and Analysis of Reliable Peer-to-Peer Storage Systems" (PDF). Sophia Antipolis: Université de Nice. p. 17. Archived (PDF) from the original on 2 July 2013. Retrieved 15 December 2012.
  8. 1 2 Byfield, Bruce (20 May 2014). "Hide Cloud Data from the Cloud Vendor". Linux Magazine. Archived from the original on 27 February 2021. Retrieved 3 March 2021.
  9. O'Brien, Danny (6 September 2013). "Tahoe and Tor: Building Privacy on Strong Foundations". Electronic Frontier Foundation. Archived from the original on 25 January 2021. Retrieved 3 March 2021.
  10. 1 2 Wilcox-O'Hearn, Zooko; Warner, Brian (31 October 2008). "Tahoe: the least-authority filesystem" (PDF). Proceedings of the 4th ACM International Workshop on Storage Security and Survivability. Association for Computing Machinery: 21–26. doi:10.1145/1456469.1456474. S2CID   12056440. Archived (PDF) from the original on 26 January 2021. Retrieved 5 March 2021.
  11. Palmer, Chris (14 December 2010). "Constructive Direct Action Against Censorship". Electronic Frontier Foundation. Archived from the original on 20 January 2021. Retrieved 3 March 2021.
  12. "GNU 30th anniversary celebration and hackathon". 28 September 2013. Archived from the original on 4 April 2021. Retrieved 3 March 2021.
  13. Huchton, Scott (March 2011). "Secure mobile distributed file system (MDFS)" (PDF). Monterey, California. Naval Postgraduate School. pp. 8–9. Archived (PDF) from the original on 21 October 2021. Retrieved 15 December 2012.
  14. Haver, Eirik; Melvold, Eivind; Ruud, Pål (2011). "Cloud Storage Vault". Institutt for telematikk. pp. 20–21. Archived from the original on 27 October 2021. Retrieved 3 March 2021.
  15. Lee, Changhoon (2011). Secure and trust computing, data management and applications : STA 2011 workshops: IWCS 2011 and STAVE 2011, Loutraki, Greece, June 28 – 30, 2011 . Berlin: Springer. pp.  192–193. ISBN   978-3642223648.
  16. Arjan Peddemors, Christiaan Kuun, Rogier Spoor, Paul Dekkers and Christiaan den Besten (29 June 2011). "Survey of Technologies for Wide Area Distributed Storage" (PDF). p. 17. Archived (PDF) from the original on 3 March 2021. Retrieved 3 March 2021.{{cite web}}: CS1 maint: multiple names: authors list (link)
  17. "OSPackages – Tahoe-LAFS". tahoe-lafs.org. Archived from the original on 2013-11-12. Retrieved 2014-01-27.