Comparison of distributed file systems

Last updated

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.

Contents

Distributed file systems differ in their performance, mutability of content, handling of concurrent writes, handling of permanent or temporary loss of nodes or storage, and their policy of storing content.

Locally managed

FOSS

ClientWritten inLicenseAccess API High availability Shards Efficient Redundancy Redundancy GranularityInitial release yearMemory requirements (GB)
Alluxio (Virtual Distributed File System) JavaApache License 2.0 HDFS, FUSE, HTTP/REST, S3 hot standbyNoReplication [1] File [2] 2013
Ceph C++LGPLlibrados (C, C++, Python, Ruby), S3, Swift, FUSE YesYesPluggable erasure codes [3] Pool [4] 20101 per TB of storage
Coda CGPLCYesYesReplicationVolume [5] 1987
GlusterFS CGPLv3libglusterfs, FUSE, NFS, SMB, Swift, libgfapimirrorYesReed-Solomon [6] Volume [7] 2005
HDFS JavaApache License 2.0Java and C client, HTTP, FUSE [8] transparent master failoverNoReed-Solomon [9] File [10] 2005
IPFS GoApache 2.0 or MIT HTTP gateway, FUSE, Go client, Javascript client, command line tool Yeswith IPFS Cluster Replication [11] Block [12] 2015 [13]
JuiceFS GoApache License 2.0 POSIX, FUSE, HDFS, S3 YesYesReed-SolomonObject2021
Kertish-DFS GoGPLv3HTTP(REST), CLI, C# Client, Go ClientYesReplication2020
LizardFS C++GPLv3 POSIX, FUSE, NFS-Ganesha, Ceph FSAL (via libcephfs) masterNoReed-Solomon [14] File [15] 2013
Lustre CGPLv2 POSIX, NFS-Ganesha, NFS, SMB YesYesNo redundancy [16] [17] No redundancy [18] [19] 2003
MinIO GoAGPL3.0 AWS S3 API, FTP, SFTP YesYesReed-Solomon [20] Object [21] 2014
MooseFS CGPLv2 POSIX, FUSE masterNoReplication [22] File [23] 2008
OpenAFS CIBM Public License Virtual file system, Installable File System ReplicationVolume [24] 2000 [25]
OpenIO [26] CAGPLv3 / LGPLv3Native (Python, C, Java), HTTP/REST, S3, Swift, FUSE (POSIX, NFS, SMB, FTP)YesPluggable erasure codes [27] Object [28] 20150.5
Ori [29] C, C++MITlibori, FUSE ReplicationFilesystem [30] 2012
Quantcast File System CApache License 2.0C++ client, FUSE (C++ server: MetaServer and ChunkServer are both in C++)masterNoReed-Solomon [31] File [32] 2012
RozoFS C, PythonGPLv2 FUSE, SMB, NFS, key/valueYesMojette [33] Volume [34] 2011 [35]
SeaweedFS Go, JavaApache License 2.0HTTP (REST), POSIX, FUSE, S3, HDFS requires CockroachDB, undocumented configReed-Solomon [36] Volume [37] 2015
Storj GoApache License 2.0, Affero General Public License v3HTTP (REST), S3, Native (Go, C, Python, Java)YesReed-Solomon [38] Object [38] 2018
Tahoe-LAFS Python GNU GPL [39] HTTP (browser or CLI), SFTP, FTP, FUSE via SSHFS, pyfilesystemReed-Solomon [40] File [41] 2007
XtreemFS Java, C++BSD Licenselibxtreemfs (Java, C++), FUSE Replication [42] File [43] 2009

Proprietary

ClientWritten inLicenseAccess API
BeeGFS C / C++FRAUNHOFER FS (FhGFS) EULA, [44]

GPLv2 client

POSIX
ObjectiveFS [45] C Proprietary POSIX, FUSE
Spectrum Scale (GPFS) C, C++ Proprietary POSIX, NFS, SMB, Swift, S3, HDFS
MapR-FS C, C++ Proprietary POSIX, NFS, FUSE, S3, HDFS, CLI
PanFS C, C++ Proprietary DirectFlow, POSIX, NFS, SMB/CIFS, HTTP, CLI
Infinit [46] C++ Proprietary (to be open sourced) [47] FUSE, Installable File System, NFS/SMB, POSIX, CLI, SDK (libinfinit)
Isilon OneFS C/C++ Proprietary POSIX, NFS, SMB/CIFS, HDFS, HTTP, FTP, SWIFT Object, CLI, Rest API
Qumulo C/C++ Proprietary POSIX, NFS, SMB/CIFS, CLI, S3, Rest API
Scality C Proprietary FUSE, NFS, REST, AWS S3
Quobyte Java, C++ Proprietary POSIX, FUSE, NFS, SMB/CIFS, HDFS, AWS S3, TensorFlow Plugin, CLI, Rest API

Remote access

NameRun byAccess API
Amazon S3 Amazon.com HTTP (REST/SOAP)
Google Cloud Storage Google HTTP (REST)
SWIFT (part of OpenStack) Rackspace, Hewlett-Packard, others HTTP (REST)
Microsoft Azure Microsoft HTTP (REST)
IBM Cloud Object Storage IBM (formerly Cleversafe) [48] HTTP (REST)

Comparison

Some researchers have made a functional and experimental analysis of several distributed file systems including HDFS, Ceph, Gluster, Lustre and old (1.6.x) version of MooseFS, although this document is from 2013 and a lot of information are outdated (e.g. MooseFS had no HA for Metadata Server at that time). [49]

The cloud based remote distributed storage from major vendors have different APIs and different consistency models. [50]

See also

Related Research Articles

In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures, which transforms a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols. The fraction r = k/n is called the code rate. The fraction k’/k, where k’ denotes the number of symbols required for recovery, is called reception efficiency. The recovery algorithm expects that it is known which of the n symbols are lost.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 Top 500 List. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem is called Alpine.

Gluster Inc. was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineering center in Bangalore, India. Gluster was funded by Nexus Venture Partners and Index Ventures. Gluster was acquired by Red Hat on October 7, 2011.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. Ceph provides distributed operation without a single point of failure and scalability to the exabyte level. Since version 12 (Luminous), Ceph does not rely on any other conventional filesystem and directly manages HDDs and SSDs with its own storage backend BlueStore and can expose a POSIX filesystem.

CloudStore was Kosmix's C++ implementation of the Google File System. It parallels the Hadoop project, which is implemented in the Java programming language. CloudStore supports incremental scalability, replication, checksumming for data integrity, client side fail-over and access from C++, Java and Python. There is a FUSE module so that the file system can be mounted on Linux.

Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system for data centers. Initially proprietary software, it was released to the public as open source on May 30, 2008.

Tahoe-LAFS is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system. It can be used as an online backup system, or to serve as a file or Web host similar to Freenet, depending on the front-end used to insert and access files in the Tahoe system. Tahoe can also be used in a RAID-like fashion using multiple disks to make a single large Redundant Array of Inexpensive Nodes (RAIN) pool of reliable data storage.

A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers (nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage contributed in aggregate must be at least equal to the amount of storage needed by end users. However, some nodes may contribute less storage and some may contribute more. There may be reward models to compensate the nodes contributing more.

RozoFS is a free software distributed file system. It comes as a free software, licensed under the GNU GPL v2. RozoFS uses erasure coding for redundancy.

Red Hat Gluster Storage, formerly Red Hat Storage Server, is a computer storage product from Red Hat. It is based on open source technologies such as GlusterFS and Red Hat Enterprise Linux.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

ObjectiveFS is a distributed file system developed by Objective Security Corp. It is a POSIX-compliant file system built with an object store backend. It was initially released with AWS S3 backend, and has later implemented support for Google Cloud Storage and object store devices. It was released for beta in early 2013, and the first version was officially released on August 11, 2013.

MinIO is an object storage system released under GNU Affero General Public License v3.0. It is API compatible with the Amazon S3 cloud storage service. It is capable of working with unstructured data such as photos, videos, log files, backups, and container images with the maximum supported object size being 50TB.

LizardFS is an open source distributed file system that is POSIX-compliant and licensed under GPLv3. It was released in 2013 as fork of MooseFS. LizardFS is also offering a paid Technical Support with possibility of configurating and setting up the cluster and active cluster monitoring.

The MapR File System is a clustered file system that supports both very large-scale and high-performance uses. MapR FS supports a variety of interfaces including conventional read/write file access via NFS and a FUSE interface, as well as via the HDFS interface used by many systems such as Apache Hadoop and Apache Spark. In addition to file-oriented access, MapR FS supports access to tables and message streams using the Apache HBase and Apache Kafka APIs, as well as via a document database interface.

Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & Professor Ion Stoica. Alluxio sits between computation and storage in the big data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License.

References

  1. "Caching: Managing Data Replication in Alluxio".
  2. "Caching: Managing Data Replication in Alluxio".
  3. "Erasure Code Profiles".
  4. "Pools".
  5. Satyanarayanan, Mahadev; Kistler, James J.; Kumar, Puneet; Okasaki, Maria E.; Siegel, Ellen H.; Steere, David C. "Coda: A Highly Available File System for a Distributed Workstation Environment" (PDF).{{cite journal}}: Cite journal requires |journal= (help)
  6. "Erasure coding implementation". GitHub . 2 November 2021.
  7. "Setting up GlusterFS Volumes".
  8. "MountableHDFS".
  9. "HDFS-7285 Erasure Coding Support inside HDFS".
  10. "Apache Hadoop: setrep".
  11. Erasure coding plan: "Reed-Solomon layer over IPFS #196". GitHub ., "Erasure Coding Layer #6". GitHub .
  12. "CLI Commands: ipfs bitswap wantlist".
  13. "Why The Internet Needs IPFS Before It's Too Late". 4 October 2015.
  14. "Configuring Replication Modes".
  15. "Configuring Replication Modes: Set and show the goal of a file/directory".
  16. "Lustre Operations Manual: What a Lustre File System Is (and What It Isn't)".
  17. Reed-Solomon in progress: "LU-10911 FLR2: Erasure coding".
  18. "Lustre Operations Manual: Lustre Features".
  19. File-level redundancy plan: "File Level Redundancy Solution Architecture".
  20. "MinIO Erasure Code Quickstart Guide".
  21. "MinIO Storage Class Quickstart Guide". GitHub .
  22. Only available in the proprietary version 4.x "[feature] erasure-coding #8". GitHub .
  23. "mfsgoal(1)".
  24. "Replicating Volumes (Creating Read-only Volumes)".
  25. "OpenAFS".
  26. "OpenIO SDS Documentation". docs.openio.io.
  27. "Erasure Coding".
  28. "Declare Storage Policies".
  29. "Ori: A Secure Distributed File System".
  30. Mashtizadeh, Ali Jose; Bittau, Andrea; Huang, Yifeng Frank; Mazières, David. "Replication, History, and Grafting in the Ori File System" (PDF).{{cite journal}}: Cite journal requires |journal= (help)
  31. "The Quantcast File System" (PDF).
  32. "qfs/src/cc/tools/cptoqfs_main.cc". GitHub . 8 December 2021.
  33. "About RozoFS: Mojette Transform".
  34. "Setting up RozoFS: Exportd Configuration File".
  35. "Initial commit". GitHub .
  36. "Erasure Coding for warm storage". GitHub .
  37. "Replication". GitHub .
  38. 1 2 "Storj: A Decentralized Cloud Storage Network Framework v3.0" (PDF). 30 October 2018.
  39. "About Tahoe-LAFS". GitHub . 24 February 2022.
  40. "zfec -- a fast C implementation of Reed-Solomon erasure coding". GitHub . 24 February 2022.
  41. "Tahoe-LAFS Architecture: File Encoding".
  42. "Under the Hood: File Replication".
  43. "Quickstart: Replicate A File".
  44. "FRAUNHOFER FS (FhGFS) END USER LICENSE AGREEMENT". Fraunhofer Society. 2012-02-22.
  45. "ObjectiveFS official website".
  46. "The Infinit Storage Platform".
  47. "Infinit's Open Source Projects". 13 August 2019.
  48. "IBM Plans to Acquire Cleversafe for Object Storage in Cloud". www-03.ibm.com. 2015-10-05. Archived from the original on October 8, 2015. Retrieved 2019-05-06.
  49. Séguin, Cyril; Depardon, Benjamin; Le Mahec, Gaël. "Analysis of Six Distributed File Systems" (PDF). HAL.
  50. "Data Consistency Models of Public Cloud Storage Services: Amazon S3, Google Cloud Storage and Windows Azure Storage". SysTutorials. 4 February 2014. Retrieved 19 June 2017.