XtreemFS

Last updated
XtreemFS
XtreemFS Logo.jpg
Stable release
1.5.1 / March 12, 2015 (2015-03-12)
Repository
Written in Java
Operating system Linux, Windows, Mac OS X
Type Distributed file system
License New BSD
Website www.xtreemfs.org

XtreemFS is an object-based, distributed file system for wide area networks. [1] XtreemFS' outstanding feature is full (all components) and real (all failure scenarios, including network partitions) fault tolerance, while maintaining POSIX file system semantics. Fault-tolerance is achieved by using Paxos-based lease negotiation algorithms and is used to replicate files and metadata. SSL and X.509 certificates support make XtreemFS usable over public networks.

Contents

XtreemFS has been under development since early 2007. A first public release was made in August 2008. XtreemFS 1.0 was released in August 2009. The 1.0 release includes support for read-only replication with failover, data center replica maps, parallel reads and writes, and a native Windows client. The 1.1 added automatic on-close replication and POSIX advisory locks. In mid-2011, release 1.3 added read/write replication for files. Version 1.4 underwent extensive testing and is considered production-quality. An improved Hadoop integration and support for SSDs was added in version 1.5.

XtreemFS is funded by the European Commission's IST programme.

The original XtreemFS team founded Quobyte Inc. in 2013. Quobyte offers a professional storage system as a commercial product.

Features

Use cases

See also

Related Research Articles

Google File System is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. The last version of Google File System codenamed Colossus was released in 2010.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in June 2020, Fugaku, as well as previous top supercomputers such as Titan and Sequoia.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Gluster Inc. was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineering center in Bangalore, India. Gluster was funded by Nexus Venture Partners and Index Ventures. Gluster was acquired by Red Hat on October 7, 2011.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Ceph is an open-source software-defined storage platform that implements object storage on a single distributed computer cluster and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalability to the exabyte level, and to be freely available. Since version 12, Ceph does not rely on other filesystems and can directly manage HDDs and SSDs with its own storage backend BlueStore and can completely self reliantly expose a POSIX filesystem.

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS and MapReduce technology. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming architecture framework that supports in-storage parallel data processing for data stored in Sector. Sector/Sphere operates in a wide area network (WAN) setting.

Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system for data centers. Initially proprietary software, it was released to the public as open source on May 30, 2008.

BeeGFS Distributed file system

BeeGFS is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is data throughput.

RozoFS is a free software distributed file system. It comes as a free software, licensed under the GNU GPL v2. RozoFS uses erasure coding for redundancy.

Data grid Set of services used to access, modify and transfer geographical data

A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. The adjacent diagram depicts a high level view of a data grid.

OrangeFS is an open-source parallel file system, the next generation of Parallel Virtual File System (PVFS). A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. OrangeFS was designed for use in large-scale cluster computing and is used by companies, universities, national laboratories and similar sites worldwide.

Contrail was a cloud federation computing project that ran from 1 October 2010 until 31 January 2014. Contrail produced open-source cloud stack software including Security, PaaS components, Distributed file system, Application Lifecycle management middleware, and SLA Management. Contrail supports OVF standard and runs on OpenStack and OpenNebula. Contrail software is a full IaaS + PaaS Cloud stack ready to implement Cloud Federations.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

LizardFS is an open source distributed file system that is POSIX-compliant and licensed under GPLv3. It was released in 2013 as fork of MooseFS. LizardFS is also offering a paid Technical Support with possibility of configurating and setting up the cluster and active cluster monitoring.

The MapR File System is a clustered file system that supports both very large-scale and high-performance uses. MapR FS supports a variety of interfaces including conventional read/write file access via NFS and a FUSE interface, as well as via the HDFS interface used by many systems such as Apache Hadoop and Apache Spark. In addition to file-oriented access, MapR FS supports access to tables and message streams using the Apache HBase and Apache Kafka APIs as well as via a document database interface.

OpenIO

OpenIO offers object storage for a wide range of high-performance applications. OpenIO was founded in 2015 by Laurent Denel (CEO), Jean-François Smigielski (CTO) and five other co-founders; it has leveraged open source software, developed since 2006, based on a grid technology that enables dynamic behaviour and supports heterogenous hardware. In October 2017 OpenIO completed a $5 million funding round. In July 2020 OpenIO has been acquired by OVH.

References

  1. F. Hupfeld, T. Cortes, B. Kolbeck, E. Focht, M. Hess, J. Malo, J. Marti, J. Stender, E. Cesario. "XtreemFS - a case for object-based storage in Grid data management". VLDB Workshop on Data Management in Grids. In: Proceedings of 33rd International Conference on Very Large Data Bases (VLDB) Workshops, 2007.
  2. Versweyveld, Leslie (30 October 2012). "The Contrail project is proud to present its first complete set of interoperable Cloud federation tools". www.isgtw.org. Archived from the original on September 11, 2015. Retrieved 17 October 2013. Alt URL
  3. J. Stender, B. Kolbeck, F. Hupfeld, E. Cesario, E. Focht, M. Hess, J. Malo, J. Marti. "Striping without Sacrifices: Maintaining POSIX Semantics in a Parallel File System". 1st USENIX Workshop on Large-Scale Computing (LASCO '08), Boston, 2008