This article contains promotional content .(August 2018) |
Developer(s) | ThinkParQ, Fraunhofer ITWM, |
---|---|
Stable release | 7.4.5 [1] / September 2024 |
Repository | github |
Operating system | Linux |
Type | Distributed file system |
License | Server: proprietary, client: GPL v2 |
Website | beegfs |
BeeGFS (formerly FhGFS) is a parallel file system developed for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. It specializes in data throughput.
BeeGFS was originally developed at the Fraunhofer Center for High Performance Computing in Germany by a team led by Sven Breuner. [2] Breuner later became the CEO of ThinkParQ (2014–2018), the spin-off company that was founded in 2014 to maintain BeeGFS and offer professional services.
While the Community Edition of BeeGFS can be downloaded and used free of charge, the Enterprise Edition must be used under a professional support subscription contract. [3]
BeeGFS started in 2005 as an in-house development at Fraunhofer Center for HPC to replace the existing file system on the institute's new compute cluster and to be used in a production environment.
In 2007, the first beta version of the software was announced during ISC07 in Dresden, Germany and introduced to the public during SC07 in Reno, NV. One year later the first stable major release became available.
In 2014, Fraunhofer started its spin-off, the new company called ThinkParQ [4] for BeeGFS. In this process, FhGFS was renamed and became BeeGFS. [5] While ThinkParQ maintains the software and offers professional services, further feature development will continue in cooperation of ThinkParQ and Fraunhofer.
Due to the nature of BeeGFS being free of charge, it is unknown how many active installations there are. However, in 2014 there were already around 100 customers worldwide that used BeeGFS with commercial support by ThinkParQ and Fraunhofer. Among those are academic users such as universities and research facilities [6] as well as commercial companies in fields like the finance or the oil & gas industry.
Notable installations include several TOP500 computers such as the Loewe-CSC [7] cluster at the Goethe University Frankfurt, Germany (No. 22 on installation), the Vienna Scientific Cluster [8] at the University of Vienna, Austria (No. 56 on installation), and the Abel [9] cluster at the University of Oslo, Norway (No. 96 on installation).
When developing BeeGFS, Fraunhofer aimed to create a software focused on scalability, flexibility and usability.
BeeGFS runs on any Linux machine and consists of several components that include services for clients, metadata servers and storage servers. In addition, there is a service for the management host as well as one for a graphical administration and monitoring system.
To run BeeGFS, at least one instance of the metadata server and the storage server is required. But BeeGFS allows multiple instances of each service to distribute the load from a large number of clients. The scalability of each component makes sure the system itself is scalable.
File contents are distributed over several storage servers using striping, i.e. each file is split into chunks of a given size and these chunks are distributed over the existing storage servers. The size of these chunks can be defined by the file system administrator. In addition, the metadata is distributed over several metadata servers on a directory level, with each server storing a part of the complete file system tree. This approach allows fast access to the data.
Clients, as well as metadata or storage servers, can be added into an existing system without any downtime. The client itself is a lightweight kernel module that does not require any kernel patches. The servers run on top of an existing local file system. There are no restrictions to the type of underlying file system as long as it supports POSIX; recommendations are to use ext4 for the metadata servers and XFS for the storage servers. Both servers run in userspace.
Also, there is no strict requirement for dedicated hardware for individual services. The design allows a file system administrator to start the services in any combination on a given set of machines and expand in the future. A common way among BeeGFS users to take advantage of this is by combining metadata servers and storage servers on the same machines.
BeeGFS supports various network-interconnects with dynamic failover such as Ethernet or Infiniband as well as many different Linux distributions and kernels (from 2.6.16 to the latest vanilla). The software has a simple setup and startup mechanism using init scripts. For users who prefer a graphical interface over command lines, a Java-based GUI (AdMon) is available. The GUI provides monitoring of the BeeGFS state and management of system settings. Besides managing and administrating the BeeGFS installation, this tool also offers a couple of monitoring options to help identify performance issues within the system.
BeeOND (BeeGFS on-demand) allows the creation of BeeGFS file system instances on a set of nodes with one single command line. Possible use cases for the tool are manifold; a few include setting up a dedicated parallel file system for a cluster job (often referred to as burst-buffering), cloud computing or fast and easy temporary setups for testing purposes.
An open-source container storage interface (CSI) driver enables BeeGFS to be used with container orchestrators like Kubernetes. [11] The driver is designed to support environments where containers running in Kubernetes and jobs running in traditional HPC workload managers need to share access to the same BeeGFS file system. The driver enables two main workflows:
Container access and visibility into the file system is restricted to the intended directory. Dynamic provisioning takes into account BeeGFS features including storage pools and striping when creating the corresponding directory in BeeGFS. General features of a POSIX file system such as the ability to specify permissions on new directories are also exposed, easing integration of global shared storage and containers. This notably simplifies tracking and limiting container consumption of the shared file system using BeeGFS quotas. [12]
The following benchmarks have been performed on Fraunhofer Seislab, [13] a test and experimental cluster at Fraunhofer ITWM with 25 nodes (20 compute plus 5 storage) and a three-tier memory: 1 TB RAM, 20 TB SSD, 120 TB HDD. Single node performance on the local file system without BeeGFS is 1,332 MB/s (write) and 1,317 MB/s (read).
The nodes are equipped with 2x Intel Xeon X5660, 48 GB RAM, 4x Intel 510 Series SSD (RAID 0), Ext4, QDR Infiniband and run Scientific Linux 6.3, Kernel 2.6.32-279 and FhGFS 2012.10-beta1.
Fraunhofer ITWM is participating in the Dynamic-Exascale Entry Platform – Extended Reach (DEEP-ER) project of the European Union, [14] which addresses the problems of the growing gap between compute speed and I/O bandwidth, and system resiliency for large-scale systems.
Some of the aspects that BeeGFS developers are working on under the scope of this project are:
The plan is to keep the POSIX interface for backward compatibility but also allow applications more control over how the file system handles things like data placement and coherency through API extensions.
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2022, supercomputers have existed which can perform over 1018 FLOPS, so called exascale supercomputers. For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.
In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.
Google File System is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. Google file system was replaced by Colossus in 2010.
Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.
GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 Top 500 List. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem is called Alpine.
The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, The DOE Office of Science Advanced Scientific Computing Research program, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high-performance computing, scientific visualization, data analysis and storage systems, software, research and development, and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.
Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. Ceph provides distributed operation without a single point of failure and scalability to the exabyte level. Since version 12 (Luminous), Ceph does not rely on any other conventional filesystem and directly manages HDDs and SSDs with its own storage backend BlueStore and can expose a POSIX filesystem.
Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system for data centers. Initially proprietary software, it was released to the public as open source on May 30, 2008.
XtreemFS is an object-based, distributed file system for wide area networks. XtreemFS' outstanding feature is full and real fault tolerance, while maintaining POSIX file system semantics. Fault-tolerance is achieved by using Paxos-based lease negotiation algorithms and is used to replicate files and metadata. SSL and X.509 certificates support make XtreemFS usable over public networks.
In computing, a distributed file system (DFS) or network file system is any file system that allows access from multiple hosts to files shared via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.
OrangeFS is an open-source parallel file system, the next generation of Parallel Virtual File System (PVFS). A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. OrangeFS was designed for use in large-scale cluster computing and is used by companies, universities, national laboratories and similar sites worldwide.
A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.
GPI-Space is a parallel programming development software, developed by the Fraunhofer Institute for Industrial Mathematics (ITWM). The main concept behind the software is separation of domain and HPC knowledge and leaving each part to the respective experts while the GPI-Space as framework integrates both parts together.
Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.
Trinity is a United States supercomputer built by the National Nuclear Security Administration (NNSA) for the Advanced Simulation and Computing Program (ASC). The aim of the ASC program is to simulate, test, and maintain the United States nuclear stockpile.
Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, United States of America. As of June 2024, it is the 9th fastest supercomputer in the world on the TOP500 list. It held the number 1 position on this list from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS.
LizardFS is an open source distributed file system that is POSIX-compliant and licensed under GPLv3. It was released in 2013 as fork of MooseFS. LizardFS is also offering a paid technical support with possibility of configurating and setting up the cluster and active cluster monitoring.