GPFS

Last updated
GPFS
Developer(s) IBM
Full nameIBM Spectrum Scale
Introduced1998;26 years ago (1998) with AIX
Limits
Max volume size8 YB
Max file size8 EB
Max no. of files264 per file system
Features
File system
permissions
POSIX
Transparent
encryption
yes
Other
Supported
operating systems
AIX, Linux, Windows Server

GPFS (General Parallel File System, brand name IBM Storage Scale and previously IBM Spectrum Scale) [1] is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. [2] For example, it is the filesystem of the Summit [3] at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. [4] Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine [5] has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5 TB/s of sequential I/O and 2.2 TB/s of random I/O.

Contents

Like typical cluster filesystems, GPFS provides concurrent high-speed file access to applications executing on multiple nodes of clusters. It can be used with AIX clusters, Linux clusters, [6] on Microsoft Windows Server, or a heterogeneous cluster of AIX, Linux and Windows nodes running on x86, Power or IBM Z processor architectures. In addition to providing filesystem storage capabilities, it provides tools for management and administration of the GPFS cluster and allows for shared access to file systems from remote clusters.

History

GPFS began as the Tiger Shark file system, a research project at IBM's Almaden Research Center as early as 1993. Tiger Shark was initially designed to support high throughput multimedia applications. This design turned out to be well suited to scientific computing. [7]

Another ancestor is IBM's Vesta filesystem, developed as a research project at IBM's Thomas J. Watson Research Center between 1992 and 1995. [8] Vesta introduced the concept of file partitioning to accommodate the needs of parallel applications that run on high-performance multicomputers with parallel I/O subsystems. With partitioning, a file is not a sequence of bytes, but rather multiple disjoint sequences that may be accessed in parallel. The partitioning is such that it abstracts away the number and type of I/O nodes hosting the filesystem, and it allows a variety of logically partitioned views of files, regardless of the physical distribution of data within the I/O nodes. The disjoint sequences are arranged to correspond to individual processes of a parallel application, allowing for improved scalability. [9] [10]

Vesta was commercialized as the PIOFS filesystem around 1994, [11] and was succeeded by GPFS around 1998. [12] [13] The main difference between the older and newer filesystems was that GPFS replaced the specialized interface offered by Vesta/PIOFS with the standard Unix API: all the features to support high performance parallel I/O were hidden from users and implemented under the hood. [7] [13] GPFS also shared many components with the related products IBM Multi-Media Server and IBM Video Charger, which is why many GPFS utilities start with the prefix mm—multi-media. [14] :xi

GPFS has been available on IBM's AIX since 1998, on Linux since 2001, and on Windows Server since 2008.

Today it is used by many of the top 500 supercomputers listed on the Top 500 Supercomputing List. Since inception, it has been successfully deployed for many commercial applications including digital media, grid analytics, and scalable file services.

In 2010, IBM previewed a version of GPFS that included a capability known as GPFS-SNC, where SNC stands for Shared Nothing Cluster. This was officially released with GPFS 3.5 in December 2012, and is now known as FPO [15] (File Placement Optimizer). This allows it to use locally attached disks on a cluster of network connected servers rather than requiring dedicated servers with shared disks (e.g. using a SAN). FPO is suitable for workloads with high data locality such as shared nothing database clusters such as SAP HANA and DB2 DPF, and can be used as a HDFS-compatible filesystem.

Architecture

It is a clustered file system. It breaks a file into blocks of a configured size, less than 1 megabyte each, which are distributed across multiple cluster nodes.

The system stores data on standard block storage volumes, but includes an internal RAID layer that can virtualize those volumes for redundancy and parallel access much like a RAID block storage system. It also has the ability to replicate across volumes at the higher file level.

Features of the architecture include

Other features include high availability, ability to be used in a heterogeneous cluster, disaster recovery, security, DMAPI, HSM and ILM.

Compared to Hadoop Distributed File System (HDFS)

Hadoop's HDFS filesystem, is designed to store similar or greater quantities of data on commodity hardware — that is, datacenters without RAID disks and a storage area network (SAN).

Information lifecycle management

Storage pools allow for the grouping of disks within a file system. An administrator can create tiers of storage by grouping disks based on performance, locality or reliability characteristics. For example, one pool could be high-performance Fibre Channel disks and another more economical SATA storage.

A fileset is a sub-tree of the file system namespace and provides a way to partition the namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of rules in a user defined policy.

There are two types of user defined policies: file placement and file management. File placement policies direct file data as files are created to the appropriate storage pool. File placement rules are selected by attributes such as file name, the user name or the fileset. File management policies allow the file's data to be moved or replicated or files to be deleted. File management policies can be used to move data from one pool to another without changing the file's location in the directory structure. File management policies are determined by file attributes such as last access time, path name or size of the file.

The policy processing engine is scalable and can be run on many nodes at once. This allows management policies to be applied to a single file system with billions of files and complete in a few hours.[ citation needed ]

See also

Related Research Articles

In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

Google File System is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. Google file system was replaced by Colossus in 2010.

The VERITAS File System is an extent-based file system. It was originally developed by VERITAS Software. Through an OEM agreement, VxFS is used as the primary filesystem of the HP-UX operating system. With on-line defragmentation and resize support turned on via license, it is known as OnlineJFS. It is also supported on AIX, Linux, Solaris, OpenSolaris, SINIX/Reliant UNIX, UnixWare and SCO OpenServer. VxFS was originally developed for AT&T's Unix System Laboratories. VxFS is packaged as a part of the Veritas Storage Foundation.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

In computing, a fileset is a set of computer files linked by defining property or common characteristic. There are different types of fileset though the context will usually give the defining characteristic. Sometimes it is necessary to explicitly state the fileset type to avoid ambiguity, an example is the emacs editor which explicitly mentions its Version Control (VC) fileset type to distinguish from its "named files" fileset type.

The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, The DOE Office of Science Advanced Scientific Computing Research program, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

<span class="mw-page-title-main">Dell EMC Isilon</span> Network-attached storage

Dell EMC Isilon is a scale out network-attached storage platform offered by Dell EMC for high-volume storage, backup and archiving of unstructured data. It provides a cluster-based storage array based on industry standard hardware, and is scalable to 50 petabytes in a single filesystem using its FreeBSD-derived OneFS file system.

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. Ceph provides completely distributed operation without a single point of failure and scalability to the exabyte level, and is freely available. Since version 12 (Luminous), Ceph does not rely on any other conventional filesystem and directly manages HDDs and SSDs with its own storage backend BlueStore and can expose a POSIX filesystem.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications that devote most of their execution time to computational requirements are deemed compute-intensive, whereas applications are deemed data-intensive require large volumes of data and devote most of their processing time to I/O and manipulation of data.

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

<span class="mw-page-title-main">National Computer Center for Higher Education (France)</span>

The National Computer Center for Higher Education, based in Montpellier, is a public institution under the supervision of the Ministry of Higher Education and Research (MESR) created by a decree issued in 1999. CINES offers IT services for public research in France. It is one of the major national centers for computing power supply for research in France.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

<span class="mw-page-title-main">Summit (supercomputer)</span> Supercomputer developed by IBM

Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, capable of 200 petaFLOPS thus making it the 5th fastest supercomputer in the world after Frontier (OLCF-5), Fugaku, LUMI, and Leonardo, with Frontier being the fastest. It held the number 1 position from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS.

The MapR File System is a clustered file system that supports both very large-scale and high-performance uses. MapR FS supports a variety of interfaces including conventional read/write file access via NFS and a FUSE interface, as well as via the HDFS interface used by many systems such as Apache Hadoop and Apache Spark. In addition to file-oriented access, MapR FS supports access to tables and message streams using the Apache HBase and Apache Kafka APIs, as well as via a document database interface.

References

  1. "GPFS (General Parallel File System)". IBM. Retrieved 2020-04-07.
  2. Schmuck, Frank; Roger Haskin (January 2002). "GPFS: A Shared-Disk File System for Large Computing Clusters" (PDF). Proceedings of the FAST'02 Conference on File and Storage Technologies. Monterey, California, US: USENIX. pp. 231–244. ISBN   1-880446-03-0 . Retrieved 2008-01-18.
  3. "Summit compute systems". Oak Ridge National Laboratory. Retrieved 2020-04-07.
  4. "November 2019 top500 list". top500.org. Archived from the original on 2020-01-02. Retrieved 2020-04-07.
  5. "Summit FAQ". Oak Ridge National Laboratory. Retrieved 2020-04-07.
  6. Wang, Teng; Vasko, Kevin; Liu, Zhuo; Chen, Hui; Yu, Weikuan (Nov 2014). "BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution". 2014 International Workshop on Data Intensive Scalable Computing Systems. IEEE. pp. 25–32. doi:10.1109/DISCS.2014.6. ISBN   978-1-4673-6750-9. S2CID   2402391.
  7. 1 2 May, John M. (2000). Parallel I/O for High Performance Computing. Morgan Kaufmann. p. 92. ISBN   978-1-55860-664-7 . Retrieved 2008-06-18.
  8. Corbett, Peter F.; Feitelson, Dror G.; Prost, J.-P.; Baylor, S. J. (1993). "Parallel access to files in the Vesta file system". Proceedings of the 1993 ACM/IEEE conference on Supercomputing - Supercomputing '93. Portland, Oregon, United States: ACM/IEEE. pp. 472–481. doi:10.1145/169627.169786. ISBN   978-0818643408. S2CID   46409100.
  9. Corbett, Peter F.; Feitelson, Dror G. (August 1996). "The Vesta parallel file system" (PDF). Transactions on Computer Systems. 14 (3): 225–264. doi:10.1145/233557.233558. S2CID   11975458. Archived from the original on 2012-02-12. Retrieved 2008-06-18.{{cite journal}}: CS1 maint: bot: original URL status unknown (link)
  10. Teng Wang; Kevin Vasko; Zhuo Liu; Hui Chen; Weikuan Yu (2016). "Enhance parallel input/output with cross-bundle aggregation". The International Journal of High Performance Computing Applications. 30 (2): 241–256. doi:10.1177/1094342015618017. S2CID   12067366.
  11. Corbett, P. F.; D. G. Feitelson; J.-P. Prost; G. S. Almasi; S. J. Baylor; A. S. Bolmarcich; Y. Hsu; J. Satran; M. Snir; R. Colao; B. D. Herr; J. Kavaky; T. R. Morgan; A. Zlotek (1995). "Parallel file systems for the IBM SP computers" (PDF). IBM Systems Journal. 34 (2): 222–248. CiteSeerX   10.1.1.381.2988 . doi:10.1147/sj.342.0222. Archived from the original on 2004-04-19. Retrieved 2008-06-18.{{cite journal}}: CS1 maint: bot: original URL status unknown (link)
  12. Barris, Marcelo; Terry Jones; Scott Kinnane; Mathis Landzettel Safran Al-Safran; Jerry Stevens; Christopher Stone; Chris Thomas; Ulf Troppens (September 1999). Sizing and Tuning GPFS (PDF). IBM Redbooks, International Technical Support Organization. see page 1 ("GPFS is the successor to the PIOFS file system"). Archived from the original on 2010-12-14. Retrieved 2022-12-06.{{cite book}}: CS1 maint: bot: original URL status unknown (link)
  13. 1 2 Snir, Marc (June 2001). "Scalable parallel systems: Contributions 1990-2000" (PDF). HPC seminar, Computer Architecture Department, Universitat Politècnica de Catalunya. Retrieved 2008-06-18.
  14. General Parallel File System Administration and Programming Reference Version 3.1 (PDF). IBM. April 2006.
  15. "IBM GPFS FPO (DCS03038-USEN-00)" (PDF). IBM Corporation. 2013. Retrieved 2012-08-12.[ permanent dead link ]