Global file system

Last updated

In computer storage, a global file system is a distributed file system that can be accessed from multiple locations, typically across a wide-area network, and provides concurrent access to a global namespace from all locations. In order for a file system to be considered global, it must allow for files to be created, modified, and deleted from any location. This access is typically provided by a cloud storage gateway at each edge location, which provides access using the NFS or SMB network file sharing protocols. [1]

Contents

There are a number of benefits to using a global file system. First, global file systems can improve the availability of data by allowing multiple copies to be stored in different locations, as well as allowing for rapid restoration of lost data from a remote location. This can be helpful in the event of a disaster, such as a power outage or a natural disaster. Second, global file systems can improve performance by allowing data to be cached closer to the users who are accessing it. This can be especially beneficial in cases where data is accessed by users in different parts of the world. Finally, in contrast to traditional Network attached storage, global file systems can improve the ability of users to collaborate across multiple sites, in a manner similar to Enterprise file synchronization and sharing. [1]

History

The term global file system has historically referred to a distributed virtual name space built on a set of local file systems to provide transparent access to multiple, potentially distributed, systems. [2] These global file systems had the same properties such as blocking interface, no buffering etc. but guaranteed that the same path name corresponds to the same object on all computers deploying the filesystem. Also called distributed file systems these file systems rely on redirection to distributed systems, therefore latency and scalability can affect file access depending on where the target systems reside.


The Andrew File System attempted to solve this for a campus environment using caching and a weak consistency model to achieve local access to remote files.

In the 2000's, global file systems have found a use case in providing hybrid cloud storage, that combine cloud or any object storage, versioning and local caching to create a single, unified, globally accessible file system that does not rely on redirection to a storage device [3] but serves files from the local cache while maintaining the single file system and all meta data in the object storage. [4] As described in Google's patents, advantages of these global file systems include the ability to scale with the object storage, use snapshots stored in the object storage for versioning to replace backup, and create a centrally managed consolidated storage repository in the object storage.

Comparison with Network Attached Storage

When it comes to hybrid file storage, there are two main approaches: network attached storage (NAS) with cloud connectivity and global file system (GFS). The two solutions are fundamentally different. [5]

NAS with cloud connectivity is typically used to supplement on-premises storage. Public clouds may be combined with on-premises NAS for tasks such as backup, tiering, or disaster recovery. This type of setup uses the cloud for specific use cases to complement on-premises storage. On-premises NAS is sold by well-established IT vendors including Dell, IBM, NetApp, and others, and most build in support for some type of cloud connectivity. [5]

A Global File System utilizes a fundamentally different architecture. In these solutions, cloud storage – typically object storage – serves as the core storage element, while caching devices are utilized on-premises to provide data access. These devices can be physical but are increasingly available as virtual solutions that can be deployed in a hypervisor. The use of caching devices reduces the amount of required on-premises storage capacity, and the associated capital expense. [5]

Global file systems are better suited for remote collaboration, as they make it easier to manage access to files across dispersed geographic areas. Utilizing the cloud as a central storage location enables users to access the same data regardless of their location. [5]

There are some trade-offs to consider when choosing a GFS solution, however. One trade off is that because the gold copy of data is stored off-site, there may be latency issues when retrieving infrequently accessed files. [5]

Vendors

Notable vendors in the global filesystem area include: [1]

See also

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

<span class="mw-page-title-main">Network-attached storage</span> Computer data storage server

Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. The term "NAS" can refer to both the technology and systems involved, or a specialized device built for such functionality.

In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

NetApp, Inc. is an American hybrid cloud data services and data management company headquartered in San Jose, California. It has ranked in the Fortune 500 from 2012 to 2021. Founded in 1992 with an IPO in 1995, NetApp offers cloud data services for management of applications and data both online and physically.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-as-a-service, is a service that provides users with a system for the backup, storage, and recovery of computer files. Online backup providers are companies that provide this type of service to end users. Such backup services are considered a form of cloud computing.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

Gluster Inc. was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineering center in Bangalore, India. Gluster was funded by Nexus Venture Partners and Index Ventures. Gluster was acquired by Red Hat on October 7, 2011.

In computing, a shared resource, or network share, is a computer resource made available from one host to other hosts on a computer network. It is a device or piece of information on a computer that can be remotely accessed from another computer transparently as if it were a resource in the local machine. Network sharing is made possible by inter-process communication over the network.

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers, and the physical environment is typically owned and managed by a hosting company. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment secured, protected, and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data.

StorNext File System (SNFS), colloquially referred to as StorNext is a shared disk file system made by Quantum Corporation. StorNext enables multiple Windows, Linux and Apple workstations to access shared block storage over a Fibre Channel network. With the StorNext file system installed, these computers can read and write to the same storage volume at the same time enabling what is known as a "file-locking SAN." StorNext is used in environments where large files must be shared, and accessed simultaneously by users without network delays, or where a file must be available for access by multiple readers starting at different times. Common use cases include multiple video editor environments in feature film, television and general video post production.

Panzura is a privately owned American software company based in San Jose, California, that provides hybrid-cloud data management software and services for the enterprise software market. Its software helps users access, manage, analyze, and store unstructured data using techniques in distributed data consolidation, artificial intelligence, and network load balancing.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

A personal cloud is a collection of digital content and services which are accessible from any device. The personal cloud is not a tangible entity. It is a place which gives users the ability to store, synchronize, stream and share content on a relative core, moving from one platform, screen and location to another. Created on connected services and applications, it reflects and sets consumers’ expectations for how next-generation computing services will work.

Nirvana was virtual object storage software developed and maintained by General Atomics.

Nasuni is a privately-held hybrid cloud storage company with headquarters in Boston, Massachusetts.

<span class="mw-page-title-main">Hybrid cloud storage</span>

Hybrid cloud storage, in data storage, is a term for a storage infrastructure that uses a combination of on-premises storage resources with a public cloud storage provider. The on-premises storage is usually managed by the organization, while the public cloud storage provider is responsible for the management and security of the data stored in the cloud.

References

  1. 1 2 3 Pritchard, Stephen (23 June 2022). "Global file systems: Hybrid cloud and follow-the-sun access". Computer Weekly. Retrieved 23 June 2022.
  2. Parallel Database Systems: PRISMA Workshop. Netherlands September 24–26, 1990. edited by Pierre America (Jul 17, 1991 ISBN   3-540-54132-2), page 410
  3. "Method and system for versioned file system using structured data representations".
  4. "Versioned file system with sharing".
  5. 1 2 3 4 5 Lewis, Mitch (September 13, 2022). "Technical Insight: File Storage Selection – NAS vs Global File Systems". Evaluator Group.
  6. "PeerGFS File Management and Orchestration across Edge, Data Center and Cloud Storage". November 2022.