Fossil (file system)

Last updated
Fossil
Developer(s) Bell Labs
Introduced2002;22 years ago (2002)
Preceded byKfs
Features
Transparent
compression
Yes
Transparent
encryption
No
Data deduplication Yes
Other
Supported
operating systems
Plan 9 from Bell Labs

Fossil is the default file system in Plan 9 from Bell Labs. It serves the network protocol 9P and runs as a user space daemon, like most Plan 9 file servers. Fossil is different from most other file systems due to its snapshot/archival feature. It can take snapshots of the entire file system on command or automatically (at a user-set interval). These snapshots can be kept on the Fossil partition as long as disk space allows; if the partition fills up then old snapshots will be removed to free up disk space. A snapshot can also be saved permanently to Venti. Fossil and Venti are typically installed together. [1]

Contents

Features

Important features include:

To access a snapshot, one would connect to a running fossil instance (“mount” it) and change directory to the desired snapshot, e.g. /snapshot/yyyy/mmdd/hhmm (with yyyy, mm, dd, hh, mm meaning year, month, day, hour, minute). To access an archive (permanent snapshot), a directory of the form /archive/yyyy/mmdds (with yyyy, mm, dd, s meaning year, month, day, sequence number) would be used. Plan 9 allows modifying the namespace in advanced ways, like redirecting one path to another path (e.g. /bin/ls to /archive/2005/1012/bin/ls). This significantly eases working with old versions of files.

Fossil is available on several other platforms via Plan 9 from User Space.

History

Fossil was designed and implemented by Sean Quinlan, Jim McKie and Russ Cox at Bell Labs and added to the Plan 9 distribution at the end of 2002. It became the default file system in 2003, replacing Kfs and the previous Plan 9 archival file system, dubbed The Plan 9 File Server, or "fs". fs is also an archival file system which originally was designed to store data on a WORM optical disc system. The permanent storage for fossil is provided by Venti, which typically stores data on hard drives, which have much lower access times than optical discs.

Related Research Articles

Universal Disk Format (UDF) is an open, vendor-neutral file system for computer data storage for a broad range of media. In practice, it has been most widely used for DVDs and newer optical disc formats, supplanting ISO 9660. Due to its design, it is very well suited to incremental updates on both write-once and re-writable optical media. UDF was developed and maintained by the Optical Storage Technology Association (OSTA).

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per hour to avoid storing large numbers of trivial changes. Others instead take periodic snapshots whose contents can be accessed using methods similar as those for normal file access.

<span class="mw-page-title-main">Snapshot (computer storage)</span> Recorded state of a computer storage system at a particular point in time

In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography.

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

Venti is a network storage system that permanently stores data blocks. A 160-bit SHA-1 hash of the data acts as the address of the data. This enforces a write-once policy since no other data block can be found with the same address: the addresses of multiple writes of the same data are identical, so duplicate data is easily identified and the data block is stored only once. Data blocks cannot be removed, making it ideal for permanent or backup storage. Venti is typically used with Fossil to provide a file system with permanent snapshots.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Content-addressable storage (CAS), also referred to as content-addressed storage or fixed-content storage, is a way to store information so it can be retrieved based on its content, not its name or location. It has been used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations. Content-addressable storage is similar to content-addressable memory.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

In computer operating systems, mkfs is a command used to format a block storage device with a specific file system. The command is part of Unix and Unix-like operating systems. In Unix, a block storage device must be formatted with a file system before it can be mounted and accessed through the operating system's filesystem hierarchy.

In computing, virtualization or virtualisation in British English is the act of creating a virtual version of something at the same abstraction level, including virtual computer hardware platforms, storage devices, and computer network resources.

Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system for data centers. Initially proprietary software, it was released to the public as open source on May 30, 2008.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

LizardFS is an open source distributed file system that is POSIX-compliant and licensed under GPLv3. It was released in 2013 as fork of MooseFS. LizardFS is also offering a paid Technical Support with possibility of configurating and setting up the cluster and active cluster monitoring.

References

  1. "Fossil, an Archival File Server". cat-v.org Documentation archive.

See also