Andrew File System

Last updated

The Andrew File System (AFS) is a distributed file system which uses a set of trusted servers to present a homogeneous, location-transparent file name space to all the client workstations. It was developed by Carnegie Mellon University as part of the Andrew Project. [1] Originally named "Vice", [2] "Andrew" refers to Andrew Carnegie and Andrew Mellon. Its primary use is in distributed computing.

Contents

Features

AFS [3] has several benefits over traditional networked file systems, particularly in the areas of security and scalability. One enterprise AFS deployment at Morgan Stanley exceeds 25,000 clients. [4] AFS uses Kerberos for authentication, and implements access control lists on directories for users and groups. Each client caches files on the local filesystem for increased speed on subsequent requests for the same file. This also allows limited filesystem access in the event of a server crash or a network outage.

AFS uses the Weak Consistency model. [5] Read and write operations on an open file are directed only to the locally cached copy. When a modified file is closed, the changed portions are copied back to the file server. Cache consistency is maintained by callback mechanism. When a file is cached, the server makes a note of this and promises to inform the client if the file is updated by someone else. Callbacks are discarded and must be re-established after any client, server, or network failure, including a timeout. Re-establishing a callback involves a status check and does not require re-reading the file itself.

A consequence of the file locking strategy is that AFS does not support large shared databases or record updating within files shared between client systems. This was a deliberate design decision based on the perceived needs of the university computing environment. For example, in the original email system for the Andrew Project, the Andrew Message System, a single file per message is used, like maildir, rather than a single file per mailbox, like mbox. See AFS and buffered I/O Problems for handling shared databases.

A significant feature of AFS is the volume, a tree of files, sub-directories and AFS mountpoints (links to other AFS volumes). Volumes are created by administrators and linked at a specific named path in an AFS cell. Once created, users of the filesystem may create directories and files as usual without concern for the physical location of the volume. A volume may have a quota assigned to it in order to limit the amount of space consumed. As needed, AFS administrators can move that volume to another server and disk location without the need to notify users; the operation can even occur while files in that volume are being used.

AFS volumes can be replicated to read-only cloned copies. When accessing files in a read-only volume, a client system will retrieve data from a particular read-only copy. If at some point, that copy becomes unavailable, clients will look for any of the remaining copies. Again, users of that data are unaware of the location of the read-only copy; administrators can create and relocate such copies as needed. The AFS command suite guarantees that all read-only volumes contain exact copies of the original read-write volume at the time the read-only copy was created.

The file name space on an Andrew workstation is partitioned into a shared and local name space. The shared name space (usually mounted as /afs on the Unix filesystem) is identical on all workstations. The local name space is unique to each workstation. It only contains temporary files needed for workstation initialization and symbolic links to files in the shared name space.

The Andrew File System heavily influenced Version 4 of Sun Microsystems' popular Network File System (NFS). Additionally, a variant of AFS, the DCE Distributed File System (DFS) was adopted by the Open Software Foundation in 1989 as part of their Distributed Computing Environment. Finally AFS (version two) was the predecessor of the Coda file system.

Implementations

Besides the original, a few other implementations were developed. OpenAFS was built from source released by Transarc (IBM) in 2000. [6] Transarc software became deprecated and lost support.[ when? ] Arla was an independent implementation of AFS developed at the Royal Institute of Technology in Stockholm in the late 1990s and early 2000s. [7] [8]

A fourth implementation of an AFS client exists in the Linux kernel source code since at least version 2.6.10. [9] Committed by Red Hat, this is a fairly simple implementation still incomplete as of January 2013. [10] [ needs update ]

Available permissions

The following Access Control List (ACL) permissions can be granted:

Lookup (l)
allows a user to list the contents of the AFS directory, examine the ACL associated with the directory and access subdirectories.
Insert (i)
allows a user to add new files or subdirectories to the directory.
Delete (d)
allows a user to remove files and subdirectories from the directory.
Administer (a)
allows a user to change the ACL for the directory. Users always have this right on their home directory, even if they accidentally remove themselves from the ACL.

Permissions that affect files and subdirectories include:

Read (r)
allows a user to look at the contents of files in a directory and list files in subdirectories. Files that are to be granted read access to any user, including the owner, need to have the standard UNIX "owner read" permission set.
Write (w)
allows a user to modify files in a directory. Files that are to be granted write access to any user, including the owner, need to have the standard UNIX "owner write" permission set.
Lock (k)
allows the processor to run programs that need to "flock" files in the directory.

Additionally, AFS includes Application ACLs (A)-(H) which have no effect on access to files.

See also

Related Research Articles

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

ext2, or second extended file system, is a file system for the Linux kernel. It was initially designed by French software developer Rémy Card as a replacement for the extended file system (ext). Having been designed according to the same principles as the Berkeley Fast File System from BSD, it was the first commercial-grade filesystem for Linux.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call system. NFS is an open IETF standard defined in a Request for Comments (RFC), allowing anyone to implement the protocol.

Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987 under the direction of Mahadev Satyanarayanan. It descended directly from an older version of Andrew File System (AFS-2) and offers many similar features. The InterMezzo file system was inspired by Coda.

<span class="mw-page-title-main">Apache Subversion</span> Free and open-source software versioning and revision control system

Apache Subversion is a software versioning and revision control system distributed as open source under the Apache License. Software developers use Subversion to maintain current and historical versions of files such as source code, web pages, and documentation. Its goal is to be a mostly compatible successor to the widely used Concurrent Versions System (CVS).

OpenSSI is an open-source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster.

In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

The Distributed Computing Environment (DCE) is a software system developed in the early 1990s from the work of the Open Software Foundation (OSF), a consortium founded in 1988 that included Apollo Computer, IBM, Digital Equipment Corporation, and others. The DCE supplies a framework and a toolkit for developing client/server applications. The framework includes:

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

<span class="mw-page-title-main">Diskless node</span> Computer workstation operated without disk drives

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

The following tables compare general and technical information for a number of file systems.

<span class="mw-page-title-main">Directory (computing)</span> File system structure for locating files

In computing, a directory is a file system cataloging structure which contains references to other computer files, and possibly other directories. On many computers, directories are known as folders, or drawers, analogous to a workbench or the traditional office filing cabinet. The name derives from books like a telephone directory that lists the phone numbers of all the people living in a certain area.

ext4 is a journaling file system for Linux, developed as the successor to ext3.

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

An automounter is any program or software facility which automatically mounts filesystems in response to access operations by user programs. An automounter system utility, when notified of file and directory access attempts under selectively monitored subdirectory trees, dynamically and transparently makes local or remote devices accessible.

A journaling file system is a file system that keeps track of changes not yet committed to the file system's main part by recording the goal of such changes in a data structure known as a "journal", which is usually a circular log. In the event of a system crash or power failure, such file systems can be brought back online more quickly with a lower likelihood of becoming corrupted.

References

  1. What is Andrew Archived September 9, 2011, at the Wayback Machine - part of CMU's official site chronicling the history of the Andrew Project.
  2. Garfinkel, Simson L. (May–June 1989). "Ripples Across the Academic Market" (PDF). Technology Review. pp. 9–13. Retrieved 25 January 2016.
  3. Howard, J.H.; Kazar, M.L.; Nichols, S.G.; Nichols, D.A.; Satyanarayanan, M.; Sidebotham, R.N. & West, M.J. (February 1988). "Scale and Performance in a Distributed File System". ACM Transactions on Computer Systems. 6 (1): 51–81. CiteSeerX   10.1.1.71.5072 . doi:10.1145/35037.35059. S2CID   52848606.
  4. Moore, Phillip (2004). "When Your Business Depends On It — The Evolution of a Global File System for a Global Enterprise" (PDF).
  5. Yaniv Pessach (2013), Distributed Storage (Distributed Storage: Concepts, Algorithms, and Implementations ed.), Amazon, OL   25423189M
  6. Opening Up AFS
  7. Assar Westerlund and Johan Danielsson (1998). "Arla-a free AFS client". Proceedings of the 1998 USENIX, Freenix Track. CiteSeerX   10.1.1.16.1360 .
  8. Magnus Ahltorp, Love Hörnquist-Åstrand and Assar Westerlund (2000). "Porting the Arla file system to Windows NT". Workshop on Management and Administration of Distributed Environments. CiteSeerX   10.1.1.512.9570 .
  9. Linux kernel AFS documentation for 2.6.10
  10. "LXR linux/Documentation/filesystems/afs.txt". linux.no. 1 August 2012. Archived from the original on 1 August 2012. Retrieved 23 April 2018.

Further reading