Coda (file system)

Last updated
Coda
Developer(s) Carnegie Mellon University
Introduced1987
Other
Supported operating systems Linux kernel, NetBSD FreeBSD
Coda
Initial release1987;35 years ago (1987)
Stable release
8.0.2 [1] / May 29, 2020;2 years ago (2020-05-29)
Repository github.com/cmusatyalab/coda
Written in C
Type Distributed file system
License GPL v2
Website coda.cs.cmu.edu

Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987 under the direction of Mahadev Satyanarayanan. It descended directly from an older version of Andrew File System (AFS-2) and offers many similar features. The InterMezzo file system was inspired by Coda.

Contents

Features

Coda has many features that are desirable for network file systems, and several features not found elsewhere.

  1. Disconnected operation for mobile computing.
  2. Is freely available under the GPL [2]
  3. High performance through client-side persistent caching
  4. Server replication
  5. Security model for authentication, encryption and access control
  6. Continued operation during partial network failures in server network
  7. Network bandwidth adaptation
  8. Good scalability
  9. Well defined semantics of sharing, even in the presence of network failure

Coda uses a local cache to provide access to server data when the network connection is lost. During normal operation, a user reads and writes to the file system normally, while the client fetches, or "hoards", all of the data the user has listed as important in the event of network disconnection. If the network connection is lost, the Coda client's local cache serves data from this cache and logs all updates. This operating state is called disconnected operation. Upon network reconnection, the client moves to reintegration state; it sends logged updates to the servers. Then it transitions back to normal connected-mode operation.

Also different from AFS is Coda's data replication method. AFS uses a pessimistic replication strategy with its files, only allowing one read/write server to receive updates and all other servers acting as read-only replicas. Coda allows all servers to receive updates, allowing for a greater availability of server data in the event of network partitions, a case which AFS cannot handle.

These unique features introduce the possibility of semantically diverging copies of the same files or directories, known as "conflicts". Disconnected operation's local updates can potentially clash with other connected users' updates on the same objects, preventing reintegration. Optimistic replication can potentially cause concurrent updates to different servers on the same object, preventing replication. The former case is called a "local/global" conflict, and the latter case a "server/server" conflict. Coda has extensive repair tools, both manual and automated, to handle and repair both types of conflicts.

Supported platforms

Coda has been developed on Linux and support for it appeared in the 2.1 Linux kernel series. [3] It has also been ported to FreeBSD. Subsequently, obsoleted there, an effort is under way to bring it back. [4] Efforts have been made to port Coda to Microsoft Windows, from the Windows 95/Windows 98 era, Windows NT [5] to Windows XP, [6] by means of open-source projects like the DJGCC DOS C Compiler and Cygwin. [5]

Related Research Articles

<span class="mw-page-title-main">Windows 2000</span> Personal computer operating system by Microsoft released in 2000

Windows 2000 is a major release of the Windows NT operating system developed by Microsoft and oriented towards businesses. It was the direct successor to Windows NT 4.0, and was released to manufacturing on December 15, 1999, and was officially released to retail on February 17, 2000. It was Microsoft's business operating system until the introduction of Windows XP Professional in 2001.

InterMezzo was a distributed file system written for the Linux kernel, distributed under the GNU General Public License. It was included in the standard Linux kernel from version 2.4.15 but was dropped from version 2.6. InterMezzo is designed to work on top of an existing journaling file system such as ext3, JFS, ReiserFS or XFS. It was developed around 1999.

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call system. NFS is an open IETF standard defined in a Request for Comments (RFC), allowing anyone to implement the protocol.

The Andrew File System (AFS) is a distributed file system which uses a set of trusted servers to present a homogeneous, location-transparent file name space to all the client workstations. It was developed by Carnegie Mellon University as part of the Andrew Project. Originally named "Vice", "Andrew" refers to Andrew Carnegie and Andrew Mellon. Its primary use is in distributed computing.

Microsoft DNS is the name given to the implementation of domain name system services provided in Microsoft Windows operating systems.

<span class="mw-page-title-main">Diskless node</span>

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

<span class="mw-page-title-main">Mahadev Satyanarayanan</span>

Mahadev "Satya" Satyanarayanan is an Indian experimental computer scientist, an ACM and IEEE fellow, and the Carnegie Group Professor of Computer Science at Carnegie Mellon University (CMU).

<span class="mw-page-title-main">Git</span> Software for version control of files

Git is free and open source software for distributed version control: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Distributed File System (DFS) is a set of client and server services that allow an organization using Microsoft Windows servers to organize many distributed SMB file shares into a distributed file system. DFS has two components to its service: Location transparency and Redundancy. Together, these components enable data availability in the case of failure or heavy load by allowing shares in multiple different locations to be logically grouped under one folder, the "DFS root".

The DCE Distributed File System (DCE/DFS) is the remote file access protocol used with the Distributed Computing Environment. It was a variant of Andrew File System (AFS), based on the AFS Version 3.0 protocol that was developed commercially by Transarc Corporation. AFS Version 3.0 was in turn based on the AFS Version 2.0 protocol originally developed at Carnegie Mellon University.

Windows Vista introduced a number of new I/O functions to the Microsoft Windows line of operating systems. They are intended to shorten the time taken to boot the system, improve the responsiveness of the system, and improve the reliability of data storage.

Mobile computing devices store and share data over a mobile network, or a database which is actually stored by the mobile device. This could be a list of contacts, price information, distance travelled, or any other information.

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

XtreemFS is an object-based, distributed file system for wide area networks. XtreemFS' outstanding feature is full and real fault tolerance, while maintaining POSIX file system semantics. Fault-tolerance is achieved by using Paxos-based lease negotiation algorithms and is used to replicate files and metadata. SSL and X.509 certificates support make XtreemFS usable over public networks.

Plastic SCM is a cross-platform commercial distributed version control tool developed by Códice Software Inc. It is available for Microsoft Windows, Mac OS X, Linux, and other operating systems. It includes a command-line tool, native GUIs, diff and merge tool and integration with a number of IDEs. It is a full version control stack not based on Git.

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.

OrangeFS is an open-source parallel file system, the next generation of Parallel Virtual File System (PVFS). A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. OrangeFS was designed for use in large-scale cluster computing and is used by companies, universities, national laboratories and similar sites worldwide.

References

  1. "Coda progress". July 5, 2020. Retrieved August 5, 2020.
  2. "New release: 5.0.pre1". 1999-01-06. Retrieved 2015-09-11.
  3. "Linux Kernel mailing list, [PATCH] Coda". 1998-01-06.
  4. "GitHub - trasz/Freebsd at coda". GitHub .
  5. 1 2 Braam, P. J.; et al. (1999). "Porting the coda file system to windows". Proc. USENIX Annual Technical Conference. USENIX Association: 30. Retrieved 2009-04-15.
  6. "Coda Support for Windows XP" . Retrieved 2009-04-15.