Developer(s) | Jian (Andiry) Xu, Lu Zhang, Steven Swanson, University of California, San Diego, Computer Science and Engineering Department |
---|---|
Preview release | Linux kernel 4.13 [1] |
Written in | C |
Operating system | Linux |
Type | File system driver |
License | GPL |
Website | github |
The NOVA ( non-volatile memory accelerated) file system is an open-source, log-structured file system for byte-addressable persistent memory (for example non-volatile dual in-line memory module (NVDIMM) and 3D XPoint DIMMs) for Linux.
NOVA is designed specifically for byte-addressable persistent memories and aims to provide high-performance, atomic file and metadata operations, and fault tolerance. To meet these goals NOVA combines several techniques found in other file systems. NOVA uses log structure, copy-on-write (COW), journaling, and log-structured metadata updates to provide strong atomicity guarantees, and it uses a combination replication, metadata checksums, and RAID 4 parity to protect data and metadata from media errors and software bugs. It also supports checkpoints to facilitate backups.
NOVA was developed at the University of California, San Diego, in the Non-Volatile Systems Laboratory of the Computer Science and Engineering Department. [2] [3] Patches were initially made available for version 4.12 of the Linux kernel. [4] As of 2017 [update] it is limited to x86-64 Linux, and not ready for merging with the upstream kernel. [3]
This section needs additional citations for verification .(September 2017) |
NOVA is primarily a log-structured file system, but it differs from other log-structured file systems in several respects. First, rather than using a single log for the entire file system, each inode has its own, dedicated log that records the updates to the inode. [3] This allows for increased concurrency in file operations, since different threads can operate on inodes in parallel. Second, the logs do not contain file data, but only metadata updates, resulting in smaller logs. Third, the logs are not stored in physically contiguous memory. Instead, NOVA stores the logs in a linked list of 4 KB memory pages.
NOVA uses the logs to provide atomicity for operations that affect a single file (e.g., writing to a file or modifying its metadata). To do this, NOVA writes a log entry to empty space past the end of the log and then atomically updates the inode's pointer to the log tail.
NOVA uses copy-on-write (COW) to update file data. When a program writes data to a file, NOVA allocates some unused memory pages to hold the data and writes the data into them. Then, it appends a log entry to the inode's log that points to the new pages and describes their logical location in the file. Since appending the log entry is atomic, the write is also atomic.
Some file operations (e.g., moving a file from one directory to another) require modifying multiple inodes. To make these operations atomic, NOVA uses a simple journaling mechanisms. First, it writes the new log entries to ends of the inodes that the operation will affect, then it uses the journal to record the necessary updates to the inodes' log tail pointers. Next, it marks the journal as committed and applies the updates to the tail pointers.
NOVA uses replication and checksums to provide protection against metadata corruption due to media errors and software bugs. Every metadata structure (e.g., inodes, superblocks, and log entries) contains a CRC32 checksum that allows NOVA to detect if structures contents have changed with its knowledge. NOVA also stores two copies of each data structure – the "primary" and the "replica" – and stores them far from one another in memory.
Whenever NOVA accesses a metadata structure, it first recomputes the checksum on both the primary and the replica. If either check results in a mismatch, NOVA repairs the damage using the other copy. If neither checksum matches, then the structure is lost and NOVA returns an error.
NOVA uses RAID 4 to protect file data. It divides each 4 KB page into 512-byte strips and stores a parity strip in a dedicated region of persistent memory. It also computes (and stores a replica of) a CRC32 checksum for the eight data strips and the parity strip.
When NOVA reads a page, it confirms the checksum on each strip. If one of the strips is corrupt, it tries to recover the strip using the parity bits. If no other strips have experienced data corruption, recovery will succeed. Otherwise, recovery fails, the contents of the page are lost, and NOVA returns an error.
XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.
ext2, or second extended file system, is a file system for the Linux kernel. It was initially designed by French software developer Rémy Card as a replacement for the extended file system (ext). Having been designed according to the same principles as the Berkeley Fast File System from BSD, it was the first commercial-grade filesystem for Linux.
ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.
The Unix file system (UFS) is a family of file systems supported by many Unix and Unix-like operating systems. It is a distant descendant of the original filesystem used by Version 7 Unix.
The inode is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attributes may include metadata, as well as owner and permission data.
The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.
Journalling Flash File System version 2 or JFFS2 is a log-structured file system for use with flash memory devices. It is the successor to JFFS. JFFS2 has been included into the Linux kernel since September 23, 2001, when it was merged into the Linux kernel mainline as part of the kernel version 2.4.10 release. JFFS2 is also available for a few bootloaders, like Das U-Boot, Open Firmware, the eCos RTOS, the RTEMS RTOS, and the RedBoot. Most prominent usage of the JFFS2 comes from OpenWrt.
The Journaling Flash File System is a log-structured file system for use on NOR flash memory devices on the Linux operating system. It has been superseded by JFFS2.
sync is a standard system call in the Unix operating system, which commits all data from the kernel filesystem buffers to non-volatile storage, i.e., data which has been scheduled for writing via low-level I/O system calls. Higher-level I/O layers such as stdio may maintain separate buffers of their own.
The following tables compare general and technical information for a number of file systems.
exFAT is a file system introduced by Microsoft in 2006 and optimized for flash memory such as USB flash drives and SD cards. exFAT was proprietary until 28 August 2019, when Microsoft published its specification. Microsoft owns patents on several elements of its design.
LogFS is a Linux log-structured and scalable flash file system, intended for use on large devices of flash memory. It is written by Jörn Engel and in part sponsored by the CE Linux Forum.
Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.
A journaling file system is a file system that keeps track of changes not yet committed to the file system's main part by recording the goal of such changes in a data structure known as a "journal", which is usually a circular log. In the event of a system crash or power failure, such file systems can be brought back online more quickly with a lower likelihood of becoming corrupted.
F2FS is a flash file system initially developed by Samsung Electronics for the Linux kernel.
In computer science, persistent memory is any method or apparatus for efficiently storing data structures such that they can continue to be accessed using memory instructions or memory APIs even after the end of the process that created or last modified them.
Apple File System (APFS) is a proprietary file system developed and deployed by Apple Inc. for macOS Sierra (10.12.4) and later, iOS 10.3 and later, tvOS 10.2 and later, watchOS 3.2 and later, and all versions of iPadOS. It aims to fix core problems of HFS+, APFS's predecessor on these operating systems. APFS is optimized for solid-state drive storage and supports encryption, snapshots, and increased data integrity, among other capabilities.
ZFS is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005, before being placed under a closed source license when Oracle Corporation acquired Sun in 2009–2010. During 2005 to 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, to continue its development as an open source project, including ZFS. In 2013, OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.