Block suballocation

Last updated

Block suballocation is a feature of some computer file systems which allows large blocks or allocation units to be used while making efficient use of empty space at the end of large files, space which would otherwise be lost for other use to internal fragmentation. [1]

Contents

In file systems that don't support fragments, this feature is also called tail merging or tail packing because it is commonly done by packing the "tail", or last partial block, of multiple files into a single block.

Rationale

File systems have traditionally divided the disk into equally sized blocks to simplify their design and limit the worst-case fragmentation. Block sizes are typically multiples of 512 bytes due to the size of hard disk sectors. When files are allocated by some traditional file systems, only whole blocks can be allocated to individual files. But as file sizes are often not multiples of the file system block size, this design inherently results in the last blocks of files (called tails) occupying only a part of the block, resulting in what is called internal fragmentation (not to be confused with external fragmentation). This waste of space can be significant if the file system stores many small files and can become critical when attempting to use higher block sizes to improve performance. UFS and other derived UNIX file systems support fragments [ citation needed ] which greatly mitigate this effect.

Suballocation schemes

Block suballocation addresses this problem by dividing up a tail block in some way to allow it to store fragments from other files.

Some block suballocation schemes can perform allocation at the byte level; most, however, simply divide up the block into smaller ones (the divisor usually being some power of 2). For example, if a 38 KiB file is to be stored in a file system using 32 KiB blocks, the file would normally span two blocks, or 64 KiB, for storage; the remaining 26 KiB of the second block becomes unused slack space. With an 8 KiB block suballocation, however, the file would occupy just 6 KiB of the second block, leave 2 KiB (of the 8 KiB suballocation block) slack and free the other 24 KiB of the block for other files.

Tail packing

Some file systems have since been designed to take advantage of this unused space, and can pack the tails of several files in a single shared tail block. While this may, at first, seem like it would significantly increase file system fragmentation, the negative effect can be mitigated with readahead features on modern operating systems when dealing with short files, several tails may be close enough to each another to be read together, and thus a disk seek is not introduced. Such file systems often employ heuristics in order to determine whether tail packing is worthwhile in a given situation, and defragmentation software may use a more evolved heuristic.

Efficiency

In some scenarios where the majority of files are shorter than half the block size, such as in a folder of small source code files or small bitmap images, tail packing can increase storage efficiency even more than twofold, compared to file systems without tail packing. [2]

This not only translates into conservation of disk space, but may also introduce performance increases, as due to higher locality of reference, less data has to be read, also translating into higher page cache efficiency. However, these advantages can be negated by the increased complexity of implementation. [3]

As of 2015, the most widely used read-write file systems with support for block suballocation are Btrfs and FreeBSD UFS2 [4] (where it is called "block level fragmentation"). ReiserFS and Reiser4 also support tail packing.

Several read-only file systems do not use blocks at all and are thus implicitly using space as efficiently as suballocating file systems; such file systems double as archive formats.

See also

Related Research Articles

ReiserFS is a general-purpose, journaling file system initially designed and implemented by a team at Namesys led by Hans Reiser and licensed under GPLv2. Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel. ReiserFS was the default file system in Novell's SUSE Linux Enterprise until Novell decided to move to ext3 for future releases on October 12, 2006.

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

ext2, or second extended file system, is a file system for the Linux kernel. It was initially designed by French software developer Rémy Card as a replacement for the extended file system (ext). Having been designed according to the same principles as the Berkeley Fast File System from BSD, it was the first commercial-grade filesystem for Linux.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

Journaled File System (JFS) is a 64-bit journaling file system created by IBM. There are versions for AIX, OS/2, eComStation, ArcaOS and Linux operating systems. The latter is available as free software under the terms of the GNU General Public License (GPL). HP-UX has another, different filesystem named JFS that is actually an OEM version of Veritas Software's VxFS.

The Unix file system (UFS) is a family of file systems supported by many Unix and Unix-like operating systems. It is a distant descendant of the original filesystem used by Version 7 Unix.

In computing, a block, sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length; a block size. Data thus structured are said to be blocked. The process of putting data into blocks is called blocking, while deblocking is the process of extracting data from blocks. Blocked data is normally stored in a data buffer, and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data stream. For some devices, such as magnetic tape and CKD disk devices, blocking reduces the amount of external storage required for the data. Blocking is almost universally employed when storing data to 9-track magnetic tape, NAND flash memory, and rotating media such as floppy disks, hard disks, and optical discs.

<span class="mw-page-title-main">Defragmentation</span> Rearrangement of sectors on a hard disk into contiguous units

In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions. It also attempts to create larger regions of free space using compaction to impede the return of fragmentation. Some defragmentation utilities try to keep smaller files within a single directory together, as they are often accessed in sequence.

Reiser4 is a computer file system, successor to the ReiserFS file system, developed from scratch by Namesys and sponsored by DARPA as well as Linspire. Reiser4 was named after its former lead developer Hans Reiser. As of 2021, the Reiser4 patch set is still being maintained, but according to Phoronix, it is unlikely to be merged into mainline Linux without corporate backing.

df (Unix) Standard Unix command

df is a standard Unix command used to display the amount of available disk space for file systems on which the invoking user has appropriate read access. df is typically implemented using the statfs or statvfs system calls.

<span class="mw-page-title-main">File system</span> Computer filing system

In computing, a file system or filesystem governs file organization and access. A local file system is a capability of an operating system that services the applications running on the same computer. A distributed file system is a protocol that provides file access between networked computers.

File size is a measure of how much data a computer file contains or, alternately, how much storage it consumes. Typically, file size is expressed in units of measurement based on the byte. By convention, file size units use either a metric prefix or a binary prefix.

Extended file attributes are file system features that enable users to associate computer files with metadata not interpreted by the filesystem, whereas regular attributes have a purpose strictly defined by the filesystem. Unlike forks, which can usually be as large as the maximum file size, extended attributes are usually limited in size to a value significantly smaller than the maximum file size. Typical uses include storing the author of a document, the character encoding of a plain-text document, or a checksum, cryptographic hash or digital certificate, and discretionary access control information.

In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific system of storage allocation in use and the particular form of fragmentation. In many cases, fragmentation leads to storage space being "wasted", and in that case the term also refers to the wasted space itself.

The following tables compare general and technical information for a number of file systems.

<span class="mw-page-title-main">Disk sector</span> Logical or physical division of storage media

In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. For most disks, each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and DVD-ROMs. Newer HDDs and SSDs use 4096-byte (4 KiB) sectors, which are known as the Advanced Format (AF).

A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory, described by a single entry in a page table. It is the smallest unit of data for memory management in an operating system that uses virtual memory. Similarly, a page frame is the smallest fixed-length contiguous block of physical memory into which memory pages are mapped by the operating system.

Free-space bitmaps are one method used to track allocated sectors by some file systems. While the most simplistic design is highly inefficient, advanced or hybrid implementations of free-space bitmaps are used by some modern file systems.

The FAT file system is a file system used on MS-DOS and Windows 9x family of operating systems. It continues to be used on mobile devices and embedded systems, and thus is a well suited file system for data exchange between computers and devices of almost any type and age from 1981 through the present.

EROFS is a lightweight read-only file system initially developed by Huawei, originally for the Linux kernel and now maintained by an open-source community from all over the world.

References

  1. U.S. patent 6,041,407 (Fundamental patent.)
  2. Hans Reiser (2001). "Hard Disk usage, ReiserFS and Ext2fs". Archived from the original on 13 November 2006. Retrieved 14 December 2006.
  3. Hans Reiser (2001). "ReiserFS file system design". Archived from the original on 13 November 2006. Retrieved 14 December 2006.
  4. Hervey, Allen (20 June 2005). "Introduction to FreeBSD, PacNOG I Workshop, Additional Topics, UFS2 and Soft Updates make for a powerful combination" (PDF). PacNOG I. p. 23. Retrieved 22 July 2012.