Ext3

Last updated
ext3
Developer(s) Stephen Tweedie
Full nameThird extended file system
IntroducedNovember 2001 with Linux 2.4.15
Preceded by ext2
Succeeded by ext4
Partition IDs 0x83 (MBR)
EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (GPT)
Structures
Directory contentsTable, hashed B-tree with dir_index enabled
File allocationbitmap (free space), table (metadata)
Bad blocksTable
Limits
Max volume size4 TiB – 32 TiB
Max file size16 GiB – 2 TiB
Max no. of filesVariable, allocated at creation time [1]
Max filename length255 bytes
Allowed filename
characters
All bytes except NUL ('\0') and '/'
Features
Dates recordedmodification (mtime), attribute modification (ctime), access (atime)
Date rangeDecember 14, 1901 – January 18, 2038
Date resolution1 s
Attributesallow-undelete, append-only, h-tree (directory), immutable, journal, no-atime, no-dump, secure-delete, synchronous-write, top (directory)
File system
permissions
Unix permissions, POSIX ACLs and arbitrary security attributes (Linux 2.6 and later)
Transparent
compression
No
Transparent
encryption
No (provided at the block device level)
Data deduplication No
Other
Supported
operating systems
Linux, BSD, ReactOS, [2] Windows (through an IFS)

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. [3] [4] [5] Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4. [6]

Contents

Advantages

The performance (speed) of ext3 is less attractive than competing Linux filesystems, such as ext4, JFS, ReiserFS, and XFS, but ext3 has a significant advantage in that it allows in-place upgrades from ext2 without having to back up and restore data. Benchmarks suggest that ext3 also uses less CPU power than ReiserFS and XFS. [7] [8] It is also considered safer than the other Linux file systems, due to its relative simplicity and wider testing base. [9] [10]

ext3 adds the following features to ext2:

Without these features, any ext3 file system is also a valid ext2 file system. This situation has allowed well-tested and mature file system maintenance utilities for maintaining and repairing ext2 file systems to also be used with ext3 without major changes. The ext2 and ext3 file systems share the same standard set of utilities, e2fsprogs, which includes an fsck tool. The close relationship also makes conversion between the two file systems (both forward to ext3 and backward to ext2) straightforward.

ext3 lacks "modern" filesystem features, such as dynamic inode allocation and extents. This situation might sometimes be a disadvantage, but for recoverability, it is a significant advantage. The file system metadata is all in fixed, well-known locations, and data structures have some redundancy. In significant data corruption, ext2 or ext3 may be recoverable, while a tree-based file system may not.

Size limits

The maximum number of blocks for ext3 is 232. The size of a block can vary, affecting the maximum number of files and the maximum size of the file system: [12]

Block sizeMaximum
file size
Maximum
file-system size
1 KiB 16 GiB 2 TiB
2 KiB256 GiB8 TiB
4 KiB2 TiB16 TiB
8 KiB [limits 1] 2 TiB32 TiB
  1. In Linux, 8 KiB block size is only available on architectures which allow 8 KiB pages, such as Alpha.

Journaling levels

There are three levels of journaling available in the Linux implementation of ext3:

Journal (lowest risk)
Both metadata and file contents are written to the journal before being committed to the main file system. Because the journal is relatively continuous on disk, this can improve performance, if the journal has enough space. In other cases, performance gets worse, because the data must be written twice—once to the journal, and once to the main part of the filesystem. [13]
Ordered (medium risk)
Only metadata is journaled; file contents are not, but it's guaranteed that file contents are written to disk before associated metadata is marked as committed in the journal. This is the default on many Linux distributions. If there is a power outage or kernel panic while a file is being written or appended to, the journal will indicate that the new file or appended data has not been "committed", so it will be purged by the cleanup process. (Thus appends and new files have the same level of integrity protection as the "journaled" level.) However, files being overwritten can be corrupted because the original version of the file is not stored. Thus it's possible to end up with a file in an intermediate state between new and old, without enough information to restore either one or the other (the new data never made it to disk completely, and the old data is not stored anywhere). Even worse, the intermediate state might intersperse old and new data, because the order of the write is left up to the disk's hardware. [13] [14]
Writeback (highest risk)
Only metadata is journaled; file contents are not. The contents might be written before or after the journal is updated. As a result, files modified right before a crash can become corrupted. For example, a file being appended to may be marked in the journal as being larger than it actually is, causing garbage at the end. Older versions of files could also appear unexpectedly after a journal recovery. The lack of synchronization between data and journal is faster in many cases. JFS uses this level of journaling, but ensures that any "garbage" due to unwritten data is zeroed out on reboot. XFS also uses this form of journaling.

In all three modes, the internal structure of file system is assured to be consistent even after a crash. In any case, only the data content of files or directories which were being modified when the system crashed will be affected; the rest will be intact after recovery.

Disadvantages

Functionality

Because ext3 aims to be backward-compatible with the earlier ext2, many of the on-disk structures are similar to those of ext2. Consequently, ext3 lacks recent features, such as extents, dynamic allocation of inodes, and block sub-allocation. [15] A directory can have at most 31998 subdirectories, because an inode can have at most 32,000 links (each direct subdirectory increases their parent folder inode link counter in the ".." reference). [16]

On ext3, like for most current Linux filesystems, the system tool "fsck" should not be used while the filesystem is mounted for writing. [6] Attempting to check a filesystem that is already mounted in read/write mode will (very likely) detect inconsistencies in the filesystem metadata. Where filesystem metadata is changing, and fsck applies changes in an attempt to bring the "inconsistent" metadata into a "consistent" state, the attempt to "fix" the inconsistencies will corrupt the filesystem.

Defragmentation

There is no online ext3 defragmentation tool that works on the filesystem level. There is an offline ext2 defragmenter, e2defrag. However, e2defrag may destroy data, depending on the feature bits turned on in the filesystem; it does not know how to handle many of the newer ext3 features. [17]

There are userspace defragmentation tools, like Shake [18] and defrag. [19] [20] Shake works by allocating space for the whole file as one operation, which will generally cause the allocator to find contiguous disk space. If there are files which are used at the same time, Shake will try to write them next to one another. Defrag works by copying each file over itself. However, this strategy works only if the file system has enough free space. A true defragmentation tool does not exist for ext3. [21]

However, as the Linux System Administrator Guide states, "Modern Linux filesystem(s) keep fragmentation at a minimum by keeping all blocks in a file close together, even if they can't be stored in consecutive sectors. Some filesystems, like ext3, effectively allocate the free block that is nearest to other blocks in a file. Therefore it is not necessary to worry about fragmentation in a Linux system." [22]

While ext3 is resistant to file fragmentation, ext3 can get fragmented over time or for specific usage patterns, like slowly writing large files. [23] [24] Consequently, ext4 (the successor to ext3) has an online filesystem defragmentation utility e4defrag [25] and currently supports extents (contiguous file regions).

Undelete

ext3 does not support the recovery of deleted files. The ext3 driver actively deletes files by wiping file inodes [26] for crash safety reasons.

There are still several techniques [27] and some free [28] and proprietary [29] software for recovery of deleted or lost files using file system journal analysis; however, they do not guarantee any specific file recovery.

Compression

e3compr [30] is an unofficial patch for ext3 that does transparent compression. It is a direct port of e2compr and still needs further development. It compiles and boots well with upstream kernels[ citation needed ], but journaling is not implemented yet.

Lack of snapshots support

Unlike a number of modern file systems, ext3 does not have native support for snapshots, the ability to quickly capture the state of the filesystem at arbitrary times. Instead, it relies on less-space-efficient, volume-level snapshots provided by the Linux LVM. The Next3 file system is a modified version of ext3 which offers snapshots support, yet retains compatibility with the ext3 on-disk format. [31]

No checksumming in journal

ext3 does not do checksumming when writing to the journal. On a storage device with extra cache, if barrier=1 is not enabled as a mount option (in /etc/fstab), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash. [32] [33] [34] This is because storage devices with write caches report to the system that the data has been completely written, even if it was written to the (volatile) cache.

If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to amortize write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the "winners" (transactions with a commit block, including the invalid transaction above, which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. If checksums had been used, where the blocks of the "fake winner" transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to ext4. [35]

Filesystems going through the device mapper interface (including software RAID and LVM implementations) may not support barriers, and will issue a warning if that mount option is used. [36] [37] There are also some disks that do not properly implement the write cache flushing extension necessary for barriers to work, which causes a similar warning. [38] In these situations, where barriers are not supported or practical, reliable write ordering is possible by turning off the disk's write cache and using the data=journal mount option. [32] Turning off the disk's write cache may be required even when barriers are available.

Applications like databases expect a call to fsync() to flush pending writes to disk, and the barrier implementation doesn't always clear the drive's write cache in response to that call. [39] There is also a potential issue with the barrier implementation related to error handling during events, such as a drive failure. [40] It is also known that sometimes some virtualization technologies do not properly forward fsync or flush commands to the underlying devices (files, volumes, disk) from a guest operating system. [41] Similarly, some hard disks or controllers implement cache flushing incorrectly or not at all, but still advertise that it is supported, and do not return any error when it is used. [42] There are so many ways to handle fsync and write cache handling incorrectly, it is safer to assume that cache flushing does not work unless it is explicitly tested, regardless of how reliable individual components are believed to be.

Near-time extinction due to date-stamp limitation

Ext3 stores dates as Unix time using four bytes in the file header. 32 bits does not give enough scope to continue processing files beyond January 18, 2038 - the Year 2038 problem. [43]

ext4

fsck time dependence on inode count (ext3 vs. ext4) E2fsck-uninit.svg
fsck time dependence on inode count (ext3 vs. ext4)

On June 28, 2006, Theodore Ts'o, the principal developer of ext3, [44] announced an enhanced version, called ext4. On October 11, 2008, the patches that mark ext4 as stable code were merged in the Linux 2.6.28 source code repositories, marking the end of the development phase and recommending its adoption. In 2008, Ts'o stated that although ext4 has improved features such as being much faster than ext3, it is not a major advance, it uses old technology, and is a stop-gap; Ts'o believes that Btrfs is the better direction, because "it offers improvements in scalability, reliability, and ease of management". [45] Btrfs also has "a number of the same design ideas that reiser3/4 had". [46]

See also

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

ReiserFS is a general-purpose, journaling file system initially designed and implemented by a team at Namesys led by Hans Reiser and licensed under GPLv2. Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel. ReiserFS was the default file system in Novell's SUSE Linux Enterprise until Novell decided to move to ext3 for future releases on October 12, 2006.

ext2, or second extended file system, is a file system for the Linux kernel. It was initially designed by French software developer Rémy Card as a replacement for the extended file system (ext). Having been designed according to the same principles as the Berkeley Fast File System from BSD, it was the first commercial-grade filesystem for Linux.

Journaled File System (JFS) is a 64-bit journaling file system created by IBM. There are versions for AIX, OS/2, eComStation, ArcaOS and Linux operating systems. The latter is available as free software under the terms of the GNU General Public License (GPL). HP-UX has another, different filesystem named JFS that is actually an OEM version of Veritas Software's VxFS.

<span class="mw-page-title-main">Defragmentation</span> Rearrangement of sectors on a hard disk into contiguous units

In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions. It also attempts to create larger regions of free space using compaction to impede the return of fragmentation. Some defragmentation utilities try to keep smaller files within a single directory together, as they are often accessed in sequence.

The inode is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attributes may include metadata, as well as owner and permission data.

In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

A disk quota is a limit set by a system administrator that restricts certain aspects of file system usage on modern operating systems. The function of using disk quotas is to allocate limited disk space in a reasonable way.

The extended file system, or ext, was implemented in April 1992 as the first file system created specifically for the Linux kernel. It has metadata structure inspired by traditional Unix filesystem principles, and was designed by Rémy Card to overcome certain limitations of the MINIX file system. It was the first implementation that used the virtual file system (VFS), for which support was added in the Linux kernel in version 0.96c, and it could handle file systems up to 2 gigabytes (GB) in size.

<span class="mw-page-title-main">File system</span> Computer filing system

In computing, a file system or filesystem governs file organization and access. A local file system is a capability of an operating system that services the applications running on the same computer. A distributed file system is a protocol that provides file access between networked computers.

<span class="mw-page-title-main">Theodore Ts'o</span> American computer scientist, free software developer

Theodore Yue Tak Ts'o is an American software engineer mainly known for his contributions to the Linux kernel, in particular his contributions to file systems. He is the secondary developer and maintainer of e2fsprogs, the userspace utilities for the ext2, ext3, and ext4 filesystems, and is a maintainer for the ext4 file system.

In computing, an extent is a contiguous area of storage reserved for a file in a file system, represented as a range of block numbers, or tracks on count key data devices. A file can consist of zero or more extents; one file fragment requires one extent. The direct benefit is in storing each range compactly as two numbers, instead of canonically storing every block number in the range. Also, extent allocation results in less file fragmentation.

Xiafs was a file system for the Linux kernel which was conceived and developed by Ge (Frank) Xia and was based on the MINIX file system. Today it is obsolete and not in use, except possibly in some historic installations.

The following tables compare general and technical information for a number of file systems.

e2fsprogs is a set of utilities for maintaining the ext2, ext3 and ext4 file systems. Since those file systems are often the default for Linux distributions, it is commonly considered to be essential software.

ext4 is a journaling file system for Linux, developed as the successor to ext3.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

In computer operating systems, mkfs is a command used to format a block storage device with a specific file system. The command is part of Unix and Unix-like operating systems. In Unix, a block storage device must be formatted with a file system before it can be mounted and accessed through the operating system's filesystem hierarchy.

An HTree is a specialized tree data structure for directory indexing, similar to a B-tree. They are constant depth of either one or two levels, have a high fanout factor, use a hash of the filename, and do not require balancing. The HTree algorithm is distinguished from standard B-tree methods by its treatment of hash collisions, which may overflow across multiple leaf and index blocks. HTree indexes are used in the ext3 and ext4 Linux filesystems, and were incorporated into the Linux kernel around 2.5.40. HTree indexing improved the scalability of Linux ext2 based filesystems from a practical limit of a few thousand files, into the range of tens of millions of files per directory.

A journaling file system is a file system that keeps track of changes not yet committed to the file system's main part by recording the goal of such changes in a data structure known as a "journal", which is usually a circular log. In the event of a system crash or power failure, such file systems can be brought back online more quickly with a lower likelihood of becoming corrupted.

References

  1. The maximum number of inodes (and hence the maximum number of files and directories) is set when the file system is created. If V is the volume size in bytes, then the default number of inodes is given by V/213 (or the number of blocks, whichever is less), and the minimum by V/223. The default was deemed sufficient for most applications. The max number of subdirectories in one directory is fixed to 32000.
  2. "ReactOS 0.4.2 Released". reactos.org. Retrieved 17 August 2016.
  3. Stephen C. Tweedie (May 1998). "Journaling the Linux ext2fs Filesystem" (PDF). Proceedings of the 4th Annual LinuxExpo, Durham, NC. Retrieved 2007-06-23.
  4. Stephen C. Tweedie (February 17, 1999). "Re: fsync on large files". Linux kernel mailing list.
  5. Rob Radez (November 23, 2001). "2.4.15-final". Linux kernel mailing list.
  6. 1 2 "Chapter 6. The Ext4 File System Red Hat Enterprise Linux 6".
  7. Piszcz, Justin. "Benchmarking Filesystems Part II". Linux Gazette (122).
  8. Ivers, Hans. "Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch". Archived from the original on 2008-09-13. Retrieved 2010-11-03.{{cite journal}}: Cite journal requires |journal= (help)
  9. Smith, Roderick W. (2003-10-09). "Introduction to Linux filesystems and files". Linux.com. Archived from the original on August 30, 2011.
  10. Trageser, James (2010-04-23). "Which Linux filesystem to choose for your PC? Ext2, Ext3, Ext4, ReiserFS (Reiser3), Reiser4, XFS, Btrfs".
  11. Cao, Mingming. "Directory indexing". Features found in Linux 2.6. Archived from the original on 2019-07-18. Retrieved 2009-04-01.
  12. Matthew Wilcox. "Documentation/filesystems/ext2.txt". Linux kernel source documentation.
  13. 1 2 Daniel Robbins (2001-12-01). "Common threads: Advanced filesystem implementor's guide, Part 8". IBM developerWorks . Archived from the original on 2007-10-13.
  14. curious onloooker: Speeding up ext3 filesystems. Evuraan.blogspot.com (2007-01-09). Retrieved on 2013-06-22.
  15. Radez, Rob (2005). "Extents, Delayed Allocation". future of ext3. Archived from the original on 2008-07-08. Retrieved 2008-07-30.
  16. Robert Nichols (2007-04-03) Re: How many sub-directories ? Archived 2008-10-06 at the Wayback Machine linux.derkeiler.com
  17. Andreas Dilger. "Post to the ext3-users mailing list". ext3-users mailing list post.
  18. Shake. Vleu.net. Retrieved on 2013-06-22.
  19. Defrag written in shell. Ck.kolivas.org (2012-08-19). Retrieved on 2013-06-22.
  20. Defrag written in Python. Bazaar.launchpad.net. Retrieved on 2013-06-22.
  21. RE: searching for ext3 defrag/file move program. Redhat.com (2005-03-04). Retrieved on 2013-06-22.
  22. 5.10. Filesystems. Tldp.org (2002-11-09). Retrieved on 2013-06-22.
  23. "#849 closed Enhancement (fixed) - preallocation to prevent fragmentation". trac.transmissionbt.com. The default Ubuntu filesystem ("ext3") will fragment large (>1GB), slowly growing files (<1 MB/s)
  24. Oliver Diedrich (27 October 2008). "Tuning the Linux file system Ext3". We found heavily fragmented free areas on an intensively used IMAP server which stores all its emails in individual files – although more than 900 GB of the total disk space of 1.4 TB were still available
  25. Ext4 – Linux Kernel Newbies. Kernelnewbies.org (2011-05-19). Retrieved on 2013-06-22.
  26. Linux ext3 FAQ. Batleth.sapienti-sat.org. Retrieved on 2013-06-22.
  27. HOWTO recover deleted files on an ext3 file system Archived 2010-09-19 at the Wayback Machine . Xs4all.nl (2008-02-07). Retrieved on 2013-06-22.
  28. PhotoRec – GPL'd File Recovery. Cgsecurity.org. Retrieved on 2013-06-22.
  29. UFS Explorer Standard Recovery version 4. Ufsexplorer.com. Retrieved on 2013-06-22.
  30. e3compr – ext3 compression. Sourceforge.net. Retrieved on 2013-06-22.
  31. Jonathan Corbet. "The Next3 filesystem". LWN.
  32. 1 2 Re: Frequent metadata corruption with ext3 + hard power-off Archived 2007-09-28 at the Wayback Machine . Archives.free.net.ph. Retrieved on 2013-06-22.
  33. Re: Frequent metadata corruption with ext3 + hard power-off Archived 2007-09-28 at the Wayback Machine . Archives.free.net.ph. Retrieved on 2013-06-22.
  34. Red Hat Enterprise Linux, Chapter 20. Write Barriers
  35. ext4: Add the journal checksum feature. Article.gmane.org (2008-02-26). Retrieved on 2013-06-22.
  36. Re: write barrier over device mapper supported or not? Archived 2009-05-04 at the Wayback Machine . Oss.sgi.com. Retrieved on 2013-06-22.
  37. XFS and zeroed files Archived 2008-04-30 at the Wayback Machine . Madduck.net (2008-07-11). Retrieved on 2013-06-22.
  38. Barrier Sync. forums.opensuse.org (March 2007)
  39. Re: Proposal for "proper" durable fsync() and fdatasync(). Mail-archive.com (2008-02-26). Retrieved on 2013-06-22.
  40. I/O Barriers, as of kernel version 2.6.31. Mjmwired.net. Retrieved on 2013-06-22.
  41. Virtualization and IO Modes = Extra Complexity. Mysqlperformanceblog.com (2011-03-21). Retrieved on 2013-06-22.
  42. SSD, XFS, LVM, fsync, write cache, barrier and lost transactions. Mysqlperformanceblog.com (2009-03-02). Retrieved on 2013-06-22.
  43. Clark, Libby (19 February 2015). "10 Highlights of Jon Corbet's Linux Kernel Report" . Retrieved 2019-01-26.
  44. "Theodore Ts'o": Proposal and plan for ext2/3 future development work. LKML. Retrieved on 2013-06-22.
  45. Ryan Paul (2009-04-13). "Panelists ponder the kernel at Linux Collaboration Summit". Ars Technica. Retrieved 2009-08-22.
  46. Theodore Ts'o (2008-08-01). "Re: reiser4 for 2.6.27-rc1". linux-kernel (Mailing list). Retrieved 2010-12-31.