Defragmentation

Last updated
Visualization of fragmentation and then of defragmentation FragmentationDefragmentation.gif
Visualization of fragmentation and then of defragmentation

In the maintenance of file systems, defragmentation is a process that reduces the degree of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions (fragments, extents). It also attempts to create larger regions of free space using compaction to impede the return of fragmentation. Some defragmentation utilities try to keep smaller files within a single directory together, as they are often accessed in sequence.

Contents

Defragmentation is advantageous and relevant to file systems on electromechanical disk drives (hard disk drives, floppy disk drives and optical disk media). The movement of the hard drive's read/write heads over different areas of the disk when accessing fragmented files is slower, compared to accessing the entire contents of a non-fragmented file sequentially without moving the read/write heads to seek other fragments.

Causes of fragmentation

Fragmentation occurs when the file system cannot or will not allocate enough contiguous space to store a complete file as a unit, but instead puts parts of it in gaps between existing files (usually those gaps exist because they formerly held a file that the file system has subsequently deleted or because the file system allocated excess space for the file in the first place). Files that are often appended to (as with log files) as well as the frequent adding and deleting of files (as with emails and web browser cache), larger files (as with videos) and greater numbers of files contribute to fragmentation and consequent performance loss. Defragmentation attempts to alleviate these problems.

Example

Examples of five states of fragmentation File system fragmentation.svg
Examples of five states of fragmentation

An otherwise blank disk has five files, A through E, each using 10 blocks of space (for this section, a block is an allocation unit of the filesystem; the block size is set when the disk is formatted and can be any size supported by the filesystem). On a blank disk, all of these files would be allocated one after the other (see example 1 in the image). If file B were to be deleted, there would be two options: mark the space for file B as empty to be used again later, or move all the files after B so that the empty space is at the end. Since moving the files could be time-consuming if there were many files which needed to be moved, usually the empty space is simply left there, marked in a table as available for new files (see example 2 in the image). [nb 1] When a new file, F, is allocated requiring 6 blocks of space, it could be placed into the first 6 blocks of the space that formerly held file B, and the 4 blocks following it will remain available (see example 3 in the image). If another new file, G, is added and needs only 4 blocks, it could then occupy the space after F and before C (example 4 in the image).

However, if file F then needs to be expanded, there are three options, since the space immediately following it is no longer available:

  1. Move the file F to where it can be created as one contiguous file of the new, larger size. This would not be possible if the file is larger than the largest contiguous space available. The file could also be so large that the operation would take an undesirably long period of time.
  2. Move all the files after F until one opens enough space to make it contiguous again. This presents the same problem as in the previous example: if there are a small number of files or not much data to move, it isn't a big problem, but if there are thousands or even tens of thousands of files, there isn't enough time to move all those files.
  3. Add a new block somewhere else, and indicate that F has a second extent (see example 5 in the image). Repeat this hundreds of times and the filesystem will have a number of small free segments scattered in many places, and some files will have multiple extents. When a file has many extents like this, access time for that file may become excessively long because of all the random seeking the disk will have to do when reading it.

Additionally, the concept of “fragmentation” is not only limited to individual files that have multiple extents on the disk. For instance, a group of files normally read in a particular sequence (like files accessed by a program when it is loading, which can include certain DLLs, various resource files, the audio/visual media files in a game) can be considered fragmented if they are not in sequential load-order on the disk, even if these individual files are not fragmented; the read/write heads will have to seek these (non-fragmented) files randomly to access them in sequence. Some groups of files may have been originally installed in the correct sequence, but drift apart with time as certain files within the group are deleted. Updates are a common cause of this, because in order to update a file, most updaters usually delete the old file first, and then write a new, updated one in its place. However, most filesystems do not write the new file in the same physical place on the disk. This allows unrelated files to fill in the empty spaces left behind.

Mitigation

Defragmentation is the operation of moving file extents (physical allocation blocks) so they eventually merge, preferably into one. Doing so usually requires at least two copy operations: one to move the blocks into some free scratch space on the disk so more movement can happen, and another to finally move the blocks into their intended place. In such a paradigm, no data is ever removed from the disk, so that the operation can be safely stopped even in the event of a power loss. The article picture depicts an example.

To defragment a disk, defragmentation software (also known as a "defragmenter") can only move files around within the free space available. This is an intensive operation and cannot be performed on a filesystem with little or no free space. During defragmentation, system performance will be degraded, and it is best to leave the computer alone during the process so that the defragmenter does not get confused by unexpected changes to the filesystem. Depending on the algorithm used it may or may not be advantageous to perform multiple passes. The reorganization involved in defragmentation does not change logical location of the files (defined as their location within the directory structure).

Besides defragmenting program files, the defragmenting tool can also reduce the time it takes to load programs and open files. For example, the Windows 9x defragmenter included the Intel Application Launch Accelerator which optimized programs on the disk by placing the defragmented program files and their dependencies next to each other, in the order in which the program loads them, to load these programs faster. [1] In Windows, a good defragmenter will read the Prefetch files to identify as many of these file groups as possible and place the files within them in access sequence.

At the beginning of the hard drive, the outer tracks have a higher data transfer rate than the inner tracks. Placing frequently accessed files onto the outer tracks increases performance. [2] Third party defragmenters, such as MyDefrag, will move frequently accessed files onto the outer tracks and defragment these files. [3]

Improvements in modern hard drives such as RAM cache, faster platter rotation speed, command queuing (SCSI/ATA TCQ or SATA NCQ), and greater data density reduce the negative impact of fragmentation on system performance to some degree, though increases in commonly used data quantities offset those benefits. However, modern systems profit enormously from the huge disk capacities currently available, since partially filled disks fragment much less than full disks, [4] and on a high-capacity HDD, the same partition occupies a smaller range of cylinders, resulting in faster seeks. However, the average access time can never be lower than a half rotation of the platters, and platter rotation (measured in rpm) is the speed characteristic of HDDs which has experienced the slowest growth over the decades (compared to data transfer rate and seek time), so minimizing the number of seeks remains beneficial in most storage-heavy applications. Defragmentation is just that: ensuring that there is at most one seek per file, counting only the seeks to non-adjacent tracks.

Partitioning

A common strategy to optimize defragmentation and to reduce the impact of fragmentation is to partition the hard disk(s) in a way that separates partitions of the file system that experience many more reads than writes from the more volatile zones where files are created and deleted frequently. The directories that contain the users' profiles are modified constantly (especially with the Temp directory and web browser cache creating thousands of files that are deleted in a few days). If files from user profiles are held on a dedicated partition (as is commonly done on UNIX recommended files systems, where it is typically stored in the /var partition), the defragmenter runs better since it does not need to deal with all the static files from other directories. (Alternatively, a defragmenter can be told to simply exclude certain file paths.) For partitions with relatively little write activity, defragmentation time greatly improves after the first defragmentation, since the defragmenter will need to defragment only a small number of new files in the future.

Offline defragmentation

The presence of immovable system files, especially a swap file, can impede defragmentation. These files can be safely moved when the operating system is not in use. For example, ntfsresize moves these files to resize an NTFS partition. The tool PageDefrag could defragment Windows system files such as the swap file and the files that store the Windows registry by running at boot time before the GUI is loaded. Since Windows Vista, the feature is not fully supported and has not been updated.

In NTFS, as files are added to the disk, the Master File Table (MFT) must grow to store the information for the new files. Every time the MFT cannot be extended due to some file being in the way, the MFT will gain a fragment. In early versions of Windows, it could not be safely defragmented while the partition was mounted, and so Microsoft wrote a hardblock in the defragmenting API. However, since Windows XP, an increasing number of defragmenters are now able to defragment the MFT, because the Windows defragmentation API has been improved and now supports that move operation. [5] Even with the improvements, the first four clusters of the MFT remain unmovable by the Windows defragmentation API, resulting in the fact that some defragmenters will store the MFT in two fragments: The first four clusters wherever they were placed when the disk was formatted, and then the rest of the MFT at the beginning of the disk (or wherever the defragmenter's strategy deems to be the best place).

Solid-state disks

When reading data from a conventional electromechanical hard disk drive, the disk controller must first position the head, relatively slowly, to the track where a given fragment resides, and then wait while the disk platter rotates until the fragment reaches the head. A solid-state drive (SSD) is based on flash memory with no moving parts, so random access of a file fragment on flash memory does not suffer this delay, making defragmentation to optimize access speed unnecessary. Furthermore, since flash memory can be written to only a limited number of times before it fails, defragmentation is actually detrimental (except in the mitigation of catastrophic failure). However, Windows still defragments a SSD automatically (albeit less vigorously) to prevent the file system from reaching its maximum fragmentation tolerance (when the metadata can’t represent any more file fragments). Once the maximum fragmentation limit is reached, subsequent attempts to write to disk fail. [6]

SMR hard disks

Although many SMR hard disks will accept TRIM command, SMR hard disks still need defragmentation for improved performance.[ citation needed ]

Approach and defragmenters by file-system type

A Windows defragmentation utility Auslogics disk defrag.gif
A Windows defragmentation utility

See also

Notes

  1. The practice of marking the now unused space of a deleted file in a table as available for later use (without erasing its contents), is why undelete programs are able to work; they recover files whose names have been deleted from the directory, but whose space has not yet been reused.[ citation needed ]

Related Research Articles

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

<span class="mw-page-title-main">Disk partitioning</span> Creation of separate accessible storage areas on a secondary computer storage device

Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk, before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk. System administrators use a program called a partition editor to create, resize, delete, and manipulate the partitions. Partitioning allows the use of different filesystems to be installed for different kinds of files. Separating user data from system data can prevent the system partition from becoming full and rendering the system unusable. Partitioning can also make backing up easier. A disadvantage is that it can be difficult to properly size partitions, resulting in having one partition with too much free space and another nearly totally allocated.

The Installable File System (IFS) is a filesystem API in MS-DOS/PC DOS 4.x, IBM OS/2 and Microsoft Windows that enables the operating system to recognize and load drivers for file systems.

<span class="mw-page-title-main">File system</span> Computer filing system

In computing, a file system or filesystem governs file organization and access. A local file system is a capability of an operating system that services the applications running on the same computer. A distributed file system is a protocol that provides file access between networked computers.

PageDefrag is a program, developed by Sysinternals, for Microsoft Windows that runs at start-up to defragment the virtual memory page file, the registry files and the Event Viewer's logs.

Undeletion is a feature for restoring computer files which have been removed from a file system by file deletion. Deleted data can be recovered on many file systems, but not all file systems provide an undeletion feature. Recovering data without an undeletion facility is usually called data recovery, rather than undeletion. Undeletion can both help prevent users from accidentally losing data, or can pose a computer security risk, since users may not be aware that deleted files remain accessible.

<span class="mw-page-title-main">Diskeeper</span> Hard-drive defragmentation program

Diskeeper is a discontinued defragmentation app, designed for Microsoft Windows. It was developed by Executive Software International, Inc., which later changed its name to Diskeeper Corporation, and is now called Condusiv Technologies. The final version of Diskeeper was released in March 2020. All of Diskeeper's features and functionality are now included in Condusiv's data performance software DymaxIO.

<span class="mw-page-title-main">PhotoRec</span> Open source data recovery software

PhotoRec is a free and open-source utility software for data recovery with text-based user interface using data carving techniques, designed to recover lost files from various digital camera memory, hard disk and CD-ROM. It can recover the files with more than 480 file extensions . It is also possible to add custom file signature to detect less known files.

<span class="mw-page-title-main">Microsoft Drive Optimizer</span> Windows utility which defragments a hard drive

Microsoft Drive Optimizer is a utility in Microsoft Windows designed to increase data access speed by rearranging files stored on a disk to occupy contiguous storage locations, a technique called defragmentation. Microsoft Drive Optimizer was first officially shipped with Windows XP.

The NTFS file system defines various ways to redirect files and folders, e.g., to make a file point to another file or its contents without making a copy of it. The object being pointed to is called the target. Such file is called a hard or symbolic link depending on a way it's stored on the filesystem.

Contig is a command line defragmentation utility for Windows developed by Microsoft as part of the Sysinternals Suite.

<span class="mw-page-title-main">File system fragmentation</span> Condition where a segmented file system is used inefficiently

In computing, file system fragmentation, sometimes called file system aging, is the tendency of a file system to lay out the contents of files non-continuously to allow in-place modification of their contents. It is a special case of data fragmentation. File system fragmentation negatively impacts seek time in spinning storage media, which is known to hinder throughput. Fragmentation can be remedied by re-organizing files and free space back into contiguous areas, a process called defragmentation.

<span class="mw-page-title-main">O&O Defrag</span>

O&O Defrag is a Windows defragmentation utility sold by German software developer O&O Software. It has won several awards by PC journals and magazines, and is certified by Microsoft for all its current NTFS-based operating systems, including NT 4.0, 2000, XP, Vista, 7, 8, 8.1, 10, 11, Server 2003, 2008, 2008 R2, 2012, 2012 R2, 2016, 2019 and 2022.

<span class="mw-page-title-main">Vopt</span> Windows defragmentation utility

Vopt is a Windows defragmentation utility sold by Golden Bow Systems. It is one of the oldest defragmentation products, and has supported MS-DOS and all versions of Microsoft Windows. The convenience of quick processing time is offset by less optimal performance, but when used in conjunction with the built-in optimization of the Windows prefetch folder, system performance is maintained without major reorganization of all the files on the drive.

<span class="mw-page-title-main">Defraggler</span> Defragmentation utility for Windows

Defraggler is a freemium defragmentation utility developed by Piriform Software, which can defragment individual files or groups of files on computer memory systems. Defraggler runs on Microsoft Windows; it has support for all versions since Windows XP. It includes support for both IA-32 and x64 versions of these operating systems.

<span class="mw-page-title-main">UltraDefrag</span>

UltraDefrag is a disk defragmentation utility for Microsoft Windows. Prior to version 8.0.0 it was released under the GNU General Public License. The only other Windows-based defragmentation utility licensed under the GNU GPL was JkDefrag, discontinued in 2008.

<span class="mw-page-title-main">JkDefrag</span> Computer program

JkDefrag is a free open-source disk defragmenting utility computer program for Windows. It was developed by Jeroen Kessels beginning in 2004 and was released under the GNU General Public License. Since version 4 of 2008, much changed from previous versions, JkDefrag was renamed MyDefrag by its developer; earlier JkDefrag versions continued to be available. MyDefrag, which was closed source freeware, was discontinued, with the last version being v4.3.1, file date 21 May 2010; for several years the MyDefrag website has been a redirect to the still-existing JkDefrag site, but links to downloadable JKdefrag files are dead. JkDefrag source code is still available from 3rd party websites

UltimateDefrag is a retail file-system defragmentation utility made by DiskTrix. An older version of the program is available as the UltimateDefrag Freeware Edition.

PerfectDisk is a defragmentation software product for Windows developed by Raxco.

References

  1. Cwdixon.com Archived 2010-10-06 at the Wayback Machine . Cwdixon.com. Retrieved on 2013-07-28.
  2. The Ultimate Defragger - LaRud's Place. Larud.net (2012-01-19). Retrieved on 2013-07-28.
  3. "MyDefrag v4.2.8". Archived from the original on 2010-02-16. Retrieved 2014-08-14. On most harddisks the beginning of the harddisk is considerably faster than the end, sometimes by as much as 200 percent! You can measure this yourself with utilities such as * HD Tune. MyDefrag is therefore geared towards moving all files to the beginning of the disk.
  4. Serdar Yegulalp (20 September 2005). "New hard disk drives reduce need for disk defragmentation". SearchWindowsServer.com: Disk Defragmentation Fast Guide. Archived from the original on 3 June 2008. Retrieved 2008-12-27.
  5. "Windows XP: Kernel Improvements Create a More Robust, Powerful, and Scalable OS -- MSDN Magazine, December 2001". Archived from the original on 2003-04-24. Retrieved 2006-12-19. msdn.microsoft.com: "The other big enhancement [in windows XP] is support for online defragmentation of the MFT and most directory and file metadata"
  6. Hanselman, Scott (3 December 2014). "The real and complete story - Does Windows defragment your SSD?". Scott Hanselman's blog. Microsoft. Archived from the original on 22 December 2014.
  7. Norton, Peter (October 1994). Peter Norton's Complete Guide to DOS 6.22 . Sams. p.  521.
  8. M. Kozierok, Charles (2001-04-17). "NTFS Versions". PC Guide. Archived from the original on 2015-09-24. Retrieved 2015-02-20.
  9. Third-party disk defragmenter tools for Windows Archived 2011-11-28 at the Wayback Machine . Support.microsoft.com (2011-08-23). Retrieved on 2013-07-28.
  10. "Disk Defragmentation – Background and Engineering the Windows 7 Improvements". Archived from the original on 2014-06-13. Retrieved 2014-06-15.
  11. "New Defrag options in Windows 8". 13 November 2011. Archived from the original on 2015-02-20. Retrieved 2014-06-15.
  12. "FreeBSD Man Pages". The FreeBSD Project. Archived from the original on 21 February 2015. Retrieved 21 February 2015.
  13. "Linux kernel 3.0, Section 1.1. Btrfs: Automatic defragmentation, scrubbing, performance improvements". kernelnewbies.org. 2011-07-21. Archived from the original on 2016-03-30. Retrieved 2016-04-05.
  14. "HTG Explains: Why Linux Doesn't Need Defragmenting". How-To Geek. Archived from the original on 2013-07-19. Retrieved 2013-08-01.
  15. 5.10. Filesystems Archived 2013-05-27 at the Wayback Machine . Tldp.org (2002-11-09). Retrieved on 2013-06-22.
  16. Erik Bärwaldt: Optimizing data organization on disk Archived 2014-09-06 at the Wayback Machine
  17. "Journaling File System Support". eComStation. Archived from the original on 2008-12-08. Retrieved 2008-12-27.
  18. "Fragmentation in HFS Plus Volumes". Archived from the original on 18 November 2012. Retrieved 2 September 2020. As we have seen, an HFS+ volume seems to resist fragmentation rather well on Mac OS X 10.3.x, and I don't envision fragmentation to be a problem bad enough to require proactive remedies (such as a defragmenting tool).
  19. "Detecting a file fragmentation point for reconstructing fragmented files using sequential hypothesis testing". US8407192 B2. Archived from the original on 21 February 2015. Retrieved 21 February 2015.
  20. Reeves, Nick (26 October 1990). "E format design document". Archived from the original on 7 April 2013. Retrieved 24 May 2013.

Sources