Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". [1] Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. [1] [2] The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. [1] In some operating systems all or parts of these three processes can be combined or repeated at different levels [lower-alpha 1] and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.
As a general rule, [lower-alpha 2] formatting a disk by default leaves most if not all existing data on the disk medium; some or most of which might be recoverable with privileged [lower-alpha 3] or special tools. [6] Special tools can remove user data by a single overwrite of all files and free space. [7]
A block, a contiguous number of bytes, is the minimum unit of storage that is read from and written to a disk by a disk driver. The earliest disk drives had fixed block sizes (e.g. the IBM 350 disk storage unit (of the late 1950s) block size was 100 six-bit characters) but starting with the 1301 [8] IBM marketed subsystems that featured variable block sizes: a particular track could have blocks of different sizes. The disk subsystems and other direct access storage devices on the IBM System/360 expanded this concept in the form of Count Key Data (CKD) and later Extended Count Key Data (ECKD); however the use of variable block size in HDDs fell out of use in the 1990s; one of the last HDDs to support variable block size was the IBM 3390 Model 9, announced May 1993. [9]
Modern hard disk drives, such as Serial attached SCSI (SAS) [lower-alpha 4] and Serial ATA (SATA) [10] drives, appear at their interfaces as a contiguous set of fixed-size blocks; for many years 512 bytes long but beginning in 2009 and accelerating through 2011, all major hard disk drive manufacturers began releasing hard disk drive platforms using the Advanced Format of 4096 byte logical blocks. [11] [12]
Floppy disks generally only used fixed block sizes but these sizes were a function of the host's OS and its interaction with its controller so that a particular type of media (e.g., 5¼-inch DSDD) would have different block sizes depending upon the host OS and controller.
Optical discs generally only use fixed block sizes.
Formatting a disk for use by an operating system and its applications typically involves three different processes. [lower-alpha 5]
The low-level format of floppy disks (and early hard disks) is performed by the disk drive's controller.
For a standard 1.44 MB floppy disk, low-level formatting normally writes 18 sectors of 512 bytes to each of 160 tracks (80 on each side) of the floppy disk, providing 1,474,560 bytes of storage on the disk.
Physical sectors are actually larger than 512 bytes, as in addition to the 512 byte data field they include a sector identifier field, CRC bytes (in some cases error correction bytes) and gaps between the fields. These additional bytes are not normally included in the quoted figure for overall storage capacity of the disk.
Different low-level formats can be used on the same media; for example, large records can be used to cut down on inter-record gap size.
Several freeware, shareware and free software programs (e.g. GParted, FDFORMAT, NFORMAT, VGA-Copy and 2M) allowed considerably more control over formatting, allowing the formatting of high-density 3.5" disks with a capacity up to 2 MB.
Techniques used include:
Linux supports a variety of sector sizes, [13] and DOS and Windows support a large-record-size DMF-formatted floppy format. [14]
After establishing the structure of tracks, a formatter also needs to fill the entire floppy and look for bad sectors. Traditionally, the physical sectors were initialized with a fill value of 0xF6
as per the INT 1Eh's Disk Parameter Table (DPT) during format on IBM compatible machines. This value is also used on the Atari Portfolio. CP/M 8-inch floppies typically came pre-formatted with a value of 0xE5
, [15] and by way of Digital Research this value was also used on Atari ST and some Amstrad formatted floppies. [lower-alpha 6] Amstrad otherwise used 0xF4
as a fill value.
Hard disk drives prior to the 1990s typically had a separate disk controller that defined how data was encoded on the media. With the media, the drive and/or the controller possibly procured from separate vendors, users were often able to perform low-level formatting. Separate procurement also had the potential of incompatibility between the separate components such that the subsystem would not reliably store data. [lower-alpha 7]
User-instigated low-level formatting (LLF) of hard disk drives was common for minicomputer and personal computer systems until the 1990s. IBM and other mainframe system vendors typically supplied their hard disk drives (or media in the case of removable media HDDs) with a low-level format. Typically this involved subdividing each track on the disk into one or more blocks which would contain the user data and associated control information. Different computers used different block sizes and IBM notably used variable block sizes but the popularity of the IBM PC caused the industry to adopt a standard of 512 user data bytes per block by the middle 1980s.
Depending upon the system, low-level formatting was generally done by an operating system utility. IBM compatible PCs used the BIOS, which is invoked using the MS-DOS debug program, to transfer control to a routine hidden at different addresses in different BIOSes. [16]
Starting in the late 1980s, driven by the volume of IBM compatible PCs, HDDs became routinely available pre-formatted with a compatible low-level format. At the same time, the industry moved from historical (dumb) bit serial interfaces to modern (intelligent) bit serial interfaces and word serial interfaces wherein the low-level format was performed at the factory. [17] [18] Accordingly, it is not possible for an end user to low-level format a modern hard disk drive.
Modern hard drives can no longer perform post-production LLF, i.e. to re-establish the basic layout of "tracks" and "blocks" on the recording surface. Reinitialization refers to processes that return a disk to a factory-like configuration: no data, no partitioning, all blocks available to use.
SCSI provides a Format Unit command. This command performs the needed certification step to weed out bad sectors and has the ability to change sector size. The command-line sg_format program may be used to issue the command. [19] A variety of sector sizes may be chosen, but are not available on all devices: 512, 520, 524, 528, 4096, 4112, 4160, and 4224-byte sectors. [20] Although the SCSI command provides many options, even resizing, it does not touch on the track layer where low-level format happens. [21]
ATA does not expose a low-level format functionality, but they allow the sector size to be changed via SET SECTOR CONFIGURATION (--set-sector-size in hdparm
). (Consumer drives usually only support 512 and 4096-byte sectors.) Although sector-size change may scramble data, it is not a safe way of erasing data, nor is any certification done. ATA offers a separate SECURITY ERASE (--security-erase in hdparm
) command for erasure. [22]
NVMe drives have a standard method of formatting, available in, for example, the Linux command-line program nvme format. Sector size change and secure erase options are available. [23] Note that NVMe drives are generally solid-state, making this "track" distinction useless.
Seagate Technology drives offer a TTL serial debugging console. [24] Among other things, the console can format the "system" and "user" partitions while performing defect checks (re-initialization over pre-established logical blocks) and modify track parameters (managing the real low-level format). [25]
When the hard drive's built-in reinitialization function (see above) is unavailable due to driver or system limitations, it is possible to fill the entire disk instead. On older hard drives without bad sector management, [26] a program will also need to check for any damaged sectors and try to spare them out. On newer drives with defect management, reallocated sectors may be left unerased, whereas the built-in re-initialization function will erase them. [27]
In modern times, it is most common to fill hard drives with value of 0x00
. One popular method for performing this zero-fill operation on a hard disk is by writing zero-value bytes to the drive using the Unix dd utility with the /dev/zero stream as the input file and the drive itself (or a specific partition) as the output file. [28] This command may take many hours to complete, and will erase all files and file systems.
A value of 0xFF
is used on flash disks to reduce wear . The latter value is typically also the default value used on ROM disks (which cannot be reformatted). Some advanced tools allow configuring the fill value. [lower-alpha 8]
Zero-filling a drive is not a secure method of preparing a drive for use with an encrypted filesystem. Doing so voids the plausible deniability of the process, as the encrypted areas (indistinguishable from random without a key, unless the cipher is compromised) will stand out among zero blocks. The correct technique is to zero-fill inside a temporary encrypted layer then discard the key and layer setup. (/dev/urandom provides similar safety, but tends to be slow.) [29]
This section needs additional citations for verification .(July 2009) |
The present ambiguity in the term low-level format seems to be due to both inconsistent documentation on web sites and the belief by many users that any process below a high-level (file system) format must be called a low-level format. Since much of the low-level formatting process can today only be performed at the factory, various drive manufacturers describe reinitialization software as LLF utilities on their web sites. Since users generally have no way to determine the difference between a complete LLF and reinitialization (they simply observe running the software results in a hard disk that must be high-level formatted), both the misinformed user and mixed signals from various drive manufacturers have perpetuated this error.
Note: whatever possible misuse of such terms may exist, many sites do make such reinitialization utilities available (possibly as bootable floppy diskette or CD image files), to both overwrite every byte and check for damaged sectors on the hard disk.
Partitioning is the process of writing information into blocks of a storage device or medium to divide the device into several sub-devices, each of which is treated by the operating system as a separate device and, in some cases, to allow an operating system to be booted from the device.
On MS-DOS, Microsoft Windows, and UNIX-based operating systems (such as BSD, Linux and macOS) this is normally done with a partition editor, such as fdisk, GNU Parted, or Disk Utility. These operating systems support multiple partitions.
Floppy disks are not partitioned; however depending upon the OS they may require volume information in order to be accessed by the OS.
Partition editors and ICKDSF today do not handle low-level functions for HDDs and optical disc drives such as writing timing marks, and they cannot reinitialize a modern disk that has been degaussed or otherwise lost the factory formatting.
IBM operating systems derived from CP-67, e.g., z/VM, maintain partitioning information for minidisks externally to the drive.
High-level formatting is the process of setting up an empty file system on a disk partition or a logical volume and for PCs, installing a boot sector. [1] This is often a fast operation, and is sometimes referred to as quick formatting.
Formatting an entire logical drive or partition may be optionally scanned for defects, which may take considerable time.
In the case of floppy disks, both high- and low-level formatting are customarily performed in one pass by the disk formatting software. Eight-inch floppies typically came low-level formatted and were filled with a format filler value of 0xE5
. [15] [lower-alpha 6] Since the 1990s, most 5.25-inch and 3.5-inch floppies have been shipped pre-formatted from the factory as DOS FAT12 floppies.
In current IBM mainframe operating systems derived from OS/360 and DOS/360, such as z/OS and z/VSE, formatting of drives is done by the INIT command of the ICKDSF utility. [30] These OSs support only a single partition per device, called a volume. The ICKDSF functions include writing a Record 0 on every track, writing IPL text, creating a volume label, creating a Volume Table of Contents (VTOC) and, optionally, creating a VTOC index (VTOCIX); high level formatting may also be done as part of allocating a file, by a utility specific to a file system or, in some older access methods, on the fly as new data are written. In z/OS Unix System Services, there are three distinct levels of high-level formatting:
In IBM operating systems derived from CP-67, formatting a volume initializes track 0 and a dummy VTOC. Guest operating systems are responsible for formatting minidisks; the CMS FORMAT command formats a CMS file system on a CMS minidisk.
The host protected area, sometimes referred to as hidden protected area, is an area of a hard drive that is high-level formatted such that the area is not normally visible to its operating system (OS).
Reformatting is a high-level formatting performed on a functioning disk drive to free the medium of its contents. Reformatting is unique to each operating system because what actually is done to existing data varies by OS. The most important aspect of the process is that it frees disk space for use by other data. To actually "erase" everything requires overwriting each block of data on the medium; something that is not done by many high-level formatting utilities.
Reformatting often carries the implication that the operating system and all other software will be reinstalled after the format is complete. Rather than fixing an installation suffering from malfunction or security compromise, it may be necessary to simply reformat everything and start from scratch. Various colloquialisms exist for this process, such as "wipe and reload", "nuke and pave", "reimage", etc. However, reformatting a drive containing only user data does not require reinstallation of the OS.
format command: Under MS-DOS, PC DOS, OS/2 and Microsoft Windows, disk formatting can be performed by the format
command. The format
program usually asks for confirmation beforehand to prevent accidental removal of data, but some versions of DOS have an undocumented /AUTOTEST
option; if used, the usual confirmation is skipped and the format begins right away. The WM/FormatC macro virus uses this command to format drive C: as soon as a document is opened.
Unconditional format: There is also the /U
parameter that performs an unconditional format which under most circumstances overwrites the entire partition, [31] preventing the recovery of data through software. Note however that the /U
switch only works reliably with floppy diskettes (see image to the right). Technically because unless /Q
is used, floppies are always low level formatted in addition to high-level formatted. Under certain circumstances with hard drive partitions, however, the /U
switch merely prevents the creation of unformat
information in the partition to be formatted while otherwise leaving the partition's contents entirely intact (still on disk but marked deleted). In such cases, the user's data remain ripe for recovery with specialist tools such as EnCase or disk editors. Reliance upon /U
for secure overwriting of hard drive partitions is therefore inadvisable, and purpose-built tools such as DBAN should be considered instead.
Overwriting: In Windows Vista and upwards the non-quick format will overwrite as it goes. Not the case in Windows XP and below. [32]
OS/2: Under OS/2, format will overwrite the entire partition or logical drive if the /L
parameter is used, which specifies a long format. Doing so enhances the ability of CHKDSK to recover files.
High-level formatting of disks on these systems is traditionally done using the mkfs
command. On Linux (and potentially other systems as well) mkfs
is typically a wrapper around filesystem-specific commands which have the name mkfs.fsname
, where fsname is the name of the filesystem with which to format the disk. [33] Some filesystems which are not supported by certain implementations of mkfs
have their own manipulation tools; for example Ntfsprogs provides a format utility for the NTFS filesystem.
Some Unix and Unix-like operating systems have higher-level formatting tools, usually for the purpose of making disk formatting easier and/or allowing the user to partition the disk with the same tool. Examples include GNU Parted (and its various GUI frontends such as GParted and the KDE Partition Manager) and the Disk Utility application on Mac OS X.
As in file deletion by the operating system, data on a disk are not fully erased during every high-level format. Instead, the area on the disk containing the data is merely marked as available, and retains the old data until it is overwritten. If the disk is formatted with a different file system than the one which previously existed on the partition, some data may be overwritten that wouldn't be if the same file system had been used. However, under some file systems (e.g., NTFS, but not FAT), the file indices (such as $MFTs under NTFS, inodes under ext2/3, etc.) may not be written to the same exact locations. And if the partition size is increased, even FAT file systems will overwrite more data at the beginning of that new partition.
From the perspective of preventing the recovery of sensitive data through recovery tools, the data must be completely overwritten (every sector), either by a separate tool, or during formatting. Data are destroyed in DOS, OS/2, and Windows when the /L (long) option is used on format and always for a Partitioned Data Set (PDS) in MVS and for newer file systems on IBM mainframes.
It is disputed whether one pass of zero-fill is enough to destroy sensitive data on older (until 1990s) magnetic storage: Gutmann (known for his 35-pass Gutmann method) claims that magnetic force microscopy may be able to "see" old bits on a floppy, [34] but the sources he cited does not prove such. Random fill is believed to be stronger than a fixed pattern fill. [35] One pass of zero fill is sufficient to prevent data remanence, according to NIST (2014) and Wright et al (2008). [36] [37] The Secure Erase option built into hard drives is considered trustworthy, [27] [38] with the caveat that early solid state drives are known to mis-implement the function. [39]
Degaussing is effective without controversy; however, this may render the drive unusable. [27]
0xE5
is the reason why the value of 0xE5
has a special meaning in directory entries in FAT12, FAT16 and FAT32 file systems. This allowed 86-DOS to use 8-inch floppies out of the box or with only the FAT initialized./W:246
(for a fill value of 0xF6
). In contrast to other FDISK utilities, DR-DOS FDISK is not only a partitioning tool, but can also format freshly created partitions as FAT12, FAT16 or FAT32. This reduces the risk of accidentally formatting the wrong volume.A floppy disk or floppy diskette is a type of disk storage composed of a thin and flexible disk of a magnetic storage medium in a square or nearly square plastic enclosure lined with a fabric that removes dust particles from the spinning disk. The three most popular floppy disks are the 8-inch, 5¼-inch, and 3½-inch floppy disks. Floppy disks store digital data which can be read and written when the disk is inserted into a floppy disk drive (FDD) connected to or inside a computer or other device.
File Allocation Table (FAT) is a file system developed for personal computers and was the default filesystem for MS-DOS and Windows 9x operating systems. Originally developed in 1977 for use on floppy disks, it was adapted for use on hard disks and other devices. The increase in disk drives capacity required four major variants: FAT12, FAT16, FAT32, and ExFAT. FAT was replaced with NTFS as the default file system on Microsoft operating systems starting with Windows XP. Nevertheless, FAT continues to be used on flash and other solid-state memory cards and modules, many portable and embedded devices because of its compatibility and ease of implementation.
Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk after a partitioning scheme is chosen for the new disk before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk. System administrators use a program called a partition editor to create, resize, delete, and manipulate the partitions. Partitioning allows the use of different filesystems to be installed for different kinds of files. Separating user data from system data can prevent the system partition from becoming full and rendering the system unusable. Partitioning can also make backing up easier. A disadvantage is that it can be difficult to properly size partitions, resulting in having one partition with too much free space and another nearly totally allocated.
A boot sector is the sector of a persistent data storage device which contains machine code to be loaded into random-access memory (RAM) and then executed by a computer system's built-in firmware.
ProDOS is the name of two similar operating systems for the Apple II of personal computer. The original ProDOS, renamed ProDOS 8 in version 1.2, is the last official operating system usable by all 8-bit Apple II computers, and was distributed from 1983 to 1993. The other, ProDOS 16, was a stop-gap solution for the 16-bit Apple IIGS that was replaced by GS/OS within two years.
TRSDOS is the operating system for the Tandy TRS-80 line of eight-bit Zilog Z80 microcomputers that were sold through Radio Shack from 1977 through 1991. Tandy's manuals recommended that it be pronounced triss-doss. TRSDOS should not be confused with Tandy DOS, a version of MS-DOS licensed from Microsoft for Tandy's x86 line of personal computers (PCs).
dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.
The Amiga Fast File System is a file system used on the Amiga personal computer. The previous Amiga filesystem was never given a specific name and known originally simply as "DOS" or AmigaDOS. Upon the release of FFS, the original filesystem became known as Amiga Old File System (OFS). OFS, which was primarily designed for use with floppy disks, had been proving slow to keep up with hard drives of the era. FFS was designed as a full replacement for the original Amiga filesystem. FFS differs from its predecessor mainly in the removal of redundant information. Data blocks contain nothing but data, allowing the filesystem to manage the transfer of large chunks of data directly from the host adapter to the final destination.
In computing, a file system or filesystem governs file organization and access. A local file system is a capability of an operating system that services the applications running on the same computer. A distributed file system is a protocol that provides file access between networked computers.
Commodore DOS, also known as CBM DOS, is the disk operating system used with Commodore's 8-bit computers. Unlike most other DOSes, which are loaded from disk into the computer's own RAM and executed there, CBM DOS is executed internally in the drive: the DOS resides in ROM chips inside the drive, and is run there by one or more dedicated MOS 6502 family CPUs. Thus, data transfer between Commodore 8-bit computers and their disk drives more closely resembles a local area network connection than typical disk/host transfers.
The USB mass storage device class is a set of computing communications protocols, specifically a USB Device Class, defined by the USB Implementers Forum that makes a USB device accessible to a host computing device and enables file transfers between the host and the USB device. To a host, the USB device acts as an external hard drive; the protocol set interfaces with a number of storage devices.
Cylinder-head-sector (CHS) is an early method for giving addresses to each physical block of data on a hard disk drive.
In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).
A volume boot record (VBR) is a type of boot sector introduced by the IBM Personal Computer. It may be found on a partitioned data storage device, such as a hard disk, or an unpartitioned device, such as a floppy disk, and contains machine code for bootstrapping programs stored in other parts of the device. On non-partitioned storage devices, it is the first sector of the device. On partitioned devices, it is the first sector of an individual partition on the device, with the first sector of the entire device being a Master Boot Record (MBR) containing the partition table.
In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. For most disks, each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs), and 2048 bytes for CD-ROMs, DVD-ROMs and BD-ROMs. Newer HDDs and SSDs use 4096 byte (4 KiB) sectors, which are known as the Advanced Format (AF).
SpartaDOS X is a disk operating system for the Atari 8-bit computers that closely resembles MS-DOS. It was developed and sold by ICD in 1987-1993, and many years later picked up by the third-party community SpartaDOS X Upgrade Project, which still maintains the software.
DOS is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible systems from other manufacturers include DR-DOS (1988), ROM-DOS (1989), PTS-DOS (1993), and FreeDOS (1998). MS-DOS dominated the IBM PC compatible market between 1981 and 1995.
A master boot record (MBR) is a type of boot sector in the first block of partitioned computer mass storage devices like fixed disks or removable drives intended for use with IBM PC-compatible systems and beyond. The concept of MBRs was publicly introduced in 1983 with PC DOS 2.0.
The FAT file system is a file system used on MS-DOS and Windows 9x family of operating systems. It continues to be used on mobile devices and embedded systems, and thus is a well-suited file system for data exchange between computers and devices of almost any type and age from 1981 through to the present.
When you do not specify either the RECOMP or LABEL option, the disk area is initialized by writing a device-dependent number of records (containing binary zeros) on each track. Any previous data on the disk is erased.
The direct access volumes, on which TSS/360 virtual organization data sets are stored, have fixed-length, page size data blocks. No key field is required. The record overflow feature is utilized to allow data blocks to span tracks, as required. The entire volume, with the current exception of part of the first cylinder, which is used for identification, is formatted into page size blocks.
The format command behavior has changed in Windows Vista. By default in Windows Vista, the format command writes zeros to the whole disk when a full format is performed. In Windows XP and in earlier versions of the Windows operating system, the format command does not write zeros to the whole disk when a full format is performed.