Original author(s) | Ken Thompson (AT&T Bell Laboratories) |
---|---|
Developer(s) | Various open-source and commercial developers |
Initial release | June 1974 |
Repository | coreutils: git |
Written in | Plan 9: C |
Operating system | Unix, Unix-like, Plan 9, Inferno, Windows |
Platform | Cross-platform |
Type | Command |
License | coreutils: GPLv3+ Plan 9: MIT License |
dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. [1] On Unix, device drivers for hardware (such as hard disk drives) and special device files (such as /dev/zero and /dev/random) appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings. [2]
In 1974, the dd command appeared as part of Version 5 Unix. According to Dennis Ritchie, the name is an allusion to the DD statement found in IBM's Job Control Language (JCL), [3] [4] in which it is an abbreviation for "Data Definition". [5] [6] According to Douglas McIlroy, dd was "originally intended for converting files between the ASCII, little-endian, byte-stream world of DEC computers and the EBCDIC, big-endian, blocked world of IBM"; thus, explaining the cultural context of its syntax. [7] Eric S. Raymond believes "the interface design was clearly a prank", due to the command's syntax resembling a JCL statement more than other Unix commands do. [4]
In 1987, the dd command is specified in the X/Open Portability Guide issue 2 of 1987. This is inherited by IEEE Std 1003.1-2008 (POSIX), which is part of the Single UNIX Specification. [8]
In 1990, David MacKenzie announced GNU fileutils (now part of coreutils) which includes the dd
command; [9] it was written by Paul Rubin, David MacKenzie, and Stuart Kemp. [10] Since 1991, Jim Meyering is its maintainer. [11]
In 1995, Plan 9 2nd edition was released; its dd command interface was redesigned to use a traditional command-line option style instead of a JCL statement style. [12]
Since at least 1999, [13] , a native Win32 port for Microsoft Windows under UnxUtils. [14]
dd
is sometimes humorously called "Disk Destroyer", due to its drive-erasing capabilities involving typos. [15]
The command line syntax of dd differs from many other Unix programs. It uses the syntax option=value
for its command-line options rather than the more standard -option value
or --option=value
formats. By default, dd reads from stdin and writes to stdout, but these can be changed by using the if (input file) and of (output file) options. [8]
Certain features of dd will depend on the computer system capabilities, such as dd's ability to implement an option for direct memory access. Sending a SIGINFO signal (or a USR1 signal on Linux) to a running dd process makes it print I/O statistics to standard error once and then continue copying. dd can read standard input from the keyboard. When end-of-file (EOF) is reached, dd will exit. Signals and EOF are determined by the software. For example, Unix tools ported to Windows vary as to the EOF: Cygwin uses Ctrl+D (the usual Unix EOF) and MKS Toolkit uses Ctrl+Z (the usual Windows EOF).
The non-standardized parts of dd invocation vary among implementations.
On completion, dd prints to the stderr stream about statistics of the data transfer. The format is standardized in POSIX. [8] : STDERR The manual page for GNU dd does not describe this format, but the BSD manuals do.
Each of the "Records in" and "Records out" lines shows the number of complete blocks transferred + the number of partial blocks, e.g. because the physical medium ended before a complete block was read, or a physical error prevented reading the complete block.
A block is a unit measuring the number of bytes that are read, written, or converted at one time. Command-line options can specify a different block size for input/reading (ibs) compared to output/writing (obs), though the block size (bs) option will override both ibs and obs. The default value for both input and output block sizes is 512 bytes (the traditional block size of disks, and POSIX-mandated size of "a block"). The count option for copying is measured in blocks, as are both the skip count for reading and seek count for writing. Conversion operations are also affected by the "conversion block size" (cbs). [8] : OPERANDS
The value provided for block size options is interpreted as a decimal (base 10) integer number of bytes. It can also contain suffixes to indicate that the block size is an integer number of larger units than bytes. POSIX only specifies the suffixes b (blocks) for 512 and k (kibibytes) for 1024. [8] : OPERANDS Implementation differ on the additional suffixes they support: (Free) BSD uses lowercase m (mebibytes), g (gibibytes), and so on for tebibytes, exbibytes, pebibytes, zebibytes, and yobibytes, [16] while GNU uses M and G for the same units, with kB, MB, and GB used for their SI unit counterparts (kilobytes). [10] For example, for GNU dd, bs=16M indicates a blocksize of 16 mebibytes (16777216 bytes) and bs=3kB specifies 3000 bytes.
Additionally, some implementations understand the x character as a multiplication operator for both block size and count parameters. For example, bs=2x80x18b is interpreted as 2 × 80 × 18 × 512 = 1474560 bytes, the exact size of a 1440 KiB floppy disk. This is required in POSIX. [8] : OPERANDS For implementations that do not support this feature, the POSIX shell arithmetic syntax of bs=$((2*80*18))b
may be used.
Block size has an effect on the performance of copying dd commands. Doing many small reads or writes is often slower than doing fewer large ones. Using large blocks requires more RAM and can complicate error recovery. When dd is used with variable-block-size devices such as tape drives or networks, the block size may determine the tape record size or packet size, depending on the network protocol used.
The dd command can be used for a variety of purposes. For plain-copying commands it tends to be slower than the domain-specific alternatives, but it excels at its unique ability to "overwrite or truncate a file at any point or seek in a file", a fairly low-level interface to the Unix file API. [17]
The examples below assume the use of GNU dd, mainly in the block size argument. To make them portable, replace e.g. bs=64M
with the shell arithmetic expression bs=$((64*1024*1024))
or bs=$((64<<20))
(written equivalently with a bit shift).
dd can duplicate data across files, devices, partitions and volumes. The data may be input or output to and from any of these; but there are important differences concerning the output when going to a partition. Also, during the transfer, the data can be modified using the conv options to suit the medium. (For this purpose, however, dd is slower than cat.) [17]
<syntaxhighlight lang="bash" class="" id="" style="background:none; border:none; color:inherit; padding: 0px 0px;" inline="1">blocks=$(isosize -d 2048 /dev/sr0)</syntaxhighlight><br/>dd if=/dev/sr0 of=isoimage.iso bs=2048 count=$blocks status=progress | Creates an ISO disk image from a CD-ROM, DVD or Blu-ray disc. [18] |
dd if=system.img of=[[/dev/sdc]] bs=64M conv=noerror | Restores a hard disk drive (or an SD card, for example) from a previously created image. |
dd if=/dev/sdb2 of=partition.image bs=64M conv=noerror | Create an image of the partition sdb2, using a 64 MiB block size. |
dd if=/dev/sda2 of=/dev/sdb2 bs=64M conv=noerror | Clones one partition to another. |
dd if=/dev/ad0 of=/dev/ad1 bs=64M conv=noerror | Clones a hard disk drive "ad0" to "ad1". |
The noerror option means to keep going if there is an error, while the sync option causes output blocks to be padded.
dd can modify data in place. For example, this overwrites the first 512 bytes of a file with null bytes:
dd if=[[/dev/zero]] of=path/to/file bs=512 count=1 conv=notrunc
The notrunc conversion option means do not truncate the output file — that is, if the output file already exists, just replace the specified bytes and leave the rest of the output file alone. Without this option, dd would create an output file 512 bytes long.
The example above can also be used to back up and restore any region of a device to a file, such as a master boot record.
To duplicate the first two sectors of a floppy disk:
dd if=/dev/fd0 of=MBRboot[[.img]] bs=512 count=2
For security reasons, it is sometimes necessary to have a disk wipe of a discarded device. This can be achieved by a "data transfer" from the Unix special files.
dd if=/dev/zero of=/dev/sda bs=16M
.dd if=/dev/urandom of=/dev/sda bs=16M
.When compared to the data modification example above, notrunc conversion option is not required as it has no effect when the dd's output file is a block device. [19]
The bs=16M option makes dd read and write 16 mebibytes at a time. For modern systems, an even greater block size may be faster. Note that filling the drive with random data may take longer than zeroing the drive, because the random data must be created by the CPU, while creating zeroes is very fast. On modern hard-disk drives, zeroing the drive will render most data it contains permanently irrecoverable. [20] However, with other kinds of drives such as flash memories, much data may still be recoverable by data remanence.
Modern hard disk drives contain a Secure Erase command designed to permanently and securely erase every accessible and inaccessible portion of a drive. It may also work for some solid-state drives (flash drives). As of 2017, it does not work on USB flash drives nor on Secure Digital flash memories.[ citation needed ] When available, this is both faster than using dd, and more secure.[ citation needed ] On Linux machines it is accessible via the hdparm command's --security-erase-enhanced option.
The shred program offers multiple overwrites, as well as more secure deletion of individual files.
Data recovery involves reading from a drive with some parts potentially inaccessible. dd
is a good fit with this job with its flexible skipping (seek
) and other low-level settings. The vanilla dd
, however, is clumsy to use as the user has to read the error messages and manually calculate the regions that can be read. The single block size also limits the granularity of the recovery, as a trade-off has to be made: either use a small one for more data recovered or use a large one for speed.
A C program called dd_rescue
[21] was written in October 1999. It did away with the conversion functionality of dd
, and supports two block sizes to deal with the dilemma. If a read using a large size fails, it falls back to the smaller size to gather as much as data possible. It can also run backwards. In 2003, a dd_rhelp
script was written to automate the process of using dd_rescue
, keeping track of what areas have been read on its own. [22]
In 2004, GNU wrote a separate utility, unrelated to dd
, called ddrescue . It has a more sophisticated dynamic block-size algorithm and keeps track of what has been read internally. The authors of both dd_rescue
and dd_rhelp
consider it superior to their implementation. [23] To help distinguish the newer GNU program from the older script, alternate names are sometimes used for GNU's ddrescue
, including addrescue
(the name on freecode.com and freshmeat.net), gddrescue
(Debian package name), and gnu_ddrescue
(openSUSE package name).
Another open-source program called savehd7
uses a sophisticated algorithm, but it also requires the installation of its own programming-language interpreter.
To make drive benchmark test and analyze the sequential (and usually single-threaded) system read and write performance for 1024-byte blocks:
To make a file of 100 random bytes using the kernel random driver:
dd if=[[/dev/urandom]] of=myrandom bs=100 count=1
To convert a file to uppercase:
dd if=filename of=filename1 conv=ucase,notrunc
Being a program mainly designed as a filter, dd normally does not provide any progress indication. This can be overcome by sending an USR1 signal to the running GNU dd process (INFO on BSD systems), resulting in dd printing the current number of transferred blocks.
The following one-liner results in continuous output of progress every 10 seconds until the transfer is finished, when dd-pid is replaced by the process-id of dd:
<syntaxhighlight lang="bash" class="" id="" style="background:none; border:none; color:inherit; padding: 0px 0px;" inline="1">while kill -USR1</syntaxhighlight> ''dd-pid'' <syntaxhighlight lang="bash" class="" id="" style="background:none; border:none; color:inherit; padding: 0px 0px;" inline="1">; do sleep 10 ; done</syntaxhighlight>
Newer versions of GNU dd support the status=progress option, which enables periodic printing of transfer statistics to stderr. [24]
dcfldd is a fork of GNU dd that is an enhanced version developed by Nick Harbour, who at the time was working for the United States' Department of Defense Computer Forensics Lab. [25] [26] [27] Compared to dd, dcfldd allows more than one output file, supports simultaneous multiple checksum calculations, provides a verification mode for file matching, and can display the percentage progress of an operation. As of February 2024, the last release was 1.9.1 from April 2023. [28]
dc3dd is another fork of GNU dd from the United States Department of Defense Cyber Crime Center (DC3). It can be seen as a continuation of the dcfldd, with a stated aim of updating whenever the GNU upstream is updated. As of June 2023 [update] , the last release was 7.3.1 from April 2023. [29]
In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.
In computing, ls
is a command to list computer files and directories in Unix and Unix-like operating systems. It is specified by POSIX and the Single UNIX Specification.
Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.
/dev/zero is a special file in Unix-like operating systems that provides as many null characters as are read from it. One of the typical uses is to provide a character stream for initializing data storage.
wc
is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.
cksum
is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's 32-bit cyclic redundancy check (CRC) checksum and byte count. The CRC output by cksum is different from the CRC-32 used in zip, PNG and zlib.
df is a standard Unix command used to display the amount of available disk space for file systems on which the invoking user has appropriate read access. df is typically implemented using the statfs or statvfs system calls.
du
is a standard Unix program used to estimate file space usage—space used under a particular directory or files on a file system. A Windows commandline version of this program is part of Sysinternals suite by Mark Russinovich.
In computing, cmp
is a command-line utility on Unix and Unix-like operating systems that compares two files of any type and writes the results to the standard output. By default, cmp
is silent if the files are the same; if they differ, the byte and line number at which the first difference occurred is reported. The command is also available in the OS-9 shell.
The seven standard Unix file types are regular, directory, symbolic link, FIFO special, block special, character special, and socket as defined by POSIX. Different OS-specific implementations allow more types than what POSIX requires. A file's type can be identified by the ls -l
command, which displays the type in the first character of the file-system permissions field.
In Unix-like operating systems, find
is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.
tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.
In computer science, a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to the data storage media instead of the actual "empty" space which makes up the block, thus consuming less storage space. The full block is written to the media as the actual size only when the block contains "real" (non-empty) data.
cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.
In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. For most disks, each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and DVD-ROMs. Newer HDDs and SSDs use 4096-byte (4 KiB) sectors, which are known as the Advanced Format (AF).
sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum
in favor of cksum.
In computer operating systems, mkfs
is a command used to format a block storage device with a specific file system. The command is part of Unix and Unix-like operating systems. In Unix, a block storage device must be formatted with a file system before it can be mounted and accessed through the operating system's filesystem hierarchy.
GNU ddrescue is a data recovery tool for disk drives, DVDs, CDs, and other digital storage media. It copies raw blocks of storage, such as disk sectors, from one device or file to another, while handling read errors in an intelligent manner to minimize data loss by scraping good sectors from partially read blocks.
In Unix-like operating systems, a device file, device node, or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.
cat
is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files . It has been ported to a number of operating systems.
dd was always named after JCL dd cards.
Important note : For some times, dd_rhelp was the only tool (AFAIK) that did this type of job, but since a few years, it is not true anymore: Antonio Diaz did write a ideal replacement for my tool: GNU 'ddrescue'.