Disk image

Last updated

A disk image is a snapshot of a storage device's structure and data typically stored in one or more computer files on another storage device. [1] [2] Traditionally, disk images were bit-by-bit copies of every sector on a hard disk often created for digital forensic purposes, but it is now common to only copy allocated data to reduce storage space. [3] [4] Compression and deduplication are commonly used to reduce the size of the image file set. [3] [5] Disk imaging is done for a variety of purposes including digital forensics, [6] [2] cloud computing, [7] system administration, [8] as part of a backup strategy, [1] and legacy emulation as part of a digital preservation strategy. [9] Disk images can be made in a variety of formats depending on the purpose. Virtual disk images (such as VHD and VMDK) are intended to be used for cloud computing, [10] [11] ISO images are intended to emulate optical media [12] and raw disk images are used for forensic purposes. [2] Proprietary formats are typically used by disk imaging software. Despite the benefits of disk imaging the storage costs can be high, [3] management can be difficult [6] and they can be time consuming to create. [13] [9]

Contents

Background

Disk images were originally (in the late 1960s) used for backup and disk cloning of mainframe disk media. Early ones were as small as 5 megabytes and as large as 330 megabytes, and the copy medium was magnetic tape, which ran as large as 200 megabytes per reel. [14] Disk images became much more popular when floppy disk media became popular, where replication or storage of an exact structure was necessary and efficient, especially in the case of copy protected floppy disks.

Disk image creation is called disk imaging and is often time consuming, even with a fast computer, because the entire disk must be copied. [13] Typically, disk imaging requires a third party disk imaging program or backup software. The software required varies according to the type of disk image that needs to be created. For example, RawWrite and WinImage create floppy disk image files for MS-DOS and Microsoft Windows. [15] [16] In Unix or similar systems the dd program can be used to create raw disk images. [2] Apple Disk Copy can be used on Classic Mac OS and macOS systems to create and write disk image files.

Authoring software for CDs/DVDs such as Nero Burning ROM can generate and load disk images for optical media. A virtual disk writer or virtual burner is a computer program that emulates an actual disc authoring device such as a CD writer or DVD writer. Instead of writing data to an actual disc, it creates a virtual disk image. [17] [18] A virtual burner, by definition, appears as a disc drive in the system with writing capabilities (as opposed to conventional disc authoring programs that can create virtual disk images), thus allowing software that can burn discs to create virtual discs. [19]

Uses

Digital forensics

Forensic imaging is the process of creating a bit-by-bit copy of the data on the drive, including files, metadata, volume information, filesystems and their structure. [2] Often, these images are also hashed to verify their integrity and that they have not been altered since being created. Unlike disk imaging for other purposes, digital forensic applications take a bit-by-bit copy to ensure forensic soundness. The purposes of imaging the disk is to not only discover evidence preserved in digital information but also to examine the drive to gather clues of how the crime was committed.

Virtual disk image

In cloud computing, creating a virtual disk image of optical media or a hard disk drive is typically done to make the content available to one or more virtual machines. Virtual machines emulate a CD/DVD drive by reading an ISO image. This can also be faster than reading from the physical optical medium. [20] Further, there are less issues with wear and tear. A hard disk drive or solid-state drive in a virtual machine is implemented as a disk image (i.e. either the VHD format used by Microsoft's Hyper-V, the VDI format used by Oracle Corporation's VirtualBox, the VMDK format used for VMware virtual machines, or the QCOW format used by QEMU). Virtual hard disk images tend to be stored as either a collection of files (where each one is typically 2GB in size), or as a single file. Virtual machines treat the image set as a physical drive.

System administration

Rapid deployment of clone systems

Educational institutions and businesses can often need to buy or replace computer systems in large numbers. Disk imaging is commonly used to deploy the same configuration across workstations. [8] Typically, disk imaging software (such as Ghost or Clonezilla) is used to make an image of a completely configured system. [21] This image is then written to a computer's hard disk which is sometimes described as restoring an image. [22] This restoration is sometimes done over a computer network using multicasting or BitTorrent to devices that need to have their configuration restored. [23] [22] This reduces the need to maintain and update individual systems manually. Imaging is also easier than automated setup methods because an administrator does not need to have knowledge of the prior configuration to copy it. [22] Disk imaging requires for all devices to be identical and provides no flexibility in adjusting the configuration.

Network-based image deployment typically uses a PXE server to boot a minimal operating system over the network that contains the necessary components to image or restore storage media in a computer. [23] This is usually used in conjunction with a DHCP server to automate the configuration of network parameters including IP addresses. Typically, multicasting, broadcasting or unicasting is used to restore an image to many computers at a time but these approaches do not work well if one or more computers experience a problem such as UDP packet loss. [22] As a result, some imaging solutions instead use the BitTorrent protocol to transfer the data.

Backup strategy

A disk image contains all files, faithfully replicating all data, including file attributes and the file fragmentation state. For this reason, it is also used for backing up optical media (CDs and DVDs, etc.), and allows the exact and efficient recovery after experimenting with modifications to a system or virtual machine. Typically, disk imaging can be used to quickly restore an entire system to an operational state after a disaster. [24]

Digital preservation

Libraries and museums are typically required to archive and digitally preserve information without altering it in any manner. [9] [25] Emulators frequently use disk images to emulate floppy disks that have been preserved. This is usually simpler to program than accessing a real floppy drive (particularly if the disks are in a format not supported by the host operating system), and allows a large library of software to be managed. Emulation also allows existing disk images to be put into a usable form even though the data contained in the image is no longer readable without emulation. [12]

Limitations

Disk images can sometimes be slower than reading from the disk directly because of a performance overhead. [3] Other limitations can be the lack of access to software required to read the contents of the image. For example, prior to Windows 8, third party software was required to mount disk images. [26] [27] Disk imaging is time consuming and the space requirements are high. When imaging multiple computers with only minor differences, much data is duplicated unnecessarily, wasting space. [3]

Speed and failure

Disk imaging can be slow, especially for older storage devices. A typical 4.7 GB DVD can take an average of 18 minutes to duplicate. [9] Floppy disks read and write much slower than hard disks. Therefore, despite their small size, it can take several minutes to copy a single disk. In some cases, disk imaging can fail due to bad sectors or physical wear and tear on the source device. [12] Utilities such as dd are not designed to recognize or cope with failures. Therefore, any failure results in being unable to create an image of the drive. [25]

See also

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Disk storage</span> General category of storage mechanisms

Disk storage is a general category of storage mechanisms where data is recorded by various electronic, magnetic, optical, or mechanical changes to a surface layer of one or more rotating disks. A disk drive is a device implementing such a storage mechanism. Notable types are today's hard disk drives (HDD) containing one or more non-removable rigid platters, the floppy disk drive (FDD) and its removable floppy disk, and various optical disc drives (ODD) and associated optical disc media.

Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.

<span class="mw-page-title-main">Live CD</span> Complete, bootable computer installation that runs directly from a CD-ROM

A live CD is a complete bootable computer installation including operating system which runs directly from a CD-ROM or similar storage device into a computer's memory, rather than loading from a hard disk drive. A live CD allows users to run an operating system for any purpose without installing it or making any changes to the computer's configuration. Live CDs can run on a computer without secondary storage, such as a hard disk drive, or with a corrupted hard disk drive or file system, allowing data recovery.

<span class="mw-page-title-main">USB flash drive</span> Data storage device

A Flash drive is a data storage device that includes flash memory with an integrated USB interface. A typical USB drive is removable, rewritable, and smaller than an optical disc, and usually weighs less than 30 g (1 oz). Since first offered for sale in late 2000, the storage capacities of USB drives range from 8 to 256 gigabytes (GB), 512 GB and 1 terabyte (TB). As of 2023, 2 TB flash drives were the largest currently in production. Some allow up to 100,000 write/erase cycles, depending on the exact type of memory chip used, and are thought to physically last between 10 and 100 years under normal circumstances.

<span class="mw-page-title-main">Optical disc authoring</span> Content publishing on optical disks

Optical disc authoring, including CD, DVD, and Blu-ray Disc authoring, is the process of assembling source material—video, audio or other data—into the proper logical volume format to then be recorded ("burned") onto an optical disc. This act is sometimes done illegally, by pirating copyrighted material without permission from the original artists.

In computing, mass storage refers to the storage of large amounts of data in a persisting and machine-readable fashion. In general, the term is used as large in relation to contemporaneous hard disk drives, but it has been used large in relation to primary memory as for example with floppy disks on personal computers.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

Disk cloning is the process of duplicating all data on a digital storage drive, such as a hard disk or solid state drive, using hardware or software techniques. Unlike file copying, disk cloning also duplicates the filesystems, partitions, drive meta data and slack space on the drive. Common reasons for cloning a drive include; data backup and recovery; duplicating a computer's configuration for mass deployment and for preserving data for digital forensics purposes. Drive cloning can be used in conjunction with drive imaging where the cloned data is saved to one or more files on another drive rather than copied directly to another drive.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

In computer science, a data buffer is a region of a memory used to store data temporarily while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device. However, a buffer may be used when data is moved between processes within a computer. That is comparable to buffers in telecommunication. Buffers can be implemented in a fixed memory location in hardware or by using a virtual data buffer in software that points at a location in the physical memory.

In computing, external storage refers to non-volatile (secondary) data storage outside a computer's own internal hardware, and thus can be readily disconnected and accessed elsewhere. Such storage devices may refer to removable media, compact flash drives, portable storage devices, or network-attached storage. Web-based cloud storage is the latest technology for external storage.

IMG, in computing, refers to binary files with the .img filename extension that store raw disk images of floppy disks, hard drives, and optical discs or a bitmap image – .img.

A logical disk, logical volume or virtual disk is a virtual device that provides an area of usable storage capacity on one or more physical disk drive(s) in a computer system. The disk is described as logical or virtual because it does not actually exist as a single physical entity in its own right. The goal of the logical disk is to provide computer software with what seems a contiguous storage area, sparing them the burden of dealing with the intricacies of storing files on multiple physical units. Most modern operating systems provide some form of logical volume management.

Virtual disk and virtual drive are software components that emulate an actual disk storage device.

The subject of computer backups is rife with jargon and highly specialized terminology. This page is a glossary of backup terms that aims to clarify the meaning of such jargon and terminology.

This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.

<span class="mw-page-title-main">Floppy disk variants</span> Types of floppy disk formats

The floppy disk is a data storage and transfer medium that was ubiquitous from the mid-1970s well into the 2000s. Besides the 3½-inch and 5¼-inch formats used in IBM PC compatible systems, or the 8-inch format that preceded them, many proprietary floppy disk formats were developed, either using a different disk design or special layout and encoding methods for the data held on the disk.

<span class="mw-page-title-main">CAINE Linux</span>

CAINE Linux is an Italian Linux live distribution managed by Giovanni "Nanni" Bassetti. The project began in 2008 as an environment to foster digital forensics and incidence response (DFIR), with several related tools pre-installed.

References

  1. 1 2 Colloton, Eddy; Farbowitz, Jonathan; Rodríguez, Caroline Gil (2022-11-02). "Disk Imaging as a Backup Tool for Digital Objects". Conservation of Time-Based Media Art. pp. 204–222. doi:10.4324/9781003034865-17. ISBN   9781003034865.
  2. 1 2 3 4 5 Woods, Kam; Lee, Christopher A.; Garfinkel, Simson (2011-06-13). Extending digital repository architectures to support disk image preservation and access. Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries. New York, NY, USA: Association for Computing Machinery. pp. 57–66. doi:10.1145/1998076.1998088. ISBN   978-1-4503-0744-4. S2CID   2628912.
  3. 1 2 3 4 5 Pullakandam, R.; Lin, X.; Hibler, M.; Eide, E.; Ricci, R. (October 23–26, 2011). High-performance Disk Imaging With Deduplicated Storage (PDF). 23rd ACM Symposium on Operating Systems Principles. Cascais, Portugal.
  4. Kävrestad, Joakim (2017), Kävrestad, Joakim (ed.), "Vocabulary", Guide to Digital Forensics: A Concise and Practical Introduction, SpringerBriefs in Computer Science, Cham: Springer International Publishing, pp. 125–126, doi:10.1007/978-3-319-67450-6_12, ISBN   978-3-319-67450-6 , retrieved 2023-01-12
  5. Lee, Sang Su; Kyong, Un Sung; Hong, Do Won (2008). A high speed disk imaging system. 2008 IEEE International Symposium on Consumer Electronics. pp. 1–3. doi:10.1109/ISCE.2008.4559553. S2CID   5932241.
  6. 1 2 Garfinkel, Simson L. (2009). Automating Disk Forensic Processing with SleuthKit, XML and Python. 2009 Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering. pp. 73–84. doi:10.1109/SADFE.2009.12. ISBN   978-0-7695-3792-4. S2CID   1624033.
  7. Kazim, Muhammad; Masood, Rahat; Shibli, Muhammad Awais (2013-11-26). Securing the virtual machine images in cloud computing. Proceedings of the 6th International Conference on Security of Information and Networks. New York, NY, USA: Association for Computing Machinery. pp. 425–428. doi:10.1145/2523514.2523576. ISBN   978-1-4503-2498-4. S2CID   2474546.
  8. 1 2 Blackham, N.; Higby, C.; Bailey, M. (June 2004). Re-Imaging Computers For Multipurpose Labs. 2004 American Society for Engineering Education Annual Conference. Salt Lake City, Utah. doi: 10.18260/1-2--14125 .
  9. 1 2 3 4 Day, Michael; Pennock, Maureen; May, Peter; Davies, Kevin; Whibley, Simon; Kimura, Akiko; Halvarsson, Edith (2016). "The preservation of disk-based content at the British Library: Lessons from the Flashback project". Alexandria: The Journal of National and International Library and Information Issues. 26 (3): 216–234. doi:10.1177/0955749016669775. ISSN   0955-7490. S2CID   63617004.
  10. Arunkumar, G.; Venkataraman., Neelanarayanan (2015-01-01). "A Novel Approach to Address Interoperability Concern in Cloud Computing". Procedia Computer Science. Big Data, Cloud and Computing Challenges. 50: 554–559. doi: 10.1016/j.procs.2015.04.083 . ISSN   1877-0509.
  11. Barrowclough, John Patrick; Asif, Rameez (2018-06-11). "Securing Cloud Hypervisors: A Survey of the Threats, Vulnerabilities, and Countermeasures". Security and Communication Networks. 2018: e1681908. doi: 10.1155/2018/1681908 . ISSN   1939-0114.
  12. 1 2 3 Colloton, E.; Farbowitz, J.; Fortunato, F.; Gil, C. (2019). "Towards Best Practices In Disk Imaging: A Cross-Institutional Approach". Electronic Media Review. 6.
  13. 1 2 Stewart, Dawid; Arvidsson, Alex (2022). Need for speed : A study of the speed of forensic disk imaging tools.
  14. "IBM Mainframe Operating Systems" (PDF). Archived from the original (PDF) on 2014-07-01. Retrieved 2014-06-17.
  15. McCune, Mike (2000). Integrating Linux and Windows. Prentice Hall Professional. ISBN   978-0-13-030670-8.
  16. Li, Hongwei; Yin, Changhong; Xu, Yaping; Guo, Qingjun (2010). Construction of the Practical Teaching System on Operating Systems Course. 2010 Second International Workshop on Education Technology and Computer Science. Vol. 1. pp. 405–408. doi:10.1109/ETCS.2010.184. ISBN   978-1-4244-6388-6. S2CID   15706012.
  17. "Phantom Burner Overview". Phantombility, Inc. Archived from the original on 19 August 2011. Retrieved 19 July 2011.
  18. "Virtual CD - The original for your PC". Virtual CD website. H+H Software GmbH. Archived from the original on 24 September 2011. Retrieved 19 July 2011.
  19. "Virtual CD/DVD-Writer Device". SourceForge . Geeknet, Inc. Archived from the original on 17 February 2011. Retrieved 19 July 2011.
  20. "pcguide.com - Access Time". Archived from the original on 10 January 2019.
  21. Bowling, Jeramiah (2011-01-01). "Clonezilla: build, clone, repeat". Linux Journal. 2011 (201): 6:6. ISSN   1075-3583.
  22. 1 2 3 4 Shiau, Steven J. H.; Huang, Yu-Chiang; Tsai, Yu-Chin; Sun, Chen-Kai; Yen, Ching-Hsuan; Huang, Chi-Yo (2021). "A BitTorrent Mechanism-Based Solution for Massive System Deployment". IEEE Access. 9: 21043–21058. Bibcode:2021IEEEA...921043S. doi: 10.1109/ACCESS.2021.3052525 . ISSN   2169-3536. S2CID   231851821.
  23. 1 2 Shiau, Steven J. H.; Sun, Chen-Kai; Tsai, Yu-Chin; Juang, Jer-Nan; Huang, Chi-Yo (2018). "The Design and Implementation of a Novel Open Source Massive Deployment System". Applied Sciences. 8 (6): 965. doi: 10.3390/app8060965 . ISSN   2076-3417.
  24. "Fast, Scalable Disk Imaging with Frisbee". www.cs.utah.edu. Retrieved 2023-01-12.
  25. 1 2 Durno, John; Trofimchuk, Jerry (2015-01-21). "Digital forensics on a shoestring: a case study from the University of Victoria". The Code4Lib Journal (27). ISSN   1940-5758.
  26. "Accessing data in ISO and VHD files". Building Windows 8 (TechNet Blogs). Microsoft. 30 August 2011. Archived from the original on 19 April 2012. Retrieved 27 April 2012.
  27. "Mount-DiskImage". Microsoft.