File deletion

Last updated

File deletion is the removal of a file from a computer's file system.

Contents

All operating systems include commands for deleting files ( rm on Unix and Linux, [1] era in CP/M and DR-DOS, del /erase in MS-DOS/PC DOS, DR-DOS, Microsoft Windows etc.). File managers also provide a convenient way of deleting files. Files may be deleted one-by-one, or a whole blacklist directory tree may be deleted.

Purpose

Examples of reasons for deleting files are:

Accidental removal

A common problem with deleting files is the accidental removal of information that later proves to be important. A common method to prevent this is to back up files regularly. Erroneously deleted files may then be found in archives.

Another technique often used is not to delete files instantly, but to move them to a temporary directory whose contents can then be deleted at will. This is how the "recycle bin" or "trash can" works. Microsoft Windows and Apple's macOS, as well as some Linux distributions, all employ this strategy.

In MS-DOS, one can use the undelete command. In MS-DOS the "deleted" files are not really deleted, but only marked as deletedso they could be undeleted during some time, until the disk blocks they used are eventually taken up by other files. This is how data recovery programs work, by scanning for files that have been marked as deleted. As the space is freed up per byte, rather than per file, this can sometimes cause data to be recovered incompletely. Defragging a drive may prevent undeletion, as the blocks used by deleted file might be overwritten since they are marked as "empty".

Another precautionary measure is to mark important files as read-only. Many operating systems will warn the user trying to delete such files. Where file-system permissions exist, users who lack the necessary permissions are only able to delete their own files, preventing the erasure of other people's work or critical system files.

Sensitive data

The common problem with sensitive data is that deleted files are not really erased and so may be recovered by interested parties. Most file systems only remove the link to data. But even overwriting parts of the disk with something else or formatting it may not guarantee that the sensitive data is completely unrecoverable. Special software is available that overwrites data, and modern (post-2001) ATA drives include a secure erase command in firmware. However, high-security applications and high-security enterprises can sometimes require that a disk drive be physically destroyed to ensure data is not recoverable, as microscopic changes in head alignment and other effects can mean even such measures are not guaranteed. When the data is encrypted only the encryption key has to be unavailable. Crypto-shredding is the practice of 'deleting' data by (only) deleting or overwriting the encryption keys.

See also

Related Research Articles

<span class="mw-page-title-main">Disk partitioning</span> Creation of separate accessible storage areas on a secondary computer storage device

Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk after a partitioning scheme is chosen for the new disk before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk. System administrators use a program called a partition editor to create, resize, delete, and manipulate the partitions. Partitioning allows the use of different filesystems to be installed for different kinds of files. Separating user data from system data can prevent the system partition from becoming full and rendering the system unusable. Partitioning can also make backing up easier. A disadvantage is that it can be difficult to properly size partitions, resulting in having one partition with too much free space and another nearly totally allocated.

In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted.

Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.

Utility software is a program specifically designed to help manage and tune system or application software. It is used to support the computer infrastructure - in contrast to application software, which is aimed at directly performing tasks that benefit ordinary users. However, utilities often form part of the application systems. For example, a batch job may run user-written code to update a database and may then include a step that runs a utility to back up the database, or a job may run a utility to compress a disk before copying files.

Data security means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach.

Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written to the media, or through physical properties of the storage media that allow previously written data to be recovered. Data remanence may make inadvertent disclosure of sensitive information possible should the storage media be released into an uncontrolled environment.

Data loss is an error condition in information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss or restore lost data. Data loss can also occur if the physical medium containing the data is lost or stolen.

Disk encryption software is a computer security software that protects the confidentiality of data stored on computer media by using disk encryption.

Undeletion is a feature for restoring computer files which have been removed from a file system by file deletion. Deleted data can be recovered on many file systems, but not all file systems provide an undeletion feature. Recovering data without an undeletion facility is usually called data recovery, rather than undeletion. Undeletion can both help prevent users from accidentally losing data, or can pose a computer security risk, since users may not be aware that deleted files remain accessible.

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

Disk encryption is a technology which protects information by converting it into code that cannot be deciphered easily by unauthorized people or processes. Disk encryption uses disk encryption software or hardware to encrypt every bit of data that goes on a disk or disk volume. It is used to prevent unauthorized access to data storage.

Anti–computer forensics or counter-forensics are techniques used to obstruct forensic analysis.

<span class="mw-page-title-main">Trash (computing)</span> Temporary storage for deleted files

In computing, the trash, also known by other names such as dustbin, wastebasket, and others, is a graphical user interface desktop metaphor for temporary storage for files set aside by the user for deletion, but not yet permanently erased. The concept and name is part of Mac operating systems, a similar implementation is called the Recycle Bin in Microsoft Windows, and other operating systems use other names.

Hardware-based full disk encryption (FDE) is available from many hard disk drive (HDD/SSD) vendors, including: Hitachi, Integral Memory, iStorage Limited, Micron, Seagate Technology, Samsung, Toshiba, Viasat UK, Western Digital. The symmetric encryption key is maintained independently from the computer's CPU, thus allowing the complete data store to be encrypted and removing computer memory as a potential attack vector.

Data erasure is a software-based method of data sanitization that aims to completely destroy all electronic data residing on a hard disk drive or other digital media by overwriting data onto all sectors of the device in an irreversible process. By overwriting the data on the storage device, the data is rendered irrecoverable.

shred is a command on Unix-like operating systems that can be used to securely delete files and devices so that it is extremely difficult to recover them, even with specialized hardware and technology; assuming recovery is possible at all, which is not always the case. It is a part of GNU Core Utilities. Being based on the Gutmann method paper, it suffers from the same criticisms and possible shortcomings.

A trim command allows an operating system to inform a solid-state drive (SSD) which blocks of data are no longer considered to be "in use" and therefore can be erased internally.

Crypto-shredding is the practice of 'deleting' data by deliberately deleting or overwriting the encryption keys. This requires that the data have been encrypted. Data may be considered to exist in three states: data at rest, data in transit and data in use. General data security principles, such as in the CIA triad of confidentiality, integrity, and availability, require that all three states must be adequately protected.

Eraser is an open-source secure file erasure tool available for the Windows operating system. It supports both file and volume wiping.

Data sanitization involves the secure and permanent erasure of sensitive data from datasets and media to guarantee that no residual data can be recovered even through extensive forensic analysis. Data sanitization has a wide range of applications but is mainly used for clearing out end-of-life electronic devices or for the sharing and use of large datasets that contain sensitive information. The main strategies for erasing personal data from devices are physical destruction, cryptographic erasure, and data erasure. While the term data sanitization may lead some to believe that it only includes data on electronic media, the term also broadly covers physical media, such as paper copies. These data types are termed soft for electronic files and hard for physical media paper copies. Data sanitization methods are also applied for the cleaning of sensitive data, such as through heuristic-based methods, machine-learning based methods, and k-source anonymity.

References

  1. "rm(1) — Linux manual page". The man-pages project. August 2023. Retrieved February 3, 2024.