Redaction

Last updated

Redaction or sanitization is the process of removing sensitive information from a document so that it may be distributed to a broader audience. It is intended to allow the selective disclosure of information. Typically, the result is a document that is suitable for publication or for dissemination to others rather than the intended audience of the original document.

Contents

When the intent is secrecy protection, such as in dealing with classified information, redaction attempts to reduce the document's classification level, possibly yielding an unclassified document. When the intent is privacy protection, it is often called data anonymization. Originally, the term sanitization was applied to printed documents; it has since been extended to apply to computer files and the problem of data remanence.

Government secrecy

In the context of government documents, redaction (also called sanitization) generally refers more specifically to the process of removing sensitive or classified information from a document prior to its publication, during declassification.

Secure document redaction techniques

A 1953 US government document that has been redacted prior to release. Mkultra-lsd-doc.jpg
A 1953 US government document that has been redacted prior to release.
A heavily redacted page from a 2004 lawsuit filed by the ACLU -- American Civil Liberties Union v. Ashcroft Aclu-v-ashcroft-redacted.jpg
A heavily redacted page from a 2004 lawsuit filed by the ACLU — American Civil Liberties Union v. Ashcroft

Redacting confidential material from a paper document before its public release involves overwriting portions of text with a wide black pen, followed by photocopying the result—the obscured text may be recoverable from the original. Alternatively opaque "cover up tape" or "redaction tape", opaque, removable adhesive tape in various widths, may be applied before photocopying.

This is a simple process with only minor security risks. For example, if the black pen or tape is not wide enough, careful examination of the resulting photocopy may still reveal partial information about the text, such as the difference between short and tall letters. The exact length of the removed text also remains recognizable, which may help in guessing plausible wordings for shorter redacted sections. Where computer-generated proportional fonts were used, even more information can leak out of the redacted section in the form of the exact position of nearby visible characters.

The UK National Archives published a document, Redaction Toolkit, Guidelines for the Editing of Exempt Information from Documents Prior to Release, [1] "to provide guidance on the editing of exempt material from information held by public bodies."

Secure redacting is more complicated with computer files. Word processing formats may save a revision history of the edited text that still contains the redacted text. In some file formats, unused portions of memory are saved that may still contain fragments of previous versions of the text. Where text is redacted, in Portable Document (PDF) or word processor formats, by overlaying graphical elements (usually black rectangles) over text, the original text remains in the file and can be uncovered by simply deleting the overlaying graphics. Effective redaction of electronic documents requires the removal of all relevant text and image data from the document file. This process, internally complex, can be carried out very easily by a user with the aid of "redaction" functions in software for editing PDF or other files.

Redaction may administratively require marking of the redacted area with the reason that the content is being restricted. US government documents released under the Freedom of Information Act are marked with exemption codes that denote the reason why the content has been withheld.

The US National Security Agency (NSA) published a guidance document which provides instructions for redacting PDF files. [2]

Printed matter

A page of a classified document that has been sanitized for public release. This is page 13 of a U.S. National Security Agency report Archived 2004-03-13 at the Wayback Machine
on the USS Liberty incident, which was declassified and released to the public in July 2003. Classified information has been blocked out so that only the unclassified information is visible. Notations with leader lines at top and bottom cite statutory authority for not declassifying certain sections. Click on the image to enlarge. NSALibertyReport.p13.jpg
A page of a classified document that has been sanitized for public release. This is page 13 of a U.S. National Security Agency report Archived 2004-03-13 at the Wayback Machine on the USS Liberty incident, which was declassified and released to the public in July 2003. Classified information has been blocked out so that only the unclassified information is visible. Notations with leader lines at top and bottom cite statutory authority for not declassifying certain sections. Click on the image to enlarge.

Printed documents which contain classified or sensitive information frequently contain a great deal of information which is less sensitive. There may be a need to release the less sensitive portions to uncleared personnel. The printed document will consequently be sanitized to obscure or remove the sensitive information. Maps have also been redacted for the same reason, with highly sensitive areas covered with a slip of white paper.

In some cases, sanitizing a classified document removes enough information to reduce the classification from a higher level to a lower one. For example, raw intelligence reports may contain highly classified information such as the identities of spies, that is removed before the reports are distributed outside the intelligence agency: the initial report may be classified as Top Secret while the sanitized report may be classified as Secret.

In other cases, such as the NSA report on the USS Liberty incident (right), the report may be sanitized to remove all sensitive data, so that the report may be released to the general public.

As is seen in the USS Liberty report, paper documents are usually sanitized by covering the classified and sensitive portions before photocopying the document.

Computer media and files

Computer (electronic or digital) documents are more difficult to sanitize. In many cases, when information in an information system is modified or erased, some or all of the data remains in storage. This may be an accident of design, where the underlying storage mechanism (disk, RAM, etc.) still allows information to be read, despite its nominal erasure. The general term for this problem is data remanence. In some contexts (notably the US NSA, DoD, and related organizations), "sanitization" typically refers to countering the data remanence problem.

However, the retention may be a deliberate feature, in the form of an undo buffer, revision history, "trash can", backups, or the like. For example, word processing programs like Microsoft Word will sometimes be used to edit out the sensitive information. Unfortunately, these products do not always show the user all of the information stored in a file, so it is possible that a file may still contain sensitive information. In other cases, inexperienced users use ineffective methods which fail to sanitize the document. Metadata removal tools are designed to effectively sanitize documents by removing potentially sensitive information.

In May 2005 the US military published a report on the death of Nicola Calipari, an Italian secret agent, at a US military checkpoint in Iraq. The published version of the report was in PDF format, and had been incorrectly redacted by covering sensitive parts with opaque blocks in software. Shortly thereafter, readers discovered that the blocked-out portions could be retrieved by copying and pasting them into a word processor. [3]

On May 24, 2006, lawyers for the communications service provider AT&T filed a legal brief [4] regarding their cooperation with domestic wiretapping by the NSA. Text on pages 12 to 14 of the PDF document were incorrectly redacted, and the covered text could be retrieved. [5]

At the end of 2005, the NSA released a report giving recommendations on how to safely sanitize a Microsoft Word document. [6]

Issues such as these make it difficult to reliably implement multilevel security systems, in which computer users of differing security clearances may share documents. The Challenge of Multilevel Security gives an example of a sanitization failure caused by unexpected behavior in Microsoft Word's change tracking feature. [7]

The two most common mistakes for incorrectly redacting a document are adding an image layer over the sensitive text to obscure it, without removing the underlying text, and setting the background color to match the text color. In both of these cases, the redacted material still exists in the document underneath the visible appearance and is subject to searching and even simple copy and paste extraction. Proper redaction tools and procedures must be used to permanently remove the sensitive information. This is often accomplished in a multi-user workflow where one group of people mark sections of the document as proposals to be redacted, another group verifies the redaction proposals are correct, and a final group operates the redaction tool to permanently remove the proposed items.

See also

Related Research Articles

The U.S. National Security Agency (NSA) used to rank cryptographic products or algorithms by a certification called product types. Product types were defined in the National Information Assurance Glossary which used to define Type 1, 2, 3, and 4 products. The definitions of numeric type products have been removed from the government lexicon and are no longer used in government procurement efforts.

In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted.

<span class="mw-page-title-main">Tempest (codename)</span> Espionage using electromagnetic leakage

TEMPEST is a U.S. National Security Agency specification and a NATO certification referring to spying on information systems through leaking emanations, including unintentional radio or electrical signals, sounds, and vibrations. TEMPEST covers both methods to spy upon others and how to shield equipment against such spying. The protection efforts are also known as emission security (EMSEC), which is a subset of communications security (COMSEC).

<span class="mw-page-title-main">Sensitive compartmented information</span> Information relative to U.S. National Security

Sensitive compartmented information (SCI) is a type of United States classified information concerning or derived from sensitive intelligence sources, methods, or analytical processes. All SCI must be handled within formal access control systems established by the Director of National Intelligence.

<i>The Puzzle Palace</i>

The Puzzle Palace is a book written by James Bamford and published in 1982. It is the first major, popular work devoted entirely to the history and workings of the National Security Agency (NSA), a United States intelligence organization. The title refers to a nickname for the NSA, which is headquartered in Fort Meade, Maryland. In addition to describing the role of the NSA and explaining how it was organized, the book exposed details of a massive eavesdropping operation called Operation Shamrock. According to security expert Bruce Schneier, the book was popular within the NSA itself, as "the agency's secrecy prevents its employees from knowing much about their own history".

Multilevel security or multiple levels of security (MLS) is the application of a computer system to process information with incompatible classifications, permit access by users with different security clearances and needs-to-know, and prevent users from obtaining access to information for which they lack authorization. There are two contexts for the use of multilevel security. One is to refer to a system that is adequate to protect itself from subversion and has robust mechanisms to separate information domains, that is, trustworthy. Another context is to refer to an application of a computer that will require the computer to be strong enough to protect itself from subversion and possess adequate mechanisms to separate information domains, that is, a system we must trust. This distinction is important because systems that need to be trusted are not necessarily trustworthy.

Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written to the media, or through physical properties of the storage media that allow previously written data to be recovered. Data remanence may make inadvertent disclosure of sensitive information possible should the storage media be released into an uncontrolled environment.

In literature, redaction is a form of editing in which multiple sources of texts are combined and altered slightly to make a single document. Often this is a method of collecting a series of writings on a similar theme and creating a definitive and coherent work.

The United States government classification system is established under Executive Order 13526, the latest in a long series of executive orders on the topic beginning in 1951. Issued by President Barack Obama in 2009, Executive Order 13526 replaced earlier executive orders on the topic and modified the regulations codified to 32 C.F.R. 2001. It lays out the system of classification, declassification, and handling of national security information generated by the U.S. government and its employees and contractors, as well as information received from other governments.

Multiple single-level or multi-security level (MSL) is a means to separate different levels of data by using separate computers or virtual machines for each level. It aims to give some of the benefits of multilevel security without needing special changes to the OS or applications, but at the cost of needing extra hardware.

Redaction is the removal of sensitive information from a document to allow its distribution.

Electronic discovery refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format. Electronic discovery is subject to rules of civil procedure and agreed-upon processes, often involving review for privilege and relevance before data are turned over to the requesting party.

Anti–computer forensics or counter-forensics are techniques used to obstruct forensic analysis.

Investigative Data Warehouse (IDW) is a searchable database operated by the FBI. It was created in 2004. Much of the nature and scope of the database is classified. The database is a centralization of multiple federal and state databases, including criminal records from various law enforcement agencies, the U.S. Department of the Treasury's Financial Crimes Enforcement Network (FinCEN), and public records databases. According to Michael Morehart's testimony before the House Committee on Financial Services in 2006, the "IDW is a centralized, web-enabled, closed system repository for intelligence and investigative data. This system, maintained by the FBI, allows appropriately trained and authorized personnel throughout the country to query for information of relevance to investigative and intelligence matters."

Record sealing is the process of making public records inaccessible to the public.

Data erasure is a software-based method of data sanitization that aims to completely destroy all electronic data residing on a hard disk drive or other digital media by overwriting data onto all sectors of the device in an irreversible process. By overwriting the data on the storage device, the data is rendered irrecoverable.

Metadata removal tool or metadata scrubber is a type of privacy software built to protect the privacy of its users by removing potentially privacy-compromising metadata from files before they are shared with others, e.g., by sending them as e-mail attachments or by posting them on the Web.

<span class="mw-page-title-main">Thomas A. Drake</span> Former NSA senior executive, military veteran, and whistleblower (born 1957)

Thomas Andrews Drake is a former senior executive of the National Security Agency (NSA), a decorated United States Air Force and United States Navy veteran, and a whistleblower. In 2010, the government alleged that Drake mishandled documents, one of the few such Espionage Act cases in U.S. history. Drake's defenders claim that he was instead being persecuted for challenging the Trailblazer Project. He is the 2011 recipient of the Ridenhour Prize for Truth-Telling and co-recipient of the Sam Adams Associates for Integrity in Intelligence (SAAII) award.

Computer security compromised by hardware failure is a branch of computer security applied to hardware. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to remain accessible and productive to its intended users. Such secret information could be retrieved by different ways. This article focus on the retrieval of data thanks to misused hardware or hardware failure. Hardware could be misused or exploited to get secret data. This article collects main types of attack that can lead to data theft.

Data sanitization involves the secure and permanent erasure of sensitive data from datasets and media to guarantee that no residual data can be recovered even through extensive forensic analysis. Data sanitization has a wide range of applications but is mainly used for clearing out end-of-life electronic devices or for the sharing and use of large datasets that contain sensitive information. The main strategies for erasing personal data from devices are physical destruction, cryptographic erasure, and data erasure. While the term data sanitization may lead some to believe that it only includes data on electronic media, the term also broadly covers physical media, such as paper copies. These data types are termed soft for electronic files and hard for physical media paper copies. Data sanitization methods are also applied for the cleaning of sensitive data, such as through heuristic-based methods, machine-learning based methods, and k-source anonymity.

References

  1. 'Redaction Toolkit, Guidelines for the Editing of Exempt Information from Documents Prior to Release
  2. "Redaction of PDF Files Using Adobe Acrobat Professional X" (PDF). Security Configuration Guide. National Security Agency Information Assurance Directorate.
  3. BBC Report (May 2, 2005). "Readers 'declassify' US document". BBC.
  4. "Archived copy" (PDF). www.politechbot.com. Archived from the original (PDF) on 2 July 2006. Retrieved 14 January 2022.{{cite web}}: CS1 maint: archived copy as title (link)
  5. Declan McCullagh (May 26, 2006). "AT&T leaks sensitive info in NSA suit". CNet News. Archived from the original on July 17, 2012.
  6. NSA SNAC (December 13, 2005). "Redacting with Confidence: How to Safely Publish Sanitized Reports Converted From Word to PDF" (PDF). Report# I333-015R-2005. Information Assurance Directorate, National Security Agency, via Federation of American Scientists . Retrieved 2006-05-29.{{cite journal}}: Cite journal requires |journal= (help)
  7. Rick Smith (2003). The Challenge of Multilevel Security (PDF). Black Hat Federal Conference. Archived from the original (PDF) on 2009-01-06.