Metadata removal tool

Last updated

Metadata removal tool or metadata scrubber is a type of privacy software built to protect the privacy of its users by removing potentially privacy-compromising metadata from files before they are shared with others, e.g., by sending them as e-mail attachments or by posting them on the Web. [1] [2]

Contents

Overview

Metadata can be found in many types of files such as documents, spreadsheets, presentations, images, and audio files. They can include information such as details on the file authors, file creation and modification dates, geographical location, document revision history, thumbnail images, and comments. [3] Metadata may be added to files by users, but some metadata is often automatically added to files by authoring applications or by devices used to produce the files, without user intervention.

Since metadata is sometimes not clearly visible in authoring applications (depending on the application and its settings), there is a risk that the user will be unaware of its existence or will forget about it and, if the file is shared, private or confidential information will inadvertently be exposed. The purpose of metadata removal tools is to minimize the risk of such data leakage. [4]

The metadata removal tools that exist today can be divided into four groups:

To securely delete the metadata of a PDF file, it is important to linearize the PDF file afterwards, otherwise changes are reversible and the metadata can be recovered. [5] [6]

Metadata removal tools are also commonly used to reduce the overall sizes of files, particularly image files posted on the Web. For example, a small image on a website, which may contain metadata including a thumbnail image, can easily contain as much metadata as image data, thus removal of that metadata can halve the file size.

See also

Related Research Articles

<span class="mw-page-title-main">PDF</span> Portable Document Format, a digital file format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020.

The JPEG File Interchange Format (JFIF) is an image file format standard published as ITU-T Recommendation T.871 and ISO/IEC 10918-5. It defines supplementary specifications for the container format that contains the image data encoded with the JPEG algorithm. The base specifications for a JPEG container format are defined in Annex B of the JPEG standard, known as JPEG Interchange Format (JIF). JFIF builds over JIF to solve some of JIF's limitations, including unnecessary complexity, component sample registration, resolution, aspect ratio, and color space. Because JFIF is not the original JPG standard, one might expect another MIME type. However, it is still registered as "image/jpeg".

CodedColor is a bitmap graphics editor and image organizer for computers running the Microsoft Windows operating system, and is published by 1STEIN.

<span class="mw-page-title-main">Exif</span> Metadata standard in digital images

Exchangeable image file format is a standard that specifies formats for images, sound, and ancillary tags used by digital cameras, scanners and other systems handling image and sound files recorded by digital cameras. The specification uses the following existing encoding formats with the addition of specific metadata tags: JPEG lossy coding for compressed image files, TIFF Rev. 6.0 for uncompressed image files, and RIFF WAV for audio files. It does not support JPEG 2000 or GIF encoded images.

<span class="mw-page-title-main">File Explorer</span> File manager application that is included with releases of the Microsoft Windows operating system

File Explorer, previously known as Windows Explorer, is a file manager application and default desktop environment that is included with releases of the Microsoft Windows operating system from Windows 95 onwards. It provides a graphical user interface for accessing the file systems, as well as user interface elements such as the taskbar and desktop.

<span class="mw-page-title-main">Geotagged photograph</span> Photograph associated with a geographical location

A geotagged photograph is a photograph which is associated with a geographic position by geotagging. Usually this is done by assigning at least a latitude and longitude to the image, and optionally elevation, compass bearing and other fields may also be included.

<span class="mw-page-title-main">Geotagging</span> Act of associating geographic coordinates to digital media

Geotagging, or GeoTagging, is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RSS feeds and is a form of geospatial metadata. This data usually consists of latitude and longitude coordinates, though they can also include altitude, bearing, distance, accuracy data, and place names, and perhaps a time stamp.

The Extensible Metadata Platform (XMP) is an ISO standard, originally created by Adobe Systems Inc., for the creation, processing and interchange of standardized and custom metadata for digital documents and data sets.

A camera raw image file contains unprocessed or minimally processed data from the image sensor of either a digital camera, a motion picture film scanner, or other image scanner. Raw files are so named because they are not yet processed, and contain large amounts of potentially redundant data. Normally, the image is processed by a raw converter, in a wide-gamut internal color space where precise adjustments can be made before conversion to a viewable file format such as JPEG or PNG for storage, printing, or further manipulation. There are dozens of raw formats in use by different manufacturers of digital image capture equipment.

Compared with previous versions of Microsoft Windows, features new to Windows Vista are very numerous, covering most aspects of the operating system, including additional management features, new aspects of security and safety, new I/O technologies, new networking features, and new technical features. Windows Vista also removed some others.

<span class="mw-page-title-main">GNOME Commander</span> Twin-panel file manager for the GNOME desktop

GNOME Commander is a 'two panel' graphical file manager for GNOME. It is built using the GTK+ toolkit and GVfs.

<span class="mw-page-title-main">Image organizer</span> Software for organising digital images

An image organizer or image management application is application software for organising digital images. It is a kind of desktop organizer software application.

Email archiving is the act of preserving and making searchable all email to/from an individual. Email archiving solutions capture email content either directly from the email application itself or during transport. The messages are typically then stored on magnetic disk storage and indexed to simplify future searches. In addition to simply accumulating email messages, these applications index and provide quick, searchable access to archived messages independent of the users of the system using a couple of different technical methods of implementation. The reasons a company may opt to implement an email archiving solution include protection of mission critical data, to meet retention and supervision requirements of applicable regulations, and for e-discovery purposes. It is predicted that the email archiving market will grow from nearly $2.1 billion in 2009 to over $5.1 billion in 2013.

<span class="mw-page-title-main">FastPictureViewer</span>

FastPictureViewer is a freemium image viewer for Windows XP and later. Its aim is to facilitate quick review, rating and annotation of large quantities of digital images in the early steps of the digital workflow, with an emphasis on simplicity and speed. As an app with a freemium license, a basic version is available cost-free for personal, non-profit or educational uses, while a commercial license is required for the professional version with additional features. The basic version starts as a full version trial.

<span class="mw-page-title-main">ExifTool</span> Software

ExifTool is a free and open-source software program for reading, writing, and manipulating image, audio, video, and PDF metadata. It is platform independent, available as both a Perl library (Image::ExifTool) and command-line application. ExifTool is commonly incorporated into different types of digital workflows and supports many types of metadata including Exif, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the manufacturer-specific metadata formats of many digital cameras.

Sidecar files, also known as buddy files or connected files, are computer files that store data which is not supported by the format of a source file.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

<span class="mw-page-title-main">RawTherapee</span> Raw photo processing software

RawTherapee is application software for processing photographs in raw image formats, as created by many digital cameras. It comprises a subset of image editing operations specifically aimed at non-destructive post-production of raw photos and is primarily focused on improving a photographer's workflow by facilitating the handling of large numbers of images. It is notable for the advanced control it gives the user over the demosaicing and developing process. It is cross-platform, with versions for Microsoft Windows, macOS and Linux.

<span class="mw-page-title-main">Gmail interface</span> Overview of the interface of Googles email service Gmail

The Gmail interface makes Gmail unique amongst webmail systems for several reasons. Most evident to users are its search-oriented features and means of managing e-mail in a "conversation view" that is similar to an Internet forum.

References

  1. Hassan, Nihad; Hijazi, Rami (2 July 2017). Digital Privacy and Security Using Windows: A Practical Guide. Apress. ISBN   978-1-4842-2799-2 . Retrieved 12 December 2022.
  2. "The Many Faces of Fraud" (PDF). LAWPRO Magazine. Lawyers’ Professional Indemnity Company (June 2004). June 2004. Archived from the original (PDF) on 12 December 2022. Retrieved 12 December 2022.
  3. "A Guardian guide to your metadata – Interactive Graphic". elearningexamples.com. Elearning Examples. Archived from the original on 5 March 2014.
  4. O'Reilly, Dennis. "Remove metadata from Office files, PDFs, and images". CNET. Archived from the original on 1 August 2022. Retrieved 12 December 2022.
  5. "PDF Tags". All metadata edits are reversible. While this would normally be considered an advantage, it is a potential security problem because old information is never actually deleted from the file. (However, after running ExifTool the old information may be removed permanently using the "qpdf" utility with this command: "qpdf --linearize in.pdf out.pdf".
  6. "exiftool Application Documentation".