Set redundancy compression

Last updated

In computer science and information theory, set redundancy compression are methods of data compression that exploits redundancy between individual data groups of a set, usually a set of similar images. It is wide used on medical and satellital images. [1] [2] [3] [4] The main methods are min-max differential, min-max predictive and centroid method.

Contents

Methods

Min-max differential

In the min-max differential (or MMD), for each position (pixel) selects the highest or the lowest. And then in each image is stored the difference of each of their positions with respect to the value previously selected.

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

<span class="mw-page-title-main">JPEG</span> Lossy compression method for reducing the size of digital images

JPEG is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality. Since its introduction in 1992, JPEG has been the most widely used image compression standard in the world, and the most widely used digital image format, with several billion JPEG images produced every day as of 2015.

<span class="mw-page-title-main">Lossy compression</span> Data compression approach that reduces data size while discarding or changing some of it

In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat on this page show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than using lossless techniques.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

<span class="mw-page-title-main">PNG</span> Family of lossless compression file formats for image files

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF) — unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".

<span class="mw-page-title-main">Image compression</span> Reduction of image size to save storage and transmission costs

Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data.

In information theory, an entropy coding is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method must have expected code length greater or equal to the entropy of the source.

Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms.

Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless on its own but is used to enable better quantization, which then results in a lower quality copy of the original input.

OpenEXR is a high-dynamic range, multi-channel raster file format, released as an open standard along with a set of software tools created by Industrial Light & Magic (ILM), under a free software license similar to the BSD license.

ICER is a wavelet-based image compression file format used by the NASA Mars rovers. ICER has both lossy and lossless compression modes.

Adaptive coding refers to variants of entropy encoding methods of lossless data compression. They are particularly suited to streaming data, as they adapt to localized changes in the characteristics of the data, and don't require a first pass over the data to calculate a probability model. The cost paid for these advantages is that the encoder and decoder must be more complex to keep their states synchronized, and more computational power is needed to keep adapting the encoder/decoder state.

A dictionary coder, also sometimes known as a substitution coder, is a class of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure.

In data compression and psychoacoustics, transparency is the result of lossy data compression accurate enough that the compressed result is perceptually indistinguishable from the uncompressed input, i.e. perceptually lossless.

An image file format is a file format for a digital image. There are many formats that can be used, such as JPEG, PNG, and GIF. Most formats up until 2022 were for storing 2D images, not 3D ones. The data stored in an image file format may be compressed or uncompressed. If the data is compressed, it may be done so using lossy compression or lossless compression. For graphic design applications, vector formats are often used. Some image file formats support transparency.

JBIG2 is an image compression standard for bi-level images, developed by the Joint Bi-level Image Experts Group. It is suitable for both lossless and lossy compression. According to a press release from the Group, in its lossless mode JBIG2 typically generates files 3–5 times smaller than Fax Group 4 and 2–4 times smaller than JBIG, the previous bi-level compression standard released by the Group. JBIG2 was published in 2000 as the international standard ITU T.88, and in 2001 as ISO/IEC 14492.

Lossless JPEG is a 1993 addition to JPEG standard by the Joint Photographic Experts Group to enable lossless compression. However, the term may also be used to refer to all lossless compression schemes developed by the group, including JPEG 2000 and JPEG-LS.

<span class="mw-page-title-main">Intra-frame coding</span>

Intra-frame coding is a data compression technique used within a video frame, enabling smaller file sizes and lower bitrates, with little or no loss in quality. Since neighboring pixels within an image are often very similar, rather than storing each pixel independently, the frame image is divided into blocks and the typically minor difference between each pixel can be encoded using fewer bits.

Multiscale Electrophysiology Format (MEF) was developed to handle the large amounts of data produced by large-scale electrophysiology in human and animal subjects. MEF can store any time series data up to 24 bits in length, and employs lossless range encoded difference compression. Subject identifying information in the file header can be encrypted using 128-bit AES encryption in order to comply with HIPAA requirements for patient privacy when transmitting data across an open network.

OpenCTM is a 3D geometry technology for storing triangle-based meshes in a compact format.

References

  1. Karadimitriou, Kosmas (August 1996), Set redundancy, the enhanced compression model, andmethods for compressing sets of similar images, CiteSeerX   10.1.1.35.7146 , This statistical correlation among similar images is a result of inter-image redundancy. In this study, the term "set redundancy" is introduced to describe this type of redundant information, and is defined as follows: Definition: Set redundancy is the inter-image redundancy that exists in a set of similar images, and refers to the common information found in more than one image in the set. Set redundancy can be used to improve compression. A limit to compression is imposed by the image entropy. In the next section it is shown how set redundancy can be used to decrease the average image entropy in a set of similar images. Ph.D. thesis, Department of Computer Science, Louisiana State University, Baton Rouge, La, USA
  2. Ait-Aoudia, Samy; Gabis, Abdelhalim (2005-02-27), "A Comparison of Set Redundancy Compression Techniques" (PDF), EURASIP Journal on Applied Signal Processing, 2006: 092734, Bibcode:2006EJASP2006..234A, doi: 10.1155/ASP/2006/92734 , retrieved 2012-09-28, Medical imaging applications produce a huge amount of similar images. Storing such amount of data needs gigantic disk space. Thus a compression technique is necessary to reduce space storage. In addition, medical images must be stored without any loss of information since the fidelity of images is critical in diagnosis. This requires lossless compression techniques. Lossless compression is an error-free compression. The decompressed image is the same as the original image. Classical image compression techniques (see [1–5]) concentrate on how to reduce the redundancies presented in an individual image. These compression techniques use the same model of compression as shown in Figure 1. Thismodel ignores an additional type of redundancy that exists in sets of similar images, the "set redundancy." The term "set redundancy" was introduced for the first time by Karadimitriou [6] and defined as follows: "Set redundancy is the interimage redundancy that exists in a set of similar images, and refers to the common information found in more than one image in the set.
  3. Ait-Aoudia, Samy; Gabis, Abdelhalim; Naimi, Amina, Compressing Sets of Similar Images (PDF), Applications using these types of data, produce a large amount of similar images. Thus a compression technique is useful to reduce transmission time and space storage. Lossless compression methods are necessary in such critical applications. Set Redundancy Compression (SRC) methods exploit the interimage redundancy and achieve better results than individual image compression techniques when applied to sets of similar images.
  4. Karadimitriou, Kosmas; Tyler, John M. (1998), "The Centroid method for compressing sets of similar images", Pattern Recognition Letters, 19 (7): 585–593, Bibcode:1998PaReL..19..585K, CiteSeerX   10.1.1.39.3248 , doi:10.1016/S0167-8655(98)00033-6, Karadimitriou (1996) proposed the Enhanced Compression Model as a more appropriate model for compressing sets of similar images. […] Methods that achieve set redundancy reduction are referred to as SRC (Set Redundancy Compression) methods. Two SRC methods are the Min-Max Differential method (Karadimitriou and Tyler, 1996) and the Min-Max Predictive method (Karadimitriou and Tyler, 1997).[…] One of the best application areas for SRC methods is medical imaging. Medical image databases usually store similar images; therefore, they contain large amounts of set redundancy.