Block-matching and 3D filtering

Last updated June 08, 2023

Block-matching and 3D filtering (BM3D) is a 3-D block-matching algorithm used primarily for noise reduction in images.^[1] It is one of the expansions of the non-local means methodology.^[2] There are two cascades in BM3D: a hard-thresholding and a Wiener filter stage, both involving the following parts: grouping, collaborative filtering, and aggregation. This algorithm depends on an augmented representation in the transformation site.^[3]

Method

Grouping

Image fragments are grouped together based on similarity, but unlike standard k-means clustering and such cluster analysis methods, the image fragments are not necessarily disjoint. This block-matching algorithm is less computationally demanding and is useful later on in the aggregation step. Fragments do however have the same size. A fragment is grouped if its dissimilarity with a reference fragment falls below a specified threshold. This grouping technique is called block-matching, it is typically used to group similar groups across different frames of a digital video, BM3D on the other hand may group macroblocks within a single frame. All image fragments in a group are then stacked to form 3D cylinder-like shapes.

Collaborative filtering

Filtering is done on every fragments group. A $d+1$ ^{[ clarification needed ]} dimensional linear transform is applied, followed by a transform-domain shrinkage such as Wiener filtering, then the linear transform is inverted to reproduce all (filtered) fragments.

Aggregation

The image is transformed back into its two-dimensional form. All overlapping image fragments are weight-averaged to ensure that they are filtered for noise yet retain their distinct signal.

Extensions

Color images

RGB images can be processed much like grayscale ones. A luminance-chrominance transformation should be applied to the RGB image. The grouping is then completed on the luminance channel which contains most of the useful information and a higher SNR. This approach works because the noise in the chrominance channels is strongly correlated to that of the luminance channel, and it saves approximately one-third of the computing time because grouping takes up approximately half of the required computing time.

Deblurring

The BM3D algorithm has been extended (IDD-BM3D) to perform decoupled deblurring and denoising using the Nash equilibrium balance of the two objective functions.^[4]

Convolutional neural network

An approach that integrates a convolutional neural network has been proposed and shows better results (albeit with a slower runtime).^[5] MATLAB code has been released for research purpose.^[6]

Implementations

Reference implementation in MATLAB and Python released under an open-source proprietary license:^[7] BM3D
Well documented^[8] C-based implementation released under the GPLv3: bm3d
CUDA and C++ based implementation released under the GPLv3: bm3d-gpu

Related Research Articles

Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data.

A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.

Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics ; third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased.

Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, military automatic target recognition, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements.

Noise reduction is the process of removing noise from a signal. Noise reduction techniques exist for audio and images. Noise reduction algorithms may distort the signal to some degree. Noise rejection is the ability of a circuit to isolate an undesired signal component from the desired signal component, as with common-mode rejection ratio.

In numerical analysis and functional analysis, a discrete wavelet transform (DWT) is any wavelet transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it captures both frequency and location information.

In image processing, a Gabor filter, named after Dennis Gabor, who first proposed it as a 1D filter. The Gabor filter was first generalized to 2D by Gösta Granlund, by adding a reference direction. The Gabor filter is a linear filter used for texture analysis, which essentially means that it analyzes whether there is any specific frequency content in the image in specific directions in a localized region around the point or region of analysis. Frequency and orientation representations of Gabor filters are claimed by many contemporary vision scientists to be similar to those of the human visual system. They have been found to be particularly appropriate for texture representation and discrimination. In the spatial domain, a 2D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave.

<span class="mw-page-title-main">Dot crawl</span>

Dot crawl is a visual defect of color analog video standards when signals are transmitted as composite video, as in terrestrial broadcast television. It consists of moving checkerboard patterns which appear along horizontal color transitions. It results from intermodulation or crosstalk between chrominance and luminance components of the signal, which are imperfectly multiplexed in the frequency domain. The term is more associated with the NTSC analog color TV system, but is also present in PAL. Although the interference patterns are slightly different depending on the system used, they have the same cause and the same general principles apply.

A demosaicing algorithm is a digital image process used to reconstruct a full color image from the incomplete color samples output from an image sensor overlaid with a color filter array (CFA). It is also known as CFA interpolation or color reconstruction.

A Block Matching Algorithm is a way of locating matching macroblocks in a sequence of digital video frames for the purposes of motion estimation. The underlying supposition behind motion estimation is that the patterns corresponding to objects and background in a frame of video sequence move within the frame to form corresponding objects on the subsequent frame. This can be used to discover temporal redundancy in the video sequence, increasing the effectiveness of inter-frame video compression by defining the contents of a macroblock by reference to the contents of a known macroblock which is minimally different.

The structural similarityindex measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. SSIM is used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference.

In statistics, the Anscombe transform, named after Francis Anscombe, is a variance-stabilizing transformation that transforms a random variable with a Poisson distribution into one with an approximately standard Gaussian distribution. The Anscombe transform is widely used in photon-limited imaging where images naturally follow the Poisson law. The Anscombe transform is usually used to pre-process the data in order to make the standard deviation approximately constant. Then denoising algorithms designed for the framework of additive white Gaussian noise are used; the final estimate is then obtained by applying an inverse Anscombe transformation to the denoised data.

A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images. It replaces the intensity of each pixel with a weighted average of intensity values from nearby pixels. This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on Euclidean distance of pixels, but also on the radiometric differences. This preserves sharp edges.

Contourlets form a multiresolution directional tight frame designed to efficiently approximate images made of smooth regions separated by smooth boundaries. The contourlet transform has a fast implementation based on a Laplacian pyramid decomposition followed by directional filterbanks applied on each bandpass subband.

Edge-preserving smoothing or edge-preserving filtering is an image processing technique that smooths away noise or textures while retaining sharp edges. Examples are the median, bilateral, guided, anisotropic diffusion, and Kuwahara filters.

Non-local means is an algorithm in image processing for image denoising. Unlike "local mean" filters, which take the mean value of a group of pixels surrounding a target pixel to smooth the image, non-local means filtering takes a mean of all pixels in the image, weighted by how similar these pixels are to the target pixel. This results in much greater post-filtering clarity, and less loss of detail in the image compared with local mean algorithms.

In image Noise reduction, local pixel grouping is the algorithm to remove noise from images using principal component analysis (PCA).

HCL (Hue-Chroma-Luminance) or LCh refers to any of the many cylindrical color space models that are designed to accord with human perception of color with the three parameters. Lch has been adopted by information visualization practitioners to present data without the bias implicit in using varying saturation. They are, in general, designed to have characteristics of both cylindrical translations of the RGB color space, such as HSL and HSV, and the L*a*b* color space. Some conflicting definitions of the terms are:

Shrinkage fields is a random field-based machine learning technique that aims to perform high quality image restoration using low computational overhead.

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

References

↑ Dabov, Kostadin; Foi, Alessandro; Katkovnik, Vladimir; Egiazarian, Karen (16 July 2007). "Image denoising by sparse 3D transform-domain collaborative filtering". IEEE Transactions on Image Processing. 16 (8): 2080–2095. Bibcode:2007ITIP...16.2080D. CiteSeerX 10.1.1.219.5398 . doi:10.1109/TIP.2007.901238. PMID 17688213. S2CID 1475121.
↑ Manjón, José V.; Carbonell-Caballero, José; Lull, Juan J.; García-Martí, Gracián; Martí-Bonmatí, Luís; Robles, Montserrat (2008-08-01). "MRI denoising using Non-Local Means". Medical Image Analysis. 12 (4): 514–523. doi:10.1016/j.media.2008.02.004. ISSN 1361-8415. PMID 18381247.
↑ Maggioni, M.; Katkovnik, V.; Egiazarian, K.; Foi, A. (January 2013). "Nonlocal Transform-Domain Filter for Volumetric Data Denoising and Reconstruction". IEEE Transactions on Image Processing. 22 (1): 119–133. doi:10.1109/TIP.2012.2210725. ISSN 1057-7149. PMID 22868570. S2CID 1295558.
↑ Danielyan, Aram; Katkovnik, Vladimir; Egiazarian, Karen (30 June 2011). "BM3D Frames and Variational Image Deblurring". IEEE Transactions on Image Processing. 21 (4): 1715–28. arXiv: 1106.6180 . Bibcode:2012ITIP...21.1715D. doi:10.1109/TIP.2011.2176954. PMID 22128008. S2CID 11204616.
↑ Ahn, Byeongyong; Ik Cho, Nam (3 April 2017). "Block-Matching Convolutional Neural Network for Image Denoising". arXiv: 1704.00524 [Vision and Pattern Recognition Computer Vision and Pattern Recognition].
↑ "BMCNN-ISPL". Seoul National University. Retrieved 3 January 2018.
↑ "LASIP - Legal Notice". Tampere University of Technology (TUT). Retrieved 2 January 2018.
↑ Lebrun, Marc (8 August 2012). "An Analysis and Implementation of the BM3D Image Denoising Method". Image Processing on Line. 2: 175–213. doi: 10.5201/ipol.2012.l-bm3d .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Dabov, Kostadin; Foi, Alessandro; Katkovnik, Vladimir; Egiazarian, Karen (16 July 2007). "Image denoising by sparse 3D transform-domain collaborative filtering". IEEE Transactions on Image Processing. 16 (8): 2080–2095. Bibcode:2007ITIP...16.2080D. CiteSeerX 10.1.1.219.5398 . doi:10.1109/TIP.2007.901238. PMID 17688213. S2CID 1475121.

[2] Manjón, José V.; Carbonell-Caballero, José; Lull, Juan J.; García-Martí, Gracián; Martí-Bonmatí, Luís; Robles, Montserrat (2008-08-01). "MRI denoising using Non-Local Means". Medical Image Analysis. 12 (4): 514–523. doi:10.1016/j.media.2008.02.004. ISSN 1361-8415. PMID 18381247.

[3] Maggioni, M.; Katkovnik, V.; Egiazarian, K.; Foi, A. (January 2013). "Nonlocal Transform-Domain Filter for Volumetric Data Denoising and Reconstruction". IEEE Transactions on Image Processing. 22 (1): 119–133. doi:10.1109/TIP.2012.2210725. ISSN 1057-7149. PMID 22868570. S2CID 1295558.

[4] Danielyan, Aram; Katkovnik, Vladimir; Egiazarian, Karen (30 June 2011). "BM3D Frames and Variational Image Deblurring". IEEE Transactions on Image Processing. 21 (4): 1715–28. arXiv: 1106.6180 . Bibcode:2012ITIP...21.1715D. doi:10.1109/TIP.2011.2176954. PMID 22128008. S2CID 11204616.

[5] Ahn, Byeongyong; Ik Cho, Nam (3 April 2017). "Block-Matching Convolutional Neural Network for Image Denoising". arXiv: 1704.00524 [Vision and Pattern Recognition Computer Vision and Pattern Recognition].

[6] "BMCNN-ISPL". Seoul National University. Retrieved 3 January 2018.

[7] "LASIP - Legal Notice". Tampere University of Technology (TUT). Retrieved 2 January 2018.

[8] Lebrun, Marc (8 August 2012). "An Analysis and Implementation of the BM3D Image Denoising Method". Image Processing on Line. 2: 175–213. doi: 10.5201/ipol.2012.l-bm3d .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]