Thresholding (image processing)

Last updated April 26, 2024

In digital image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images.^[1]

Definition

The simplest thresholding methods replace each pixel in an image with a black pixel if the image intensity $I_{i,j}$ is less than a fixed value called the threshold $T$ , or a white pixel if the pixel intensity is greater than that threshold. In the example image on the right, this results in the dark tree becoming completely black, and the bright snow becoming completely white.

Automatic thresholding

While in some cases, the threshold $T$ can be selected manually by the user, there are many cases where the user wants the threshold to be automatically set by an algorithm. In those cases, the threshold should be the "best" threshold in the sense that the partition of the pixels above and below the threshold should match as closely as possible the actual partition between the two classes of objects represented by those pixels (e.g., pixels below the threshold should correspond to the background and those above to some objects of interest in the image).

Many types of automatic thresholding methods exist, the most famous and widely used being Otsu's method. Sezgin et al 2004 categorized thresholding methods into broad groups based on the information the algorithm manipulates.^[2] Note however that such a categorization is necessarily fuzzy as some methods can fall in several categories (for example, Otsu's method can be both considered a histogram-shape and a clustering algorithm)

Histogram shape-based methods, where, for example, the peaks, valleys and curvatures of the smoothed histogram are analyzed.^[3] Note that these methods, more than others, make certain assumptions about the image intensity probability distribution (i.e., the shape of the histogram),
Clustering-based methods, where the gray-level samples are clustered in two parts as background and foreground,^[4]^[5]
Entropy -based methods result in algorithms that use the entropy of the foreground and background regions, the cross-entropy between the original and binarized image, etc.,^[6]
Object Attribute-based methods search a measure of similarity between the gray-level and the binarized images, such as fuzzy shape similarity, edge coincidence, etc.,
Spatial methods use higher-order probability distribution and/or correlation between pixels.

Global vs local thresholding

In most methods, the same threshold is applied to all pixels of an image. However, in some cases, it can be advantageous to apply a different threshold to different parts of the image, based on the local value of the pixels. This category of methods is called local or adaptive thresholding. They are particularly adapted to cases where images have inhomogeneous lighting, such as in the sudoku image on the right. In those cases, a neighborhood is defined and a threshold is computed for each pixel and its neighborhood. Many global thresholding methods can be adapted to work in a local way, but there are also methods developed specifically for local thresholding, such as the Niblack [7] or the Bernsen algorithms.

Software such as ImageJ propose a wide range of automatic threshold methods, both global and local.

Benefits of Local Thresholding Over Global Thresholding^[8]

Adaptability to Local Image Characteristics: Local thresholding can adapt to variations in illumination, contrast, and texture within different parts of the image. This adaptability helps in handling images with non-uniform lighting conditions or complex textures.
Preservation of Local Details: By applying tailored thresholds to different regions, local thresholding can preserve fine details and edges that might be lost in global thresholding, especially in areas with varying intensities or gradients.
Reduced Sensitivity to Noise: Local thresholding can be less sensitive to noise compared to global thresholding, as the thresholding decision is based on local statistics rather than the entire image.

Examples of Algorithms for Local Thresholding

Niblack's Method:^[9] Niblack's algorithm computes a local threshold for each pixel based on the mean and standard deviation of the pixel's neighborhood. It adjusts the threshold based on the local characteristics of the image, making it suitable for handling variations in illumination.
Bernsen's Method:^[10] Bernsen's algorithm calculates the threshold for each pixel by considering the local contrast within a neighborhood. It uses a fixed window size and is robust to noise and variations in background intensity.
Sauvola's Method:^[11] Sauvola's algorithm extends Niblack's method by incorporating a dynamic factor that adapts the threshold based on the local contrast and mean intensity. This adaptive factor improves the binarization results, particularly in regions with varying contrasts.

Extensions of binary thresholding

Multi-band images

Color images can also be thresholded. One approach is to designate a separate threshold for each of the RGB components of the image and then combine them with an AND operation. This reflects the way the camera works and how the data is stored in the computer, but it does not correspond to the way that people recognize color. Therefore, the HSL and HSV color models are more often used; note that since hue is a circular quantity it requires circular thresholding. It is also possible to use the CMYK color model.^[12]

Multiple thresholds

Instead of a single threshold resulting in a binary image, it is also possible to introduce multiple increasing thresholds $T_{n}$ . In that case, implementing $N$ thresholds will result in an image with $N$ classes, where pixels with intensity $I_{ij}$ such that $T_{n}<I_{ij}<T_{n+1}$ will be assigned to class $n$ . Most of the binary automatic thresholding methods have a natural extension for multi-thresholding.

Limitations

Thresholding will work best under certain conditions :

low level of noise
higher intra-class variance than inter-class variance, i.e., pixels from a same group have closer intensities to each other than to pixels of another group,
homogeneous lighting, etc.

In difficult cases, thresholding will likely be imperfect and yield a binary image with false positives and false negatives.

Related Research Articles

Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics ; third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased.

Edge detection includes a variety of mathematical methods that aim at identifying edges, defined as curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection. Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction.

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.

<span class="mw-page-title-main">Fractal flame</span>

Fractal flames are a member of the iterated function system class of fractals created by Scott Draves in 1992. Draves' open-source code was later ported into Adobe After Effects graphics software and translated into the Apophysis fractal flame editor.

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

In computer vision and image processing, Otsu's method, named after Nobuyuki Otsu, is used to perform automatic image thresholding. In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes, foreground and background. This threshold is determined by minimizing intra-class intensity variance, or equivalently, by maximizing inter-class variance. Otsu's method is a one-dimensional discrete analogue of Fisher's Discriminant Analysis, is related to Jenks optimization method, and is equivalent to a globally optimal k-means performed on the intensity histogram. The extension to multi-level thresholding was described in the original paper, and computationally efficient implementations have since been proposed.

Shot transition detection also called cut detection is a field of research of video processing. Its subject is the automated detection of transitions between shots in digital video with the purpose of temporal segmentation of videos.

<span class="mw-page-title-main">Histogram equalization</span> Method in image processing of contrast adjustment using the images histogram

Histogram equalization is a method in image processing of contrast adjustment using the image's histogram.

<span class="mw-page-title-main">Error diffusion</span> Type of halftoning

Error diffusion is a type of halftoning in which the quantization residual is distributed to neighboring pixels that have not yet been processed. Its main use is to convert a multi-level image into a binary image, though it has other applications.

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems, such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph. Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph, the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization.

Context-adaptive binary arithmetic coding (CABAC) is a form of entropy encoding used in the H.264/MPEG-4 AVC and High Efficiency Video Coding (HEVC) standards. It is a lossless compression technique, although the video coding standards in which it is used are typically for lossy compression applications. CABAC is notable for providing much better compression than most other entropy encoding algorithms used in video encoding, and it is one of the key elements that provides the H.264/AVC encoding scheme with better compression capability than its predecessors.

In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

Region growing is a simple region-based image segmentation method. It is also classified as a pixel-based image segmentation method since it involves the selection of initial seed points.

In various science/engineering applications, such as independent component analysis, image analysis, genetic analysis, speech recognition, manifold learning, and time delay estimation it is useful to estimate the differential entropy of a system or process, given some observations.

In image processing, the balanced histogram thresholding method (BHT), is a very simple method used for automatic image thresholding. Like Otsu's Method and the Iterative Selection Thresholding Method, this is a histogram based thresholding method. This approach assumes that the image is divided in two main classes: The background and the foreground. The BHT method tries to find the optimum threshold level that divides the histogram in two classes.

Color normalization is a topic in computer vision concerned with artificial color vision and object recognition. In general, the distribution of color values in an image depends on the illumination, which may vary depending on lighting conditions, cameras, and other factors. Color normalization allows for object recognition techniques based on color to compensate for these variations.

Features from accelerated segment test (FAST) is a corner detection method, which could be used to extract feature points and later used to track and map objects in many computer vision tasks. The FAST corner detector was originally developed by Edward Rosten and Tom Drummond, and was published in 2006. The most promising advantage of the FAST corner detector is its computational efficiency. Referring to its name, it is indeed faster than many other well-known feature extraction methods, such as difference of Gaussians (DoG) used by the SIFT, SUSAN and Harris detectors. Moreover, when machine learning techniques are applied, superior performance in terms of computation time and resources can be realised. The FAST corner detector is very suitable for real-time video processing application because of this high-speed performance.

An oversampled binary image sensor is an image sensor with non-linear response capabilities reminiscent of traditional photographic film. Each pixel in the sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. The response function of the image sensor is non-linear and similar to a logarithmic function, which makes the sensor suitable for high dynamic range imaging.

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing.

References

↑ Shapiro, Linda G.; Stockman, George C. (2001). Computer Vision. Prentice Hall. p. 83. ISBN 978-0-13-030796-5.
↑ Sankur, Bülent (2004). "Survey over image thresholding techniques and quantitative performance evaluation". Journal of Electronic Imaging. 13 (1): 146. Bibcode:2004JEI....13..146S. doi:10.1117/1.1631315.
↑ Zack, G W; Rogers, W E; Latt, S A (July 1977). "Automatic measurement of sister chromatid exchange frequency". Journal of Histochemistry & Cytochemistry. 25 (7): 741–753. doi: 10.1177/25.7.70454 . PMID 70454. S2CID 15339151.
↑ "Picture Thresholding Using an Iterative Selection Method". IEEE Transactions on Systems, Man, and Cybernetics. 8 (8): 630–632. 1978. doi:10.1109/TSMC.1978.4310039.
↑ Barghout, L.; Sheynin, J. (2013-07-25). "Real-world scene perception and perceptual organization: Lessons from Computer Vision". Journal of Vision. 13 (9): 709. doi: 10.1167/13.9.709 .
↑ Kapur, J. N.; Sahoo, P. K.; Wong, A. K. C. (1985-03-01). "A new method for gray-level picture thresholding using the entropy of the histogram". Computer Vision, Graphics, and Image Processing. 29 (3): 273–285. doi:10.1016/0734-189X(85)90125-2.
↑ An introduction to digital image processing. Prentice-Hall International. 1986. ISBN 0-13-480600-X. OCLC 1244113797.^{[ page needed ]}
↑ Zhou, Huiyu., Wu, Jiahua., Zhang, Jianguo. Digital Image Processing: Part II. United States: Ventus Publishing, 2010.^{[ page needed ]}
↑ Niblack, Wayne (1986). An introduction to digital image processing. Prentice-Hall International. ISBN 0-13-480600-X. OCLC 1244113797.^{[ page needed ]}
↑ Chaki, Nabendu., Shaikh, Soharab Hossain., Saeed, Khalid. Exploring Image Binarization Techniques. Germany: Springer India, 2014.^{[ page needed ]}
↑ Sauvola, J.; Pietikäinen, M. (February 2000). "Adaptive document image binarization". Pattern Recognition. 33 (2): 225–236. Bibcode:2000PatRe..33..225S. doi:10.1016/S0031-3203(99)00055-2.
↑ Pham, Nhu-An; Morrison, Andrew; Schwock, Joerg; Aviel-Ronen, Sarit; Iakovlev, Vladimir; Tsao, Ming-Sound; Ho, James; Hedley, David W. (2007-02-27). "Quantitative image analysis of immunohistochemical stains using a CMYK color model". Diagnostic Pathology. 2 (1): 8. doi: 10.1186/1746-1596-2-8 . PMC 1810239 . PMID 17326824.

Thresholding (image processing)

Contents

Definition