Split and merge segmentation

Last updated February 26, 2020

Split and merge segmentation is an image processing technique used to segment an image. The image is successively split into quadrants based on a homogeneity criterion and similar regions are merged to create the segmented result. The technique incorporates a quadtree data structure, meaning that there is a parent-child node relationship. The total region is a parent, and each of the four splits is a child.

Algorithm

Define the criterion to be used for homogeneity
Split the image into equal size regions
Calculate homogeneity for each region
If the region is homogeneous, then merge it with neighbors
The process is repeated until all regions pass the homogeneity test^[1]

Homogeneity

After each split, a test is necessary to determine whether each new region needs further splitting. The criterion for the test is the homogeneity of the region. There are several ways to define homogeneity, some examples are:

Uniformity- the region is homogeneous if its gray scale levels are constant or within a given threshold.
Local mean vs global mean - if the mean of a region is greater than the mean of the global image, then the region is homogeneous
Variance - the gray level variance is defined as

$\sigma ^{2}=(1/(N-1))\sum _{(r,c)\epsilon R}[I(r,c)-{\bar {I}}]^{2}$

where r and c are row and column, N is the number of pixels in the region and ${\bar {I}}=(1/N)\sum _{(r,c)\epsilon Region}I(r,c)$

An example incorporation would be that the variance of a region be less than a specified value in order to be considered homogeneous.

Data structure

The splitting results in a partitioned image as shown below to 3 levels.

Each level of partitioning can be represented in a tree-like structure.

Example

The following example shows the segmentation of a gray scale image using matlab.^[2]^[3] The homogeneity criterion is thresholding, max(region)-min(region) < 10 for a region to be homogeneous.

The blocks created during splitting are shown in the following picture:

And the segmented image is below.

Related Research Articles

Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

Eigenface set of eigenvectors used in the computer vision problem of human face recognition

An eigenface is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

In data mining and statistics, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:

In computer vision, image segmentation is the process of partitioning a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In econometrics, the autoregressive conditional heteroscedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

In computer vision and image processing, Otsu's method, named after Nobuyuki Otsu, is used to perform automatic image thresholding. In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes, foreground and background. This threshold is determined by minimizing intra-class intensity variance, or equivalently, by maximizing inter-class variance. Otsu's method is a one-dimensional discrete analog of Fisher's Discriminant Analysis, is related to Jenks optimization method, and is equivalent to a globally optimal k-means performed on the intensity histogram. The extension to multi-level thresholding was described in the original paper, and computationally efficient implementations have since been proposed.

Fuzzy clustering is a form of clustering in which each data point can belong to more than one cluster.

A mixed model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units, or where measurements are made on clusters of related statistical units. Because of their advantage in dealing with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures ANOVA.

Range segmentation is the task of segmenting (dividing) a range image, an image containing depth information for each pixel, into segments (regions), so that all the points of the same surface belong to the same region, there is no overlap between different regions and the union of these regions generates the entire image.

In the study of image processing, a watershed is a transformation defined on a grayscale image. The name refers metaphorically to a geological watershed, or drainage divide, which separates adjacent drainage basins. The watershed transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges.

The Kadir–Brady saliency detector extracts features of objects in images that are distinct and representative. It was invented by Timor Kadir and J. Michael Brady in 2001 and an affine invariant version was introduced by Kadir and Brady in 2004 and a robust version was designed by Shao et al. in 2007.

Region growing is a simple region-based image segmentation method. It is also classified as a pixel-based image segmentation method since it involves the selection of initial seed points.

Image segmentation strives to partition a digital image into regions of pixels with similar properties, e.g. homogeneity. The higher-level region representation simplifies image analysis tasks such as counting objects or detecting changes, because region attributes can be compared more readily than raw pixels.

An image texture is a set of metrics calculated in image processing designed to quantify the perceived texture of an image. Image texture gives us information about the spatial arrangement of color or intensities in an image or selected region of an image.

Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood. The goal of this approach is to classify the images by using the contextual information.

CVIPtools is an Open Source image processing software. It is free for use with Windows, and previous versions are available for UNIX. It is an interactive program for image processing and computer vision.

The Kuwahara filter is a non-linear smoothing filter used in image processing for adaptive noise reduction. Most filters that are used for image smoothing are linear low-pass filters that effectively reduce noise but also blur out the edges. However the Kuwahara filter is able to apply smoothing on the image while preserving the edges.

References

↑ E., Umbaugh, Scott (2017-11-30). Digital Image Processing and Analysis with MATLAB and CVIPtools, Third Edition (3rd ed.). ISBN 9781498766074. OCLC 1016899766.
↑ C., Gonzalez, Rafael (2004). Digital Image processing using MATLAB. Woods, Richard E. (Richard Eugene), 1954-, Eddins, Steven L., 1964-. Upper Saddle River, NJ: Pearson/Prentice Hall. ISBN 0130085197. OCLC 54345501.
↑ "Quadtree decomposition - MATLAB qtdecomp". www.mathworks.com. Retrieved 2018-04-24.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] E., Umbaugh, Scott (2017-11-30). Digital Image Processing and Analysis with MATLAB and CVIPtools, Third Edition (3rd ed.). ISBN 9781498766074. OCLC 1016899766.

[2] C., Gonzalez, Rafael (2004). Digital Image processing using MATLAB. Woods, Richard E. (Richard Eugene), 1954-, Eddins, Steven L., 1964-. Upper Saddle River, NJ: Pearson/Prentice Hall. ISBN 0130085197. OCLC 54345501.

[3] "Quadtree decomposition - MATLAB qtdecomp". www.mathworks.com. Retrieved 2018-04-24.