Simple interactive object extraction

Last updated

Simple interactive object extraction (SIOX) is an algorithm for extracting foreground objects from color images and videos with very little user interaction. [1] It has been implemented as "foreground selection" tool in the GIMP (since version 2.3.3), as part of the tracer tool in Inkscape (since 0.44pre3), and as function in ImageJ and Fiji (plug-in). Experimental implementations were also reported for Blender and Krita. Although the algorithm was originally designed for videos, virtually all implementations use SIOX primarily for still image segmentation. In fact, it is often said to be the current de facto standard for this task in the open-source world.

Contents

Initially, a free hand selection tool is used to specify the region of interest. It must contain all foreground objects to extract and as few background as possible. The pixels outside the region of interest form the sure background while the inner region define a superset of the foreground, i.e. the unknown region. A so-called foreground brush is then used to mark representative foreground regions. The algorithm outputs a selection mask. The selection can be refined by either adding further foreground markings or by adding background markings using the background brush.

Technically, the algorithm performs the following steps:

For video segmentation the sure background and sure foreground regions are learned from motion statistics. SIOX also features tools that allow sub-pixel accurate refinement of edges and high texture areas, the so-called "detail refinement brushes".

As with all segmentation algorithms, there are always pictures where the algorithm does not yield perfect results. The most critical drawback of SIOX is the color dependence. Although many photos are well-separable by color, the algorithm cannot deal with camouflage. If the foreground and background share many identical shades of similar colors, the algorithm might give a result with parts missing or incorrectly classified foreground. SIOX performs about equally well on different benchmarks compared to graph-based segmentation methods, such as Grabcut. SIOX is, however, more noise robust and can therefore also be used for the segmentation of videos. Graph-based segmentation methods search for a minimum cut and therefore tend to not perform optimally with complex structures.

The algorithm has initially been developed at the department of computer science at Freie Universitaet Berlin. The main developer, Gerald Friedland, is now faculty at the EECS department of the University of California at Berkeley and also a Principal Data Scientist at Lawrence Livermore National Lab. He continues to support the development through mentoring, e.g. in the Google Summer of Code.

Notes

  1. Friedland, G., Jantz, K., Lenz, T., Wiesel, F., and Rojas, R. (2006). "A Practical Approach to Boundary Accurate Multi-Object Extraction from Still Images and Videos". Eighth IEEE International Symposium on Multimedia (ISM'06). pp. 307–316. doi:10.1109/ISM.2006.9. ISBN   978-0-7695-2746-8. S2CID   13938666.{{cite book}}: CS1 maint: multiple names: authors list (link)

Related Research Articles

<span class="mw-page-title-main">GIMP</span> Open source raster graphics editor

GIMP is a free and open-source raster graphics editor used for image manipulation (retouching) and image editing, free-form drawing, transcoding between different image file formats, and more specialized tasks. It is not designed to be used for drawing, though some artists and creators have used it in this way.

<span class="mw-page-title-main">Chroma key</span> Compositing technique, also known as green screen

Chroma key compositing, or chroma keying, is a visual-effects and post-production technique for compositing (layering) two or more images or video streams together based on colour hues. The technique has been used in many fields to remove a background from the subject of a photo or video – particularly the newscasting, motion picture, and video game industries. A colour range in the foreground footage is made transparent, allowing separately filmed background footage or a static image to be inserted into the scene. The chroma keying technique is commonly used in video production and post-production. This technique is also referred to as colour keying, colour-separation overlay, or by various terms for specific colour-related variants such as green screen or blue screen; chroma keying can be done with backgrounds of any colour that are uniform and distinct, but green and blue backgrounds are more commonly used because they differ most distinctly in hue from any human skin colour. No part of the subject being filmed or photographed may duplicate the colour used as the backing, or the part may be erroneously identified as part of the backing.

<span class="mw-page-title-main">Adobe Illustrator</span> Vector graphics editor from Adobe Inc.

Adobe Illustrator is a vector graphics editor and design program developed and marketed by Adobe Inc. Originally designed for the Apple Macintosh, development of Adobe Illustrator began in 1985. Along with Creative Cloud, Illustrator CC was released. The latest version, Illustrator 2023, was released on October 18, 2022, and is the 27th generation in the product line. Adobe Illustrator was reviewed as the best vector graphics editing program in 2021 by PC Magazine.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

<span class="mw-page-title-main">Otsu's method</span> In computer vision and image processing

In computer vision and image processing, Otsu's method, named after Nobuyuki Otsu, is used to perform automatic image thresholding. In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes, foreground and background. This threshold is determined by minimizing intra-class intensity variance, or equivalently, by maximizing inter-class variance. Otsu's method is a one-dimensional discrete analogue of Fisher's Discriminant Analysis, is related to Jenks optimization method, and is equivalent to a globally optimal k-means performed on the intensity histogram. The extension to multi-level thresholding was described in the original paper, and computationally efficient implementations have since been proposed.

<span class="mw-page-title-main">Clone tool</span>

The clone tool, as it is known in Adobe Photoshop, Inkscape, GIMP, and Corel PhotoPaint, is used in digital image editing to replace information for one part of a picture with information from another part. In other image editing software, its equivalent is sometimes called a rubber stamp tool or a clone brush.

Mattes are used in photography and special effects filmmaking to combine two or more image elements into a single, final image. Usually, mattes are used to combine a foreground image with a background image. In this case, the matte is the background painting. In film and stage, mattes can be physically huge sections of painted canvas, portraying large scenic expanses of landscapes.

<span class="mw-page-title-main">Color quantization</span>

In computer graphics, color quantization or color image quantization is quantization applied to color spaces; it is a process that reduces the number of distinct colors used in an image, usually with the intention that the new image should be as visually similar as possible to the original image. Computer algorithms to perform color quantization on bitmaps have been studied since the 1970s. Color quantization is critical for displaying images with many colors on devices that can only display a limited number of colors, usually due to memory limitations, and enables efficient compression of certain types of images.

Connected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic. Connected-component labeling is not to be confused with segmentation.

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems, such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph. Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph, the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization.

In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others.

GrabCut is an image segmentation method based on graph cuts.

The random walker algorithm is an algorithm for image segmentation. In the first description of the algorithm, a user interactively labels a small number of pixels with known labels, e.g., "object" and "background". The unlabeled pixels are each imagined to release a random walker, and the probability is computed that each pixel's random walker first arrives at a seed bearing each label, i.e., if a user places K seeds, each with a different label, then it is necessary to compute, for each pixel, the probability that a random walker leaving the pixel will first arrive at each seed. These probabilities may be determined analytically by solving a system of linear equations. After computing these probabilities for each pixel, the pixel is assigned to the label for which it is most likely to send a random walker. The image is modeled as a graph, in which each pixel corresponds to a node which is connected to neighboring pixels by edges, and the edges are weighted to reflect the similarity between the pixels. Therefore, the random walk occurs on the weighted graph.

<span class="mw-page-title-main">Image editing</span> Processes of altering images, digital or traditional photos, adding, pasting, cutting words

Image editing encompasses the processes of altering images, whether they are digital photographs, traditional photo-chemical photographs, or illustrations. Traditional analog image editing is known as photo retouching, using tools such as an airbrush to modify photographs or editing illustrations with any traditional art medium. Graphic software programs, which can be broadly grouped into vector graphics editors, raster graphics editors, and 3D modelers, are the primary tools with which a user may manipulate, enhance, and transform images. Many image editing programs are also used to render or create computer art from scratch. The term "image editing" usually refers only to the editing of 2D images, not 3D ones.

<span class="mw-page-title-main">Foreground detection</span>

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing.

Foreground and background or background and foreground may refer to:

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Gerald Friedland is an adjunct professor at the Electrical Engineering and Computer Science Department of the University of California, Berkeley. He also maintains an affiliation with the International Computer Science Institute.

<span class="mw-page-title-main">Teknomo–Fernandez algorithm</span>

The Teknomo–Fernandez algorithm , is an efficient algorithm for generating the background image of a given video sequence.

Video matting is a technique for separating the video into two or more layers, usually foreground and background, and generating alpha mattes which determine blending of the layers. The technique is very popular in video editing because it allows to substitute the background, or process the layers individually.

References