Range segmentation is the task of segmenting (dividing) a range image , an image containing depth information for each pixel, into segments (regions), so that all the points of the same surface belong to the same region, there is no overlap between different regions and the union of these regions generates the entire image.
There have been two main approaches to the range segmentation problem: region-based range segmentation and edge-based range segmentation.
Region-based range segmentation algorithms can be further categorized into two major groups: parametric model-based range segmentation algorithms and region-growing algorithms.
Algorithms of the first group are based on assuming a parametric surface model and grouping data points so that all of them can be considered as points of a surface from the assumed parametric model (an instance of that model). [1] [2]
Region-growing algorithms start by segmenting an image into initial regions. These regions are then merged or extended by employing a region growing strategy. [3] [4] The initial regions can be obtained using different methods, including iterative or random methods. A drawback of algorithms of this group is that in general they produce distorted boundaries because the segmentation usually is carried out at region level instead of pixel level.
Edge-based range segmentation algorithms are based on edge detection and labeling edges using the jump boundaries (discontinuities). They apply an edge detector to extract edges from a range image. Once boundaries are extracted, edges with common properties are clustered together. A typical example of edge-based range segmentation algorithms is presented by Fan et al. [5] The segmentation procedure starts by detecting discontinuities using zero-crossing and curvature values. The image is segmented at discontinuities to obtain an initial segmentation. At the next step, the initial segmentation is refined by fitting quadratics whose coefficients are calculated based on the Least squares method. In general, a drawback of edge-based range segmentation algorithms is that although they produce clean and well defined boundaries between different regions, they tend to produce gaps between boundaries. In addition, for curved surfaces, discontinuities are smooth and hard to locate and therefore these algorithms tend to under-segment the range image. Although the range image segmentation problem has been studied for a number of years, the task of segmenting range images of curved surfaces is yet to be satisfactorily resolved. [6]
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do.
Edge detection includes a variety of mathematical methods that aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The points at which image brightness changes sharply are typically organized into a set of curved line segments termed edges. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection. Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction.
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
In computer vision and image processing feature detection includes methods for computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not. The resulting features will be subsets of the image domain, often in the form of isolated points, continuous curves or connected regions.
Iterative closest point (ICP) is an algorithm employed to minimize the difference between two clouds of points. ICP is often used to reconstruct 2D or 3D surfaces from different scans, to localize robots and achieve optimal path planning, to co-register bone models, etc.
Video tracking is the process of locating a moving object over time using a camera. It has a variety of uses, some of which are: human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, medical imaging and video editing. Video tracking can be a time-consuming process due to the amount of data that is contained in video. Adding further to the complexity is the possible need to use object recognition techniques for tracking, a challenging problem in its own right.
Scale-space segmentation or multi-scale segmentation is a general framework for signal and image segmentation, based on the computation of image descriptors at multiple scales of smoothing.
Livewire, is a segmentation technique which allows a user to select regions of interest to be extracted quickly and accurately, using simple mouse clicks. It is based on the lowest cost path algorithm, by Edsger W. Dijkstra. Firstly convolve the image with a Sobel filter to extract edges. Each pixel of the resulting image is a vertex of the graph and has edges going to the 4 pixels around it, as up, down, left, right. The edge costs are defined based on a cost function. In 1995, Eric N. Mortensen and William A. Barrett made some extension work on livewire segmentation tool, which is known as Intelligent Scissors. In 2010, Leo Grady extended the Livewire algorithm to 3D segmentation.
As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems, such as image smoothing, the stereo correspondence problem, image segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph. Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph, the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization.
In computer vision, the bag-of-words model can be applied to image classification, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.
The following outline is provided as an overview of and topical guide to object recognition:
In computer vision, 3D object recognition involves recognizing and determining 3D information, such as the pose, volume, or shape, of user-chosen 3D objects in a photograph or range scan. Typically, an example of the object to be recognized is presented to a vision system in a controlled environment, and then for an arbitrary input such as a video stream, the system locates the previously presented object. This can be done either off-line, or in real-time. The algorithms for solving this problem are specialized for locating a single pre-identified object, and can be contrasted with algorithms which operate on general classes of objects, such as face recognition systems or 3D generic object recognition. Due to the low cost and ease of acquiring photographs, a significant amount of research has been devoted to 3D object recognition in photographs.
In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.
The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation. Segmentation-based object categorization can be viewed as a specific case of spectral clustering applied to image segmentation.
Pedestrian detection is an essential and significant task in any intelligent video surveillance system, as it provides the fundamental information for semantic understanding of the video footages. It has an obvious extension to automotive applications due to the potential for improving safety systems. Many car manufacturers offer this as an ADAS option in 2017.
Demetri Terzopoulos is a Distinguished Professor of Computer Science in the Henry Samueli School of Engineering and Applied Science at the University of California, Los Angeles, where he directs the UCLA Computer Graphics & Vision Laboratory.
In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.
Dynamic texture is the texture with motion which can be found in videos of sea-waves, fire, smoke, wavy trees, etc. Dynamic texture has a spatially repetitive pattern with time-varying visual pattern. Modeling and analyzing dynamic texture is a topic of images processing and pattern recognition in computer vision.
Gregory D. Hager is the Mandell Bellmore Professor of Computer Science and founding director of the Johns Hopkins Malone Center for Engineering in Healthcare at Johns Hopkins University.
Gradient vector flow (GVF), a computer vision framework introduced by Chenyang Xu and Jerry L. Prince , is the vector field that is produced by a process that smooths and diffuses an input vector field. It is usually used to create a vector field from images that points to object edges from a distance. It's widely used in image analysis and computer vision applications for object tracking, shape recognition, segmentation, and edge detection. In particular, it's commonly used in conjunction with active contour model.