Range imaging

Last updated September 05, 2023

Range imaging is the name for a collection of techniques that are used to produce a 2D image showing the distance to points in a scene from a specific point, normally associated with some type of sensor device.

Types of range cameras

The sensor device that is used for producing the range image is sometimes referred to as a range camera or depth camera. Range cameras can operate according to a number of different techniques, some of which are presented here.

Stereo triangulation

Stereo triangulation is an application of stereophotogrammetry where the depth data of the pixels are determined from data acquired using a stereo or multiple-camera setup system. This way it is possible to determine the depth to points in the scene, for example, from the center point of the line between their focal points. In order to solve the depth measurement problem using a stereo camera system it is necessary to first find corresponding points in the different images. Solving the correspondence problem is one of the main problems when using this type of technique. For instance, it is difficult to solve the correspondence problem for image points that lie inside regions of homogeneous intensity or color. As a consequence, range imaging based on stereo triangulation can usually produce reliable depth estimates only for a subset of all points visible in the multiple cameras.

The advantage of this technique is that the measurement is more or less passive; it does not require special conditions in terms of scene illumination. The other techniques mentioned here do not have to solve the correspondence problem but are instead dependent on particular scene illumination conditions.

Sheet of light triangulation

If the scene is illuminated with a sheet of light this creates a reflected line as seen from the light source. From any point out of the plane of the sheet the line will typically appear as a curve, the exact shape of which depends both on the distance between the observer and the light source, and the distance between the light source and the reflected points. By observing the reflected sheet of light using a camera (often a high resolution camera) and knowing the positions and orientations of both camera and light source, it is possible to determine the distances between the reflected points and the light source or camera.

By moving either the light source (and normally also the camera) or the scene in front of the camera, a sequence of depth profiles of the scene can be generated. These can be represented as a 2D range image.

Structured light

By illuminating the scene with a specially designed light pattern, structured light , depth can be determined using only a single image of the reflected light. The structured light can be in the form of horizontal and vertical lines, points or checker board patterns. A light stage is basically a generic structured light range imaging device originally created for the job of reflectance capture.

Time-of-flight

The depth can also be measured using the standard time-of-flight (ToF) technique, more or less like a radar, in that a range image similar to a radar image is produced, except that a light pulse is used instead of an RF pulse. It is also not unlike a LIDAR, except that ToF is scannerless, i.e., the entire scene is captured with a single light pulse, as opposed to point-by-point with a rotating laser beam. Time-of-flight cameras are relatively new devices that capture a whole scene in three dimensions with a dedicated image sensor, and therefore have no need for moving parts. A time-of-flight laser radar with a fast gating intensified CCD camera achieves sub-millimeter depth resolution. With this technique a short laser pulse illuminates a scene, and the intensified CCD camera opens its high speed shutter only for a few hundred picoseconds. The 3D information is calculated from a 2D image series that was gathered with increasing delay between the laser pulse and the shutter opening.^[1]

Interferometry

By illuminating points with coherent light and measuring the phase shift of the reflected light relative to the light source it is possible to determine depth. Under the assumption that the true range image is a more or less continuous function of the image coordinates, the correct depth can be obtained using a technique called phase-unwrapping. See terrestrial SAR interferometry.

Coded aperture

Depth information may be partially or wholly inferred alongside intensity through reverse convolution of an image captured with a specially designed coded aperture pattern with a specific complex arrangement of holes through which the incoming light is either allowed through or blocked. The complex shape of the aperture creates a non-uniform blurring of the image for those parts of the scene not at the focal plane of the lens. The extent of blurring across the scene, which is related to the displacement from the focal plane, may be used to infer the depth.^[2]

In order to identify the size of the blur (needed to decode depth information) in the captured image, two approaches can be used: 1) deblurring the captured image with different blurs, or 2) learning some linear filters that identify the type of blur.

The first approach uses correct mathematical deconvolution that takes account of the known aperture design pattern; this deconvolution can identify where and by what degree the scene has become convoluted by out of focus light selectively falling on the capture surface, and reverse the process.^[3] Thus the blur-free scene may be retrieved together with the size of the blur.

The second approach, instead, extracts the extent of the blur bypassing the recovery of the blur-free image, and therefore without performing reverse convolution. Using a principal component analysis (PCA) based technique, the method learns off-line a bank of filters that uniquely identify each size of blur; these filters are then applied directly to the captured image, as a normal convolution.^[4] The most important advantage of this approach is that no information about the coded aperture pattern is required. Because of its efficiency, this algorithm has also been extended to video sequences with moving and deformable objects.^[5]

Since the depth for a point is inferred from its extent of blurring caused by the light spreading from the corresponding point in the scene arriving across the entire surface of the aperture and distorting according to this spread, this is a complex form of stereo triangulation. Each point in the image is effectively spatially sampled across the width of the aperture.

This technology has lately been used in the iPhone X. Many other phones from Samsung and computers from Microsoft have tried to use this technology but they do not use the 3D mapping.

Related Research Articles

<span class="mw-page-title-main">Depth of field</span> Distance between the nearest and the furthest objects that are in focus in an image

The depth of field (DOF) is the distance between the nearest and the furthest objects that are in acceptably sharp focus in an image captured with a camera.

In optics, an aperture is a hole or an opening through which light travels. More specifically, the aperture and focal length of an optical system determine the cone angle of the bundle of rays that come to a focus in the image plane.

In photography, bokeh is the aesthetic quality of the blur produced in out-of-focus parts of an image, caused by Circles of Confusion. Bokeh has also been defined as "the way the lens renders out-of-focus points of light". Differences in lens aberrations and aperture shape cause very different bokeh effects. Some lens designs blur the image in a way that is pleasing to the eye, while others produce distracting or unpleasant blurring. Photographers may deliberately use a shallow focus technique to create images with prominent out-of-focus regions, accentuating their lens's bokeh.

<span class="mw-page-title-main">Point spread function</span> Response in an optical imaging system

The point spread function (PSF) describes the response of a focused optical imaging system to a point source or point object. A more general term for the PSF is the system's impulse response; the PSF is the impulse response or impulse response function (IRF) of a focused optical imaging system. The PSF in many contexts can be thought of as the extended blob in an image that represents a single point object, that is considered as a spatial impulse. In functional terms, it is the spatial domain version of the optical transfer function (OTF) of an imaging system. It is a useful concept in Fourier optics, astronomical imaging, medical imaging, electron microscopy and other imaging techniques such as 3D microscopy and fluorescence microscopy.

Computational photography refers to digital image capture and processing techniques that use digital computation instead of optical processes. Computational photography can improve the capabilities of a camera, or introduce features that were not possible at all with film based photography, or reduce the cost or size of camera elements. Examples of computational photography include in-camera computation of digital panoramas, high-dynamic-range images, and light field cameras. Light field cameras use novel optical elements to capture three dimensional scene information which can then be used to produce 3D images, enhanced depth-of-field, and selective de-focusing. Enhanced depth-of-field reduces the need for mechanical focusing systems. All of these features use computational imaging techniques.

A total internal reflection fluorescence microscope (TIRFM) is a type of microscope with which a thin region of a specimen, usually less than 200 nanometers can be observed.

The science of photography is the use of chemistry and physics in all aspects of photography. This applies to the camera, its lenses, physical operation of the camera, electronic camera internals, and the process of developing film in order to take and develop pictures properly.

In photography and optics, a neutral-density filter, or ND filter, is a filter that reduces or modifies the intensity of all wavelengths, or colors, of light equally, giving no changes in hue of color rendition. It can be a colorless (clear) or grey filter, and is denoted by Wratten number 96. The purpose of a standard photographic neutral-density filter is to reduce the amount of light entering the lens. Doing so allows the photographer to select combinations of aperture, exposure time and sensor sensitivity that would otherwise produce overexposed pictures. This is done to achieve effects such as a shallower depth of field or motion blur of a subject in a wider range of situations and atmospheric conditions.

3D scanning is the process of analyzing a real-world object or environment to collect three dimensional data of its shape and possibly its appearance. The collected data can then be used to construct digital 3D models.

A light field camera, also known as a plenoptic camera, is a camera that captures information about the light field emanating from a scene; that is, the intensity of light in a scene, and also the precise direction that the light rays are traveling in space. This contrasts with conventional cameras, which record only light intensity at various wavelengths.

Bloom is a computer graphics effect used in video games, demos, and high-dynamic-range rendering (HDRR) to reproduce an imaging artifact of real-world cameras. The effect produces fringes of light extending from the borders of bright areas in an image, contributing to the illusion of an extremely bright light overwhelming the camera or eye capturing the scene. It became widely used in video games after an article on the technique was published by the authors of Tron 2.0 in 2004.

The following are common definitions related to the machine vision field.

In optics and signal processing, wavefront coding refers to the use of a phase modulating element in conjunction with deconvolution to extend the depth of field of a digital imaging system such as a video camera.

<span class="mw-page-title-main">Structured light</span>

Structured light is the process of projecting a known pattern on to a scene. The way that these deform when striking surfaces allows vision systems to calculate the depth and surface information of the objects in the scene, as used in structured light 3D scanners.

<span class="mw-page-title-main">Tilted plane focus</span>

Tilted plane photography is a method of employing focus as a descriptive, narrative or symbolic artistic device. It is distinct from the more simple uses of selective focus which highlight or emphasise a single point in an image, create an atmospheric bokeh, or miniaturise an obliquely-viewed landscape. In this method the photographer is consciously using the camera to focus on several points in the image at once while de-focussing others, thus making conceptual connections between these points.

<span class="mw-page-title-main">Laser beam profiler</span> Measurement device

A laser beam profiler captures, displays, and records the spatial intensity profile of a laser beam at a particular plane transverse to the beam propagation path. Since there are many types of lasers — ultraviolet, visible, infrared, continuous wave, pulsed, high-power, low-power — there is an assortment of instrumentation for measuring laser beam profiles. No single laser beam profiler can handle every power level, pulse duration, repetition rate, wavelength, and beam size.

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

A structured-light 3D scanner is a 3D scanning device for measuring the three-dimensional shape of an object using projected light patterns and a camera system.

A time-of-flight camera, also known as time-of-flight sensor, is a range imaging camera system for measuring distances between the camera and the subject for each point of the image based on time-of-flight, the round trip time of an artificial light signal, as provided by a laser or an LED. Laser-based time-of-flight cameras are part of a broader class of scannerless LIDAR, in which the entire scene is captured with each laser pulse, as opposed to point-by-point with a laser beam such as in scanning LIDAR systems. Time-of-flight camera products for civil applications began to emerge around 2000, as the semiconductor processes allowed the production of components fast enough for such devices. The systems cover ranges of a few centimeters up to several kilometers.

Computational imaging is the process of indirectly forming images from measurements using algorithms that rely on a significant amount of computing. In contrast to traditional imaging, computational imaging systems involve a tight integration of the sensing system and the computation in order to form the images of interest. The ubiquitous availability of fast computing platforms, the advances in algorithms and modern sensing hardware is resulting in imaging systems with significantly enhanced capabilities. Computational Imaging systems cover a broad range of applications include computational microscopy, tomographic imaging, MRI, ultrasound imaging, computational photography, Synthetic Aperture Radar (SAR), seismic imaging etc. The integration of the sensing and the computation in computational imaging systems allows for accessing information which was otherwise not possible. For example:

References

↑ High accuracy 3D laser radar Jens Busck and Henning Heiselberg, Danmarks Tekniske University, 2004
↑ Martinello, Manuel (2012). Coded Aperture Imaging (PDF). Heriot-Watt University.
↑ Image and depth from a conventional camera with a coded aperture Anat Levin, Rob Fergus, Fredo Durand, William T. Freeman, MIT
↑ Martinello, Manuel; Favaro, Paolo (2011). "Single Image Blind Deconvolution with Higher-Order Texture Statistics" (PDF). Video Processing and Computational Video. Lecture Notes in Computer Science. Vol. 7082. Springer-Verlag. pp. 124–151. doi:10.1007/978-3-642-24870-2_6. ISBN 978-3-642-24869-6.
↑ Martinello, Manuel; Favaro, Paolo (2012). "Depth estimation from a video sequence with moving and deformable objects". IET Conference on Image Processing (IPR 2012) (PDF). p. 131. doi:10.1049/cp.2012.0425. ISBN 978-1-84919-632-1.

Bernd Jähne (1997). Practical Handbook on Image Processing for Scientific Applications. CRC Press. ISBN 0-8493-8906-2.
Linda G. Shapiro and George C. Stockman (2001). Computer Vision. Prentice Hall. ISBN 0-13-030796-3.
David A. Forsyth and Jean Ponce (2003). Computer Vision, A Modern Approach. Prentice Hall. ISBN 0-12-379777-2.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] High accuracy 3D laser radar Jens Busck and Henning Heiselberg, Danmarks Tekniske University, 2004

[2] Martinello, Manuel (2012). Coded Aperture Imaging (PDF). Heriot-Watt University.

[3] Image and depth from a conventional camera with a coded aperture Anat Levin, Rob Fergus, Fredo Durand, William T. Freeman, MIT

[4] Martinello, Manuel; Favaro, Paolo (2011). "Single Image Blind Deconvolution with Higher-Order Texture Statistics" (PDF). Video Processing and Computational Video. Lecture Notes in Computer Science. Vol. 7082. Springer-Verlag. pp. 124–151. doi:10.1007/978-3-642-24870-2_6. ISBN 978-3-642-24869-6.

[5] Martinello, Manuel; Favaro, Paolo (2012). "Depth estimation from a video sequence with moving and deformable objects". IET Conference on Image Processing (IPR 2012) (PDF). p. 131. doi:10.1049/cp.2012.0425. ISBN 978-1-84919-632-1.

[1]

[2]

[3]

[4]

[5]