Binocular disparity

Last updated January 25, 2025

Binocular disparity refers to the difference in image location of an object seen by the left and right eyes, resulting from the eyes' horizontal separation (parallax). The mind uses binocular disparity to extract depth information from the two-dimensional retinal images in stereopsis. In computer vision, binocular disparity refers to the difference in coordinates of similar features within two stereo images.

A similar disparity can be used in rangefinding by a coincidence rangefinder to determine distance and/or altitude to a target. In astronomy, the disparity between different locations on the Earth can be used to determine various celestial parallax, and Earth's orbit can be used for stellar parallax.

Definition

Human eyes are horizontally separated by about 50–75 mm (interpupillary distance) depending on each individual. Thus, each eye has a slightly different view of the world around. This can be easily seen when alternately closing one eye while looking at a vertical edge. The binocular disparity can be observed from apparent horizontal shift of the vertical edge between both views.

At any given moment, the line of sight of the two eyes meet at a point in space. This point in space projects to the same location (i.e. the center) on the retinae of the two eyes. Because of the different viewpoints observed by the left and right eye however, many other points in space do not fall on corresponding retinal locations. Visual binocular disparity is defined as the difference between the point of projection in the two eyes and is usually expressed in degrees as the visual angle.^[1]

The term "binocular disparity" refers to geometric measurements made external to the eye. The disparity of the images on the actual retina depends on factors internal to the eye, especially the location of the nodal points, even if the cross section of the retina is a perfect circle. Disparity on retina conforms to binocular disparity when measured as degrees, while much different if measured as distance due to the complicated structure inside eye.

Figure 1: The full black circle is the point of fixation. The blue object lies nearer to the observer. Therefore, it has a "near" disparity d_n. Objects lying more far away (green) correspondingly have a "far" disparity d_f. Binocular disparity is the angle between two lines of projection . One of which is the real projection from the object to the actual point of projection. The other one is the imaginary projection running through the nodal point of the fixation point.

In computer vision, binocular disparity is calculated from stereo images taken from a set of stereo cameras. The variable distance between these cameras, called the baseline, can affect the disparity of a specific point on their respective image plane. As the baseline increases, the disparity increases due to the greater angle needed to align the sight on the point. However, in computer vision, binocular disparity is referenced as coordinate differences of the point between the right and left images instead of a visual angle. The units are usually measured in pixels.

Tricking neurons with 2D images

Brain cells (neurons) in a part of the brain responsible for processing visual information coming from the retinae (primary visual cortex) can detect the existence of disparity in their input from the eyes. Specifically, these neurons will be active, if an object with "their" special disparity lies within the part of the visual field to which they have access (receptive field).^[2]

Researchers investigating precise properties of these neurons with respect to disparity present visual stimuli with different disparities to the cells and look whether they are active or not. One possibility to present stimuli with different disparities is to place objects in varying depth in front of the eyes. However, the drawback to this method may not be precise enough for objects placed further away as they possess smaller disparities while objects closer will have greater disparities. Instead, neuroscientists use an alternate method as schematised in Figure 2.

Figure 2: The disparity of an object with different depth than the fixation point can alternatively be produced by presenting an image of the object to one eye and a laterally shifted version of the same image to the other eye. The full black circle is the point of fixation. Objects in varying depths are placed along the line of fixation of the left eye. The same disparity produced from a shift in depth of an object (filled coloured circles) can also be produced by laterally shifting the object in constant depth in the picture one eye sees (black circles with coloured margin). Note that for near disparities the lateral shift has to be larger to correspond to the same depth compared with far disparities. This is what neuroscientists usually do with random dot stimuli to study disparity selectivity of neurons since the lateral distance required to test disparities is less than the distances required using depth tests. This principle has also been applied in autostereogram illusions.

Computing disparity using digital stereo images

The disparity of features between two stereo images are usually computed as a shift to the left of an image feature when viewed in the right image.^[3] For example, a single point that appears at the x coordinate t (measured in pixels) in the left image may be present at the x coordinate t − 3 in the right image. In this case, the disparity at that location in the right image would be 3 pixels.

Stereo images may not always be correctly aligned to allow for quick disparity calculation. For example, the set of cameras may be slightly rotated off level. Through a process known as image rectification, both images are rotated to allow for disparities in only the horizontal direction (i.e. there is no disparity in the y image coordinates).^[3] This is a property that can also be achieved by precise alignment of the stereo cameras before image capture.

Computer algorithm

After rectification, the correspondence problem can be solved using an algorithm that scans both the left and right images for matching image features. A common approach to this problem is to form a smaller image patch around every pixel in the left image. These image patches are compared to all possible disparities in the right image by comparing their corresponding image patches. For example, for a disparity of 1, the patch in the left image would be compared to a similar-sized patch in the right, shifted to the left by one pixel. The comparison between these two patches can be made by attaining a computational measure from one of the following equations that compares each of the pixels in the patches. For all of the following equations, L and R refer to the left and right columns while r and c refer to the current row and column of either images being examined. d refers to the disparity of the right image.

Normalized correlation: ${\frac {\sum {\sum {L(r,c)\cdot R(r,c-d)}}}{\sqrt {(\sum {\sum {L(r,c)^{2}}})\cdot (\sum {\sum {R(r,c-d)^{2}}})}}}$
Sum of squared differences: $\sum {\sum {(L(r,c)-R(r,c-d))^{2}}}$
Sum of absolute differences: $\sum {\sum {\left|L(r,c)-R(r,c-d)\right\vert }}$

The disparity with the lowest computed value using one of the above methods is considered the disparity for the image feature. This lowest score indicates that the algorithm has found the best match of corresponding features in both images.

The method described above is a brute-force search algorithm. With large patch and/or image sizes, this technique can be very time consuming as pixels are constantly being re-examined to find the lowest correlation score. However, this technique also involves unnecessary repetition as many pixels overlap. A more efficient algorithm involves remembering all values from the previous pixel. An even more efficient algorithm involves remembering column sums from the previous row (in addition to remembering all values from the previous pixel). Techniques that save previous information can greatly increase the algorithmic efficiency of this image analyzing process.

Uses of disparity from images

Knowledge of disparity can be used in further extraction of information from stereo images. One case that disparity is most useful is for depth/distance calculation. Disparity and distance from the cameras are inversely related. As the distance from the cameras increases, the disparity decreases. This allows for depth perception in stereo images. Using geometry and algebra, the points that appear in the 2D stereo images can be mapped as coordinates in 3D space.

This concept is particularly useful for navigation. For example, the Mars Exploration Rover uses a similar method for scanning the terrain for obstacles.^[4] The rover captures a pair of images with its stereoscopic navigation cameras and disparity calculations are performed in order to detect elevated objects (such as boulders).^[5] Additionally, location and speed data can be extracted from subsequent stereo images by measuring the displacement of objects relative to the rover. In some cases, this is the best source of this type of information as the encoder sensors in the wheels may be inaccurate due to tire slippage.

Related Research Articles

<span class="mw-page-title-main">Saccade</span> Eye movement

A saccade is a quick, simultaneous movement of both eyes between two or more phases of focal points in the same direction. In contrast, in smooth-pursuit movements, the eyes move smoothly instead of in jumps; it could be associated with a shift in frequency of an emitted signal or a movement of a body part or device. Controlled cortically by the frontal eye fields (FEF), or subcortically by the superior colliculus, saccades serve as a mechanism for focal points, rapid eye movement, and the fast phase of optokinetic nystagmus. The word appears to have been coined in the 1880s by French ophthalmologist Émile Javal, who used a mirror on one side of a page to observe eye movement in silent reading, and found that it involves a succession of discontinuous individual movements.

In biology, binocular vision is a type of vision in which an animal has two eyes capable of facing the same direction to perceive a single three-dimensional image of its surroundings. Binocular vision does not typically refer to vision where an animal has eyes on opposite sides of its head and shares no field of view between them, like in some animals.

Stereoscopy is a technique for creating or enhancing the illusion of depth in an image by means of stereopsis for binocular vision. The word stereoscopy derives from Greek στερεός (stereos) 'firm, solid' and σκοπέω (skopeō) 'to look, to see'. Any stereoscopic image is called a stereogram. Originally, stereogram referred to a pair of stereo images which could be viewed using a stereoscope.

Depth perception is the ability to perceive distance to objects in the world using the visual system and visual perception. It is a major factor in perceiving the world in three dimensions.

An autostereogram is a two-dimensional (2D) image that can create the optical illusion of a three-dimensional (3D) scene. Autostereograms use only one image to accomplish the effect while normal stereograms require two. The 3D scene in an autostereogram is often unrecognizable until it is viewed properly, unlike typical stereograms. Viewing any kind of stereogram properly may cause the viewer to experience vergence-accommodation conflict.

A random-dot stereogram (RDS) is stereo pair of images of random dots that, when viewed with the aid of a stereoscope, or with the eyes focused on a point in front of or behind the images, produces a sensation of depth due to stereopsis, with objects appearing to be in front of or behind the display level.

Stereopsis is the component of depth perception retrieved through binocular vision. Stereopsis is not the only contributor to depth perception, but it is a major one. Binocular vision happens because each eye receives a different image because they are in slightly different positions in one's head. These positional differences are referred to as "horizontal disparities" or, more generally, "binocular disparities". Disparities are processed in the visual cortex of the brain to yield depth perception. While binocular disparities are naturally present when viewing a real three-dimensional scene with two eyes, they can also be simulated by artificially presenting two different images separately to each eye using a method called stereoscopy. The perception of depth in such cases is also referred to as "stereoscopic depth".

<span class="mw-page-title-main">Horopter</span> All points in space which project onto the same points in the retinas of both eyes

In vision science, the horopter was originally defined in geometric terms as the locus of points in space that make the same angle at each eye with the fixation point, although more recently in studies of binocular vision it is taken to be the locus of points in space that have the same disparity as fixation. This can be defined theoretically as the points in space that project on corresponding points in the two retinas, that is, on anatomically identical points. The horopter can be measured empirically in which it is defined using some criterion.

Fixation disparity is a tendency of the eyes to drift in the direction of the heterophoria. While the heterophoria refers to a fusion-free vergence state, the fixation disparity refers to a small misalignment of the visual axes when both eyes are open in an observer with normal fusion and binocular vision. The misalignment may be vertical, horizontal or both. The misalignment is much smaller than that of strabismus. While strabismus prevents binocular vision, fixation disparity keeps binocular vision, however it may reduce a patient's level of stereopsis. A patient may have a different fixation disparity at distance than near. Observers with a fixation disparity are more likely to report eye strain in demanding visual tasks; therefore, tests of fixation disparity belong to the diagnostic tools used by eye care professionals: remediation includes vision therapy, prism eye glasses, or visual ergonomics at the workplace.

The stereo cameras approach is a method of distilling a noisy video signal into a coherent data set that a computer can begin to process into actionable symbolic objects, or abstractions. Stereo cameras is one of many approaches used in the broader fields of computer vision and machine vision.

Image rectification is a transformation process used to project images onto a common image plane. This process has several degrees of freedom and there are many strategies for transforming images to the common plane. Image rectification is used in computer stereo vision to simplify the problem of finding matching points between images, and in geographic information systems (GIS) to merge images taken from multiple perspectives into a common map coordinate system.

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

<span class="mw-page-title-main">3D stereo view</span> Enables viewing of objects through any stereo pattern

A 3D stereo view is the viewing of objects through any stereo pattern.

Computer stereo vision is the extraction of 3D information from digital images, such as those obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examining the relative positions of objects in the two panels. This is similar to the biological process of stereopsis.

Wiggle stereoscopy is an example of stereoscopy in which left and right images of a stereogram are animated. This technique is also called wiggle 3-D, wobble 3-D, wigglegram, or sometimes Piku-Piku.

2D to 3D video conversion is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.

Stereoscopic acuity, also stereoacuity, is the smallest detectable depth difference that can be seen in binocular vision.

In vision science, cyclodisparity is the difference in the rotation angle of an object or scene viewed by the left and right eyes. Cyclodisparity can result from the eyes' torsional rotation (cyclorotation) or can be created artificially by presenting to the eyes two images that need to be rotated relative to each other for binocular fusion to take place.

Binocular neurons are neurons in the visual system that assist in the creation of stereopsis from binocular disparity. They have been found in the primary visual cortex where the initial stage of binocular convergence begins. Binocular neurons receive inputs from both the right and left eyes and integrate the signals together to create a perception of depth.

Semi-global matching (SGM) is a computer vision algorithm for the estimation of a dense disparity map from a rectified stereo image pair, introduced in 2005 by Heiko Hirschmüller while working at the German Aerospace Center. Given its predictable run time, its favourable trade-off between quality of the results and computing time, and its suitability for fast parallel implementation in ASIC or FPGA, it has encountered wide adoption in real-time stereo vision applications such as robotics and advanced driver assistance systems.

References

↑ Qian, N., Binocular Disparity and the Perception of Depth, Neuron, 18, 359–368, 1997.
↑ Gonzalez, F. and Perez, R., Neural mechanisms underlying stereoscopic vision, Prog Neurobiol, 55(3), 191–224, 1998.
1 2 Linda G. Shapiro and George C. Stockman (2001). Computer Vision. Prentice Hall, 371–409. ISBN 0-13-030796-3.
↑ "The Computer Vision Laboratory ." JPL.NASA.GOV. JPL/NASA, n.d. Web. 5 Jun 2011. <>.
↑ "Spacecraft: Surface Operations: Rover ." JPL.NASA.GOV. JPL/NASA, n.d. Web. 5 Jun 2011. .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[binocular-1] Qian, N., Binocular Disparity and the Perception of Depth, Neuron, 18, 359–368, 1997.

[neural-2] Gonzalez, F. and Perez, R., Neural mechanisms underlying stereoscopic vision, Prog Neurobiol, 55(3), 191–224, 1998.

[comp_vision-3] 1 2 Linda G. Shapiro and George C. Stockman (2001). Computer Vision. Prentice Hall, 371–409. ISBN 0-13-030796-3.

[4] "The Computer Vision Laboratory ." JPL.NASA.GOV. JPL/NASA, n.d. Web. 5 Jun 2011. <>.

[5] "Spacecraft: Surface Operations: Rover ." JPL.NASA.GOV. JPL/NASA, n.d. Web. 5 Jun 2011. .

[1]

[2]

[3]

[4]

[5]