Chessboard detection

Last updated January 20, 2025

Chessboards arise frequently in computer vision theory and practice because their highly structured geometry is well-suited for algorithmic detection and processing. The appearance of chessboards in computer vision can be divided into two main areas: camera calibration and feature extraction. This article provides a unified discussion of the role that chessboards play in the canonical methods from these two areas, including references to the seminal literature, examples, and pointers to software implementations.

Chessboard camera calibration

A classical problem in computer vision is three-dimensional (3D) reconstruction, where one seeks to infer 3D structure about a scene from two-dimensional (2D) images of it.^[1] Practical cameras are complex devices, and photogrammetry is needed to model the relationship between image sensor measurements and the 3D world. In the standard pinhole camera model, one models the relationship between world coordinates $\mathbf {X}$ and image (pixel) coordinates $\mathbf {x}$ via the perspective transformation

\mathbf {x} =K{\begin{bmatrix}R&t\end{bmatrix}}\mathbf {X} \quad ,\quad \mathbf {x} \in \mathbb {P} ^{2}\quad ,\quad \mathbf {X} \in \mathbb {P} ^{3},

where $\mathbb {P} ^{n}$ is the projective space of dimension $n$ .

In this setting, camera calibration is the process of estimating the parameters of the $3\times 4$ matrix $M=K{\begin{bmatrix}R&t\end{bmatrix}}$ of the perspective model. Camera calibration is an important step in the computer vision pipeline because many subsequent algorithms require knowledge of camera parameters as input.^[2] Chessboards are often used during camera calibration because they are simple to construct, and their planar grid structure defines many natural interest points in an image. The following two methods are classic calibration techniques that often employ chessboards.

Direct linear transformation

Direct linear transformation (DLT) calibration uses correspondences between world points and camera image points to estimate camera parameters. In particular, DLT calibration exploits the fact that the perspective pinhole camera model defines a set of similarity relations that can be solved via the direct linear transformation algorithm.^[3] To employ this approach, one requires accurate coordinates of a non-degenerate set of points in 3D space. A common way to achieve this is to construct a camera calibration rig (example below) built from three mutually perpendicular chessboards. Since the corners of each square are equidistant, it is straightforward to compute the 3D coordinates of each corner given the width of each square. The advantage of DLT calibration is its simplicity; arbitrary cameras can be calibrated by solving a single homogeneous linear system. However, the practical use of DLT calibration is limited by the necessity of a 3D calibration rig and the fact that extremely accurate 3D coordinates are required to avoid numerical instability.^[1]

Example: calibration rig

3D calibration rig built from three mutually perpendicular chessboards

Multiplane calibration

Multiplane calibration is a variant of camera auto-calibration that allows one to compute the parameters of a camera from two or more views of a planar surface. The seminal work in multiplane calibration is due to Zhang.^[4] Zhang's method calibrates cameras by solving a particular homogeneous linear system that captures the homographic relationships between multiple perspective views of the same plane. This multiview approach is popular because, in practice, it is more natural to capture multiple views of a single planar surface - like a chessboard - than to construct a precise 3D calibration rig, as required by DLT calibration. The following figures demonstrate a practical application of multiplane camera calibration from multiple views of a chessboard.^[5]

Example: multiplane calibration

Multiple views of a chessboard for multiplane calibration

Reconstructed orientations
(camera-centric coordinates)

Reconstructed orientations
(world-centric coordinates)

Chessboard feature extraction

The second context in which chessboards arise in computer vision is to demonstrate several canonical feature extraction algorithms. In feature extraction, one seeks to identify image interest points, which summarize the semantic content of an image and, hence, offer a reduced dimensionality representation of one's data.^[2] Chessboards - in particular - are often used to demonstrate feature extraction algorithms because their regular geometry naturally exhibits local image features like edges, lines, and corners. The following sections demonstrate the application of common feature extraction algorithms to a chessboard image.

Corners

Corners are a natural local image feature exploited in many computer vision systems. Loosely speaking, one can define a corner as the intersection of two edges. A variety of corner detection algorithms exist that formalize this notion into concrete algorithms. Corners are a useful image feature because they are necessarily distinct from their neighboring pixels. The Harris corner detector is a standard algorithm for corner detection in computer vision.^[6] The algorithm works by analyzing the eigenvalues of the 2D discrete structure tensor matrix at each image pixel and flagging a pixel as a corner when the eigenvalues of its structure tensor are sufficiently large. Intuitively, the eigenvalues of the structure tensor matrix associated with a given pixel describe the gradient strength in a neighborhood of that pixel. As such, a structure tensor matrix with large eigenvalues corresponds to an image neighborhood with large gradients in orthogonal directions - i.e., a corner.

A chessboard contains natural corners at the boundaries between board squares, so one would expect corner detection algorithms to successfully detect them in practice. Indeed, the following figure demonstrates Harris corner detection applied to a perspective-transformed chessboard image. Clearly, the Harris detector is able to accurately detect the corners of the board.

Example: corner detection

Perspective-transformed chessboard image

Output of Harris corner detector

Lines

Lines are another natural local image feature exploited in many computer vision systems. Geometrically, the set of all lines in a 2D image can be parametrized by polar coordinates $(\rho ,\theta )$ describing the distance and angle, respectively, of their normal vectors with respect to the origin. The discrete Hough transform exploits this idea by transforming a spatial image into a matrix in $(\rho ,\theta )$ -space whose $(i,j)$ -th entry counts the number of image edge points that lie on the line parametrized by $(\rho _{i},\theta _{j})$ .^[7]^[8]^[9] As such, one can detect lines in an image by simply searching for local maxima of its discrete Hough transform.

The grid structure of a chessboard naturally defines two sets of parallel lines in an image of it. Therefore, one expects that line detection algorithms should successfully detect these lines in practice. Indeed, the following figure demonstrates Hough transform-based line detection applied to a perspective-transformed chessboard image. Clearly, the Hough transform is able to accurately detect the lines induced by the board squares.

Example: line detection

Perspective-transformed chessboard image

Canny edge detector applied to chessboard image

Hough transform of edge image with 19 largest local maxima denoted

Lines parameterized by Hough transform local maxima

The following MATLAB code generates the above images using the Image Processing Toolbox:

% Load imageI=imread('Perspective_chessboard.png');% Compute edge imageBW=edge(I,'canny');% Compute Hough transform[Hthetarho]=hough(BW);% Find local maxima of Hough transformnumpeaks=19;thresh=ceil(0.1*max(H(:)));P=houghpeaks(H,numpeaks,'threshold',thresh);% Extract image lineslines=houghlines(BW,theta,rho,P,'FillGap',50,'MinLength',60);% --------------------------------------------------------------------------% Display results% --------------------------------------------------------------------------% Original imagefigure;imshow(I);% Edge imagefigure;imshow(BW);% Hough transformfigure;image(theta,rho,imadjust(mat2gray(H)),'CDataMapping','scaled');holdon;colormap(gray(256));plot(theta(P(:,2)),rho(P(:,1)),'o','color','r');% Detected linesfigure;imshow(I);holdon;n=size(I,2);fork=1:length(lines)% Overlay kth linex=[lines(k).point1(1)lines(k).point2(1)];y=[lines(k).point1(2)lines(k).point2(2)];line=@(z)((y(2)-y(1))/(x(2)-x(1)))*(z-x(1))+y(1);plot([1n],line([1n]),'Color','r');end

Limitations

The main limitation of using chessboard patterns for geometric camera calibration is that due to their highly repetitive structure, they need to be completely visible in the camera image. This assumption may be violated e.g. when specular reflections due to inhomogenous lighting cause chessboard detection to fail in some of the corners. The measurement of camera distortions close to the image corners is also altered by the need of a completely visible chessboard target.

To solve this issue, chessboard targets can be combined with some position encoding. One popular way is to place ArUco markers^[10] inside the lightchessboard squares. The main advantage of such ChArUco targets^[11] is that all light chessboard squares are uniquely coded and identifiable. This also allows to do single image multiplane calibration by placing multiple targets with different ArUco in one scene.

An alternative way for adding position encoding to chessboard patterns is the PuzzleBoard pattern:^[12] Each chessboard edge is given one bit of information such that local parts of the pattern show a unique bit pattern. In comparison to ChArUco patterns, the position encoding can be read at much lower resolutions.

An example of a PuzzleBoard pattern with 8x11 chessboard corners. Each 3x3 tile pattern is unique. PuzzleBoard8x11.jpg — An example of a PuzzleBoard pattern with 8x11 chessboard corners. Each 3x3 tile pattern is unique.

Related Research Articles

The Hough transform is a feature extraction technique used in image analysis, computer vision, pattern recognition, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

<span class="mw-page-title-main">Image stitching</span> Combining multiple photographic images with overlapping fields of view

Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally.

In computer vision, the fundamental matrix $is a 3\times3 matrix which relates corresponding points in stereo images. In epipolar geometry, with homogeneous image coordinates, x and x', of corresponding points in a stereo image pair, Fx describes a line on which the corresponding point x' on the other image must lie. That means, for all pairs of corresponding points holds$

<span class="mw-page-title-main">Corner detection</span> Approach used in computer vision systems

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

Camera resectioning is the process of estimating the parameters of a pinhole camera model approximating the camera that produced a given photograph or video; it determines which incoming light ray is associated with each pixel on the resulting image. Basically, the process determines the pose of the pinhole camera.

The generalized Hough transform (GHT), introduced by Dana H. Ballard in 1981, is the modification of the Hough transform using the principle of template matching. The Hough transform was initially developed to detect analytically defined shapes. In these cases, we have knowledge of the shape and aim to find out its location and orientation in the image. This modification enables the Hough transform to be used to detect an arbitrary object described with its model.

In computer vision, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images. In order to solve this problem it is necessary to know the parameters of the camera projection function from 3D to 2D for the cameras involved, in the simplest case represented by the camera matrices. Triangulation is sometimes also referred to as reconstruction or intersection.

Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.

In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s) employed to acquire the images, given a set of images depicting a number of 3D points from different viewpoints. Its name refers to the geometrical bundles of light rays originating from each 3D feature and converging on each camera's optical center, which are adjusted optimally according to an optimality criterion involving the corresponding image projections of all points.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

The Viola–Jones object detection framework is a machine learning object detection framework proposed in 2001 by Paul Viola and Michael Jones. It was motivated primarily by the problem of face detection, although it can be adapted to the detection of other object classes.

Hough transforms are techniques for object detection, a critical step in many implementations of computer vision, or data mining from images. Specifically, the Randomized Hough transform is a probabilistic variant to the classical Hough transform, and is commonly used to detect curves The basic idea of Hough transform (HT) is to implement a voting procedure for all potential curves in the image, and at the termination of the algorithm, curves that do exist in the image will have relatively high voting scores. Randomized Hough transform (RHT) is different from HT in that it tries to avoid conducting the computationally expensive voting process for every nonzero pixel in the image by taking advantage of the geometric properties of analytical curves, and thus improve the time efficiency and reduce the storage requirement of the original algorithm.

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing.

The circle Hough Transform (CHT) is a basic feature extraction technique used in digital image processing for detecting circles in imperfect images. The circle candidates are produced by “voting” in the Hough parameter space and then selecting local maxima in an accumulator matrix.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Perspective-n-Point is the problem of estimating the pose of a calibrated camera given a set of $n$ 3D points in the world and their corresponding 2D projections in the image. The camera pose consists of 6 degrees-of-freedom (DOF) which are made up of the rotation and 3D translation of the camera with respect to the world. This problem originates from camera calibration and has many applications in computer vision and other areas, including 3D pose estimation, robotics and augmented reality. A commonly used solution to the problem exists for $n = 3$ called P3P, and many solutions are available for the general case of $n \geq 3$ . A solution for $n = 2$ exists if feature orientations are available at the two points. Implementations of these solutions are also available in open source software.

In image processing, line detection is an algorithm that takes a collection of n edge points and finds all the lines on which these edge points lie. The most popular line detectors are the Hough transform and convolution-based techniques.

References

1 2 D. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Prentice Hall. (2002). ISBN 978-0262061582.
1 2 R. Szeliski. Computer Vision: Algorithms and Applications. Springer Science and Business Media. (2010). ISBN 978-1848829350.
↑ O. Faugeras. Three-dimensional Computer Vision. MIT Press. (1993). ISBN 978-0262061582.
↑ Z. Zhang. "A flexible new technique for camera calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 22(11), pp. 1330-1334 (2000).
↑ J. Bouguet, "Camera calibration toolbox for MATLAB". http://www.vision.caltech.edu/bouguetj/calib_doc/. (2013).
↑ C. Harris and M. Stephens. "A combined corner and edge detector." Proceedings of the 4th Alvey Vision Conference. pp. 147-151 (1988).
↑ L. Shapiro and G. Stockman. Computer Vision. Prentice-Hall, Inc. (2001). ISBN 978-0130307965
↑ R. Duda and P. Hart. "Use of the Hough transformation to detect lines and curves in pictures," Comm. ACM, vol. 15, pp. 11-15 (1972).
↑ P. Hough. "Machine analysis of bubble chamber pictures." Proc. Int. Conf. High Energy Accelerators and Instrumentation. (1959).
↑ S. Garrido-Jurado et al. "Automatic generation and detection of highly reliable fiducial markers under occlusion." Pattern Recognition, vol. 47(6), pp. 2280-2292. https://dl.acm.org/doi/abs/10.1016/J.PATCOG.2014.01.005. (2014).
↑ OpenCV. https://docs.opencv.org/3.4/df/d4a/tutorial_charuco_detection.html.
↑ P. Stelldinger, et al. "PuzzleBoard: A New Camera Calibration Pattern with Position Encoding." German Conference on Pattern Recognition. (2024). https://users.informatik.haw-hamburg.de/~stelldinger/pub/PuzzleBoard/. (2024).

External links

The following links are pointers to popular implementations of chessboard-related computer vision algorithms.

Camera Calibration Toolbox for MATLAB - MATLAB toolbox implementing many common camera calibration methods
Camera Calibration and 3D Reconstruction - OpenCV implementation of many common camera calibration methods
Multiplane Camera Calibration From Multiple Chessboard Views - MATLAB example of applying multiview auto-calibration to a series of chessboard images
MATLAB chessboard detection - MATLAB function from the Computer Vision System Toolbox for detecting chessboards in images
OpenCV chessboard detection - OpenCV function for detecting chessboards in images
MATLAB Harris corner detection - MATLAB function for performing Harris corner detection
OpenCV Harris corner detection - OpenCV function for performing Harris corner detection
MATLAB Hough transform - MATLAB function for computing the Hough transform
OpenCV Hough transform - OpenCV function for computing the Hough transform
mrgingham - tool for detection of chessboards

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[forsyth2002-1] 1 2 D. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Prentice Hall. (2002). ISBN 978-0262061582.

[szeliski2010-2] 1 2 R. Szeliski. Computer Vision: Algorithms and Applications. Springer Science and Business Media. (2010). ISBN 978-1848829350.

[faugeras1993-3] O. Faugeras. Three-dimensional Computer Vision. MIT Press. (1993). ISBN 978-0262061582.

[zhang2000-4] Z. Zhang. "A flexible new technique for camera calibration." IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 22(11), pp. 1330-1334 (2000).

[bouguet2013-5] J. Bouguet, "Camera calibration toolbox for MATLAB". http://www.vision.caltech.edu/bouguetj/calib_doc/. (2013).

[harris1998-6] C. Harris and M. Stephens. "A combined corner and edge detector." Proceedings of the 4th Alvey Vision Conference. pp. 147-151 (1988).

[shapiro2001-7] L. Shapiro and G. Stockman. Computer Vision. Prentice-Hall, Inc. (2001). ISBN 978-0130307965

[duda1972-8] R. Duda and P. Hart. "Use of the Hough transformation to detect lines and curves in pictures," Comm. ACM, vol. 15, pp. 11-15 (1972).

[hough1959-9] P. Hough. "Machine analysis of bubble chamber pictures." Proc. Int. Conf. High Energy Accelerators and Instrumentation. (1959).

[gerrido2014-10] S. Garrido-Jurado et al. "Automatic generation and detection of highly reliable fiducial markers under occlusion." Pattern Recognition, vol. 47(6), pp. 2280-2292. https://dl.acm.org/doi/abs/10.1016/J.PATCOG.2014.01.005. (2014).

[opencv-11] OpenCV. https://docs.opencv.org/3.4/df/d4a/tutorial_charuco_detection.html.

[stelldinger2024-12] P. Stelldinger, et al. "PuzzleBoard: A New Camera Calibration Pattern with Position Encoding." German Conference on Pattern Recognition. (2024). https://users.informatik.haw-hamburg.de/~stelldinger/pub/PuzzleBoard/. (2024).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]