Tomasi–Kanade factorization

Last updated

The Tomasi–Kanade factorization is the seminal work by Carlo Tomasi and Takeo Kanade in the early 1990s. [1] It charted out an elegant and simple solution based on a SVD-based factorization scheme for analysing image measurements of a rigid object captured from different views using a weak perspective camera model. The crucial observation made by authors was that if all the measurements (i.e., image co-ordinates of all the points in all the views) are collected in a single matrix, the point trajectories will reside in a certain subspace. The dimension of the subspace in which the image data resides is a direct consequence of two factors:


  1. The type of camera that projects the scene (for example, affine or perspective)
  2. The nature of inspected object (for instance, rigid or non-rigid).

The low-dimensionality of the subspace is mirrored (captured) trivially as reduced rank of the measurement matrix. This reduced rank of measurement matrix can be motivated from the fact that, the position of the projection of an object point on the image plane is constrained as the motion of each point is globally described by a precise geometric model.


The rigid-body factorization introduced in provides a description of 3D structure of a rigid object in terms of a set of feature points extracted from salient image features. After tracking the points throughout all the images composing the temporal sequence, a set of trajectories is available. These trajectories are constrained globally at each frame by the rigid transformation which the shape is undergoing, i.e., trajectory of every point will have similar profile.

Let the location of a point j in a frame i be defined as pij = (xij, yij)T where xij and yij are horizontal and vertical image co-ordinates respectively .

A compact representation of the image measurements can be expressed by collecting all the non-homogeneous co-ordinates in a single matrix, called the observation matrix P such that

P is a 2F × N matrix, where F is the number of frames and N the number of feature points. Ideally, the observation matrix, should contain perfect information about the object being tracked. Unfortunately, in practice, most state-of-art trackers can only provide point tracks that are incomplete (due to occlusion) and inaccurate (due to sensor noise) if placed in an unstructured environment.

As mentioned earlier, the central premise behind the factorization approach is that a measurement matrix P is rank limited. Further, it is possible to factor P into two sub-matrices: a motion and a shape matrix, M and S of size 2F × r and N × r respectively.

The size and structure of S generally depends on the shape properties (for example whether it is rigid or non-rigid) and M depends both on the type of camera model we assume and the shape properties. The essence of factorization method is computing

The optimal r-rank approximation of P with respect to the Frobenius norm can be found out using a SVD-based scheme.

Related Research Articles

In mathematics, a linear map is a mapping VW between two modules that preserves the operations of addition and scalar multiplication. If a linear map is a bijection then it is called a linear isomorphism.

Linear subspace subset of a vector space that forms a vector space itself

In mathematics, and more specifically in linear algebra, a linear subspace, also known as a vector subspace is a vector space that is a subset of some larger vector space. A linear subspace is usually called simply a subspace when the context serves to distinguish it from other types of subspaces.

Kinematics is a subfield of classical mechanics that describes the motion of points, bodies (objects), and systems of bodies without considering the forces that cause them to move. Kinematics, as a field of study, is often referred to as the "geometry of motion" and is occasionally seen as a branch of mathematics. A kinematics problem begins by describing the geometry of the system and declaring the initial conditions of any known values of position, velocity and/or acceleration of points within the system. Then, using arguments from geometry, the position, velocity and acceleration of any unknown parts of the system can be determined. The study of how forces act on bodies falls within kinetics, not kinematics. For further details, see analytical dynamics.

Row and column spaces linear algebra

In linear algebra, the column space of a matrix A is the span of its column vectors. The column space of a matrix is the image or range of the corresponding matrix transformation.

In mathematics, matrix addition is the operation of adding two matrices by adding the corresponding entries together. However, there are other operations which could also be considered as a kind of addition for matrices, the direct sum and the Kronecker sum.

Matrix multiplication Mathematical operation in linear algebra

In mathematics, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The result matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix.

Work (physics) Process of energy transfer to an object via force application through displacement

In physics, work is the product of force and displacement. A force is said to do work if, when acting, there is a displacement of the point of application in the direction of the force.

In linear algebra, a Toeplitz matrix or diagonal-constant matrix, named after Otto Toeplitz, is a matrix in which each descending diagonal from left to right is constant. For instance, the following matrix is a Toeplitz matrix:

Moment of inertia physical quantity in rotational dynamics

The moment of inertia, otherwise known as the mass moment of inertia, angular mass or rotational inertia, of a rigid body is a quantity that determines the torque needed for a desired angular acceleration about a rotational axis; similar to how mass determines the force needed for a desired acceleration. It depends on the body's mass distribution and the axis chosen, with larger moments requiring more torque to change the body's rotation rate.

In linear algebra, Cramer's rule is an explicit formula for the solution of a system of linear equations with as many equations as unknowns, valid whenever the system has a unique solution. It expresses the solution in terms of the determinants of the (square) coefficient matrix and of matrices obtained from it by replacing one column by the column vector of right-hand-sides of the equations. It is named after Gabriel Cramer (1704–1752), who published the rule for an arbitrary number of unknowns in 1750, although Colin Maclaurin also published special cases of the rule in 1748.

In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants".

In mathematics, a block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into sections called blocks or submatrices. Intuitively, a matrix interpreted as a block matrix can be visualized as the original matrix with a collection of horizontal and vertical lines, which break it up, or partition it, into a collection of smaller matrices. Any matrix may be interpreted as a block matrix in one or more ways, with each interpretation defined by how its rows and columns are partitioned.

In mathematics, more specifically in multivariable calculus, the implicit function theorem is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.

In mathematics, more specifically in linear algebra and functional analysis, the kernel of a linear mapping, also known as the null space or nullspace, is the set of vectors in the domain of the mapping which are mapped to the zero vector. That is, given a linear map L : VW between two vector spaces V and W, the kernel of L is the set of all elements v of V for which L(v) = 0, where 0 denotes the zero vector in W, or more symbolically:

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of a multivariate function with respect to a single variable, into vectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems of differential equations. The notation used here is commonly used in statistics and engineering, while the tensor index notation is preferred in physics.

In computer vision, the Lucas–Kanade method is a widely used differential method for optical flow estimation developed by Bruce D. Lucas and Takeo Kanade. It assumes that the flow is essentially constant in a local neighbourhood of the pixel under consideration, and solves the basic optical flow equations for all the pixels in that neighbourhood, by the least squares criterion.

In the mathematical field of differential geometry, the affine geometry of curves is the study of curves in an affine space, and specifically the properties of such curves which are invariant under the special affine group

Kepler orbit describes the motion of an orbiting body as an ellipse, parabola, or hyperbola

In celestial mechanics, a Kepler orbit is the motion of one body relative to another, as an ellipse, parabola, or hyperbola, which forms a two-dimensional orbital plane in three-dimensional space. A Kepler orbit can also form a straight line. It considers only the point-like gravitational attraction of two bodies, neglecting perturbations due to gravitational interactions with other objects, atmospheric drag, solar radiation pressure, a non-spherical central body, and so on. It is thus said to be a solution of a special case of the two-body problem, known as the Kepler problem. As a theory in classical mechanics, it also does not take into account the effects of general relativity. Keplerian orbits can be parametrized into six orbital elements in various ways.

In analytical mechanics, the mass matrix is a symmetric matrix M that expresses the connection between the time derivative of the generalized coordinate vector q of a system and the kinetic energy T of that system, by the equation

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.


  1. Carlo Tomasi and Takeo Kanade. (November 1992). "Shape and motion from image streams under orthography: a factorization method". International Journal of Computer Vision. 9 (2): 137–154. CiteSeerX . doi:10.1007/BF00129684.

See also