Motion capture

Last updated
Motion capture of two pianists' right hands playing the same piece (slow motion, no sound) [1]

Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision [2] and robotics. [3] In filmmaking and video game development, it refers to recording actions of human actors, and using that information to animate digital character models in 2D or 3D computer animation. [4] [5] [6] When it includes face and fingers or captures subtle expressions, it is often referred to as performance capture. [7] In many fields, motion capture is sometimes called motion tracking, but in filmmaking and games, motion tracking usually refers more to match moving .

Military science is the study of military processes, institutions, and behavior, along with the study of warfare, and the theory and application of organized coercive force. It is mainly focused on theory, method, and practice of producing military capability in a manner consistent with national defense policy. Military science serves to identify the strategic, political, economic, psychological, social, operational, technological, and tactical elements necessary to sustain relative advantage of military force; and to increase the likelihood and favorable outcomes of victory in peace or during a war. Military scientists include theorists, researchers, experimental scientists, applied scientists, designers, engineers, test technicians, and other military personnel.

Entertainment activity that holds the attention and interest of an audience, or gives pleasure and delight

Entertainment is a form of activity that holds the attention and interest of an audience, or gives pleasure and delight. It can be an idea or a task, but is more likely to be one of the activities or events that have developed over thousands of years specifically for the purpose of keeping an audience's attention. Although people's attention is held by different things, because individuals have different preferences in entertainment, most forms are recognisable and familiar. Storytelling, music, drama, dance, and different kinds of performance exist in all cultures, were supported in royal courts, developed into sophisticated forms and over time became available to all citizens. The process has been accelerated in modern times by an entertainment industry that records and sells entertainment products. Entertainment evolves and can be adapted to suit any scale, ranging from an individual who chooses a private entertainment from a now enormous array of pre-recorded products; to a banquet adapted for two; to any size or type of party, with appropriate music and dance; to performances intended for thousands; and even for a global audience.

Filmmaking is the process of making a film, generally in the sense of films intended for extensive theatrical exhibition. Filmmaking involves a number of discrete stages including an initial story, idea, or commission, through screenwriting, casting, shooting, sound recording and pre-production, editing, and screening the finished product before an audience that may result in a film release and exhibition. Filmmaking takes place in many places around the world in a range of economic, social, and political contexts, and using a variety of technologies and cinematic techniques. Typically, it involves a large number of people, and can take from a few months to several years to complete.


In motion capture sessions, movements of one or more actors are sampled many times per second. Whereas early techniques used images from multiple cameras to calculate 3D positions, [8] Often the purpose of motion capture is to record only the movements of the actor, not his or her visual appearance. This animation data is mapped to a 3D model so that the model performs the same actions as the actor. This process may be contrasted with the older technique of rotoscoping, as seen in Ralph Bakshi's The Lord of the Rings (1978) and American Pop (1981). The animated character movements were achieved in these films by tracing over a live-action actor, capturing the actor's motions and movements. To explain, an actor is filmed performing an action, and then the recorded film is projected onto an animation table frame-by-frame. Animators trace the live-action footage onto animation cels, capturing the actor's outline and motions frame-by-frame, and then they fill in the traced outlines with the animated character. The completed animation cels are then photographed frame-by-frame, exactly matching the movements and actions of the live-action footage. The end result of which is that the animated character replicates exactly the live-action movements of the actor. However, this process takes a considerable amount of time and effort.

3D reconstruction from multiple images

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

Rotoscoping animation technique

Rotoscoping is an animation technique that animators use to trace over motion picture footage, frame by frame, to produce realistic action. Originally, animators projected photographed live-action movie images onto a glass panel and traced over the image. This projection equipment is referred to as a rotoscope, developed by Polish-American animator Max Fleischer. This device was eventually replaced by computers, but the process is still called rotoscoping.

Ralph Bakshi Animator, filmmaker

Ralph Bakshi is an American director of animated and live-action films. In the 1970s, he established an alternative to mainstream animation through independent and adult-oriented productions. Between 1972 and 2015, he directed ten theatrically released feature films, six of which he wrote. He has been involved in numerous television projects as director, writer, producer and animator.

Camera movements can also be motion captured so that a virtual camera in the scene will pan, tilt or dolly around the stage driven by a camera operator while the actor is performing. At the same time, the motion capture system can capture the camera and props as well as the actor's performance. This allows the computer-generated characters, images and sets to have the same perspective as the video images from the camera. A computer processes the data and displays the movements of the actor, providing the desired camera positions in terms of objects in the set. Retroactively obtaining camera movement data from the captured footage is known as match moving or camera tracking.


Motion capture offers several advantages over traditional computer animation of a 3D model:

Computer animation art of creating moving images using computers

Computer animation is the process used for digitally generating animated images. The more general term computer-generated imagery (CGI) encompasses both static scenes and dynamic images, while computer animation only refers to the moving images. Modern computer animation usually uses 3D computer graphics, although 2D computer graphics are still used for stylistic, low bandwidth, and faster real-time renderings. Sometimes, the target of the animation is the computer itself, but sometimes film as well.

Animation Method of creating moving pictures

Animation is a method in which pictures are manipulated to appear as moving images. In traditional animation, images are drawn or painted by hand on transparent celluloid sheets to be photographed and exhibited on film. Today, most animations are made with computer-generated imagery (CGI). Computer animation can be very detailed 3D animation, while 2D computer animation can be used for stylistic reasons, low bandwidth or faster real-time renderings. Other common animation methods apply a stop motion technique to two and three-dimensional objects like paper cutouts, puppets or clay figures.


Squash and stretch

Squash and stretch is the phrase used to describe "by far the most important" of the 12 basic principles of animation, described in the book The Illusion of Life by Frank Thomas and Ollie Johnston.


Motion capture performers from Buckinghamshire New University Motion Capture Performers.png
Motion capture performers from Buckinghamshire New University

Video games often use motion capture to animate athletes, martial artists, and other in-game characters. [12] [13] This has been done since the Sega Model 2 arcade game Virtua Fighter 2 in 1994. [14] By mid-1995 the use of motion capture in video game development had become commonplace, and developer/publisher Acclaim Entertainment had gone so far as to have its own in-house motion capture studio built into its headquarters. [13] Namco's 1995 arcade game Soul Edge used passive optical system markers for motion capture. [15]

Video game electronic game that involves interaction with a user interface to generate visual feedback on a video device such as a TV screen or computer monitor

A video game is an electronic game that involves interaction with a user interface to generate visual feedback on a two- or three-dimensional video display device such as a TV screen, virtual reality headset or computer monitor. Since the 1980s, video games have become an increasingly important part of the entertainment industry, and whether they are also a form of art is a matter of dispute.

Arcade game Coin-operated entertainment machine

An arcade game or coin-op game is a coin-operated entertainment machine typically installed in public businesses such as restaurants, bars and amusement arcades. Most arcade games are video games, pinball machines, electro-mechanical games, redemption games or merchandisers. While exact dates are debated, the golden age of arcade video games is usually defined as a period beginning sometime in the late 1970s and ending sometime in the mid-1980s. Excluding a brief resurgence in the early 1990s, the arcade industry subsequently declined in the Western hemisphere as competing home video game consoles such as the Sony PlayStation and Microsoft Xbox increased in their graphics and game-play capability and decreased in cost. The eastern hemisphere retains a strong arcade industry.

<i>Virtua Fighter 2</i> 1994 arcade video game

Virtua Fighter 2 is a fighting video game developed by Sega. It is the sequel to Virtua Fighter and the second game in the Virtua Fighter series. It was created by Sega's Yu Suzuki-headed AM2 and was released in the arcade in 1994. It was ported to the Sega Saturn in 1995 and Microsoft Windows in 1997. In 1996, a super deformed version of the game, Virtua Fighter Kids, arrived in arcades and was ported to the Sega Saturn. A 2D remake was released for the Mega Drive/Genesis in 1996. In addition, Virtua Fighter 2 was converted for the PlayStation 2 in 2004 as part of Sega's Ages 2500 series in Japan. The Mega Drive/Genesis port was re-released on the PS2 and PSP in 2006 as part of Sega Genesis Collection, on the Virtual Console for the Wii on March 20, 2007 (Japan) and April 16, 2007, and for iOS on January 20, 2011.

Movies use motion capture for CG effects, in some cases replacing traditional cel animation, and for completely computer-generated creatures, such as Gollum, The Mummy, King Kong, Davy Jones from Pirates of the Caribbean, the Na'vi from the film Avatar, and Clu from Tron: Legacy . The Great Goblin, the three Stone-trolls, many of the orcs and goblins in the 2012 film The Hobbit: An Unexpected Journey , and Smaug were created using motion capture.

‘’Star Wars: Episode I – The Phantom Menace’’ (1999) was the first feature-length film to include a main character created using motion capture (that character being Jar Jar Binks played by Ahmed Best), and Indian-American film Sinbad: Beyond the Veil of Mists (2000) was the first feature-length film made primarily with motion capture, although many character animators also worked on the film, which had a very limited release. 2001's Final Fantasy: The Spirits Within was the first widely released movie to be made primarily with motion capture technology. Despite its poor box-office intake, supporters of motion capture technology took notice.

The Lord of the Rings: The Two Towers was the first feature film to utilize a real-time motion capture system. This method streamed the actions of actor Andy Serkis into the computer generated skin of Gollum / Smeagol as it was being performed. [16]

Out of the three nominees for the 2006 Academy Award for Best Animated Feature, two of the nominees ( Monster House and the winner Happy Feet ) used motion capture, and only Disney · Pixar's Cars was animated without motion capture. In the ending credits of Pixar's film Ratatouille , a stamp appears labelling the film as "100% Pure Animation – No Motion Capture!"

Since 2001, motion capture is being used extensively to produce films which attempt to simulate or approximate the look of live-action cinema, with nearly photorealistic digital character models. The Polar Express used motion capture to allow Tom Hanks to perform as several distinct digital characters (in which he also provided the voices). The 2007 adaptation of the saga Beowulf animated digital characters whose appearances were based in part on the actors who provided their motions and voices. James Cameron's highly popular Avatar used this technique to create the Na'vi that inhabit Pandora. The Walt Disney Company has produced Robert Zemeckis's A Christmas Carol using this technique. In 2007, Disney acquired Zemeckis' ImageMovers Digital (that produces motion capture films), but then closed it in 2011, after a string of failures.

Television series produced entirely with motion capture animation include Laflaque in Canada, Sprookjesboom and Cafe de Wereld  [ nl ] in The Netherlands, and Headcases in the UK.

Virtual Reality and Augmented Reality providers, such as uSens and Gestigon, allow users to interact with digital content in real time by capturing hand motions. This can be useful for training simulations, visual perception tests, or performing a virtual walk-throughs in a 3D environment. Motion capture technology is frequently used in digital puppetry systems to drive computer generated characters in real-time.

Gait analysis is one application of motion capture in clinical medicine. Techniques allow clinicians to evaluate human motion across several biomechanical factors, often while streaming this information live into analytical software.

Some physical therapy clinics utilize motion capture as an objective way to quantify patient progress. [17]

During the filming of James Cameron's Avatar all of the scenes involving this process were directed in realtime using Autodesk Motion Builder software to render a screen image which allowed the director and the actor to see what they would look like in the movie, making it easier to direct the movie as it would be seen by the viewer. This method allowed views and angles not possible from a pre-rendered animation. Cameron was so proud of his results that he even invited Steven Spielberg and George Lucas on set to view the system in action.

In Marvel's critically acclaimed The Avengers , Mark Ruffalo used motion capture so he could play his character the Hulk, rather than have him be only CGI like previous films, making Ruffalo the first actor to play both the human and the Hulk versions of Bruce Banner.

FaceRig software uses facial recognition technology from ULSee.Inc to map a player's facial expressions and the body tracking technology from Perception Neuron to map the body movement onto a 3D or 2D character's motion onscreen. [18] [19]

During Game Developers Conference 2016 in San Francisco Epic Games demonstrated full-body motion capture live in Unreal Engine. The whole scene, from the upcoming game Hellblade about a woman warrior named Senua, was rendered in real-time. The keynote [20] was a collaboration between Unreal Engine , Ninja Theory , 3Lateral , Cubic Motion , IKinema and Xsens .

Methods and systems

Reflective markers attached to skin to identify bony landmarks and the 3D motion of body segments Kistler plates.jpg
Reflective markers attached to skin to identify bony landmarks and the 3D motion of body segments
Silhouette tracking Silhouette tracking.PNG
Silhouette tracking

Motion tracking or motion capture started as a photogrammetric analysis tool in biomechanics research in the 1970s and 1980s, and expanded into education, training, sports and recently computer animation for television, cinema, and video games as the technology matured. Since the 20th century the performer has to wear markers near each joint to identify the motion by the positions or angles between the markers. Acoustic, inertial, LED, magnetic or reflective markers, or combinations of any of these, are tracked, optimally at least two times the frequency rate of the desired motion. The resolution of the system is important in both the spatial resolution and temporal resolution as motion blur causes almost the same problems as low resolution. Since the beginning of the 21st century and because of the rapid growth of technology new methods were developed. Most modern systems can extract the silhouette of the performer from the background. Afterwards all joint angles are calculated by fitting in a mathematic model into the silhouette. For movements you can't see a change of the silhouette, there are hybrid Systems available who can do both (marker and silhouette), but with less marker.[ citation needed ] In robotics, some motion capture systems are based on simultaneous localization and mapping. [21]

Optical systems

Optical systems utilize data captured from image sensors to triangulate the 3D position of a subject between two or more cameras calibrated to provide overlapping projections. Data acquisition is traditionally implemented using special markers attached to an actor; however, more recent systems are able to generate accurate data by tracking surface features identified dynamically for each particular subject. Tracking a large number of performers or expanding the capture area is accomplished by the addition of more cameras. These systems produce data with three degrees of freedom for each marker, and rotational information must be inferred from the relative orientation of three or more markers; for instance shoulder, elbow and wrist markers providing the angle of the elbow. Newer hybrid systems are combining inertial sensors with optical sensors to reduce occlusion, increase the number of users and improve the ability to track without having to manually clean up data[ citation needed ].

Passive markers

A dancer wearing a suit used in an optical motion capture system MotionCapture.jpg
A dancer wearing a suit used in an optical motion capture system
Markers are placed at specific points on an actor's face during facial optical motion capture. Motion capture facial.jpg
Markers are placed at specific points on an actor's face during facial optical motion capture.

Passive optical system use markers coated with a retroreflective material to reflect light that is generated near the cameras lens. The camera's threshold can be adjusted so only the bright reflective markers will be sampled, ignoring skin and fabric.

The centroid of the marker is estimated as a position within the two-dimensional image that is captured. The grayscale value of each pixel can be used to provide sub-pixel accuracy by finding the centroid of the Gaussian.

An object with markers attached at known positions is used to calibrate the cameras and obtain their positions and the lens distortion of each camera is measured. If two calibrated cameras see a marker, a three-dimensional fix can be obtained. Typically a system will consist of around 2 to 48 cameras. Systems of over three hundred cameras exist to try to reduce marker swap. Extra cameras are required for full coverage around the capture subject and multiple subjects.

Vendors have constraint software to reduce the problem of marker swapping since all passive markers appear identical. Unlike active marker systems and magnetic systems, passive systems do not require the user to wear wires or electronic equipment. [22] Instead, hundreds of rubber balls are attached with reflective tape, which needs to be replaced periodically. The markers are usually attached directly to the skin (as in biomechanics), or they are velcroed to a performer wearing a full body spandex/lycra suit designed specifically for motion capture. This type of system can capture large numbers of markers at frame rates usually around 120 to 160 fps although by lowering the resolution and tracking a smaller region of interest they can track as high as 10000 fps.

Active marker

Active optical systems triangulate positions by illuminating one LED at a time very quickly or multiple LEDs with software to identify them by their relative positions, somewhat akin to celestial navigation. Rather than reflecting light back that is generated externally, the markers themselves are powered to emit their own light. Since inverse square law provides one quarter the power at two times the distance, this can increase the distances and volume for capture. This also enables high signal-to-noise ratio, resulting in very low marker jitter and a resulting high measurement resolution (often down to 0.1 mm within the calibrated volume).

The TV series Stargate SG1 produced episodes using an active optical system for the VFX allowing the actor to walk around props that would make motion capture difficult for other non-active optical systems.[ citation needed ]

ILM used active markers in Van Helsing to allow capture of Dracula's flying brides on very large sets similar to Weta's use of active markers in Rise of the Planet of the Apes . The power to each marker can be provided sequentially in phase with the capture system providing a unique identification of each marker for a given capture frame at a cost to the resultant frame rate. The ability to identify each marker in this manner is useful in realtime applications. The alternative method of identifying markers is to do it algorithmically requiring extra processing of the data.

There are also possibilities to find the position by using coloured LED markers. In these systems, each colour is assigned to a specific point of the body.

One of the earliest active marker systems in the 1980s was a hybrid passive-active mocap system with rotating mirrors and colored glass reflective markers and which used masked linear array detectors.

Time modulated active marker

A high-resolution uniquely identified active marker system with 3,600 x 3,600 resolution at 960 hertz providing real time submillimeter positions Activemarker2.PNG
A high-resolution uniquely identified active marker system with 3,600 × 3,600 resolution at 960 hertz providing real time submillimeter positions

Active marker systems can further be refined by strobing one marker on at a time, or tracking multiple markers over time and modulating the amplitude or pulse width to provide marker ID. 12 megapixel spatial resolution modulated systems show more subtle movements than 4 megapixel optical systems by having both higher spatial and temporal resolution. Directors can see the actors performance in real time, and watch the results on the motion capture driven CG character. The unique marker IDs reduce the turnaround, by eliminating marker swapping and providing much cleaner data than other technologies. LEDs with onboard processing and a radio synchronization allow motion capture outdoors in direct sunlight, while capturing at 120 to 960 frames per second due to a high speed electronic shutter. Computer processing of modulated IDs allows less hand cleanup or filtered results for lower operational costs. This higher accuracy and resolution requires more processing than passive technologies, but the additional processing is done at the camera to improve resolution via a subpixel or centroid processing, providing both high resolution and high speed. These motion capture systems are typically $20,000 for an eight camera, 12 megapixel spatial resolution 120 hertz system with one actor.

IR sensors can compute their location when lit by mobile multi-LED emitters, e.g. in a moving car. With Id per marker, these sensor tags can be worn under clothing and tracked at 500 Hz in broad daylight. PrakashOutdoorMotionCapture.jpg
IR sensors can compute their location when lit by mobile multi-LED emitters, e.g. in a moving car. With Id per marker, these sensor tags can be worn under clothing and tracked at 500 Hz in broad daylight.

Semi-passive imperceptible marker

One can reverse the traditional approach based on high speed cameras. Systems such as Prakash use inexpensive multi-LED high speed projectors. The specially built multi-LED IR projectors optically encode the space. Instead of retro-reflective or active light emitting diode (LED) markers, the system uses photosensitive marker tags to decode the optical signals. By attaching tags with photo sensors to scene points, the tags can compute not only their own locations of each point, but also their own orientation, incident illumination, and reflectance.

These tracking tags work in natural lighting conditions and can be imperceptibly embedded in attire or other objects. The system supports an unlimited number of tags in a scene, with each tag uniquely identified to eliminate marker reacquisition issues. Since the system eliminates a high speed camera and the corresponding high-speed image stream, it requires significantly lower data bandwidth. The tags also provide incident illumination data which can be used to match scene lighting when inserting synthetic elements. The technique appears ideal for on-set motion capture or real-time broadcasting of virtual sets but has yet to be proven.

Underwater motion capture system

Motion capture technology has been available for researchers and scientists for a few decades, which has given new insight into many fields.

Underwater cameras

The vital part of the system, the underwater camera, has a waterproof housing. The housing has a finish that withstands corrosion and chlorine which makes it perfect for use in basins and swimming pools. There are two types of cameras. Industrial high-speed-cameras can also be used as infrared cameras. The infrared underwater cameras comes with a cyan light strobe instead of the typical IR light—for minimum falloff under water and the high-speed-cameras cone with an LED light or with the option of using image processing.

Underwater motion capture camera Oqus underwater.jpg
Underwater motion capture camera
Motion tracking in swimming by using image processing Motion tacking by using image processing.PNG
Motion tracking in swimming by using image processing
Measurement volume

An underwater camera is typically able to measure 15–20 meters depending on the water quality, the camera and the type of marker used. Unsurprisingly, the best range is achieved when the water is clear, and like always, the measurement volume is also dependent on the number of cameras. A range of underwater markers are available for different circumstances.


Different pools require different mountings and fixtures. Therefore, all underwater motion capture systems are uniquely tailored to suit each specific pool installment. For cameras placed in the center of the pool, specially designed tripods, using suction cups, are provided.


Emerging techniques and research in computer vision are leading to the rapid development of the markerless approach to motion capture. Markerless systems such as those developed at Stanford University, the University of Maryland, MIT, and the Max Planck Institute, do not require subjects to wear special equipment for tracking. Special computer algorithms are designed to allow the system to analyze multiple streams of optical input and identify human forms, breaking them down into constituent parts for tracking. ESC entertainment, a subsidiary of Warner Brothers Pictures created specially to enable virtual cinematography, including photorealistic digital look-alikes for filming The Matrix Reloaded and The Matrix Revolutions movies, used a technique called Universal Capture that utilized 7 camera setup and the tracking the optical flow of all pixels over all the 2-D planes of the cameras for motion, gesture and facial expression capture leading to photorealistic results.

Traditional systems

Traditionally markerless optical motion tracking is used to keep track on various objects, including airplanes, launch vehicles, missiles and satellites. Many of such optical motion tracking applications occur outdoors, requiring differing lens and camera configurations. High resolution images of the target being tracked can thereby provide more information than just motion data. The image obtained from NASA's long-range tracking system on space shuttle Challenger's fatal launch provided crucial evidence about the cause of the accident. Optical tracking systems are also used to identify known spacecraft and space debris despite the fact that it has a disadvantage compared to radar in that the objects must be reflecting or emitting sufficient light. [23]

An optical tracking system typically consists of three subsystems: the optical imaging system, the mechanical tracking platform and the tracking computer.

The optical imaging system is responsible for converting the light from the target area into digital image that the tracking computer can process. Depending on the design of the optical tracking system, the optical imaging system can vary from as simple as a standard digital camera to as specialized as an astronomical telescope on the top of a mountain. The specification of the optical imaging system determines the upper-limit of the effective range of the tracking system.

The mechanical tracking platform holds the optical imaging system and is responsible for manipulating the optical imaging system in such a way that it always points to the target being tracked. The dynamics of the mechanical tracking platform combined with the optical imaging system determines the tracking system's ability to keep the lock on a target that changes speed rapidly.

The tracking computer is responsible for capturing the images from the optical imaging system, analyzing the image to extract target position and controlling the mechanical tracking platform to follow the target. There are several challenges. First the tracking computer has to be able to capture the image at a relatively high frame rate. This posts a requirement on the bandwidth of the image capturing hardware. The second challenge is that the image processing software has to be able to extract the target image from its background and calculate its position. Several textbook image processing algorithms are designed for this task. This problem can be simplified if the tracking system can expect certain characteristics that is common in all the targets it will track. The next problem down the line is to control the tracking platform to follow the target. This is a typical control system design problem rather than a challenge, which involves modeling the system dynamics and designing controllers to control it. This will however become a challenge if the tracking platform the system has to work with is not designed for real-time.

The software that runs such systems are also customized for the corresponding hardware components. One example of such software is OpticTracker, which controls computerized telescopes to track moving objects at great distances, such as planes and satellites. Another option is the software SimiShape, which can also be used hybrid in combination with markers.

Non-optical systems

Inertial systems

Inertial motion capture [24] technology is based on miniature inertial sensors, biomechanical models and sensor fusion algorithms. [25] The motion data of the inertial sensors (inertial guidance system) is often transmitted wirelessly to a computer, where the motion is recorded or viewed. Most inertial systems use inertial measurement units (IMUs) containing a combination of gyroscope, magnetometer, and accelerometer, to measure rotational rates. These rotations are translated to a skeleton in the software. Much like optical markers, the more IMU sensors the more natural the data. No external cameras, emitters or markers are needed for relative motions, although they are required to give the absolute position of the user if desired. Inertial motion capture systems capture the full six degrees of freedom body motion of a human in real-time and can give limited direction information if they include a magnetic bearing sensor, although these are much lower resolution and susceptible to electromagnetic noise. Benefits of using Inertial systems include: capturing in a variety of environments including tight spaces, no solving, portability, and large capture areas. Disadvantages include lower positional accuracy and positional drift which can compound over time. These systems are similar to the Wii controllers but are more sensitive and have greater resolution and update rates. They can accurately measure the direction to the ground to within a degree. The popularity of inertial systems is rising amongst game developers, [9] mainly because of the quick and easy set up resulting in a fast pipeline. A range of suits are now available from various manufacturers and base prices range from $1,000 to US$80,000.

Mechanical motion

Mechanical motion capture systems directly track body joint angles and are often referred to as exoskeleton motion capture systems, due to the way the sensors are attached to the body. A performer attaches the skeletal-like structure to their body and as they move so do the articulated mechanical parts, measuring the performer's relative motion. Mechanical motion capture systems are real-time, relatively low-cost, free-of-occlusion, and wireless (untethered) systems that have unlimited capture volume. Typically, they are rigid structures of jointed, straight metal or plastic rods linked together with potentiometers that articulate at the joints of the body. These suits tend to be in the $25,000 to $75,000 range plus an external absolute positioning system. Some suits provide limited force feedback or haptic input.

Magnetic systems

Magnetic systems calculate position and orientation by the relative magnetic flux of three orthogonal coils on both the transmitter and each receiver. [26] The relative intensity of the voltage or current of the three coils allows these systems to calculate both range and orientation by meticulously mapping the tracking volume. The sensor output is 6DOF, which provides useful results obtained with two-thirds the number of markers required in optical systems; one on upper arm and one on lower arm for elbow position and angle.[ citation needed ] The markers are not occluded by nonmetallic objects but are susceptible to magnetic and electrical interference from metal objects in the environment, like rebar (steel reinforcing bars in concrete) or wiring, which affect the magnetic field, and electrical sources such as monitors, lights, cables and computers. The sensor response is nonlinear, especially toward edges of the capture area. The wiring from the sensors tends to preclude extreme performance movements. [26] With magnetic systems, it is possible to monitor the results of a motion capture session in real time. [26] The capture volumes for magnetic systems are dramatically smaller than they are for optical systems. With the magnetic systems, there is a distinction between alternating-current(AC) and direct-current(DC) systems: DC system uses square pulses, AC systems uses sine wave pulse.

Motion capture facial.jpg

Facial motion capture

Most traditional motion capture hardware vendors provide for some type of low resolution facial capture utilizing anywhere from 32 to 300 markers with either an active or passive marker system. All of these solutions are limited by the time it takes to apply the markers, calibrate the positions and process the data. Ultimately the technology also limits their resolution and raw output quality levels.

High fidelity facial motion capture, also known as performance capture, is the next generation of fidelity and is utilized to record the more complex movements in a human face in order to capture higher degrees of emotion. Facial capture is currently arranging itself in several distinct camps, including traditional motion capture data, blend shaped based solutions, capturing the actual topology of an actor's face, and proprietary systems.

The two main techniques are stationary systems with an array of cameras capturing the facial expressions from multiple angles and using software such as the stereo mesh solver from OpenCV to create a 3D surface mesh, or to use light arrays as well to calculate the surface normals from the variance in brightness as the light source, camera position or both are changed. These techniques tend to be only limited in feature resolution by the camera resolution, apparent object size and number of cameras. If the users face is 50 percent of the working area of the camera and a camera has megapixel resolution, then sub millimeter facial motions can be detected by comparing frames. Recent work is focusing on increasing the frame rates and doing optical flow to allow the motions to be retargeted to other computer generated faces, rather than just making a 3D Mesh of the actor and their expressions.

RF positioning

RF (radio frequency) positioning systems are becoming more viable[ citation needed ] as higher frequency RF devices allow greater precision than older RF technologies such as traditional radar. The speed of light is 30 centimeters per nanosecond (billionth of a second), so a 10 gigahertz (billion cycles per second) RF signal enables an accuracy of about 3 centimeters. By measuring amplitude to a quarter wavelength, it is possible to improve the resolution down to about 8 mm. To achieve the resolution of optical systems, frequencies of 50 gigahertz or higher are needed, which are almost as dependant on line of sight and as easy to block as optical systems. Multipath and reradiation of the signal are likely to cause additional problems, but these technologies will be ideal for tracking larger volumes with reasonable accuracy, since the required resolution at 100 meter distances is not likely to be as high. Many RF scientists[ who? ] believe that radio frequency will never produce the accuracy required for motion capture.

Non-traditional systems

An alternative approach was developed where the actor is given an unlimited walking area through the use of a rotating sphere, similar to a hamster ball, which contains internal sensors recording the angular movements, removing the need for external cameras and other equipment. Even though this technology could potentially lead to much lower costs for motion capture, the basic sphere is only capable of recording a single continuous direction. Additional sensors worn on the person would be needed to record anything more.

Another alternative is using a 6DOF (Degrees of freedom) motion platform with an integrated omni-directional treadmill with high resolution optical motion capture to achieve the same effect. The captured person can walk in an unlimited area, negotiating different uneven terrains. Applications include medical rehabilitation for balance training, bio-mechanical research and virtual reality.

See also

Related Research Articles

Visual effects is the process by which imagery is created or manipulated outside the context of a live action shot in film making.

Cave automatic virtual environment

A cave automatic virtual environment is an immersive virtual reality environment where projectors are directed to between three and six of the walls of a room-sized cube. The name is also a reference to the allegory of the Cave in Plato's Republic in which a philosopher contemplates perception, reality and illusion.

Gesture recognition

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current focuses in the field include emotion recognition from face and hand gesture recognition. Users can use simple gestures to control or interact with devices without physically touching them. Many approaches have been made using cameras and computer vision algorithms to interpret sign language. However, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques. Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a richer bridge between machines and humans than primitive text user interfaces or even GUIs, which still limit the majority of input to keyboard and mouse and interact naturally without any mechanical devices. Using the concept of gesture recognition, it is possible to point a finger at this point will move accordingly. This could make conventional input on devices such and even redundant.

In visual effects, match moving is a technique that allows the insertion of computer graphics into live-action footage with correct position, scale, orientation, and motion relative to the photographed objects in the shot. The term is used loosely to describe several different methods of extracting camera motion information from a motion picture. Sometimes referred to as motion tracking or camera solving, match moving is related to rotoscoping and photogrammetry. Match moving is sometimes confused with motion capture, which records the motion of objects, often human actors, rather than the camera. Typically, motion capture requires special cameras and sensors and a controlled environment. Match moving is also distinct from motion control photography, which uses mechanical hardware to execute multiple identical camera moves. Match moving, by contrast, is typically a software-based technology, applied after the fact to normal footage recorded in uncontrolled environments with an ordinary camera.

Wired glove

A wired glove is an input device for human–computer interaction worn like a glove.

Computer facial animation is primarily an area of computer graphics that encapsulates methods and techniques for generating and animating images or models of a character face. The character can be a human, a humanoid, an animal, a fantasy creature or character, etc. Due to its subject and output type, it is also related to many other scientific and artistic fields from psychology to traditional animation. The importance of human faces in verbal and non-verbal communication and advances in computer graphics hardware and software have caused considerable scientific, technological, and artistic interests in computer facial animation.

Virtual cinematography

Virtual cinematography is the set of cinematographic techniques performed in a computer graphics environment. This includes a wide variety of subjects like photographing real objects, often with stereo or multi-camera setup, for the purpose of recreating them as three-dimensional objects and algorithms for automated creation of real and simulated camera angles.

Facial motion capture is the process of electronically converting the movements of a person's face into a digital database using cameras or laser scanners. This database may then be used to produce CG computer animation for movies, games, or real-time avatars. Because the motion of CG characters is derived from the movements of real people, it results in more realistic and nuanced computer character animation than if the animation were created manually.

Motion analysis is used in computer vision, image processing, high-speed photography and machine vision that studies methods and applications in which two or more consecutive images from an image sequences, e.g., produced by a video camera or high-speed camera, are processed to produce information based on the apparent motion in the images. In some applications, the camera is fixed relative to the scene and objects are moving around in the scene, in some applications the scene is more or less fixed and the camera is moving, and in some cases both the camera and the scene are moving.

Visual odometry

In robotics and computer vision, visual odometry is the process of determining the position and orientation of a robot by analyzing the associated camera images. It has been used in a wide variety of robotic applications, such as on the Mars Exploration Rovers.

A Hand-Over is a term used in the animation industry to refer to the process of adding finger and hand motion capture data to the pre-existing full-body motion capture data, using a hand motion capture device.

In 3D user interaction (3DUI) the human interacts with a computer or other device with an aspect of three-dimensional space. This interaction is created thanks to the interfaces, which will be the intermediaries between human and machine.

Inertial navigation system navigation aid relying on systems contained within the vehicle to determine location

An inertial navigation system (INS) is a navigation device that uses a computer, motion sensors (accelerometers) and rotation sensors (gyroscopes) to continuously calculate by dead reckoning the position, the orientation, and the velocity of a moving object without the need for external references. Often the inertial sensors are supplemented by a barometric altimeter and occasionally by magnetic sensors (magnetometers) and/or speed measuring devices. INSs are used on vehicles such as ships, aircraft, submarines, guided missiles, and spacecraft. Other terms used to refer to inertial navigation systems or closely related devices include inertial guidance system, inertial instrument, inertial measurement unit (IMU) and many other variations. Older INS systems generally used an inertial platform as their mounting point to the vehicle and the terms are sometimes considered synonymous.

Positioning systems will use positioning technology to determine the position and orientation of an object or person in a room, building or in the world.

Finger tracking

In the field of gesture recognition and image processing, finger tracking is a high-resolution technique that is employed to know the consecutive position of the fingers of the user and hence represent objects in 3D. In addition to that, the finger tracking technique is used as a tool of the computer, acting as an external device in our computer, similar to a keyboard and a mouse.

The history of computer animation began as early as the 1940s and 1950s, when people began to experiment with computer graphics - most notably by John Whitney. It was only by the early 1960s when digital computers had become widely established, that new avenues for innovative computer graphics blossomed. Initially, uses were mainly for scientific, engineering and other research purposes, but artistic experimentation began to make its appearance by the mid-1960s. By the mid-1970s, many such efforts were beginning to enter into public media. Much computer graphics at this time involved 2-dimensional imagery, though increasingly, as computer power improved, efforts to achieve 3-dimensional realism became the emphasis. By the late 1980s, photo-realistic 3D was beginning to appear in film movies, and by mid-1990s had developed to the point where 3D animation could be used for entire feature film production.

Faceware Technologies is an American company that designs facial animation and motion capture technology. The company was established under Image Metrics and became its own company at the beginning of 2012.

X-ray motion analysis is a technique used to track the movement of objects using X-rays. This is done by placing the subject to be imaged in the center of the X-ray beam and recording the motion using an image intensifier and a high-speed camera, allowing for high quality videos sampled many times per second. Depending on the settings of the X-rays, this technique can visualize specific structures in an object, such as bones or cartilage. X-ray motion analysis can be used to perform gait analysis, analyze joint movement, or record the motion of bones obscured by soft tissue. The ability to measure skeletal motions is a key aspect to one's understanding of vertebrate biomechanics, energetics, and motor control.

Positional tracking

Positional tracking detects the precise position of the head-mounted displays, controllers, other objects or body parts within Euclidean space. Positional tracking registers the exact position due to recognition of the rotation and recording of the translational movements. Since virtual reality is about emulating and altering reality it’s important that we can track accurately how objects move in real life in order to represent them inside VR. Defining the position and orientation of a real object in space is determined with the help of special sensors or markers. Sensors record the signal from the real object when it moves or is moved and transmit the received information to the computer.


  1. Goebl, W.; Palmer, C. (2013). Balasubramaniam, Ramesh (ed.). "Temporal Control and Hand Movement Efficiency in Skilled Music Performance". PLoS ONE. 8 (1): e50901. Bibcode:2013PLoSO...850901G. doi:10.1371/journal.pone.0050901. PMC   3536780 . PMID   23300946.
  2. David Noonan, Peter Mountney, Daniel Elson, Ara Darzi and Guang-Zhong Yang. A Stereoscopic Fibroscope for Camera Motion and 3D Depth Recovery During Minimally Invasive Surgery. In proc ICRA 2009, pp. 4463-4468. <>
  3. Yamane, Katsu, and Jessica Hodgins. "Simultaneous tracking and balancing of humanoid robots for imitating human motion capture data." Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on. IEEE, 2009.
  4. NY Castings, Joe Gatt, Motion Capture Actors: Body Movement Tells the Story Archived 2014-07-03 at the Wayback Machine , Accessed June 21, 2014
  5. Andrew Harris Salomon, Feb. 22, 2013, Backstage Magazine, Growth In Performance Capture Helping Gaming Actors Weather Slump, Accessed June 21, 2014, "..But developments in motion-capture technology, as well as new gaming consoles expected from Sony and Microsoft within the year, indicate that this niche continues to be a growth area for actors. And for those who have thought about breaking in, the message is clear: Get busy...."
  6. Ben Child, 12 August 2011, The Guardian, Andy Serkis: why won't Oscars go ape over motion-capture acting? Star of Rise of the Planet of the Apes says performance capture is misunderstood and its actors deserve more respect, Accessed June 21, 2014
  7. Hugh Hart, January 24, 2012, Wired magazine, When will a motion capture actor win an Oscar?, Accessed June 21, 2014, "...the Academy of Motion Picture Arts and Sciences’ historic reluctance to honor motion-capture performances .. Serkis, garbed in a sensor-embedded Lycra body suit, quickly mastered the then-novel art and science of performance-capture acting. ..."
  8. Cheung, German KM, et al. "A real time system for robust 3D voxel reconstruction of human motions." Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on. Vol. 2. IEEE, 2000.
  9. 1 2 "Xsens MVN Animate - Products". Xsens 3D motion tracking. Retrieved 2019-01-22.
  10. "The Next Generation 1996 Lexicon A to Z: Motion Capture". Next Generation . No. 15. Imagine Media. March 1996. p. 37.
  11. "Motion Capture". Next Generation . Imagine Media (10): 50. October 1995.
  12. Jon Radoff, Anatomy of an MMORPG, "Archived copy". Archived from the original on 2009-12-13. Retrieved 2009-11-30.CS1 maint: Archived copy as title (link)
  13. 1 2 "Hooray for Hollywood! Acclaim Studios". GamePro . IDG (82): 28–29. July 1995.
  14. Wawro, Alex (October 23, 2014). "Yu Suzuki Recalls Using Military Tech to Make Virtua Fighter 2". Gamasutra . Retrieved 18 August 2016.
  15. "History of Motion Capture". Retrieved 2013-08-10.
  16. Savage, Annaliza (12 July 2012). "Gollum Actor: How New Motion-Capture Tech Improved The Hobbit". Wired . Retrieved 29 January 2017.
  17. "Markerless Motion Capture | EuMotus". Markerless Motion Capture | EuMotus. Retrieved 2018-10-12.
  18. Corriea, Alexa Ray (30 June 2014). "This facial recognition software lets you be Octodad" . Retrieved 4 January 2017 via
  19. Plunkett, Luke. "Turn Your Human Face Into A Video Game Character". Retrieved 4 January 2017.
  20. "Put your (digital) game face on". 24 April 2016. Retrieved 4 January 2017.
  21. Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D SLAM systems." Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012.
  22. "Motion Capture: Optical Systems". Next Generation . Imagine Media (10): 53. October 1995.
  23. Veis, G. (1963). "Optical tracking of artificial satellites". Space Science Reviews. 2 (2): 250–296. Bibcode:1963SSRv....2..250V. doi:10.1007/BF00216781.
  24. "Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors" (PDF).
  25. "A history of motion capture". Xsens 3D motion tracking. Retrieved 2019-01-22.
  26. 1 2 3 "Motion Capture: Magnetic Systems". Next Generation . Imagine Media (10): 51. October 1995.