Motion analysis

Last updated

Motion analysis is used in computer vision, image processing, high-speed photography and machine vision that studies methods and applications in which two or more consecutive images from an image sequences, e.g., produced by a video camera or high-speed camera, are processed to produce information based on the apparent motion in the images. In some applications, the camera is fixed relative to the scene and objects are moving around in the scene, in some applications the scene is more or less fixed and the camera is moving, and in some cases both the camera and the scene are moving.

Contents

The motion analysis processing can in the simplest case be to detect motion, i.e., find the points in the image where something is moving. More complex types of processing can be to track a specific object in the image over time, to group points that belong to the same rigid object that is moving in the scene, or to determine the magnitude and direction of the motion of every point in the image. The information that is produced is often related to a specific image in the sequence, corresponding to a specific time-point, but then depends also on the neighboring images. This means that motion analysis can produce time-dependent information about motion.

Applications of motion analysis can be found in rather diverse areas, such as surveillance, medicine, film industry, automotive crash safety, [1] ballistic firearm studies, [2] biological science, [3] flame propagation, [4] and navigation of autonomous vehicles to name a few examples.

Background

Principle of a pinhole camera. Light rays from an object pass through a small hole to form an image. Pinhole-camera.svg
Principle of a pinhole camera. Light rays from an object pass through a small hole to form an image.
The motion field that corresponds to the relative motion of some 3D point. Motionfield.svg
The motion field that corresponds to the relative motion of some 3D point.

A video camera can be seen as an approximation of a pinhole camera, which means that each point in the image is illuminated by some (normally one) point in the scene in front of the camera, usually by means of light that the scene point reflects from a light source. Each visible point in the scene is projected along a straight line that passes through the camera aperture and intersects the image plane. This means that at a specific point in time, each point in the image refers to a specific point in the scene. This scene point has a position relative to the camera, and if this relative position changes, it corresponds to a relative motion in 3D. It is a relative motion since it does not matter if it is the scene point, or the camera, or both, that are moving. It is only when there is a change in the relative position that the camera is able to detect that some motion has happened. By projecting the relative 3D motion of all visible points back into the image, the result is the motion field , describing the apparent motion of each image point in terms of a magnitude and direction of velocity of that point in the image plane. A consequence of this observation is that if the relative 3D motion of some scene points are along their projection lines, the corresponding apparent motion is zero.

The camera measures the intensity of light at each image point, a light field. In practice, a digital camera measures this light field at discrete points, pixels, but given that the pixels are sufficiently dense, the pixel intensities can be used to represent most characteristics of the light field that falls onto the image plane. A common assumption of motion analysis is that the light reflected from the scene points does not vary over time. As a consequence, if an intensity I has been observed at some point in the image, at some point in time, the same intensity I will be observed at a position that is displaced relative to the first one as a consequence of the apparent motion. Another common assumption is that there is a fair amount of variation in the detected intensity over the pixels in an image. A consequence of this assumption is that if the scene point that corresponds to a certain pixel in the image has a relative 3D motion, then the pixel intensity is likely to change over time.

Methods

Motion detection

One of the simplest type of motion analysis is to detect image points that refer to moving points in the scene. The typical result of this processing is a binary image where all image points (pixels) that relate to moving points in the scene are set to 1 and all other points are set to 0. This binary image is then further processed, e.g., to remove noise, group neighboring pixels, and label objects. Motion detection can be done using several methods; the two main groups are differential methods and methods based on background segmentation.

Applications

Human motion analysis

In the areas of medicine, sports, [5] video surveillance, physical therapy, [6] and kinesiology, [7] human motion analysis has become an investigative and diagnostic tool. See the section on motion capture for more detail on the technologies. Human motion analysis can be divided into three categories: human activity recognition, human motion tracking, and analysis of body and body part movement.

Human activity recognition is most commonly used for video surveillance, specifically automatic motion monitoring for security purposes. Most efforts in this area rely on state-space approaches, in which sequences of static postures are statistically analyzed and compared to modeled movements. Template-matching is an alternative method whereby static shape patterns are compared to pre-existing prototypes. [8]

Human motion tracking can be performed in two or three dimensions. Depending on the complexity of analysis, representations of the human body range from basic stick figures to volumetric models. Tracking relies on the correspondence of image features between consecutive frames of video, taking into consideration information such as position, color, shape, and texture. Edge detection can be performed by comparing the color and/or contrast of adjacent pixels, looking specifically for discontinuities or rapid changes. [9] Three-dimensional tracking is fundamentally identical to two-dimensional tracking, with the added factor of spatial calibration. [8]

Motion analysis of body parts is critical in the medical field. In postural and gait analysis, joint angles are used to track the location and orientation of body parts. Gait analysis is also used in sports to optimize athletic performance or to identify motions that may cause injury or strain. Tracking software that does not require the use of optical markers is especially important in these fields, where the use of markers may impede natural movement. [8] [10]

Motion analysis in manufacturing

Motion analysis is also applicable in the manufacturing process. [11] Using high speed video cameras and motion analysis software, one can monitor and analyze assembly lines and production machines to detect inefficiencies or malfunctions. Manufacturers of sports equipment, such as baseball bats and hockey sticks, also use high speed video analysis to study the impact of projectiles. An experimental setup for this type of study typically uses a triggering device, external sensors (e.g., accelerometers, strain gauges), data acquisition modules, a high-speed camera, and a computer for storing the synchronized video and data. Motion analysis software calculates parameters such as distance, velocity, acceleration, and deformation angles as functions of time. This data is then used to design equipment for optimal performance. [12]

Additional applications for motion analysis

The object and feature detecting capabilities of motion analysis software can be applied to count and track particles, such as bacteria, [13] [14] viruses, [15] "ionic polymer-metal composites", [16] [17] micron-sized polystyrene beads, [18] aphids, [19] and projectiles. [20]

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Chroma key</span> Compositing technique, also known as green screen

Chroma key compositing, or chroma keying, is a visual-effects and post-production technique for compositing (layering) two or more images or video streams together based on colour hues. The technique has been used in many fields to remove a background from the subject of a photo or video – particularly the newscasting, motion picture, and video game industries. A colour range in the foreground footage is made transparent, allowing separately filmed background footage or a static image to be inserted into the scene. The chroma keying technique is commonly used in video production and post-production. This technique is also referred to as colour keying, colour-separation overlay, or by various terms for specific colour-related variants such as green screen or blue screen; chroma keying can be done with backgrounds of any colour that are uniform and distinct, but green and blue backgrounds are more commonly used because they differ most distinctly in hue from any human skin colour. No part of the subject being filmed or photographed may duplicate the colour used as the backing, or the part may be erroneously identified as part of the backing.

<span class="mw-page-title-main">Motion blur</span> Photography artifact from moving objects

Motion blur is the apparent streaking of moving objects in a photograph or a sequence of frames, such as a film or animation. It results when the image being recorded changes during the recording of a single exposure, due to rapid movement or long exposure.

<span class="mw-page-title-main">Motion capture</span> Process of recording the movement of objects or people

Motion capture is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robots. In filmmaking and video game development, it refers to recording actions of human actors and using that information to animate digital character models in 2D or 3D computer animation. When it includes face and fingers or captures subtle expressions, it is often referred to as performance capture. In many fields, motion capture is sometimes called motion tracking, but in filmmaking and games, motion tracking usually refers more to match moving.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Motion detection is the process of detecting a change in the position of an object relative to its surroundings or a change in the surroundings relative to an object. It can be achieved by either mechanical or electronic methods. When it is done by natural organisms, it is called motion perception.

A high-speed camera is a device capable of capturing moving images with exposures of less than 1/1,000 second or frame rates in excess of 250 fps. It is used for recording fast-moving objects as photographic images onto a storage medium. After recording, the images stored on the medium can be played back in slow motion. Early high-speed cameras used film to record the high-speed events, but were superseded by entirely electronic devices using an image sensor, recording, typically, over 1,000 fps onto DRAM, to be played back slowly to study the motion for scientific study of transient phenomena.

<span class="mw-page-title-main">Gesture recognition</span> Topic in computer science and language technology

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. It is a subdiscipline of computer vision. Gestures can originate from any bodily motion or state, but commonly originate from the face or hand. Focuses in the field include emotion recognition from face and hand gesture recognition since they are all expressions. Users can make simple gestures to control or interact with devices without physically touching them. Many approaches have been made using cameras and computer vision algorithms to interpret sign language, however, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques. Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a better bridge between machines and humans than older text user interfaces or even GUIs, which still limit the majority of input to keyboard and mouse and interact naturally without any mechanical devices.

In visual effects, match moving is a technique that allows the insertion of computer graphics into live-action footage with correct position, scale, orientation, and motion relative to the photographed objects in the shot. The term is used loosely to describe several different methods of extracting camera motion information from a motion picture. Sometimes referred to as motion tracking or camera solving, match moving is related to rotoscoping and photogrammetry. Match moving is sometimes confused with motion capture, which records the motion of objects, often human actors, rather than the camera. Typically, motion capture requires special cameras and sensors and a controlled environment. Match moving is also distinct from motion control photography, which uses mechanical hardware to execute multiple identical camera moves. Match moving, by contrast, is typically a software-based technology, applied after the fact to normal footage recorded in uncontrolled environments with an ordinary camera.

<span class="mw-page-title-main">Motion estimation</span> Process used in video coding/compression

Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom.

<span class="mw-page-title-main">Image stitching</span> Combining multiple photographic images with overlapping fields of view

Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally.

In the fields of computing and computer vision, pose represents the position and orientation of an object, usually in three dimensions. Poses are often stored internally as transformation matrices. The term “pose” is largely synonymous with the term “transform”, but a transform may often include scale, whereas pose does not.

The following are common definitions related to the machine vision field.

Range imaging is the name for a collection of techniques that are used to produce a 2D image showing the distance to points in a scene from a specific point, normally associated with some type of sensor device.

<span class="mw-page-title-main">Rolling shutter</span> Image capture method

Rolling shutter is a method of image capture in which a still picture or each frame of a video is captured not by taking a snapshot of the entire scene at a single instant in time but rather by scanning across the scene rapidly, vertically, horizontally or rotationally. In other words, not all parts of the image of the scene are recorded at exactly the same instant. This produces predictable distortions of fast-moving objects or rapid flashes of light. This is in contrast with "global shutter" in which the entire frame is captured at the same instant.

<span class="mw-page-title-main">Visual odometry</span> Determining the position and orientation of a robot by analyzing associated camera images

In robotics and computer vision, visual odometry is the process of determining the position and orientation of a robot by analyzing the associated camera images. It has been used in a wide variety of robotic applications, such as on the Mars Exploration Rovers.

2D to 3D video conversion is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.

<span class="mw-page-title-main">Foreground detection</span>

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

An event camera, also known as a neuromorphic camera, silicon retina or dynamic vision sensor, is an imaging sensor that responds to local changes in brightness. Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.

References

  1. Munsch, Marie. "Lateral Glazing Characterization Under Head Impact:experimental and Numerical Investigation" (PDF). Retrieved 20 December 2013.
  2. "Handgun Wounding Effects Due to Bullet Rotational Velocity" (PDF). Archived from the original (PDF) on 22 December 2013. Retrieved 18 February 2013.
  3. Anderson first Christopher V. (2010). "Ballistic tongue projection in chameleons maintains high performance at low temperature" (PDF). Proceedings of the National Academy of Sciences of the United States of America. Department of Integrative Biology, University of South Florida, Tampa, FL 33620, PNAS March 23, 2010 vol. 107 no. 12 5495–5499. 107 (12): 5495–9. Bibcode:2010PNAS..107.5495A. doi: 10.1073/pnas.0910778107 . PMC   2851764 . PMID   20212130 . Retrieved 2 June 2010.
  4. Mogi, Toshio. "Self-ignition and flame propagation of high-pressure hydrogen jet during sudden discharge from a pipers" (PDF). International Journal of Hydrogen Energy 34 ( 2009 ) 5810 – 5816. Retrieved 28 April 2009.
  5. Payton, Carl J. "Biomechanical Evaluation of Movement in Sport and Exercise" (PDF). Archived from the original (PDF) on 2014-01-08. Retrieved 8 January 2014.
  6. "Markerless Motion Capture + Motion Analysis | EuMotus". www.eumotus.com. Retrieved 2018-03-25.
  7. Hedrick, Tyson L. (2011). "Morphological and kinematic basis of the hummingbird flight stroke: scaling of flight muscle transmission ratio". Proceedings. Biological Sciences. 279 (1735): 1986–1992. doi:10.1098/rspb.2011.2238. PMC   3311889 . PMID   22171086.
  8. 1 2 3 Aggarwal, JK and Q Cai. "Human Motion Analysis: A Review." Computer Vision and Image Understanding 73, no. 3 (1999): 428-440.
  9. Fan, J, EA El-Kwae, M-S Hacid, and F Liang. "Novel tracking-based moving object extraction algorithm." J Electron Imaging 11, 393 (2002).
  10. Green, RD, L Guan, and JA Burne. "Video analysis of gait for diagnosing movement disorders." J Electron Imaging 9, 16 (2000).
  11. Longana, M.L. "High-strain rate imaging & full-field optical techniques for material characterization" (PDF). Archived from the original (PDF) on January 8, 2014. Retrieved Nov 22, 2012.
  12. Masi, CG. "Vision improves bat performance." Vision Systems Design. June 2006
  13. Borrok, M. J., et al. (2009). Structure-based design of a periplasmic binding protein antagonist that prevents domain closure. ACS Chemical Biology, 4, 447-456.
  14. Borrok, M. J., Kolonko, E. M., and Kiessling, L. L. (2008). Chemical probes of bacterial signal transduction reveal that repellents stabilize and attractants destabilize the chemoreceptor array. ACS Chemical Biology, 3, 101-109.
  15. Shopov, A. et al. "Improvements in image analysis and fluorescence microscopy to discriminate and enumerate bacteria and viruses in aquatic samples, or cells, and to analyze sprays and fragmenting debris." Aquatic Microbial Ecology 22 (2000): 103-110.
  16. Park, J. K., and Moore, R. B. (2009). Influence of ordered morphology on the anisotropic actuation in uniaxially oriented electroactive polymer systems. ACS Applied Materials & Interfaces, 1, 697-702.
  17. Phillips, A. K., and Moore, R. B. (2005). Ionic actuators based on novel sulfonated ethylene vinyl alcohol copolymer membranes. Polymer, 46, 7788-7802.
  18. Nott, M. (2005). Teaching Brownian motion: demonstrations and role play. School Science Review, 86, 18-28.
  19. Kay, S., and Steinkraus, D. C. (2005). Effect of Neozygites fresenii infection on cotton aphid movement. AAES Research Series 543, 245-248. Fayetteville, AR: Arkansas Agricultural Experiment Station. Available from http://arkansasagnews.uark.edu/543-43.pdf
  20. Sparks, C. et al. "Comparison and Validation of Smooth Particle Hydrodynamics (SPH) and Coupled Euler Lagrange (CEL) Techniques for Modeling Hydrodynamic Ram." 46th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Austin, Texas, Apr. 18-21, 2005.