Structure from motion

Last updated

Structure from motion (SfM) [1] is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. It is studied in the fields of computer vision and visual perception.

Contents

Principle

Digital surface model of motorway interchange construction site DSM construction site.jpg
Digital surface model of motorway interchange construction site
Real photo x SfM with texture color x SfM with simple shader. Made with Python Photogrammetry Toolbox GUI and rendered in Blender with Cycles. SfM PPT GUI vs PHOTO.png
Real photo x SfM with texture color x SfM with simple shader. Made with Python Photogrammetry Toolbox GUI and rendered in Blender with Cycles.
Bezmiechowa airfield 3D digital surface model extracted from data collected during 30min flight of Pteryx UAV Bezmiechowa DSM 3D 2010-05-29 Pteryx UAV.jpg
Bezmiechowa airfield 3D digital surface model extracted from data collected during 30min flight of Pteryx UAV

Humans perceive a great deal of information about the three-dimensional structure in their environment by moving around it. When the observer moves, objects around them move different amounts depending on their distance from the observer. This is known as motion parallax, and from this depth information can be used to generate an accurate 3D representation of the world around them. [2]

Finding structure from motion presents a similar problem to finding structure from stereo vision. In both instances, the correspondence between images and the reconstruction of 3D object needs to be found.

To find correspondence between images, features such as corner points (edges with gradients in multiple directions) are tracked from one image to the next. One of the most widely used feature detectors is the scale-invariant feature transform (SIFT). It uses the maxima from a difference-of-Gaussians (DOG) pyramid as features. The first step in SIFT is finding a dominant gradient direction. To make it rotation-invariant, the descriptor is rotated to fit this orientation. [3] Another common feature detector is the SURF (speeded-up robust features). [4] In SURF, the DOG is replaced with a Hessian matrix-based blob detector. Also, instead of evaluating the gradient histograms, SURF computes for the sums of gradient components and the sums of their absolute values. [5] Its usage of integral images allows the features to be detected extremely quickly with high detection rate. [6] Therefore, comparing to SIFT, SURF is a faster feature detector with drawback of less accuracy in feature positions. [5] Another type of feature recently made practical for structure from motion are general curves (e.g., locally an edge with gradients in one direction), part of a technology known as pointless SfM, [7] [8] useful when point features are insufficient, common in man-made environments. [9]

The features detected from all the images will then be matched. One of the matching algorithms that track features from one image to another is the Lucas–Kanade tracker. [10]

Sometimes some of the matched features are incorrectly matched. This is why the matches should also be filtered. RANSAC (random sample consensus) is the algorithm that is usually used to remove the outlier correspondences. In the paper of Fischler and Bolles, RANSAC is used to solve the location determination problem (LDP), where the objective is to determine the points in space that project onto an image into a set of landmarks with known locations. [11]

The feature trajectories over time are then used to reconstruct their 3D positions and the camera's motion. [12] An alternative is given by so-called direct approaches, where geometric information (3D structure and camera motion) is directly estimated from the images, without intermediate abstraction to features or corners. [13]

There are several approaches to structure from motion. In incremental SfM, [14] camera poses are solved for and added one by one to the collection. In global SfM, [15] [16] the poses of all cameras are solved for at the same time. A somewhat intermediate approach is out-of-core SfM, where several partial reconstructions are computed that are then integrated into a global solution.

Applications

Geosciences

Structure-from-motion photogrammetry with multi-view stereo provides hyperscale landform models using images acquired from a range of digital cameras and optionally a network of ground control points. The technique is not limited in temporal frequency and can provide point cloud data comparable in density and accuracy to those generated by terrestrial and airborne laser scanning at a fraction of the cost. [17] [18] [19] Structure from motion is also useful in remote or rugged environments where terrestrial laser scanning is limited by equipment portability and airborne laser scanning is limited by terrain roughness causing loss of data and image foreshortening. The technique has been applied in many settings such as rivers, [20] badlands, [21] sandy coastlines, [22] [23] fault zones, [24] landslides, [25] and coral reef settings. [26] SfM has been also successfully applied for the assessment of large wood accumulation volume [27] and porosity [28] in fluvial systems, as well as for the characterization of rock masses through the determination of some properties as the orientation, persistence, etc. of discontinuities. [29] [30] A full range of digital cameras can be utilized, including digital SLR's, compact digital cameras and even smart phones. Generally though, higher accuracy data will be achieved with more expensive cameras, which include lenses of higher optical quality. The technique therefore offers exciting opportunities to characterize surface topography in unprecedented detail and, with multi-temporal data, to detect elevation, position and volumetric changes that are symptomatic of earth surface processes. Structure from motion can be placed in the context of other digital surveying methods.

Cultural heritage

Cultural heritage is present everywhere. Its structural control, documentation and conservation is one of humanity's main duties (UNESCO). Under this point of view, SfM is used in order to properly estimate situations as well as planning and maintenance efforts and costs, control and restoration. Because serious constraints often exist connected to the accessibility of the site and impossibility to install invasive surveying pillars that did not permit the use of traditional surveying routines (like total stations), SfM provides a non-invasive approach for the structure, without the direct interaction between the structure and any operator. The use is accurate as only qualitative considerations are needed. It is fast enough to respond to the monument’s immediate management needs. [31] The first operational phase is an accurate preparation of the photogrammetric surveying where is established the relation between best distance from the object, focal length, the ground sampling distance (GSD) and the sensor’s resolution. With this information the programmed photographic acquisitions must be made using vertical overlapping of at least 60% (figure 02). [32]

Furthermore, structure-from-motion photogrammetry represents a non-invasive, highly flexible and low-cost methodology to digitalize historical documents. [33]

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Photogrammetry</span> Taking measurements using photography

Photogrammetry is the science and technology of obtaining reliable information about physical objects and the environment through the process of recording, measuring and interpreting photographic images and patterns of electromagnetic radiant imagery and other phenomena.

Super-resolution imaging (SR) is a class of techniques that enhance (increase) the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced.

<span class="mw-page-title-main">3D scanning</span> Scanning of an object or environment to collect data on its shape

3D scanning is the process of analyzing a real-world object or environment to collect three dimensional data of its shape and possibly its appearance. The collected data can then be used to construct digital 3D models.

The correspondence problem refers to the problem of ascertaining which parts of one image correspond to which parts of another image, where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photos.

<span class="mw-page-title-main">Takeo Kanade</span> Japanese computer scientist

Takeo Kanade is a Japanese computer scientist and one of the world's foremost researchers in computer vision. He is U.A. and Helen Whitaker Professor at Carnegie Mellon School of Computer Science. He has approximately 300 peer-reviewed academic publications and holds around 20 patents.

Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.

<span class="mw-page-title-main">Bundle adjustment</span>

In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s) employed to acquire the images, given a set of images depicting a number of 3D points from different viewpoints. Its name refers to the geometrical bundles of light rays originating from each 3D feature and converging on each camera's optical center, which are adjusted optimally according to an optimality criterion involving the corresponding image projections of all points.

<span class="mw-page-title-main">3D reconstruction</span> Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

A structured-light 3D scanner is a 3D scanning device for measuring the three-dimensional shape of an object using projected light patterns and a camera system.

<span class="mw-page-title-main">Visual odometry</span> Determining the position and orientation of a robot by analyzing associated camera images

In robotics and computer vision, visual odometry is the process of determining the position and orientation of a robot by analyzing the associated camera images. It has been used in a wide variety of robotic applications, such as on the Mars Exploration Rovers.

The Conference on Computer Vision and Pattern Recognition (CVPR) is an annual conference on computer vision and pattern recognition, which is regarded as one of the most important conferences in its field. According to Google Scholar Metrics (2022), it is the highest impact computing venue.

The Institute of Photogrammetry and GeoInformation (IPI) is a research institute that is part of the consortium of institutes operating under the aegis of Leibniz University situated in Hannover, Germany. The current research at IPI focuses both on terrestrial and extraterrestrial image interpretation. The basic themes of research revolve around computer vision, 3D geometry, image processing and machine learning. IPI contributes regularly with state-of-the-art methods to interpret high resolution images received from the HRSC probe of the Mars Express mission.

<span class="mw-page-title-main">PhotoModeler</span> Software application

PhotoModeler is a software application that performs image-based modeling and close range photogrammetry – producing 3D models and measurements from photography. The software is used for close-range, aerial and uav photogrammetry.

<span class="mw-page-title-main">3D reconstruction from multiple images</span> Creation of a 3D model from a set of images

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

<span class="mw-page-title-main">Digital outcrop model</span> Digital 3D representation of the outcrop surface

A digital outcrop model (DOM), also called a virtual outcrop model, is a digital 3D representation of the outcrop surface, mostly in a form of textured polygon mesh.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting.

<span class="mw-page-title-main">Michael J. Black</span> American-born computer scientist

Michael J. Black is an American-born computer scientist working in Tübingen, Germany. He is a founding director at the Max Planck Institute for Intelligent Systems where he leads the Perceiving Systems Department in research focused on computer vision, machine learning, and computer graphics. He is also an Honorary Professor at the University of Tübingen.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

References

  1. S. Ullman (1979). "The interpretation of structure from motion" (PDF). Proceedings of the Royal Society of London. 203 (1153): 405–426. Bibcode:1979RSPSB.203..405U. doi:10.1098/rspb.1979.0006. hdl: 1721.1/6298 . PMID   34162. S2CID   11995230.
  2. Linda G. Shapiro; George C. Stockman (2001). Computer Vision. Prentice Hall. ISBN   978-0-13-030796-5.
  3. D. G. Lowe (2004). "Distinctive image features from scale-invariant keypoints". International Journal of Computer Vision. 60 (2): 91–110. CiteSeerX   10.1.1.73.2924 . doi:10.1023/b:visi.0000029664.99615.94. S2CID   221242327.
  4. H. Bay; T. Tuytelaars & L. Van Gool (2006). "Surf: Speeded up robust features". 9th European Conference on Computer Vision.
  5. 1 2 K. Häming & G. Peters (2010). "The structure-from-motion reconstruction pipeline – a survey with focus on short image sequences". Kybernetika. 46 (5): 926–937.
  6. Viola, P.; Jones, M. (2001). "Rapid object detection using a boosted cascade of simple features". Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. Vol. 1. Kauai, HI, USA: IEEE Comput. Soc. pp. I–511–I-518. doi:10.1109/CVPR.2001.990517. ISBN   978-0-7695-1272-3. S2CID   2715202.
  7. Nurutdinova, Andrew; Fitzgibbon, Andrew (2015). "Towards Pointless Structure from Motion: 3D Reconstruction and Camera Parameters from General 3D Curves" (PDF). 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2363–2371. doi:10.1109/ICCV.2015.272. ISBN   978-1-4673-8391-2. S2CID   9120123.
  8. Fabbri, Ricardo; Giblin, Peter; Kimia, Benjamin (2012). "Camera Pose Estimation Using First-Order Curve Differential Geometry". Computer Vision – ECCV 2012 (PDF). Lecture Notes in Computer Science. Vol. 7575. pp. 231–244. doi:10.1007/978-3-642-33765-9_17. ISBN   978-3-642-33764-2. S2CID   15402824.
  9. Apple, ARKIT team (2018). "Understanding ARKit Tracking and Detection". WWDC.
  10. B. D. Lucas & T. Kanade. "An iterative image registration technique with an application to stereo vision". Ijcai81.
  11. M. A. Fischler & R. C. Bolles (1981). "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography". Commun. ACM. 24 (6): 381–395. doi: 10.1145/358669.358692 . S2CID   972888.
  12. F. Dellaert; S. Seitz; C. Thorpe & S. Thrun (2000). "Structure from Motion without Correspondence" (PDF). IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
  13. Engel, Jakob; Schöps, Thomas; Cremers, Daniel (2014). "LSD-SLAM: Large-Scale Direct Monocular SLAM". European Conference on Computer Vision (ECCV) 2014 (PDF).
  14. J.L. Schönberger & J.M. Frahm (2016). "Structure-from-Motion Revisited" (PDF). IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
  15. C. Tomasi & T. Kanade (1992). "Shape and motion from image streams under orthography: a factorization method". International Journal of Computer Vision. 9 (2): 137–154. CiteSeerX   10.1.1.131.9807 . doi:10.1007/BF00129684. S2CID   2931825.
  16. V.M. Govindu (2001). "Combining two-view constraints for motion estimation". Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. Vol. 2. pp. II-218–II-225. doi:10.1109/CVPR.2001.990963. ISBN   0-7695-1272-0. S2CID   8252027.
  17. Westoby, M. J.; Brasington, J.; Glasser, N. F.; Hambrey, M. J.; Reynolds, J. M. (2012-12-15). "'Structure-from-Motion' photogrammetry: A low-cost, effective tool for geoscience applications". Geomorphology. 179: 300–314. Bibcode:2012Geomo.179..300W. doi:10.1016/j.geomorph.2012.08.021. S2CID   33695861.
  18. James, M. R.; Robson, S. (2012-09-01). "Straightforward reconstruction of 3D surfaces and topography with a camera: Accuracy and geoscience application" (PDF). Journal of Geophysical Research: Earth Surface. 117 (F3): F03017. Bibcode:2012JGRF..117.3017J. doi: 10.1029/2011jf002289 . ISSN   2156-2202.
  19. Fonstad, Mark A.; Dietrich, James T.; Courville, Brittany C.; Jensen, Jennifer L.; Carbonneau, Patrice E. (2013-03-30). "Topographic structure from motion: a new development in photogrammetric measurement" (PDF). Earth Surface Processes and Landforms . 38 (4): 421–430. Bibcode:2013ESPL...38..421F. doi:10.1002/esp.3366. ISSN   1096-9837. S2CID   15601931.
  20. Javernick, L.; Brasington, J.; Caruso, B. (2014). "Modeling the topography of shallow braided rivers using Structure-from-Motion photogrammetry". Geomorphology. 213: 166–182. Bibcode:2014Geomo.213..166J. doi:10.1016/j.geomorph.2014.01.006.
  21. Smith, Mark William; Vericat, Damià (2015-09-30). "From experimental plots to experimental landscapes: topography, erosion and deposition in sub-humid badlands from Structure-from-Motion photogrammetry" (PDF). Earth Surface Processes and Landforms. 40 (12): 1656–1671. Bibcode:2015ESPL...40.1656S. doi:10.1002/esp.3747. ISSN   1096-9837. S2CID   128402144.
  22. Goldstein, Evan B; Oliver, Amber R; deVries, Elsemarie; Moore, Laura J; Jass, Theo (2015-10-22). "Ground control point requirements for structure-from-motion derived topography in low-slope coastal environments". PeerJ PrePrints. doi: 10.7287/peerj.preprints.1444v1 . ISSN   2167-9843.
  23. Mancini, Francesco; Dubbini, Marco; Gattelli, Mario; Stecchi, Francesco; Fabbri, Stefano; Gabbianelli, Giovanni (2013-12-09). "Using Unmanned Aerial Vehicles (UAV) for High-Resolution Reconstruction of Topography: The Structure from Motion Approach on Coastal Environments". Remote Sensing. 5 (12): 6880–6898. Bibcode:2013RemS....5.6880M. doi: 10.3390/rs5126880 . hdl: 11380/1055514 .
  24. Johnson, Kendra; Nissen, Edwin; Saripalli, Srikanth; Arrowsmith, J. Ramón; McGarey, Patrick; Scharer, Katherine; Williams, Patrick; Blisniuk, Kimberly (2014-10-01). "Rapid mapping of ultrafine fault zone topography with structure from motion". Geosphere. 10 (5): 969–986. Bibcode:2014Geosp..10..969J. doi: 10.1130/GES01017.1 .
  25. Del Soldato, M.; Riquelme, A.; Bianchini, S.; Tomàs, R.; Di Martire, D.; De Vita, P.; Moretti, S.; Calcaterra, D. (2018-06-06). "Multisource data integration to investigate one century of evolution for the Agnone landslide (Molise, southern Italy)". Landslides. 15 (11): 2113–2128. doi: 10.1007/s10346-018-1015-z . hdl: 2158/1131012 . ISSN   1612-510X.
  26. Bryson, Mitch; Duce, Stephanie; Harris, Dan; Webster, Jody M.; Thompson, Alisha; Vila-Concejo, Ana; Williams, Stefan B. (2016). "Geomorphic changes of a coral shingle cay measured using Kite Aerial Photography". Geomorphology. 270: 1–8. Bibcode:2016Geomo.270....1B. doi:10.1016/j.geomorph.2016.06.018.
  27. Spreitzer, Gabriel; Tunnicliffe, Jon; Friedrich, Heide (2019-12-01). "Using Structure from Motion photogrammetry to assess large wood (LW) accumulations in the field". Geomorphology. 346: 106851. Bibcode:2019Geomo.34606851S. doi:10.1016/j.geomorph.2019.106851. S2CID   202908775.
  28. Spreitzer, Gabriel; Tunnicliffe, Jon; Friedrich, Heide (2020). "Large wood (LW) 3D accumulation mapping and assessment using structure from Motion photogrammetry in the laboratory". Journal of Hydrology. 581: 124430. Bibcode:2020JHyd..58124430S. doi:10.1016/j.jhydrol.2019.124430. S2CID   209465940.
  29. Riquelme, A.; Cano, M.; Tomás, R.; Abellán, A. (2017-01-01). "Identification of Rock Slope Discontinuity Sets from Laser Scanner and Photogrammetric Point Clouds: A Comparative Analysis". Procedia Engineering. 191: 838–845. doi: 10.1016/j.proeng.2017.05.251 . hdl: 10045/67538 . ISSN   1877-7058.
  30. Jordá Bordehore, Luis; Riquelme, Adrian; Cano, Miguel; Tomás, Roberto (2017-09-01). "Comparing manual and remote sensing field discontinuity collection used in kinematic stability assessment of failed rock slopes" (PDF). International Journal of Rock Mechanics and Mining Sciences. 97: 24–32. Bibcode:2017IJRMM..97...24J. doi:10.1016/j.ijrmms.2017.06.004. hdl: 10045/67528 . ISSN   1365-1609.
  31. Guidi. G.; Beraldin, J.A.; Atzeni, C. High accuracy 3D modelling of cultural heritage: The digitizing of Donatello. IEEE Trans. Image Process. 2004, 13, 370–380
  32. Kraus, K., 2007. Photogrammetry: Geometry from Image and Laser Scans. Walter de Gruyter, 459 pp. ISBN   978-3-11-019007-6
  33. Brandolini, Filippo; Patrucco, Giacomo (September 2019). "Structure-from-Motion (SFM) Photogrammetry as a Non-Invasive Methodology to Digitalize Historical Documents: A Highly Flexible and Low-Cost Approach?". Heritage. 2 (3): 2124–2136. doi: 10.3390/heritage2030128 . hdl: 2434/666172 .

Further reading