Structure from motion

Last updated

Structure from motion (SfM) [1] is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. It is studied in the fields of computer vision and visual perception.

Contents

Principle

Digital surface model of motorway interchange construction site DSM construction site.jpg
Digital surface model of motorway interchange construction site
Real photo x SfM with texture color x SfM with simple shader. Made with Python Photogrammetry Toolbox GUI and rendered in Blender with Cycles. SfM PPT GUI vs PHOTO.png
Real photo x SfM with texture color x SfM with simple shader. Made with Python Photogrammetry Toolbox GUI and rendered in Blender with Cycles.
Bezmiechowa airfield 3D digital surface model extracted from data collected during 30min flight of Pteryx UAV Bezmiechowa DSM 3D 2010-05-29 Pteryx UAV.jpg
Bezmiechowa airfield 3D digital surface model extracted from data collected during 30min flight of Pteryx UAV

Humans perceive a great deal of information about the three-dimensional structure in their environment by moving around it. When the observer moves, objects around them move different amounts depending on their distance from the observer. This is known as motion parallax, and from this depth information can be used to generate an accurate 3D representation of the world around them. [2]

Finding structure from motion presents a similar problem to finding structure from stereo vision. In both instances, the correspondence between images and the reconstruction of 3D object needs to be found.

To find correspondence between images, features such as corner points (edges with gradients in multiple directions) are tracked from one image to the next. One of the most widely used feature detectors is the scale-invariant feature transform (SIFT). It uses the maxima from a difference-of-Gaussians (DOG) pyramid as features. The first step in SIFT is finding a dominant gradient direction. To make it rotation-invariant, the descriptor is rotated to fit this orientation. [3] Another common feature detector is the SURF (speeded-up robust features). [4] In SURF, the DOG is replaced with a Hessian matrix-based blob detector. Also, instead of evaluating the gradient histograms, SURF computes for the sums of gradient components and the sums of their absolute values. [5] Its usage of integral images allows the features to be detected extremely quickly with high detection rate. [6] Therefore, comparing to SIFT, SURF is a faster feature detector with drawback of less accuracy in feature positions. [5] Another type of feature recently made practical for structure from motion are general curves (e.g., locally an edge with gradients in one direction), part of a technology known as pointless SfM, [7] [8] useful when point features are insufficient, common in man-made environments. [9]

The features detected from all the images will then be matched. One of the matching algorithms that track features from one image to another is the Lucas–Kanade tracker. [10]

Sometimes some of the matched features are incorrectly matched. This is why the matches should also be filtered. RANSAC (random sample consensus) is the algorithm that is usually used to remove the outlier correspondences. In the paper of Fischler and Bolles, RANSAC is used to solve the location determination problem (LDP), where the objective is to determine the points in space that project onto an image into a set of landmarks with known locations. [11]

The feature trajectories over time are then used to reconstruct their 3D positions and the camera's motion. [12] An alternative is given by so-called direct approaches, where geometric information (3D structure and camera motion) is directly estimated from the images, without intermediate abstraction to features or corners. [13]

There are several approaches to structure from motion. In incremental SfM, [14] camera poses are solved for and added one by one to the collection. In global SfM, [15] [16] the poses of all cameras are solved for at the same time. A somewhat intermediate approach is out-of-core SfM, where several partial reconstructions are computed that are then integrated into a global solution.

Applications

Geosciences

Structure-from-motion photogrammetry with multi-view stereo provides hyperscale landform models using images acquired from a range of digital cameras and optionally a network of ground control points. The technique is not limited in temporal frequency and can provide point cloud data comparable in density and accuracy to those generated by terrestrial and airborne laser scanning at a fraction of the cost. [17] [18] [19] Structure from motion is also useful in remote or rugged environments where terrestrial laser scanning is limited by equipment portability and airborne laser scanning is limited by terrain roughness causing loss of data and image foreshortening. The technique has been applied in many settings such as rivers, [20] badlands, [21] sandy coastlines, [22] [23] fault zones, [24] landslides, [25] [26] and coral reef settings. [27] SfM has been also successfully applied for the assessment of changes [28] and large wood accumulation volume [29] and porosity [30] in fluvial systems, the characterization of rock masses through the determination of some properties as the orientation, persistence, etc. of discontinuities. [31] [32] as well as for the evaluation of the stability of rock cut slopes [33] . A full range of digital cameras can be utilized, including digital SLR's, compact digital cameras and even smart phones. Generally though, higher accuracy data will be achieved with more expensive cameras, which include lenses of higher optical quality. The technique therefore offers exciting opportunities to characterize surface topography in unprecedented detail and, with multi-temporal data, to detect elevation, position and volumetric changes that are symptomatic of earth surface processes. Structure from motion can be placed in the context of other digital surveying methods.

Cultural heritage

Cultural heritage is present everywhere. Its structural control, documentation and conservation is one of humanity's main duties (UNESCO). Under this point of view, SfM is used in order to properly estimate situations as well as planning and maintenance efforts and costs, control and restoration. Because serious constraints often exist connected to the accessibility of the site and impossibility to install invasive surveying pillars that did not permit the use of traditional surveying routines (like total stations), SfM provides a non-invasive approach for the structure, without the direct interaction between the structure and any operator. The use is accurate as only qualitative considerations are needed. It is fast enough to respond to the monument’s immediate management needs. [34] The first operational phase is an accurate preparation of the photogrammetric surveying where is established the relation between best distance from the object, focal length, the ground sampling distance (GSD) and the sensor’s resolution. With this information the programmed photographic acquisitions must be made using vertical overlapping of at least 60% (figure 02). [35]

Furthermore, structure-from-motion photogrammetry represents a non-invasive, highly flexible and low-cost methodology to digitalize historical documents. [36]

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Photogrammetry</span> Taking measurements using photography

Photogrammetry is the science and technology of obtaining reliable information about physical objects and the environment through the process of recording, measuring and interpreting photographic images and patterns of electromagnetic radiant imagery and other phenomena.

<span class="mw-page-title-main">Automatic image annotation</span>

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

<span class="mw-page-title-main">3D scanning</span> Scanning of an object or environment to collect data on its shape

3D scanning is the process of analyzing a real-world object or environment to collect three dimensional data of its shape and possibly its appearance. The collected data can then be used to construct digital 3D models.

Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.

<span class="mw-page-title-main">Bundle adjustment</span> Technique in photogrammetry and computer vision

In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s) employed to acquire the images, given a set of images depicting a number of 3D points from different viewpoints. Its name refers to the geometrical bundles of light rays originating from each 3D feature and converging on each camera's optical center, which are adjusted optimally according to an optimality criterion involving the corresponding image projections of all points.

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.

<span class="mw-page-title-main">3D reconstruction</span> Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

A structured-light 3D scanner is a 3D scanning device for measuring the three-dimensional shape of an object using projected light patterns and a camera system.

<span class="mw-page-title-main">Visual odometry</span> Determining the position and orientation of a robot by analyzing associated camera images

In robotics and computer vision, visual odometry is the process of determining the position and orientation of a robot by analyzing the associated camera images. It has been used in a wide variety of robotic applications, such as on the Mars Exploration Rovers.

The Institute of Photogrammetry and GeoInformation (IPI) is a research institute that is part of the consortium of institutes operating under the aegis of Leibniz University situated in Hannover, Germany. The current research at IPI focuses both on terrestrial and extraterrestrial image interpretation. The basic themes of research revolve around computer vision, 3D geometry, image processing and machine learning. IPI contributes regularly with state-of-the-art methods to interpret high resolution images received from the HRSC probe of the Mars Express mission.

<span class="mw-page-title-main">PhotoModeler</span> Software application

PhotoModeler is a software application that performs image-based modeling and close range photogrammetry – producing 3D models and measurements from photography. The software is used for close-range, aerial and uav photogrammetry.

<span class="mw-page-title-main">3D reconstruction from multiple images</span> Creation of a 3D model from a set of images

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

<span class="mw-page-title-main">Digital outcrop model</span> Digital 3D representation of the outcrop surface

A digital outcrop model (DOM), also called a virtual outcrop model, is a digital 3D representation of the outcrop surface, mostly in a form of textured polygon mesh.

<span class="mw-page-title-main">Dlib</span> Cross-platform software library

Dlib is a general purpose cross-platform software library written in the programming language C++. Its design is heavily influenced by ideas from design by contract and component-based software engineering. Thus it is, first and foremost, a set of independent software components. It is open-source software released under a Boost Software License.

<span class="mw-page-title-main">Pix4D</span>

Pix4D is a Swiss software company that specializes in photogrammetry. It was founded in 2011 as a spinoff from the École Polytechnique Fédérale de Lausanne (EPFL) Computer Vision Lab in Switzerland. It develops a suite of software products that use photogrammetry and computer vision algorithms to transform DSLR, fisheye, RGB, thermal and multispectral images into 3D maps and 3D modeling. The company has 7 international offices, with its headquarters in Lausanne, Switzerland.

<span class="mw-page-title-main">Event camera</span> Type of imaging sensor

An event camera, also known as a neuromorphic camera, silicon retina or dynamic vision sensor, is an imaging sensor that responds to local changes in brightness. Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.

OpenDroneMap is an open source photogrammetry toolkit to process aerial imagery into maps and 3D models. The software is hosted and distributed freely on GitHub.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

References

  1. S. Ullman (1979). "The interpretation of structure from motion" (PDF). Proceedings of the Royal Society of London. 203 (1153): 405–426. Bibcode:1979RSPSB.203..405U. doi:10.1098/rspb.1979.0006. hdl: 1721.1/6298 . PMID   34162. S2CID   11995230.
  2. Linda G. Shapiro; George C. Stockman (2001). Computer Vision. Prentice Hall. ISBN   978-0-13-030796-5.
  3. D. G. Lowe (2004). "Distinctive image features from scale-invariant keypoints". International Journal of Computer Vision. 60 (2): 91–110. CiteSeerX   10.1.1.73.2924 . doi:10.1023/b:visi.0000029664.99615.94. S2CID   221242327.
  4. H. Bay; T. Tuytelaars & L. Van Gool (2006). "Surf: Speeded up robust features". 9th European Conference on Computer Vision.
  5. 1 2 K. Häming & G. Peters (2010). "The structure-from-motion reconstruction pipeline – a survey with focus on short image sequences". Kybernetika. 46 (5): 926–937.
  6. Viola, P.; Jones, M. (2001). "Rapid object detection using a boosted cascade of simple features". Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. Vol. 1. Kauai, HI, USA: IEEE Comput. Soc. pp. I–511–I-518. doi:10.1109/CVPR.2001.990517. ISBN   978-0-7695-1272-3. S2CID   2715202.
  7. Nurutdinova, Andrew; Fitzgibbon, Andrew (2015). "Towards Pointless Structure from Motion: 3D Reconstruction and Camera Parameters from General 3D Curves" (PDF). 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2363–2371. doi:10.1109/ICCV.2015.272. ISBN   978-1-4673-8391-2. S2CID   9120123.
  8. Fabbri, Ricardo; Giblin, Peter; Kimia, Benjamin (2012). "Camera Pose Estimation Using First-Order Curve Differential Geometry". Computer Vision – ECCV 2012 (PDF). Lecture Notes in Computer Science. Vol. 7575. pp. 231–244. doi:10.1007/978-3-642-33765-9_17. ISBN   978-3-642-33764-2. S2CID   15402824.
  9. Apple, ARKIT team (2018). "Understanding ARKit Tracking and Detection". WWDC.
  10. B. D. Lucas & T. Kanade. "An iterative image registration technique with an application to stereo vision". Ijcai81.
  11. M. A. Fischler & R. C. Bolles (1981). "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography". Commun. ACM. 24 (6): 381–395. doi: 10.1145/358669.358692 . S2CID   972888.
  12. F. Dellaert; S. Seitz; C. Thorpe & S. Thrun (2000). "Structure from Motion without Correspondence" (PDF). IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
  13. Engel, Jakob; Schöps, Thomas; Cremers, Daniel (2014). "LSD-SLAM: Large-Scale Direct Monocular SLAM". European Conference on Computer Vision (ECCV) 2014 (PDF).
  14. J.L. Schönberger & J.M. Frahm (2016). "Structure-from-Motion Revisited" (PDF). IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
  15. C. Tomasi & T. Kanade (1992). "Shape and motion from image streams under orthography: a factorization method". International Journal of Computer Vision. 9 (2): 137–154. CiteSeerX   10.1.1.131.9807 . doi:10.1007/BF00129684. S2CID   2931825.
  16. V.M. Govindu (2001). "Combining two-view constraints for motion estimation". Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. Vol. 2. pp. II-218–II-225. doi:10.1109/CVPR.2001.990963. ISBN   0-7695-1272-0. S2CID   8252027.
  17. Westoby, M. J.; Brasington, J.; Glasser, N. F.; Hambrey, M. J.; Reynolds, J. M. (2012-12-15). "'Structure-from-Motion' photogrammetry: A low-cost, effective tool for geoscience applications". Geomorphology. 179: 300–314. Bibcode:2012Geomo.179..300W. doi:10.1016/j.geomorph.2012.08.021. hdl: 2160/11389 . S2CID   33695861.
  18. James, M. R.; Robson, S. (2012-09-01). "Straightforward reconstruction of 3D surfaces and topography with a camera: Accuracy and geoscience application" (PDF). Journal of Geophysical Research: Earth Surface. 117 (F3): F03017. Bibcode:2012JGRF..117.3017J. doi: 10.1029/2011jf002289 . ISSN   2156-2202.
  19. Fonstad, Mark A.; Dietrich, James T.; Courville, Brittany C.; Jensen, Jennifer L.; Carbonneau, Patrice E. (2013-03-30). "Topographic structure from motion: a new development in photogrammetric measurement" (PDF). Earth Surface Processes and Landforms . 38 (4): 421–430. Bibcode:2013ESPL...38..421F. doi:10.1002/esp.3366. ISSN   1096-9837. S2CID   15601931.
  20. Javernick, L.; Brasington, J.; Caruso, B. (2014). "Modeling the topography of shallow braided rivers using Structure-from-Motion photogrammetry". Geomorphology. 213: 166–182. Bibcode:2014Geomo.213..166J. doi:10.1016/j.geomorph.2014.01.006.
  21. Smith, Mark William; Vericat, Damià (2015-09-30). "From experimental plots to experimental landscapes: topography, erosion and deposition in sub-humid badlands from Structure-from-Motion photogrammetry" (PDF). Earth Surface Processes and Landforms. 40 (12): 1656–1671. Bibcode:2015ESPL...40.1656S. doi:10.1002/esp.3747. ISSN   1096-9837. S2CID   128402144.
  22. Goldstein, Evan B; Oliver, Amber R; deVries, Elsemarie; Moore, Laura J; Jass, Theo (2015-10-22). "Ground control point requirements for structure-from-motion derived topography in low-slope coastal environments". PeerJ PrePrints. doi: 10.7287/peerj.preprints.1444v1 . ISSN   2167-9843.
  23. Mancini, Francesco; Dubbini, Marco; Gattelli, Mario; Stecchi, Francesco; Fabbri, Stefano; Gabbianelli, Giovanni (2013-12-09). "Using Unmanned Aerial Vehicles (UAV) for High-Resolution Reconstruction of Topography: The Structure from Motion Approach on Coastal Environments". Remote Sensing. 5 (12): 6880–6898. Bibcode:2013RemS....5.6880M. doi: 10.3390/rs5126880 . hdl: 11380/1055514 .
  24. Johnson, Kendra; Nissen, Edwin; Saripalli, Srikanth; Arrowsmith, J. Ramón; McGarey, Patrick; Scharer, Katherine; Williams, Patrick; Blisniuk, Kimberly (2014-10-01). "Rapid mapping of ultrafine fault zone topography with structure from motion". Geosphere. 10 (5): 969–986. Bibcode:2014Geosp..10..969J. doi: 10.1130/GES01017.1 .
  25. Del Soldato, M.; Riquelme, A.; Bianchini, S.; Tomàs, R.; Di Martire, D.; De Vita, P.; Moretti, S.; Calcaterra, D. (2018-06-06). "Multisource data integration to investigate one century of evolution for the Agnone landslide (Molise, southern Italy)". Landslides. 15 (11): 2113–2128. Bibcode:2018Lands..15.2113D. doi: 10.1007/s10346-018-1015-z . hdl: 2158/1131012 . ISSN   1612-510X.
  26. Tomás, Roberto; Pinheiro, Marisa; Pinto, Pedro; Pereira, Eduardo; Miranda, Tiago (August 2023). "Preliminary analysis of the mechanisms, characteristics, and causes of a recent catastrophic structurally controlled rock planar slide in Esposende (northern Portugal)". Landslides. 20 (8): 1657–1665. Bibcode:2023Lands..20.1657T. doi:10.1007/s10346-023-02082-y. ISSN   1612-510X.
  27. Bryson, Mitch; Duce, Stephanie; Harris, Dan; Webster, Jody M.; Thompson, Alisha; Vila-Concejo, Ana; Williams, Stefan B. (2016). "Geomorphic changes of a coral shingle cay measured using Kite Aerial Photography". Geomorphology. 270: 1–8. Bibcode:2016Geomo.270....1B. doi:10.1016/j.geomorph.2016.06.018.
  28. Conesa-García, Carmelo; Puig-Mengual, Carlos; Riquelme, Adrián; Tomás, Roberto; Martínez-Capel, Francisco; García-Lorenzo, Rafael; Pastor, José L.; Pérez-Cutillas, Pedro; Martínez-Salvador, Alberto; Cano-Gonzalez, Miguel (2022-02-01). "Changes in stream power and morphological adjustments at the event-scale and high spatial resolution along an ephemeral gravel-bed channel". Geomorphology. 398: 108053. Bibcode:2022Geomo.39808053C. doi:10.1016/j.geomorph.2021.108053. hdl:10251/190056. ISSN   0169-555X.
  29. Spreitzer, Gabriel; Tunnicliffe, Jon; Friedrich, Heide (2019-12-01). "Using Structure from Motion photogrammetry to assess large wood (LW) accumulations in the field". Geomorphology. 346: 106851. Bibcode:2019Geomo.34606851S. doi:10.1016/j.geomorph.2019.106851. S2CID   202908775.
  30. Spreitzer, Gabriel; Tunnicliffe, Jon; Friedrich, Heide (2020). "Large wood (LW) 3D accumulation mapping and assessment using structure from Motion photogrammetry in the laboratory". Journal of Hydrology. 581: 124430. Bibcode:2020JHyd..58124430S. doi:10.1016/j.jhydrol.2019.124430. S2CID   209465940.
  31. Riquelme, A.; Cano, M.; Tomás, R.; Abellán, A. (2017-01-01). "Identification of Rock Slope Discontinuity Sets from Laser Scanner and Photogrammetric Point Clouds: A Comparative Analysis". Procedia Engineering. 191: 838–845. doi: 10.1016/j.proeng.2017.05.251 . hdl: 10045/67538 . ISSN   1877-7058.
  32. Jordá Bordehore, Luis; Riquelme, Adrian; Cano, Miguel; Tomás, Roberto (2017-09-01). "Comparing manual and remote sensing field discontinuity collection used in kinematic stability assessment of failed rock slopes" (PDF). International Journal of Rock Mechanics and Mining Sciences. 97: 24–32. Bibcode:2017IJRMM..97...24J. doi:10.1016/j.ijrmms.2017.06.004. hdl: 10045/67528 . ISSN   1365-1609.
  33. Tomás, R.; Riquelme, A.; Cano, M.; Pastor, J. L.; Pagán, J. I.; Asensio, J. L.; Ruffo, M. (2020-06-23). "Evaluación de la estabilidad de taludes rocosos a partir de nubes de puntos 3D obtenidas con un vehículo aéreo no tripulado". Revista de Teledetección (55): 1. doi:10.4995/raet.2020.13168. ISSN   1988-8740.
  34. Guidi. G.; Beraldin, J.A.; Atzeni, C. High accuracy 3D modelling of cultural heritage: The digitizing of Donatello. IEEE Trans. Image Process. 2004, 13, 370–380
  35. Kraus, K., 2007. Photogrammetry: Geometry from Image and Laser Scans. Walter de Gruyter, 459 pp. ISBN   978-3-11-019007-6
  36. Brandolini, Filippo; Patrucco, Giacomo (September 2019). "Structure-from-Motion (SFM) Photogrammetry as a Non-Invasive Methodology to Digitalize Historical Documents: A Highly Flexible and Low-Cost Approach?". Heritage. 2 (3): 2124–2136. doi: 10.3390/heritage2030128 . hdl: 2434/666172 .

Further reading