Structure from motion (psychophysics)

Last updated

In visual perception, structure from motion (SFM) refers to how humans (and other living creatures) recover depth structure from object's motion. The human visual field has an important function: capturing the three-dimensional structures of an object using different kinds of visual cues. [1]

Contents

SFM is a kind of motion visual cue that uses motion of two-dimensional surfaces to demonstrate three-dimensional objects, [2] and this visual cue works really well even independent of other depth cues. [3] Psychological, especially psychophysical studies have been focused on this topic for decades.

Psychophysical studies

Biological motion demonstration: dots representing a person walking.

In a 1953 study on SFM done by Wallach and O'Connell the kinetic depth effect was tested. They found that by turning shadow images of a three dimensional object can be used as a cue to recover the structure of the physical object quite well. [4] Johansson's study conducted in 1973 discovered our ability to perceive human form of walking or dancing simply from projected motion of several points on the body, [5] this motion pattern was later termed as biological motion. [6]

A proposition for how we generate a 3D surface representation of an object is that our visual system uses the spatial and temporal integration of information to detect the structure. [7] Other studies agree that SFM is a process which contains several aspects: [8] the perception of rotating direction, [9] perceived orientation of rotation axis, [9] space interpolation effects [10] and object recognition.

Given its complexity, SFM involves very high-level of visual processing. Studies have shown that MT, rather than V1 (the primary visual cortex), is directly involved in the generation of the SFM perception. [8] Neurons in MT are also triggered by motion parallax and show depth signs independent of other depth cues, [11] and MT's representation of three-dimensions also confirms the close relationship between MT area and SFM. However, V1 neuron activities are indirectly related to SFM perception, which receives general feedback from MT. [8] [12]

The importance of motion perception of SFM in detecting three-dimensional structure is also demonstrated by several studies. 3D objects can be perceived from the 2D projections of the moving object on a screen, but not the stationary 2D images. [4] [13] Also, one essential condition for SFM perception to occur accurately is that the projection of the object must has simultaneously changing contour and lines. [4] A relatively invariant point lifetime threshold of SFM (50-85 msec) was found, and it turns out that this threshold is close to the threshold of velocity measurement, which suggests that velocity measurement is involved in the SFM processing procedure. [7] Given such mechanism, human visual system can derive an accurate model of SFM even with the presence of noise. [14]

Being a complex process, SFM requires more than orthographic projections approximations, though many experiments used orthographic projections. Studies have found that higher order visual cues like acceleration and perspective projection are involved in this process rather than just first order flow (meaning SFM is partly a top down process). [15] Combination of all orders of visual cues gives the best estimate of 3D objects. [15]

See also

Related Research Articles

<span class="mw-page-title-main">Visual cortex</span> Region of the brain that processes visual information

The visual cortex of the brain is the area of the cerebral cortex that processes visual information. It is located in the occipital lobe. Sensory input originating from the eyes travels through the lateral geniculate nucleus in the thalamus and then reaches the visual cortex. The area of the visual cortex that receives the sensory input from the lateral geniculate nucleus is the primary visual cortex, also known as visual area 1 (V1), Brodmann area 17, or the striate cortex. The extrastriate areas consist of visual areas 2, 3, 4, and 5.

<span class="mw-page-title-main">Optical illusion</span> Visually perceived images that differ from objective reality

In visual perception, an optical illusion is an illusion caused by the visual system and characterized by a visual percept that arguably appears to differ from reality. Illusions come in a wide variety; their categorization is difficult because the underlying cause is often not clear but a classification proposed by Richard Gregory is useful as an orientation. According to that, there are three main classes: physical, physiological, and cognitive illusions, and in each class there are four kinds: Ambiguities, distortions, paradoxes, and fictions. A classical example for a physical distortion would be the apparent bending of a stick half immerged in water; an example for a physiological paradox is the motion aftereffect. An example for a physiological fiction is an afterimage. Three typical cognitive distortions are the Ponzo, Poggendorff, and Müller-Lyer illusion. Physical illusions are caused by the physical environment, e.g. by the optical properties of water. Physiological illusions arise in the eye or the visual pathway, e.g. from the effects of excessive stimulation of a specific receptor type. Cognitive visual illusions are the result of unconscious inferences and are perhaps those most widely known.

<span class="mw-page-title-main">Depth perception</span> Visual ability to perceive the world in 3D

Depth perception is the ability to perceive distance to objects in the world using the visual system and visual perception. It is a major factor in perceiving the world in three dimensions. Depth perception happens primarily due to stereopsis and accommodation of the eye.

<span class="mw-page-title-main">Visual system</span> Body parts responsible for vision

The visual system comprises the sensory organ and parts of the central nervous system which gives organisms the sense of vision as well as enabling the formation of several non-image photo response functions. It detects and interprets information from the optical spectrum perceptible to that species to "build a representation" of the surrounding environment. The visual system carries out a number of complex tasks, including the reception of light and the formation of monocular neural representations, colour vision, the neural mechanisms underlying stereopsis and assessment of distances to and between objects, the identification of a particular object of interest, motion perception, the analysis and integration of visual information, pattern recognition, accurate motor coordination under visual guidance, and more. The neuropsychological side of visual information processing is known as visual perception, an abnormality of which is called visual impairment, and a complete absence of which is called blindness. Non-image forming visual functions, independent of visual perception, include the pupillary light reflex and circadian photoentrainment.

Multisensory integration, also known as multimodal integration, is the study of how information from the different sensory modalities may be integrated by the nervous system. A coherent representation of objects combining modalities enables animals to have meaningful perceptual experiences. Indeed, multisensory integration is central to adaptive behavior because it allows animals to perceive a world of coherent perceptual entities. Multisensory integration also deals with how different sensory modalities interact with one another and alter each other's processing.

Stereopsis is the component of depth perception retrieved through binocular vision. Stereopsis is not the only contributor to depth perception, but it is a major one. Binocular vision happens because each eye receives a different image because they are in slightly different positions on one's head. These positional differences are referred to as "horizontal disparities" or, more generally, "binocular disparities". Disparities are processed in the visual cortex of the brain to yield depth perception. While binocular disparities are naturally present when viewing a real three-dimensional scene with two eyes, they can also be simulated by artificially presenting two different images separately to each eye using a method called stereoscopy. The perception of depth in such cases is also referred to as "stereoscopic depth".

In psychophysics, sensory threshold is the weakest stimulus that an organism can sense. Unless otherwise indicated, it is usually defined as the weakest stimulus that can be detected half the time, for example, as indicated by a point on a probability curve. Methods have been developed to measure thresholds in any of the senses.

<span class="mw-page-title-main">Flash lag illusion</span> Optical illusion

The flash lag illusion or flash-lag effect is a visual illusion wherein a flash and a moving object that appear in the same location are perceived to be displaced from one another. Several explanations for this simple illusion have been explored in the neuroscience literature.

<span class="mw-page-title-main">Kinetic depth effect</span> Phenomenon of visual perception

In visual perception, the kinetic depth effect refers to the phenomenon whereby the three-dimensional structural form of an object can be perceived when the object is moving. In the absence of other visual depth cues, this might be the only perception mechanism available to infer the object's shape. Being able to identify a structure from a motion stimulus through the human visual system was shown by Hans Wallach and O'Connell in the 1950s through their experiments.

In cognitive neuroscience, visual modularity is an organizational concept concerning how vision works. The way in which the primate visual system operates is currently under intense scientific scrutiny. One dominant thesis is that different properties of the visual world require different computational solutions which are implemented in anatomically/functionally distinct regions that operate independently – that is, in a modular fashion.

In human visual perception, the visual angle, denoted θ, subtended by a viewed object sometimes looks larger or smaller than its actual value. One approach to this phenomenon posits a subjective correlate to the visual angle: the perceived visual angle or perceived angular size. An optical illusion where the physical and subjective angles differ is then called a visual angle illusion or angular size illusion.

<span class="mw-page-title-main">Parasol cell</span>

A parasol cell, sometimes called an M cell or M ganglion cell, is one type of retinal ganglion cell (RGC) located in the ganglion cell layer of the retina. These cells project to magnocellular cells in the lateral geniculate nucleus (LGN) as part of the magnocellular pathway in the visual system. They have large cell bodies as well as extensive branching dendrite networks and as such have large receptive fields. Relative to other RGCs, they have fast conduction velocities. While they do show clear center-surround antagonism, they receive no information about color. Parasol ganglion cells contribute information about the motion and depth of objects to the visual system.

<span class="mw-page-title-main">Wiggle stereoscopy</span> 3-D image display method

Wiggle stereoscopy is an example of stereoscopy in which left and right images of a stereogram are animated. This technique is also called wiggle 3-D, wobble 3-D, or sometimes Piku-Piku.

Object-based attention refers to the relationship between an ‘object’ representation and a person’s visually stimulated, selective attention, as opposed to a relationship involving either a spatial or a feature representation; although these types of selective attention are not necessarily mutually exclusive. Research into object-based attention suggests that attention improves the quality of the sensory representation of a selected object, and results in the enhanced processing of that object’s features.

Binocular neurons are neurons in the visual system that assist in the creation of stereopsis from binocular disparity. They have been found in the primary visual cortex where the initial stage of binocular convergence begins. Binocular neurons receive inputs from both the right and left eyes and integrate the signals together to create a perception of depth.

Biological motion perception is the act of perceiving the fluid unique motion of a biological agent. The phenomenon was first documented by Swedish perceptual psychologist, Gunnar Johansson, in 1973. There are many brain areas involved in this process, some similar to those used to perceive faces. While humans complete this process with ease, from a computational neuroscience perspective there is still much to be learned as to how this complex perceptual problem is solved. One tool which many research studies in this area use is a display stimuli called a point light walker. Point light walkers are coordinated moving dots that simulate biological motion in which each dot represents specific joints of a human performing an action.

Stereoscopic motion, as introduced by Béla Julesz in his book Foundations of Cyclopean Perception of 1971, is a translational motion of figure boundaries defined by changes in binocular disparity over time in a real-life 3D scene, a 3D film or other stereoscopic scene. This translational motion gives rise to a mental representation of three dimensional motion created in the brain on the basis of the binocular motion stimuli. Whereas the motion stimuli as presented to the eyes have a different direction for each eye, the stereoscopic motion is perceived as yet another direction on the basis of the views of both eyes taken together. Stereoscopic motion, as it is perceived by the brain, is also referred to as cyclopean motion, and the processing of visual input that takes place in the visual system relating to stereoscopic motion is called stereoscopic motion processing.

Michael S. Landy is Professor of Psychology and Neural Science at New York University. He is known for his research on visual perception and movement planning.

Julie Marie Harris has been Director of Research in the School of Psychology and Neuroscience (2011–21) and a Professor of Vision Science at the University of St Andrews. Her research investigates visual systems and camouflage.

<span class="mw-page-title-main">Accidental viewpoint</span> Ambiguous image or illusion

An accidental viewpoint is a singular position from which an image can be perceived, creating either an ambiguous image or an illusion. The image perceived at this angle is viewpoint-specific, meaning it cannot be perceived at any other position, known as generic or non-accidental viewpoints. These view-specific angles are involved in object recognition. In its uses in art and other visual illusions, the accidental viewpoint creates the perception of depth often on a two-dimensional surface with the assistance of monocular cues.

References

  1. Whitehead, Bruce A. (July 1981). "James J. Gibson: The ecological approach to visual perception. Boston: Houghton Mifflin, 1979, 332 pp". Behavioral Science. 26 (3): 308–309. doi:10.1002/bs.3830260313. ISSN   0005-7940.
  2. "APA Upgrades APA PsycNET Content Delivery Platform". PsycEXTRA Dataset. 2017. doi:10.1037/e500792018-001 . Retrieved 2020-06-16.
  3. Rogers, Brian; Graham, Maureen (April 1979). "Motion Parallax as an Independent Cue for Depth Perception". Perception. 8 (2): 125–134. doi:10.1068/p080125. ISSN   0301-0066. PMID   471676. S2CID   32993507.
  4. 1 2 3 "APA PsycNet". psycnet.apa.org. Retrieved 2020-06-28.
  5. Aloimonos, J.; Brown, C. M. (1989-04-01). "On the kinetic depth effect". Biological Cybernetics. 60 (6): 445–455. doi:10.1007/BF00204700. ISSN   1432-0770. PMID   2719982. S2CID   11865192.
  6. Johansson, Gunnar (1973-06-01). "Visual perception of biological motion and a model for its analysis". Perception & Psychophysics. 14 (2): 201–211. doi: 10.3758/BF03212378 . ISSN   1532-5962.
  7. 1 2 Treue, Stefan; Husain, Masud; Andersen, Richard A. (1991-01-01). "Human perception of structure from motion". Vision Research. 31 (1): 59–75. doi:10.1016/0042-6989(91)90074-F. ISSN   0042-6989. PMID   2006555. S2CID   45082357.
  8. 1 2 3 Grunewald, Alexander; Bradley, David C.; Andersen, Richard A. (2002-07-15). "Neural Correlates of Structure-from-Motion Perception in Macaque V1 and MT". The Journal of Neuroscience. 22 (14): 6195–6207. doi:10.1523/JNEUROSCI.22-14-06195.2002. ISSN   0270-6474. PMC   6757912 . PMID   12122078.
  9. 1 2 Pollick, F. E.; Nishida, S.; Koike, Y.; Kawato, M. (July 1994). "Perceived motion in structure from motion: pointing responses to the axis of rotation". Perception & Psychophysics. 56 (1): 91–109. doi: 10.3758/bf03211693 . ISSN   0031-5117. PMID   8084735.
  10. Treue, S.; Andersen, R. A.; Ando, H.; Hildreth, E. C. (January 1995). "Structure-from-motion: perceptual evidence for surface interpolation". Vision Research. 35 (1): 139–148. doi: 10.1016/0042-6989(94)e0069-w . ISSN   0042-6989. PMID   7839603. S2CID   2057062.
  11. Nadler, Jacob W.; Angelaki, Dora E.; DeAngelis, Gregory C. (2008-04-03). "A neural representation of depth from motion parallax in macaque visual cortex". Nature. 452 (7187): 642–645. Bibcode:2008Natur.452..642N. doi:10.1038/nature06814. ISSN   1476-4687. PMC   2422877 . PMID   18344979.
  12. Maunsell, JH; van Essen, DC (1983-12-01). "The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey". The Journal of Neuroscience. 3 (12): 2563–2586. doi: 10.1523/jneurosci.03-12-02563.1983 . ISSN   0270-6474. PMC   6564662 . PMID   6655500.
  13. Vuong, Quoc C; Friedman, Alinda; Plante, Courtney (January 2009). "Modulation of Viewpoint Effects in Object Recognition by Shape and Motion Cues". Perception. 38 (11): 1628–1648. doi:10.1068/p6430. ISSN   0301-0066. PMID   20120262. S2CID   15584115.
  14. Hildreth, Ellen C.; Grzywacz, Norberto M.; Adelson, Edward H.; Inada, Victor K. (1990-01-01). "The perceptual buildup of three-dimensional structure from motion". Perception & Psychophysics. 48 (1): 19–36. doi: 10.3758/BF03205008 . hdl: 1721.1/6512 . ISSN   1532-5962. PMID   2377437.
  15. 1 2 "APA PsycNet". psycnet.apa.org. Retrieved 2020-06-21.