Geon (psychology)

Last updated

Geons are the simple 2D or 3D forms such as cylinders, bricks, wedges, cones, circles and rectangles corresponding to the simple parts of an object in Biederman's recognition-by-components theory. [1] The theory proposes that the visual input is matched against structural representations of objects in the brain. These structural representations consist of geons and their relations (e.g., an ice cream cone could be broken down into a sphere located above a cone). Only a modest number of geons (< 40) are assumed. When combined in different relations to each other (e.g., on-top-of, larger-than, end-to-end, end-to-middle) and coarse metric variation such as aspect ratio and 2D orientation, billions of possible 2- and 3-geon objects can be generated. Two classes of shape-based visual identification that are not done through geon representations, are those involved in: a) distinguishing between similar faces, and b) classifications that don’t have definite boundaries, such as that of bushes or a crumpled garment. Typically, such identifications are not viewpoint-invariant.

Contents

Properties of geons

Two cases of two interrelated geons, What does the reader imagine in each case? Geons.png
Two cases of two interrelated geons, What does the reader imagine in each case?

There are 4 essential properties of geons:

  1. View-invariance: Each geon can be distinguished from the others from almost any viewpoints except for “accidents” at highly restricted angles in which one geon projects an image that could be a different geon, as, for example, when an end-on view of a cylinder can be a sphere or circle. Objects represented as an arrangement of geons would, similarly, be viewpoint invariant.
  2. Stability or resistance to visual noise: Because the geons are simple, they are readily supported by the Gestalt property of smooth continuation, rendering their identification robust to partial occlusion and degradation by visual noise as, for example, when a cylinder might be viewed behind a bush.
  3. Invariance to illumination direction and surface markings and texture.
  4. High distinctiveness: The geons differ qualitatively, with only two or three levels of an attributes, such as straight vs. curved, parallel vs. non parallel, positive vs. negative curvature. These qualitative differences can be readily distinguished thus rendering the geons readily distinguishable and the objects so composed, readily distinguishable.

Derivation of invariant properties of geons

Viewpoint invariance: The viewpoint invariance of geons derives from their being distinguished by three nonaccidental properties (NAPs) of contours that do not change with orientation in depth:

  1. Whether the contour is straight or curved,
  2. The vertex that is formed when two or three contours coterminate (that is, end together at the same point), in the image, i.e., an L (2 contours), fork (3 contours with all angles < 180°), or an arrow (3 contours, with one angle > 180°), and
  3. Whether a pair of contours is parallel or not (with allowance for perspective). When not parallel, the contours can be straight (converging or diverging) or curved, with positive or negative curvature forming a convex or concave, envelope, respectively (see Figure below).
Geon2.png

NAPs can be distinguished from metric properties (MPs), such as the degree of non-zero curvature of a contour or its length, which do vary with changes in orientation in depth.

Invariance to lighting direction and surface characteristics

Geons can be determined from the contours that mark the edges at orientation and depth discontinuities of an image of an object, i.e., the contours that specify a good line drawing of the object’s shape or volume. Orientation discontinuities define those edges where there is a sharp change in the orientation of the normal to the surface of a volume, as occurs at the contour at the boundaries of the different sides of a brick. A depth discontinuity is where the observer’s line of sight jumps from the surface of an object to the background (i.e., is tangent to the surface), as occurs at the sides of a cylinder. The same contour might mark both an orientation and depth discontinuity, as with the back edge of a brick. Because the geons are based on these discontinuities, they are invariant to variations in the direction of lighting, shadows, and surface texture and markings.

Geons and generalized cones

The geons constitute a partition of the set of generalized cones, [2] which are the volumes created when a cross section is swept along an axis. For example, a circle swept along a straight axis would define a cylinder (see Figure). A rectangle swept along a straight axis would define a "brick" (see Figure). Four dimensions with contrastive values (i.e., mutually exclusive values) define the current set of geons (see Figure):

  1. Shape of cross section: round vs. straight. For example, as stated above, a rectangle swept along a straight axis would define a "brick" and the cross section would be straight.
  2. Axis: straight vs. curved.
  3. Size of cross-section as it is swept along an axis: constant vs. expanding (or contracting) vs. expanding then contracting vs. contracting then expanding. The cross section size of a "brick" would be constant.
  4. Termination of geon with constant sized cross-sections: truncated vs. converging to a point vs. rounded.

These variations in the generating of geons create shapes that differ in NAPs.

Experimental tests of the viewpoint invariance of geons

There is now considerable support for the major assumptions of geon theory (See Recognition-by-components theory). One issue that generated some discussion was the finding [3] that the geons were viewpoint invariant with little or no cost in the speed or accuracy of recognizing or matching a geon from an orientation in depth not previously experienced. Some studies [4] reported modest costs in matching geons at new orientations in depth but these studies had several methodological shortcomings. [5] [6]

Research on geons

There is much research out about geons and how they are interpreted. Kim Kirkpatrick-Steger, Edward A. Wasserman and Irving Biederman have found that the individual geons along with their spatial composition are important in recognition. [7] Furthermore, the findings in this research seem to indicate that non-accidental sensitivity can be found in all shape discriminating species. [8]

Notes

  1. Biederman, Irving (1987). "Recognition-by-components: A theory of human image understanding" (PDF). Psychological Review. 94 (2): 115–47. doi:10.1037/0033-295X.94.2.115. PMID   3575582.
  2. Nevatia, R. (1982) Machine Perception. Prentice-Hall.[ page needed ]
  3. Biederman, Irving; Gerhardstein, Peter C. (1993). "Recognizing depth-rotated objects: Evidence and conditions for three-dimensional viewpoint invariance" (PDF). Journal of Experimental Psychology: Human Perception and Performance. 19 (6): 1162–82. doi:10.1037/0096-1523.19.6.1162. PMID   8294886.
  4. Tarr, Michael J.; Williams, Pepper; Hayward, William G.; Gauthier, Isabel (1998). "Three-dimensional object recognition is viewpoint dependent". Nature Neuroscience. 1 (4): 275–7. doi:10.1038/1089. PMID   10195159. S2CID   14389169.
  5. Biederman, I; Bar, M (1999). "One-shot viewpoint invariance in matching novel objects". Vision Research. 39 (17): 2885–99. doi: 10.1016/S0042-6989(98)00309-5 . PMID   10492817. S2CID   2494577.
  6. Dill, Marcus; Edelman, Shimon (2001). "Imperfect invariance to object translation in the discrimination of complex shapes". Perception. 30 (6): 707–24. doi:10.1068/p2953. PMID   11464559. S2CID   12607120.
  7. Biederman, Irving; Kirkpatrik-Steger, Kim; Wasserman, Edward (1998). "Effects of Geon Deletion, Scrambling, and Movement on Picture Recognition in Pigeons". Journal of Experimental Psychology: Animal Behavior Processes. 24 (1): 34–46. doi:10.1037/0097-7403.24.1.34. PMID   9438964.
  8. Biederman, Irving; Kirkpatrik-Steger, Kim; Wasserman, Edward (1998). "Effects of Geon Deletion, Scrambling, and Movement on Picture Recognition in Pigeons". Journal of Experimental Psychology: Animal Behavior Processes. 24 (1): 34–46. doi:10.1037/0097-7403.24.1.34. PMID   9438964.

Related Research Articles

Volume Quantity of three-dimensional space

Volume is a scalar quantity expressing the amount of three-dimensional space enclosed by a closed surface. For example, the space that a substance or 3D shape occupies or contains. Volume is often quantified numerically using the SI derived unit, the cubic metre. The volume of a container is generally understood to be the capacity of the container; i.e., the amount of fluid that the container could hold, rather than the amount of space the container itself displaces. Three dimensional mathematical shapes are also assigned volumes. Volumes of some simple shapes, such as regular, straight-edged, and circular shapes can be easily calculated using arithmetic formulas. Volumes of complicated shapes can be calculated with integral calculus if a formula exists for the shape's boundary. One-dimensional figures and two-dimensional shapes are assigned zero volume in the three-dimensional space.

Optical illusion Visually perceived images that differ from objective reality

Within visual perception, an optical illusion is an illusion caused by the visual system and characterized by a visual percept that arguably appears to differ from reality. Illusions come in a wide variety; their categorization is difficult because the underlying cause is often not clear but a classification proposed by Richard Gregory is useful as an orientation. According to that, there are three main classes: physical, physiological, and cognitive illusions, and in each class there are four kinds: Ambiguities, distortions, paradoxes, and fictions. A classical example for a physical distortion would be the apparent bending of a stick half immerged in water; an example for a physiological paradox is the motion aftereffect. An example for a physiological fiction is an afterimage. Three typical cognitive distortions are the Ponzo, Poggendorff, and Müller-Lyer illusion. Physical illusions are caused by the physical environment, e.g. by the optical properties of water. Physiological illusions arise in the eye or the visual pathway, e.g. from the effects of excessive stimulation of a specific receptor type. Cognitive visual illusions are the result of unconscious inferences and are perhaps those most widely known.

Shape Form of an object or its external boundary

A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type. A plane shape or plane figure is constrained to lie on a plane, in contrast to solid 3D shapes. A two-dimensional shape or two-dimensional figure may lie on a more general curved surface.

In mathematical physics, a closed timelike curve (CTC) is a world line in a Lorentzian manifold, of a material particle in spacetime that is "closed", returning to its starting point. This possibility was first discovered by Willem Jacob van Stockum in 1937 and later confirmed by Kurt Gödel in 1949, who discovered a solution to the equations of general relativity (GR) allowing CTCs known as the Gödel metric; and since then other GR solutions containing CTCs have been found, such as the Tipler cylinder and traversable wormholes. If CTCs exist, their existence would seem to imply at least the theoretical possibility of time travel backwards in time, raising the spectre of the grandfather paradox, although the Novikov self-consistency principle seems to show that such paradoxes could be avoided. Some physicists speculate that the CTCs which appear in certain GR solutions might be ruled out by a future theory of quantum gravity which would replace GR, an idea which Stephen Hawking has labeled the chronology protection conjecture. Others note that if every closed timelike curve in a given space-time passes through an event horizon, a property which can be called chronological censorship, then that space-time with event horizons excised would still be causally well behaved and an observer might not be able to detect the causal violation.

Bounding volume Cosed volume that completely contains the union of a set of objects

In computer graphics and computational geometry, a bounding volume for a set of objects is a closed volume that completely contains the union of the objects in the set. Bounding volumes are used to improve the efficiency of geometrical operations by using simple volumes to contain more complex objects. Normally, simpler volumes have simpler ways to test for overlap.

The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

Ambiguous image Image that exploits graphical similarities between two or more distinct images

Ambiguous images or reversible figures are visual forms which create ambiguity by exploiting graphical similarities and other properties of visual system interpretation between two or more distinct image forms. These are famous for inducing the phenomenon of multistable perception. Multistable perception is the occurrence of an image being able to provide multiple, although stable, perceptions.

Cross section (geometry) Projection of a solid body onto a plane in 3D space, or an intersection of the two

In geometry and science, a cross section is the non-empty intersection of a solid body in three-dimensional space with a plane, or the analog in higher-dimensional spaces. Cutting an object into slices creates many parallel cross-sections. The boundary of a cross-section in three-dimensional space that is parallel to two of the axes, that is, parallel to the plane determined by these axes, is sometimes referred to as a contour line; for example, if a plane cuts through mountains of a raised-relief map parallel to the ground, the result is a contour line in two-dimensional space showing points on the surface of the mountains of equal elevation.

In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

Simple cell Beaker with Dilute Sulphuric Acid, Zinc and Copper Sheet is known as A Simple Cell

A simple cell in the primary visual cortex is a cell that responds primarily to oriented edges and gratings. These cells were discovered by Torsten Wiesel and David Hubel in the late 1950s.

Recognition-by-components theory

The recognition-by-components theory, or RBC theory, is a process proposed by Irving Biederman in 1987 to explain object recognition. According to RBC theory, we are able to recognize objects by separating them into geons. Biederman suggested that geons are based on basic 3-dimensional shapes that can be assembled in various arrangements to form a virtually unlimited number of objects.

Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.

3D reconstruction Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

Visual perception Ability to interpret the surrounding environment using light in the visible spectrum

Visual perception, or sight, is the ability to interpret the surrounding environment through photopic vision, color vision, scotopic vision, and mesopic vision, using light in the visible spectrum reflected by objects in the environment. This is different from visual acuity, which refers to how clearly a person sees. A person can have problems with visual perceptual processing even if they have 20/20 vision.

Visual object recognition refers to the ability to identify the objects in view based on visual input. One important signature of visual object recognition is "object invariance", or the ability to identify objects across changes in the detailed context in which objects are viewed, including changes in illumination, object pose, and background context.

Form perception is the recognition of visual elements of objects, specifically those to do with shapes, patterns and previously identified important characteristics. An object is perceived by the retina as a two-dimensional image,[1] but the image can vary for the same object in terms of the context with which it is viewed, the apparent size of the object, the angle from which it is viewed, how illuminated it is, as well as where it resides in the field of vision.[2] Despite the fact that each instance of observing an object leads to a unique retinal response pattern, the visual processing in the brain is capable of recognizing these experiences as analogous, allowing invariant object recognition. recognition|object recognition]]. Visual processing occurs in a hierarchy with the lowest levels recognizing lines and contours, and slightly higher levels performing tasks such as completing boundaries and recognizing contour combinations. The highest levels integrate the perceived information to recognize an entire object. Essentially object recognition is the ability to assign labels to objects in order to categorize and identify them, thus distinguishing one object from another. During visual processing information is not created, but rather reformatted in a way that draws out the most detailed information of the stimulus.

Representational momentum is a small, but reliable, error in our visual perception of moving objects. Representational moment was discovered and named by Jennifer Freyd and Ronald Finke. Instead of knowing the exact location of a moving object, viewers actually think it is a bit further along its trajectory as time goes forward. For example, people viewing an object moving from left to right that suddenly disappears will report they saw it a bit further to the right than where it actually vanished. While not a big error, it has been found in a variety of different events ranging from simple rotations to camera movement through a scene. The name "representational momentum" initially reflected the idea that the forward displacement was the result of the perceptual system having internalized, or evolved to include, basic principles of Newtonian physics, but it has come to mean forward displacements that continue a presented pattern along a variety of dimensions, not just position or orientation. As with many areas of cognitive psychology, theories can focus on bottom-up or top-down aspects of the task. Bottom-up theories of representational momentum highlight the role of eye movements and stimulus presentation, while top-down theories highlight the role of the observer's experience and expectations regarding the presented event.

This article is about structure from motion in psychophysics.

Accidental viewpoint Ambiguous image or illusion

An accidental viewpoint is a singular position from which an image can be perceived, creating either an ambiguous image or an illusion. The image perceived at this angle is viewpoint-specific, meaning it cannot be perceived at any other position, known as generic or non-accidental viewpoints. These view-specific angles are involved in object recognition. In its uses in art and other visual illusions, the accidental viewpoint creates the perception of depth often on a two-dimensional surface with the assistance of monocular cues.

Geometrical Product Specification and Verification (GPS&V). standards is a set of ISO standards developed by ISO Technical Committee 213. The aim of those standards is to develop a common language to specify macrogeometry and microgeometry of products or part of products so that it can be used consistently all over the world.