Visual routine

Last updated

A visual routine is a means of extracting information from a visual scene.

In his studies on human visual cognition, Shimon Ullman proposed that the human visual system's task of perceiving shape properties and spatial relations is split into two successive stages: an early "bottom-up" state during which base representations are generated from the visual input, and a later "top-down" stage during which high-level primitives dubbed "visual routines" extract the desired information from the base representations. [1] In humans, the base representations generated during the bottom-up stage correspond to retinotopic maps (more than 15 of which exist in the cortex) for properties like color, edge orientation, speed of motion, and direction of motion. These base representations rely on fixed operations performed uniformly over the entire field of visual input, and do not make use of object-specific knowledge, task-specific knowledge, or other higher-level information. [2]

The visual routines proposed by Ullman are high-level primitives which parse the structure of a scene, extracting spatial information from the base representations. These visual routines are composed of a sequence of elementary visual operators specific to the task at hand. Visual routines differ from the fixed operations of the base representations in that they are not applied uniformly over the entire visual field --- rather, they are only applied to objects or areas specified by the routines. [1]

Ullman lists the following as examples of visual operators: shifting the processing focus, indexing a salient item for further processing, spreading activation over an area delimited by boundaries, tracing boundaries, and marking a location or object for future reference. When combined into visual routines, these elementary operators can be used to perform relatively sophisticated spatial tasks such as counting the number of objects satisfying a certain property, or recognizing a complex shape. [1]

A number of researchers have implemented visual routines for processing camera images, to perform tasks like determining the object a human in the camera image is pointing at. [3] [4] [5] Researchers have also applied the visual routines approach to artificial map representations, for playing real-time 2D video games. In those cases, however, the map of the video game was provided directly, alleviating the need to deal with real-world perceptual tasks like object recognition and occlusion compensation.

Related Research Articles

Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.

User interface means by which a user interacts with and controls a machine

The user interface (UI), in the industrial design field of human-computer interaction, is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine from the human end, whilst the machine simultaneously feeds back information that aids the operators' decision-making process. Examples of this broad concept of user interfaces include the interactive aspects of computer operating systems, hand tools, heavy machinery operator controls, and process controls. The design considerations applicable when creating user interfaces are related to or involve such disciplines as ergonomics and psychology.

Attention Behavioral and cognitive process of selectively concentrating on a discrete aspect of information, whether deemed subjective or objective, while ignoring other perceivable information

Attention is the behavioral and cognitive process of selectively concentrating on a discrete aspect of information, whether considered subjective or objective, while ignoring other perceivable information. It is a state of arousal. As William James (1890) wrote, "[Attention] is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence." Attention has also been described as the allocation of limited cognitive processing resources.

Solid modeling modeling of three-dimensional solids

Solid modeling is a consistent set of principles for mathematical and computer modeling of three-dimensional solids. Solid modeling is distinguished from related areas of geometric modeling and computer graphics by its emphasis on physical fidelity. Together, the principles of geometric and solid modeling form the foundation of 3D-computer-aided design and in general support the creation, exchange, visualization, animation, interrogation, and annotation of digital models of physical objects.

Soar is a cognitive architecture, originally created by John Laird, Allen Newell, and Paul Rosenbloom at Carnegie Mellon University. It is now maintained and developed by John Laird's research group at the University of Michigan.

Sensory processing is the process that organizes sensation from one’s own body and the environment, thus making it possible to use the body effectively within the environment. Specifically, it deals with how the brain processes multiple sensory modality inputs, such as proprioception, vision, auditory system, tactile, olfactory, vestibular system, interoception, and taste into usable functional outputs.

Dysmetria is a lack of coordination of movement typified by the undershoot or overshoot of intended position with the hand, arm, leg, or eye. It is a type of ataxia. It can also include an inability to judge distance or scale.

In computing, 3D interaction is a form of human-machine interaction where users are able to move and perform interaction in 3D space. Both human and machine process information where the physical position of elements in the 3D space is relevant.

In computer vision, the bag-of-words model can be applied to image classification, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.

In 3D user interaction (3DUI) the human interacts with a computer or other device with an aspect of three-dimensional space. This interaction is created thanks to the interfaces, which will be the intermediaries between human and machine.

Dr. Barbara Landau is a professor in the Department of Cognitive Science at Johns Hopkins University and also chairs the department. Landau specializes in language learning, spatial representation, and the relationships between these foundational systems of human knowledge. She examines questions about how the two systems work together to enhance human cognition and whether one is actually foundational to the other. She is known for her research of unusual cases of development and is a leading authority on language and spatial information in people with Williams syndrome.

Visual object recognition refers to the ability to identify the objects in view based on visual input. One important signature of visual object recognition is "object invariance", or the ability to identify objects across changes in the detailed context in which objects are viewed, including changes in illumination, object pose, and background context..

Perceptual learning is learning better perception skills such as differentiating two musical tones from one another or categorizations of spatial and temporal patterns relevant to real-world expertise as in reading, seeing relations among chess pieces, knowing whether or not an X-ray image shows a tumor.

There are many types of artificial neural networks (ANN).

Haptic memory is the form of sensory memory specific to touch stimuli. Haptic memory is used regularly when assessing the necessary forces for gripping and interacting with familiar objects. It may also influence one's interactions with novel objects of an apparently similar size and density. Similar to visual iconic memory, traces of haptically acquired information are short lived and prone to decay after approximately two seconds. Haptic memory is best for stimuli applied to areas of the skin that are more sensitive to touch. Haptics involves at least two subsystems; cutaneous, or everything skin related, and kinesthetic, or joint angle and the relative location of body. Haptics generally involves active, manual examination and is quite capable of processing physical traits of objects and surfaces.

Deep learning Branch of machine learning

Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

Biased competition theory advocates the idea that each object in the visual field competes for cortical representation and cognitive processing. This theory suggests that the process of visual processing can be biased by other mental processes such as bottom-up and top-down systems which prioritize certain features of an object or whole items for attention and further processing. Biased competition theory is, simply stated, the competition of objects for processing. This competition can be biased, often toward the object that is currently attended in the visual field, or alternatively toward the object most relevant to behavior.

Convolutional neural network artificial neural network

In deep learning, a convolutional neural network is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.

Human performance modeling (HPM) is a method of quantifying human behavior, cognition, and processes; a tool used by human factors researchers and practitioners for both the analysis of human function and for the development of systems designed for optimal user experience and interaction. It is a complementary approach to other usability testing methods for evaluating the impact of interface features on operator performance.

This glossary of artificial intelligence terms is about artificial intelligence, its sub-disciplines, and related fields.

References

  1. 1 2 3 "Ullman's Visual Routines, and Tekkotsu Sketches" (PDF).
  2. Huang, J.; Wechsler, H. (April 2000). "Visual routines for eye location using learning and evolution". IEEE Transactions on Evolutionary Computation. 4 (1): 73–82. doi:10.1109/4235.843496. ISSN   1089-778X.
  3. Johnson, M. P. (August 1996). Automated creation of visual routines using genetic programming. Proceedings of 13th International Conference on Pattern Recognition. 1. pp. 951–956 vol.1. doi:10.1109/ICPR.1996.546164. ISBN   978-0-8186-7282-8.
  4. Aste, Marco; Rossi, Massimo; Cattoni, Roldano; Caprile, Bruno (1998-06-01). "Visual routines for real-time monitoring of vehicle behavior". Machine Vision and Applications. 11 (1): 16–23. CiteSeerX   10.1.1.48.5736 . doi:10.1007/s001380050086. ISSN   0932-8092.
  5. Rao, Satyajit. "Visual Routines and Attention" (PDF). MIT Computer Science and Artificial Intelligence Laboratory.