This article is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic.(December 2012) |
Pandemonium architecture is a theory in cognitive science that describes how visual images are processed by the brain. It has applications in artificial intelligence and pattern recognition. The theory was developed by the artificial intelligence pioneer Oliver Selfridge in 1959. It describes the process of object recognition as the exchange of signals within a hierarchical system of detection and association, the elements of which Selfridge metaphorically termed "demons". This model is now recognized as the basis of visual perception in cognitive science.
Pandemonium architecture arose in response to the inability of template matching theories to offer a biologically plausible explanation of the image constancy phenomenon. Contemporary[ when? ] researchers praise this architecture for its elegancy and creativity; that the idea of having multiple independent systems (e.g., feature detectors) working in parallel to address the image constancy phenomena of pattern recognition is powerful yet simple. The basic idea of the pandemonium architecture is that a pattern is first perceived in its parts before the "whole". [1]
Pandemonium architecture was one of the first computational models in pattern recognition. Although not perfect, the pandemonium architecture influenced the development of modern connectionist, artificial intelligence, and word recognition models. [2]
Most research in perception has been focused on the visual system, investigating the mechanisms of how we see and understand objects. A critical function of our visual system is its ability to recognize patterns, but the mechanism by which this is achieved is unclear. [3]
The earliest theory that attempted to explain how we recognize patterns is the template matching model. According to this model, we compare all external stimuli against an internal mental representation. If there is "sufficient" overlap between the perceived stimulus and the internal representation, we will "recognize" the stimulus. Although some machines follow a template matching model (e.g., bank machines verifying signatures and accounting numbers), the theory is critically flawed in explaining the phenomena of image constancy: we can easily recognize a stimulus regardless of the changes in its form of presentation (e.g., T and T are both easily recognized as the letter T). It is highly unlikely that we have a stored template for all of the variations of every single pattern. [4]
As a result of the biological plausibility criticism of the template matching model, feature detection models began to rise. In a feature detection model, the image is first perceived in its basic individual elements before it is recognized as a whole object. For example, when we are presented with the letter A, we would first see a short horizontal line and two slanted long diagonal lines. Then we would combine the features to complete the perception of A. Each unique pattern consists of different combination of features, which means those that are formed with the same features will generate the same recognition. That is, regardless of how we rotate the letter A, is still perceived as the letter A. It is easy for this sort of architecture to account for the image constancy phenomena because you only need to "match" at the basic featural level, which is presumed to be limited and finite, thus biologically plausible. The best known feature detection model is called the pandemonium architecture. [4]
The pandemonium architecture was originally developed by Oliver Selfridge in the late 1950s. The architecture is composed of different groups of "demons" working independently to process the visual stimulus. Each group of demons is assigned to a specific stage in recognition, and within each group, the demons work in parallel. There are four major groups of demons in the original architecture. [3]
Stage | Demon name | Function |
---|---|---|
1 | Image demon | Records the image that is received in the retina. |
2 | Feature demons | There are many feature demons, each representing a specific feature. For example, there is a feature demon for short straight lines, another for curved lines, and so forth. Each feature demon's job is to "yell" if they detect a feature that they correspond to. Note that, feature demons are not meant to represent any specific neurons, but to represent a group of neurons that have similar functions. For example, the vertical line feature demon is used to represent the neurons that respond to the vertical lines in the retina image. |
3 | Cognitive demons | Watch the "yelling" from the feature demons. Each cognitive demon is responsible for a specific pattern (e.g., a letter in the alphabet). The "yelling" of the cognitive demons is based on how much of their pattern was detected by the feature demons. The more features the cognitive demons find that correspond to their pattern, the louder they "yell". For example, if the curved, long straight and short angled line feature demons are yelling really loud, the R letter cognitive demon might get really excited, and the P letter cognitive demon might be somewhat excited as well; but the Z letter cognitive demon is very likely to be quiet. |
4 | Decision demon | Represents the final stage of processing. It listens to the "yelling" produced by the cognitive demons. It selects the loudest cognitive demon. The demon that gets selected becomes our conscious perception. Continuing with our previous example, the R cognitive demon would be the loudest, seconded by P; therefore we will perceive R, but if we were to make a mistake because of poor displaying conditions (e.g., letters are quickly flashed or have parts occluded), it is likely to be P. Note that, the "pandemonium" simply represents the cumulative "yelling" produced by the system. |
The concept of feature demons, that there are specific neurons dedicated to perform specialized processing is supported by research in neuroscience. Hubel and Wiesel found there were specific cells in a cat's brain that responded to specific lengths and orientations of a line. Similar findings were discovered in frogs, octopuses and a variety of other animals. Octopuses were discovered to be only sensitive to verticality of lines, whereas frogs demonstrated a wider range of sensitivity. These animal experiments demonstrate that feature detectors seem to be a very primitive development. That is, it did not result from the higher cognitive development of humans. Not surprisingly, there is also evidence that the human brain possesses these elementary feature detectors as well. [5] [6] [7]
Moreover, this architecture is capable of learning, similar to a back-propagation styled neural network. The weight between the cognitive and feature demons can be adjusted in proportion to the difference between the correct pattern and the activation from the cognitive demons. To continue with our previous example, when we first learned the letter R, we know is composed of a curved, long straight, and a short angled line. Thus when we perceive those features, we perceive R. However, the letter P consists of very similar features, so during the beginning stages of learning, it is likely for this architecture to mistakenly identify R as P. But through constant exposure of confirming R's features to be identified as R, the weights of R's features to P are adjusted so the P response becomes inhibited (e.g., learning to inhibit the P response when a short angled line is detected). In principle, a pandemonium architecture can recognize any pattern. [8]
As mentioned earlier, this architecture makes error predictions based on the amount of overlapping features. Such as, the most likely error for R should be P. Thus, in order to show this architecture represents the human pattern recognition system we must put these predictions into test. Researchers have constructed scenarios where various letters are presented in situations that make them difficult to identify; then types of errors were observed, which was used to generate confusion matrices: where all of the errors for each letter are recorded. Generally, the results from these experiments matched the error predictions from the pandemonium architecture. Also as a result of these experiments, some researchers have proposed models that attempted to list all of the basic features in the Roman alphabet. [9] [10] [11] [12]
A major criticism of the pandemonium architecture is that it adopts a completely bottom-up processing: recognition is entirely driven by the physical characteristics of the targeted stimulus. This means that it is unable to account for any top-down processing effects, such as context effects (e.g., pareidolia), where contextual cues can facilitate (e.g., word superiority effect: it is relatively easier to identify a letter when it is part of a word than in isolation) processing. However, this is not a fatal criticism to the overall architecture, because is relatively easy to add a group of contextual demons to work along with the cognitive demons to account for these context effects. [13]
Although the pandemonium architecture is built on the fact that it can account for the image constancy phenomena, some researchers have argued otherwise; and pointed out that the pandemonium architecture might share the same flaws from the template matching models. For example, the letter H is composed of 2 long vertical lines and a short horizontal line; but if we rotate the H 90 degrees in either direction, it is now composed of 2 long horizontal lines and a short vertical line. In order to recognize the rotated H as H, we would need a rotated H cognitive demon. Thus we might end up with a system that requires a large number of cognitive demons in order to produce accurate recognition, which would lead to the same biological plausibility criticism of the template matching models. However, it is rather difficult to judge the validity of this criticism because the pandemonium architecture does not specify how and what features are extracted from incoming sensory information, it simply outlines the possible stages of pattern recognition. But of course that raises its own questions, to which it is almost impossible to criticize such a model if it does not include specific parameters. Also, the theory appears to be rather incomplete without defining how and what features are extracted, which proves to be especially problematic with complex patterns (e.g., extracting the weight and features of a dog). [3] [14]
Some researchers have also pointed out that the evidence supporting the pandemonium architecture has been very narrow in its methodology. Majority of the research that supports this architecture has often referred to its ability to recognize simple schematic drawings that are selected from a small finite set (e.g., letters in the Roman alphabet). Evidence from these types of experiments can lead to overgeneralized and misleading conclusions, because the recognition process of complex, three-dimensional patterns could be very different from simple schematics. Furthermore, some have criticized the methodology used in generating the confusion matrix, because it confounds perceptual confusion (error in identification caused by overlapping features between the error and the correct answer) with post-perceptual guessing (people randomly guessing because they cannot be sure what they saw). However, these criticisms were somewhat addressed when similar results were replicated with other paradigms (e.g., go/no go and same-different tasks), supporting the claim that humans do have elementary feature detectors. These new paradigms relied on reaction time as the dependent variable, which also avoided the problem of empty cells that is inherent with the confusion matrix (statistical analyses are difficult to conduct and interpret when the data have empty cells). [7]
Additionally, some researchers have pointed out that feature accumulation theories like the pandemonium architecture have the processing stages of pattern recognition almost backwards. This criticism was mainly used by advocates of the global-to-local theory, who argued and provided evidence that perception begins with a blurry view of the whole that refines overtime, implying feature extraction does not happen in the early stages of recognition. [15] However, there is nothing to prevent a demon from recognizing a global pattern in parallel with other demons recognizing local patterns within the global pattern.
The pandemonium architecture has been applied to solve several real-world problems, such as translating hand-sent Morse codes and identifying hand-printed letters. The overall accuracy of pandemonium-based models are impressive, even when the system was given a short learning period. For example, Doyle constructed a pandemonium-based system with over 30 complex feature-analyzers. He then fed his system several hundred letters for learning. During this phase, the system analyzed the inputted letter and generated its own output (what the system identifies the letter as). The output from the system was compared against the correct identification, which sends an error signal back to the system to adjust the weights between the features analyzers accordingly. In the testing phase, unfamiliar letters were presented (different style and size of the letters than those that were presented in the learning phase), and the system was able to achieve a near 90% accuracy. Because of its impressive capability to recognize words, all modern theories on how humans read and recognize words follow this hierarchal structure: word recognition begins with feature extractions of the letters, which then activates the letter detectors [16] (e.g., SOLAR, [17] SERIOL, [18] IA, [19] DRC [20] ).
Based on the original pandemonium architecture, John Jackson has extended the theory to explain phenomena beyond perception. Jackson offered the analogy of an arena to account for "consciousness". His arena consisted of a stand, a playing field, and a sub-arena. The arena was populated by a multitude of demons. The demons that were designated in the playing fields were the active demons, as they represent the active elements of human consciousness. The demons in the stands are to watch those in the playing field until something excites them; each demon is excited by different things. The more excited the demons get, the louder they yell. If a demon yells pass a set threshold, it gets to join the other demons in the playing field and perform its function, which may then excite other demons, and this cycle continues. The sub-arena in the analogy functions as the learning and feedback mechanism of the system. The learning system here is similar to any other neural styled networks, which is through modifying the connection strength between the demons; in other words, how the demons respond to each other's yelling. This multiple agent approach to human information processing became the assumption for many modern artificial intelligence systems. [21] [22]
Although the pandemonium architecture arose as a response to address a major criticism of the template matching theories, the two are actually rather similar in some sense: there is a process where a specific set of features for items is matched against some sort of mental representation. The critical difference between the two is that the image is directly compared against an internal representation in the template matching theories, whereas with the pandemonium architecture, the image is first diffused and processed at the featural level. This granted pandemonium architectures tremendous power because it is capable of recognizing a stimulus despite its changes in size, style and other transformations; without the presumption of an unlimited pattern memory. It is also unlikely that the template matching theories will function properly when faced with realistic visual inputs, where objects are presented in three dimensions and often occluded by other objects (e.g., half of a book is covered by a piece of paper, but we can still recognize it as a book with relative ease). Nonetheless, some researchers have conducted experiments comparing the two theories. Not surprisingly, the results often favored a hierarchal feature building model like the pandemonium architecture. [23] [24] [25]
The Hebbian model resembles feature-oriented theories like the pandemonium architecture in many aspects. The first level of processing in the Hebbian model is called the cell assemblies, which have very similar functions to feature demons. However, cell assemblies are more limited than the feature demons, because it can only extracts lines, angles and contours. The cell assemblies are combined to form phase sequences, which is very similar to the function of the cognitive demons. In a sense, many consider the Hebbian model to be a crossover between the template and feature matching theories, as the features extracted from the Hebbian models can be considered as basic templates. [8]
Perception is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. All perception involves signals that go through the nervous system, which in turn result from physical or chemical stimulation of the sensory system. Vision involves light striking the retina of the eye; smell is mediated by odor molecules; and hearing involves pressure waves.
An illusion is a distortion of the senses, which can reveal how the mind normally organizes and interprets sensory stimulation. Although illusions distort the human perception of reality, they are generally shared by most people.
Categorization is a type of cognition involving conceptual differentiation between characteristics of conscious experience, such as objects, events, or ideas. It involves the abstraction and differentiation of aspects of experience by sorting and distinguishing between groupings, through classification or typification on the basis of traits, features, similarities or other criteria that are universal to the group. Categorization is considered one of the most fundamental cognitive abilities, and it is studied particularly by psychology and cognitive linguistics.
Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.
Hebbian theory is a neuropsychological theory claiming that an increase in synaptic efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. It is an attempt to explain synaptic plasticity, the adaptation of brain neurons during the learning process. It was introduced by Donald Hebb in his 1949 book The Organization of Behavior. The theory is also called Hebb's rule, Hebb's postulate, and cell assembly theory. Hebb states it as follows:
Let us assume that the persistence or repetition of a reverberatory activity tends to induce lasting cellular changes that add to its stability. ... When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.
Apophenia is the tendency to perceive meaningful connections between unrelated things.
Feature integration theory is a theory of attention developed in 1980 by Anne Treisman and Garry Gelade that suggests that when perceiving a stimulus, features are "registered early, automatically, and in parallel, while objects are identified separately" and at a later stage in processing. The theory has been one of the most influential psychological models of human visual attention.
In cognitive psychology, the word superiority effect (WSE) refers to the phenomenon that people have better recognition of letters presented within words as compared to isolated letters and to letters presented within nonword strings. Studies have also found a WSE when letter identification within words is compared to letter identification within pseudowords and pseudohomophones.
Visual search is a type of perceptual task requiring attention that typically involves an active scan of the visual environment for a particular object or feature among other objects or features. Visual search can take place with or without eye movements. The ability to consciously locate an object or target amongst a complex array of stimuli has been extensively studied over the past 40 years. Practical examples of using visual search can be seen in everyday life, such as when one is picking out a product on a supermarket shelf, when animals are searching for food among piles of leaves, when trying to find a friend in a large crowd of people, or simply when playing visual search games such as Where's Wally?
Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language. Speech perception research has applications in building computer systems that can recognize speech, in improving speech recognition for hearing- and language-impaired listeners, and in foreign-language teaching.
In psychology and cognitive neuroscience, pattern recognition is a cognitive process that matches information from a stimulus with information retrieved from memory.
CHREST is a symbolic cognitive architecture based on the concepts of limited attention, limited short-term memories, and chunking. The architecture takes into low-level aspects of cognition such as reference perception, long and short-term memory stores, and methodology of problem-solving and high-level aspects such as the use of strategies. Learning, which is essential in the architecture, is modelled as the development of a network of nodes (chunks) which are connected in various ways. This can be contrasted with Soar and ACT-R, two other cognitive architectures, which use productions for representing knowledge. CHREST has often been used to model learning using large corpora of stimuli representative of the domain, such as chess games for the simulation of chess expertise or child-directed speech for the simulation of children's development of language. In this respect, the simulations carried out with CHREST have a flavour closer to those carried out with connectionist models than with traditional symbolic models.
Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.
Common coding theory is a cognitive psychology theory describing how perceptual representations and motor representations are linked. The theory claims that there is a shared representation for both perception and action. More important, seeing an event activates the action associated with that event, and performing an action activates the associated perceptual event.
Visual object recognition refers to the ability to identify the objects in view based on visual input. One important signature of visual object recognition is "object invariance", or the ability to identify objects across changes in the detailed context in which objects are viewed, including changes in illumination, object pose, and background context.
Perceptual learning is learning better perception skills such as differentiating two musical tones from one another or categorizations of spatial and temporal patterns relevant to real-world expertise. Examples of this may include reading, seeing relations among chess pieces, and knowing whether or not an X-ray image shows a tumor.
In cognitive science, prototype-matching is a theory of pattern recognition that describes the process by which a sensory unit registers a new stimulus and compares it to the prototype, or standard model, of said stimulus. Unlike template matching and featural analysis, an exact match is not expected for prototype-matching, allowing for a more flexible model. An object is recognized by the sensory unit when a similar prototype match is found.
Statistical language acquisition, a branch of developmental psycholinguistics, studies the process by which humans develop the ability to perceive, produce, comprehend, and communicate with natural language in all of its aspects through the use of general learning mechanisms operating on statistical patterns in the linguistic input. Statistical learning acquisition claims that infants' language-learning is based on pattern perception rather than an innate biological grammar. Several statistical elements such as frequency of words, frequent frames, phonotactic patterns and other regularities provide information on language structure and meaning for facilitation of language acquisition.
Object-based attention refers to the relationship between an ‘object’ representation and a person’s visually stimulated, selective attention, as opposed to a relationship involving either a spatial or a feature representation; although these types of selective attention are not necessarily mutually exclusive. Research into object-based attention suggests that attention improves the quality of the sensory representation of a selected object, and results in the enhanced processing of that object’s features.
Biological motion perception is the act of perceiving the fluid unique motion of a biological agent. The phenomenon was first documented by Swedish perceptual psychologist, Gunnar Johansson, in 1973. There are many brain areas involved in this process, some similar to those used to perceive faces. While humans complete this process with ease, from a computational neuroscience perspective there is still much to be learned as to how this complex perceptual problem is solved. One tool which many research studies in this area use is a display stimuli called a point light walker. Point light walkers are coordinated moving dots that simulate biological motion in which each dot represents specific joints of a human performing an action.
Neisser, Ulric.