Feature integration theory is a theory of attention developed in 1980 by Anne Treisman and Garry Gelade that suggests that when perceiving a stimulus, features are "registered early, automatically, and in parallel, while objects are identified separately" and at a later stage in processing. The theory has been one of the most influential psychological models of human visual attention.
According to Treisman, the first stage of the feature integration theory is the preattentive stage. During this stage, different parts of the brain automatically gather information about basic features (colors, shape, movement) that are found in the visual field. The idea that features are automatically separated appears counterintuitive. However, we are not aware of this process because it occurs early in perceptual processing, before we become conscious of the object.
The second stage of feature integration theory is the focused attention stage, where a subject combines individual features of an object to perceive the whole object. Combining individual features of an object requires attention, and selecting that object occurs within a "master map" of locations. The master map of locations contains all the locations in which features have been detected, with each location in the master map having access to the multiple feature maps. These multiple feature maps, or sub-maps, contain a large storage base of features. Features such as color, shape, orientation, sound, and movement are stored in these sub-maps [1] [2] .When attention is focused at a particular location on the map, the features currently in that position are attended to and are stored in "object files". If the object is familiar, associations are made between the object and prior knowledge, which results in identification of that object. This top-down process, using prior knowledge to inform a current situation or decision, is paramount in either identifying or recognizing objects. [3] [4] In support of this stage, researchers often refer to patients with Balint's syndrome. Due to damage in the parietal lobe, these people are unable to focus attention on individual objects. Given a stimulus that requires combining features, people with Balint's syndrome are unable to focus attention long enough to combine the features, providing support for this stage of the theory. [5]
Treisman distinguishes between two kinds of visual search tasks, "feature search" and "conjunction search". Feature searches can be performed fast and pre-attentively for targets defined by only one feature, such as color, shape, perceived direction of lighting, movement, or orientation. Features should "pop out" during search and should be able to form illusory conjunctions. Conversely, conjunction searches occur with the combination of two or more features and are identified serially. Conjunction search is much slower than feature search and requires conscious attention and effort. In multiple experiments, some referenced in this article, Treisman concluded that color, orientation, and intensity are features for which feature searches may be performed.
As a reaction to the feature integration theory, Wolfe (1994) proposed the Guided Search Model 2.0. According to this model, attention is directed to an object or location through a preattentive process. The preattentive process, as Wolfe explains, directs attention in both a bottom-up and top-down way. Information acquired through both bottom-up and top-down processing is ranked according to priority. The priority ranking guides visual search and makes the search more efficient. Whether the Guided Search Model 2.0 or the feature integration theory are "correct" theories of visual search is still a hotly debated topic.
To test the notion that attention plays a vital role in visual perception, Treisman and Schmidt (1982) designed an experiment to show that features may exist independently of one another early in processing. Participants were shown a picture involving four objects hidden by two black numbers. The display was flashed for one-fifth of a second followed by a random-dot masking field that appeared on screen to eliminate "any residual perception that might remain after the stimuli were turned off". [6] Participants were to report the black numbers they saw at each location where the shapes had previously been. The results of this experiment verified Treisman and Schmidt's hypothesis. In 18% of trials, participants reported seeing shapes "made up of a combination of features from two different stimuli", [7] even when the stimuli had great differences; this is often referred to as an illusory conjunction. Specifically, illusory conjunctions occur in various situations. For example, you may identify a passing person wearing a red shirt and yellow hat and very quickly transform him or her into one wearing a yellow shirt and red hat. The feature integration theory provides explanation for illusory conjunctions; because features exist independently of one another during early processing and are not associated with a specific object, they can easily be incorrectly combined both in laboratory settings, as well as in real life situations. [8]
As previously mentioned, Balint's syndrome patients have provided support for the feature integration theory. Particularly, Research participant R.M., who had Bálint's syndrome and was unable to focus attention on individual objects, experiences illusory conjunctions when presented with simple stimuli such as a "blue O" or a "red T." In 23% of trials, even when able to view the stimulus for as long as 10 seconds, R.M. reported seeing a "red O" or a "blue T". [9] This finding is in accordance with feature integration theory's prediction of how one with a lack of focused attention would erroneously combine features.
If people use their prior knowledge or experience to perceive an object, they are less likely to make mistakes, or illusory conjunctions. To explain this phenomenon, Treisman and Souther (1986) conducted an experiment in which they presented three shapes to participants where illusory conjunctions could exist. Surprisingly, when she told participants that they were being shown a carrot, lake, and tire (in place of the orange triangle, blue oval, and black circle, respectively), illusory conjunctions did not exist. [10] Treisman maintained that prior-knowledge played an important role in proper perception. Normally, bottom-up processing is used for identifying novel objects; but, once we recall prior knowledge, top-down processing is used. This explains why people are good at identifying familiar objects rather than unfamiliar.
When identifying letters while reading, not only are their shapes picked up but also other features like their colors and surrounding elements. Individual letters are processed serially when spatially conjoined with another letter. The locations of each feature of a letter are not known in advance, even while the letter is in front of the reader. Since the location of the letter's features and/or the location of the letter is unknown, feature interchanges can occur if one is not attentively focused. This is known as lateral masking, which in this case, refers to a difficulty in separating a letter from the background. [11]
Perception is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. All perception involves signals that go through the nervous system, which in turn result from physical or chemical stimulation of the sensory system. Vision involves light striking the retina of the eye; smell is mediated by odor molecules; and hearing involves pressure waves.
Attention or focus, is the concentration of awareness on some phenomenon to the exclusion of other stimuli. It is the selective concentration on discrete information, either subjectively or objectively. William James (1890) wrote that "Attention is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence." Attention has also been described as the allocation of limited cognitive processing resources. Attention is manifested by an attentional bottleneck, in terms of the amount of data the brain can process each second; for example, in human vision, less than 1% of the visual input data stream of 1MByte/sec can enter the bottleneck, leading to inattentional blindness.
In psychology, parallel processing is the ability of the brain to simultaneously process incoming stimuli of differing quality. Parallel processing is associated with the visual system in that the brain divides what it sees into four components: color, motion, shape, and depth. These are individually analyzed and then compared to stored memories, which helps the brain identify what you are viewing. The brain then combines all of these into the field of view that is then seen and comprehended. This is a continual and seamless operation. For example, if one is standing between two different groups of people who are simultaneously carrying on two different conversations, one may be able to pick up only some information of both conversations at the same time. Parallel processing has been linked, by some experimental psychologists, to the stroop effect. In the stroop effect, an inability to attend to all stimuli is seen through people's selective attention.
The consciousness and binding problem is the problem of how objects, background and abstract or emotional features are combined into a single experience.
Simultanagnosia is a rare neurological disorder characterized by the inability of an individual to perceive more than a single object at a time. This type of visual attention problem is one of three major components of Bálint's syndrome, an uncommon and incompletely understood variety of severe neuropsychological impairments involving space representation. The term "simultanagnosia" was first coined in 1924 by Wolpert to describe a condition where the affected individual could see individual details of a complex scene but failed to grasp the overall meaning of the image.
Anne Marie Treisman was an English psychologist who specialised in cognitive psychology.
Inhibition of return (IOR) refers to an orientation mechanism that briefly enhances the speed and accuracy with which an object is detected after the object is attended, but then impairs detection speed and accuracy. IOR is usually measured with a cue-response paradigm, in which a person presses a button when they detect a target stimulus following the presentation of a cue that indicates the location in which the target will appear. The cue can be exogenous, or endogenous. Inhibition of return results from oculomotor activation, regardless of whether it was produced by exogenous signals or endogenously. Although IOR occurs for both visual and auditory stimuli, IOR is greater for visual stimuli, and is studied more often than auditory stimuli.
Visual search is a type of perceptual task requiring attention that typically involves an active scan of the visual environment for a particular object or feature among other objects or features. Visual search can take place with or without eye movements. The ability to consciously locate an object or target amongst a complex array of stimuli has been extensively studied over the past 40 years. Practical examples of using visual search can be seen in everyday life, such as when one is picking out a product on a supermarket shelf, when animals are searching for food among piles of leaves, when trying to find a friend in a large crowd of people, or simply when playing visual search games such as Where's Wally?
Illusory conjunctions are psychological effects in which participants combine features of two objects into one object. There are visual illusory conjunctions, auditory illusory conjunctions, and illusory conjunctions produced by combinations of visual and tactile stimuli. Visual illusory conjunctions are thought to occur due to a lack of visual spatial attention, which depends on fixation and the amount of time allotted to focus on an object. With a short span of time to interpret an object, blending of different aspects within a region of the visual field – like shapes and colors – can occasionally be skewed, which results in visual illusory conjunctions. For example, in a study designed by Anne Treisman and Schmidt, participants were required to view a visual presentation of numbers and shapes in different colors. Some shapes were larger than others but all shapes and numbers were evenly spaced and shown for just 200 ms. When the participants were asked to recall the shapes they reported answers such as a small green triangle instead of a small green circle. If the space between the objects is smaller, illusory conjunctions occur more often.
N2pc refers to an ERP component linked to selective attention. The N2pc appears over visual cortex contralateral to the location in space to which subjects are attending; if subjects pay attention to the left side of the visual field, the N2pc appears in the right hemisphere of the brain, and vice versa. This characteristic makes it a useful tool for directly measuring the general direction of a person's attention with fine-grained temporal resolution.
Attenuation theory, also known as Treisman’s Attenuation Model, is a model of selective attention proposed by Anne Treisman, and can be seen as a revision of Donald Broadbent's filter model. Treisman proposed attenuation theory as a means to explain how unattended stimuli sometimes came to be processed in a more rigorous manner than what Broadbent's filter model could account for. As a result, attenuation theory added layers of sophistication to Broadbent's original idea of how selective attention might operate: claiming that instead of a filter which barred unattended inputs from ever entering awareness, it was a process of attenuation. Thus, the attenuation of unattended stimuli would make it difficult, but not impossible to extract meaningful content from irrelevant inputs, so long as stimuli still possessed sufficient "strength" after attenuation to make it through a hierarchical analysis process.
Broadbent's filter model is an early selection theory of attention.
Object-based attention refers to the relationship between an ‘object’ representation and a person’s visually stimulated, selective attention, as opposed to a relationship involving either a spatial or a feature representation; although these types of selective attention are not necessarily mutually exclusive. Research into object-based attention suggests that attention improves the quality of the sensory representation of a selected object, and results in the enhanced processing of that object’s features.
Biased competition theory advocates the idea that each object in the visual field competes for cortical representation and cognitive processing. This theory suggests that the process of visual processing can be biased by other mental processes such as bottom-up and top-down systems which prioritize certain features of an object or whole items for attention and further processing. Biased competition theory is, simply stated, the competition of objects for processing. This competition can be biased, often toward the object that is currently attended in the visual field, or alternatively toward the object most relevant to behavior.
Emotion perception refers to the capacities and abilities of recognizing and identifying emotions in others, in addition to biological and physiological processes involved. Emotions are typically viewed as having three components: subjective experience, physical changes, and cognitive appraisal; emotion perception is the ability to make accurate decisions about another's subjective experience by interpreting their physical changes through sensory systems responsible for converting these observed changes into mental representations. The ability to perceive emotion is believed to be both innate and subject to environmental influence and is also a critical component in social interactions. How emotion is experienced and interpreted depends on how it is perceived. Likewise, how emotion is perceived is dependent on past experiences and interpretations. Emotion can be accurately perceived in humans. Emotions can be perceived visually, audibly, through smell and also through bodily sensations and this process is believed to be different from the perception of non-emotional material.
Natural scene perception refers to the process by which an agent visually takes in and interprets scenes that it typically encounters in natural modes of operation. This process has been modeled in several different ways that are guided by different concepts.
In cognitive psychology, intertrial priming is an accumulation of the priming effect over multiple trials, where "priming" is the effect of the exposure to one stimulus on subsequently presented stimuli. Intertrial priming occurs when a target feature is repeated from one trial to the next, and typically results in speeded response times to the target. A target is the stimulus participants are required to search for. For example, intertrial priming occurs when the task is to respond to either a red or a green target, and the response time to a red target is faster if the preceding trial also has a red target.
Visual spatial attention is a form of visual attention that involves directing attention to a location in space. Similar to its temporal counterpart visual temporal attention, these attention modules have been widely implemented in video analytics in computer vision to provide enhanced performance and human interpretable explanation of deep learning models.
Visual indexing theory, also known as FINST theory, is a theory of early visual perception developed by Zenon Pylyshyn in the 1980s. It proposes a pre-attentive mechanism whose function is to individuate salient elements of a visual scene, and track their locations across space and time. Developed in response to what Pylyshyn viewed as limitations of prominent theories of visual perception at the time, visual indexing theory is supported by several lines of empirical evidence.
Ensemble coding, also known as ensemble perception or summary representation, is a theory in cognitive neuroscience about the internal representation of groups of objects in the human mind. Ensemble coding proposes that such information is recorded via summary statistics, particularly the average or variance. Experimental evidence tends to support the theory for low-level visual information, such as shapes and sizes, as well as some high-level features such as face gender. Nonetheless, it remains unclear the extent to which ensemble coding applies to high-level or non-visual stimuli, and the theory remains the subject of active research.