In the study of vision, visual short-term memory (VSTM) is one of three broad memory systems including iconic memory and long-term memory. VSTM is a type of short-term memory, but one limited to information within the visual domain.
The term VSTM refers in a theory-neutral manner to the non-permanent storage of visual information over an extended period of time. [1] The visuospatial sketchpad is a VSTM subcomponent within the theoretical model of working memory proposed by Alan Baddeley; in which it is argued that a working memory aids in mental tasks like planning and comparison. [2] [3] Whereas iconic memories are fragile, decay rapidly, and are unable to be actively maintained, visual short-term memories are robust to subsequent stimuli and last over many seconds. VSTM is distinguished from long-term memory, on the other hand, primarily by its very limited capacity. [4] [5]
The introduction of stimuli which were hard to verbalize, and unlikely to be held in long-term memory, revolutionized the study of VSTM in the early 1970s. [6] [7] [8] The basic experimental technique used required observers to indicate whether two matrices, [7] [8] or figures, [6] separated by a short temporal interval, were the same. The finding that observers were able to report that a change had occurred, at levels significantly above chance, indicated that they were able to encode aspect of the first stimulus in a purely visual store, at least for the period until the presentation of the second stimulus. However, as the stimuli used were complex, and the nature of the change relatively uncontrolled, these experiments left open various questions, such as:
Much effort has been dedicated to investigating the capacity limits of VSTM. In a typical change-detection task, observers are presented with two arrays, composed of a number of stimuli. The two arrays are separated by a short temporal interval, and the task of observers is to decide if the first and second arrays are identical, or whether one item differs across the two displays. [lower-alpha 1] Performance is critically dependent on the number of items in the array. While performance is generally almost perfect for arrays of one or two items, correct responses invariably decline in a monotonic fashion as more items are added. Different theoretical models have been put forward to explain limits on VSTM storage, and distinguishing between them remains an active area of research.
A prominent class of model proposes that observers are limited by the total number of items which can be encoded, either because the capacity of VSTM itself is limited. [lower-alpha 2] This type of model has obvious similarities to urn models used in probability theory. [lower-alpha 3] In essence, an urn model assumes that VSTM is restricted in storage capacity to only a few items, k (often estimated to lie in the range of three-to-five in adults, though fewer in children [9] ). The probability that a suprathreshold change will be detected is simply the probability that the change element is encoded in VSTM (i.e., k/N). This capacity limit has been linked to the posterior parietal cortex, the activity of which initially increases with the number of stimuli in the arrays, but saturates at higher set-sizes. [10] Although urn models are used commonly to describe performance limitations in VSTM, [lower-alpha 4] it is only recently that the actual structure of items stored has been considered. Luck and colleagues have reported a series of experiments designed specifically to elucidate the structure of information held in VSTM. [11] This work provides evidence that items stored in VSTM are coherent objects, and not the more elementary features of which those objects are composed.
An alternative framework has more been put forward by Wilken and Ma who suggest that apparent capacity limitations in VSTM are caused by a monotonic decline in the quality of the internal representations stored (i.e., monotonic increase in noise) as a function of set-size. In this conception capacity limitations in memory are not caused by a limit on the number of things that can be encoded, but by a decline in the quality of the representation of each thing as more things are added to memory. In their 2004 experiments, they varied color, spatial frequency, and orientation of objects stored in VSTM using a signal detection theory approach. [lower-alpha 5] The participants were asked to report differences between the visual stimuli presented to them in consecutive order. The investigators found that different stimuli were encoded independently and in parallel, and that the major factor limiting report performance was neuronal noise (which is a function of visual set-size). [12]
Under this framework, the key limiting factor on working memory performance is the precision with which visual information can be stored, not the number of items that can be remembered. [12] Further evidence for this theory was obtained by Bays and Husain using a discrimination task. They showed that, unlike a "slot" model of VSTM, a signal-detection model could account both for discrimination performance in their study and previous results from change detection tasks. [lower-alpha 6] These authors proposed that VSTM is a flexible resource, shared out between elements of a visual scene—items that receive more resource are stored with greater precision. In support of this, they showed that increasing the salience of one item in a memory array led to that item being recalled with increased resolution, but at the cost of reducing resolution of storage for the other items in the display. [13]
Psychophysical experiments suggest that information is encoded in VSTM across multiple parallel channels, each channel associated with a particular perceptual attribute. [14] Within this framework, a decrease in an observer's ability to detect a change with increasing set-size can be attributed to two different processes:
However, the Greenlee-Thomas model [15] suffers from two failings as a model for the effects of set-size in VSTM. First, it has only been empirically tested with displays composed of one or two elements. It has been shown repeatedly in various experimental paradigms that set-size effects differ for displays composed of a relatively small number of elements (i.e., 4 items or less), and those associated with larger displays (i.e., more than 4 items). The Greenlee-Thomas model offers no explanation for why this might be so. Second, while Magnussen, Greenlee, and Thomas [18] [ full citation needed ] are able to use this model to predict that greater interference will be found when dual decisions are made within the same perceptual dimension, rather than across different perceptual dimensions, this prediction lacks quantitative rigor, and is unable to accurately anticipate the size of the threshold increase, or give a detailed explanation of its underlying causes.
In addition to the Greenlee-Thomas model, there are two other prominent approaches for describing set-size effects in VSTM. These two approaches can be referred to as sample size models, [19] and urn models. [lower-alpha 7] They differ from the Greenlee-Thomas model by:
There is some evidence of an intermediate visual store with characteristics of both iconic memory and VSTM. [20] This intermediate store is proposed to have high capacity (up to 15 items) and prolonged memory trace duration (up to 4 seconds). It coexists with VSTM but unlike it visual stimuli can overwrite the contents of its visual store. [21] Further studies suggests an involvement of visual area V4 in the retention of information about the color of the stimulus in visual working memory, [22] [23] and the role of the VO1 area for retaining information about its shape. [23] It has been shown that in the VO2 region all characteristics of the stimulus retained in memory are combined into a holistic image. [23]
VSTM is thought[ by whom? ] to be the visual component of the working memory system, and as such it is used as a buffer for temporary information storage during the process of naturally occurring tasks. But what naturally occurring tasks actually require VSTM? Most work on this issue has focused on the role of VSTM in bridging the sensory gaps caused by saccadic eye movements. These sudden shift of gaze typically occur 2–4 times per second, and vision is briefly suppressed while the eyes are moving. Thus, the visual input consists of a series of spatially shifted snapshots of the overall scene, separated by brief gaps. Over time, a rich and detailed long-term memory representation is constructed from these brief glimpses of the input, and VSTM is thought[ by whom? ] to bridge the gaps between these glimpses and to allow the relevant portions of one glimpse to be aligned with the relevant portions of the next glimpse. Both spatial and object VSTM systems may play important roles in the integration of information across eye movements. Eye movements are also affected by VSTM representations. The constructed representations held in VSTM can affect eye movements even when the task does not explicitly require eye movements: the direction of small microsaccades point towards the location of objects in VSTM. [24]
Long-term memory (LTM) is the stage of the Atkinson–Shiffrin memory model in which informative knowledge is held indefinitely. It is defined in contrast to sensory memory, the initial stage, and short-term or working memory, the second stage, which persists for about 18 to 30 seconds. LTM is grouped into two categories known as explicit memory and implicit memory. Explicit memory is broken down into episodic and semantic memory, while implicit memory includes procedural memory and emotional conditioning.
Short-term memory is the capacity for holding a small amount of information in an active, readily available state for a short interval. For example, short-term memory holds a phone number that has just been recited. The duration of short-term memory is estimated to be on the order of seconds. The commonly cited capacity of 7 items, found in Miller's Law, has been superseded by 4±1 items. In contrast, long-term memory holds information indefinitely.
The Atkinson–Shiffrin model is a model of memory proposed in 1968 by Richard Atkinson and Richard Shiffrin. The model asserts that human memory has three separate components:
Iconic memory is the visual sensory memory register pertaining to the visual domain and a fast-decaying store of visual information. It is a component of the visual memory system which also includes visual short-term memory (VSTM) and long-term memory (LTM). Iconic memory is described as a very brief, pre-categorical, high capacity memory store. It contributes to VSTM by providing a coherent representation of our entire visual perception for a very brief period of time. Iconic memory assists in accounting for phenomena such as change blindness and continuity of experience during saccades. Iconic memory is no longer thought of as a single entity but instead, is composed of at least two distinctive components. Classic experiments including Sperling's partial report paradigm as well as modern techniques continue to provide insight into the nature of this SM store.
In cognitive psychology, chunking is a process by which small individual pieces of a set of information are bound together to create a meaningful whole later on in memory. The chunks, by which the information is grouped, are meant to improve short-term retention of the material, thus bypassing the limited capacity of working memory and allowing the working memory to be more efficient. A chunk is a collection of basic units that are strongly associated with one another, and have been grouped together and stored in a person's memory. These chunks can be retrieved easily due to their coherent grouping. It is believed that individuals create higher-order cognitive representations of the items within the chunk. The items are more easily remembered as a group than as the individual items themselves. These chunks can be highly subjective because they rely on an individual's perceptions and past experiences, which are linked to the information set. The size of the chunks generally ranges from two to six items but often differs based on language and culture.
Baddeley's model of working memory is a model of human memory proposed by Alan Baddeley and Graham Hitch in 1974, in an attempt to present a more accurate model of primary memory. Working memory splits primary memory into multiple components, rather than considering it to be a single, unified construct.
Visual memory describes the relationship between perceptual processing and the encoding, storage and retrieval of the resulting neural representations. Visual memory occurs over a broad time range spanning from eye movements to years in order to visually navigate to a previously visited location. Visual memory is a form of memory which preserves some characteristics of our senses pertaining to visual experience. We are able to place in memory visual information which resembles objects, places, animals or people in a mental image. The experience of visual memory is also referred to as the mind's eye through which we can retrieve from our memory a mental image of original objects, places, animals or people. Visual memory is one of several cognitive systems, which are all interconnected parts that combine to form the human memory. Types of palinopsia, the persistence or recurrence of a visual image after the stimulus has been removed, is a dysfunction of visual memory.
Inattentional blindness or perceptual blindness occurs when an individual fails to perceive an unexpected stimulus in plain sight, purely as a result of a lack of attention rather than any vision defects or deficits. When it becomes impossible to attend to all the stimuli in a given situation, a temporary "blindness" effect can occur, as individuals fail to see unexpected but often salient objects or stimuli.
Attentional blink (AB) is a phenomenon that reflects temporal limitations in the ability to deploy visual attention. When people must identify two visual stimuli in quick succession, accuracy for the second stimulus is poor if it occurs within 200 to 500 ms of the first.
Visual search is a type of perceptual task requiring attention that typically involves an active scan of the visual environment for a particular object or feature among other objects or features. Visual search can take place with or without eye movements. The ability to consciously locate an object or target amongst a complex array of stimuli has been extensively studied over the past 40 years. Practical examples of using visual search can be seen in everyday life, such as when one is picking out a product on a supermarket shelf, when animals are searching for food among piles of leaves, when trying to find a friend in a large crowd of people, or simply when playing visual search games such as Where's Wally?
Memory has the ability to encode, store and recall information. Memories give an organism the capability to learn and adapt from previous experiences as well as build relationships. Encoding allows a perceived item of use or interest to be converted into a construct that can be stored within the brain and recalled later from long-term memory. Working memory stores information for immediate use or manipulation, which is aided through hooking onto previously archived items already present in the long-term memory of an individual.
Repetition priming refers to improvements in a behavioural response when stimuli are repeatedly presented. The improvements can be measured in terms of accuracy or reaction time and can occur when the repeated stimuli are either identical or similar to previous stimuli. These improvements have been shown to be cumulative, so as the number of repetitions increases the responses get continually faster up to a maximum of around seven repetitions. These improvements are also found when the repeated items are changed slightly in terms of orientation, size and position. The size of the effect is also modulated by the length of time the item is presented for and the length time between the first and subsequent presentations of the repeated items.
Priming is the idea that exposure to one stimulus may influence a response to a subsequent stimulus, without conscious guidance or intention. The priming effect is the positive or negative effect of a rapidly presented stimulus on the processing of a second stimulus that appears shortly after. Generally speaking, the generation of priming effect depends on the existence of some positive or negative relationship between priming and target stimuli. For example, the word nurse might be recognized more quickly following the word doctor than following the word bread. Priming can be perceptual, associative, repetitive, positive, negative, affective, semantic, or conceptual. Priming effects involve word recognition, semantic processing, attention, unconscious processing, and many other issues, and are related to differences in various writing systems. Onset of priming effects can be almost instantaneous.
Emotion can have a powerful effect on humans and animals. Numerous studies have shown that the most vivid autobiographical memories tend to be of emotional events, which are likely to be recalled more often and with more clarity and detail than neutral events.
Perceptual learning is learning better perception skills such as differentiating two musical tones from one another or categorizations of spatial and temporal patterns relevant to real-world expertise. Examples of this may include reading, seeing relations among chess pieces, and knowing whether or not an X-ray image shows a tumor.
Haptic memory is the form of sensory memory specific to touch stimuli. Haptic memory is used regularly when assessing the necessary forces for gripping and interacting with familiar objects. It may also influence one's interactions with novel objects of an apparently similar size and density. Similar to visual iconic memory, traces of haptically acquired information are short lived and prone to decay after approximately two seconds. Haptic memory is best for stimuli applied to areas of the skin that are more sensitive to touch. Haptics involves at least two subsystems; cutaneous, or everything skin related, and kinesthetic, or joint angle and the relative location of body. Haptics generally involves active, manual examination and is quite capable of processing physical traits of objects and surfaces.
Broadbent's filter model is an early selection theory of attention.
In cognitive psychology, intertrial priming is an accumulation of the priming effect over multiple trials, where "priming" is the effect of the exposure to one stimulus on subsequently presented stimuli. Intertrial priming occurs when a target feature is repeated from one trial to the next, and typically results in speeded response times to the target. A target is the stimulus participants are required to search for. For example, intertrial priming occurs when the task is to respond to either a red or a green target, and the response time to a red target is faster if the preceding trial also has a red target.
During every moment of an organism's life, sensory information is being taken in by sensory receptors and processed by the nervous system. Sensory information is stored in sensory memory just long enough to be transferred to short-term memory. Humans have five traditional senses: sight, hearing, taste, smell, touch. Sensory memory (SM) allows individuals to retain impressions of sensory information after the original stimulus has ceased. A common demonstration of SM is a child's ability to write letters and make circles by twirling a sparkler at night. When the sparkler is spun fast enough, it appears to leave a trail which forms a continuous image. This "light trail" is the image that is represented in the visual sensory store known as iconic memory. The other two types of SM that have been most extensively studied are echoic memory, and haptic memory; however, it is reasonable to assume that each physiological sense has a corresponding memory store. For example, children have been shown to remember specific "sweet" tastes during incidental learning trials but the nature of this gustatory store is still unclear. However, sensory memories might be related to a region of the thalamus, which serves as a source of signals encoding past experiences in the neocortex.
Ensemble coding, also known as ensemble perception or summary representation, is a theory in cognitive neuroscience about the internal representation of groups of objects in the human mind. Ensemble coding proposes that such information is recorded via summary statistics, particularly the average or variance. Experimental evidence tends to support the theory for low-level visual information, such as shapes and sizes, as well as some high-level features such as face gender. Nonetheless, it remains unclear the extent to which ensemble coding applies to high-level or non-visual stimuli, and the theory remains the subject of active research.