Eye movement in scene viewing

Last updated
A Parisian street scene painted by Jean Beraud. A fashionable beauty reads the posters on the kiosk while two gentlemen surreptitiously ogle her. Jean Beraud, Parisian Street Scene.jpg
A Parisian street scene painted by Jean Béraud. A fashionable beauty reads the posters on the kiosk while two gentlemen surreptitiously ogle her.

Eye movement in scene viewing refers to the visual processing of information presented in scenes. This phenomenon has been studied in a range of areas such as cognitive psychology and psychophysics, where eye movement can be monitored under experimental conditions. A core aspect in these studies is the division of eye movements into saccades, the rapid movement of the eyes, and fixations, the focus of the eyes on a point. There are several factors which influence eye movement in scene viewing, both the task and knowledge of the viewer (top-down factors), and the properties of the image being viewed (bottom-up factors). The study of eye movement in scene viewing helps to understand visual processing in more natural environments.

Eye movement voluntary or involuntary movement of the eyes, helping in acquiring, fixating and tracking visual stimuli

Eye movement includes the voluntary or involuntary movement of the eyes, helping in acquiring, fixating and tracking visual stimuli. A special type of eye movement, rapid eye movement, occurs during REM sleep.

Visual processing is a term that is used to refer to the brain's ability to use and interpret visual information from the world around us. The process of converting light energy into a meaningful image is a complex process that is facilitated by numerous brain structures and higher level cognitive processes. On an anatomical level, light energy first enters the eye through the cornea, where the light is bent. After passing through the cornea, light passes through the pupil and then lens of the eye, where it is bent to a greater degree and focused upon the retina. The retina is where a group of light-sensing cells, called photoreceptors are located. There are two types of photoreceptors: rods and cones. Rods are sensitive to dim light and cones are better able to transduce bright light. Photoreceptors connect to bipolar cells, which induce action potentials in retinal ganglion cells. These retinal ganglion cells form a bundle at the optic disc, which is a part of the optic nerve. The two optic nerves from each eye meet at the optic chiasm, where nerve fibers from each nasal retina cross which results in the right half of each eye's visual field being represented in the left hemisphere and the left half of each eye's visual fields being represented in the right hemisphere. The optic tract then diverges into two visual pathways, the geniculostriate pathway and the tectopulvinar pathway, which send visual information to the visual cortex of the occipital lobe for higher level processing.

Cognitive psychology is the scientific study of mental processes such as "attention, language use, memory, perception, problem solving, creativity, and thinking". Much of the work derived from cognitive psychology has been integrated into various other modern disciplines such as Cognitive Science and of psychological study, including educational psychology, social psychology, personality psychology, abnormal psychology, developmental psychology, linguistics, and economics.

Contents

Typically, when presented with a scene, viewers demonstrate short fixation durations and long saccade amplitudes in the earlier phases of viewing an image, representing ambient processing. This is followed by longer fixations and shorter saccades in the latter phases of scene viewing, representing focal processing (Pannasch et al., 2008).

Eye movement behaviour in scene viewing differs between different levels of cognitive development. Fixation durations shorten and saccade amplitudes lengthen with the increase in age. In children, the development of saccades to the amplitude normally found in adults have occur earlier (4–6 years old) than the development of fixation durations (6–8 years old). Yet, the typical pattern of behaviour during scene viewing, when progressing from ambient processing to focal processing, has been observed to occur from the age of 2 years old (Helo, Pannasch, Sirri & Rämä, 2014).

Spatial variation

There are particular factors which affect where eye movements fixate upon, these include bottom-up factors inherent to the stimulus, and top-down factors inherent to the viewer. Even an initial glimpse of a scene has been found to generate an abstract representation of the image that can be stored in memory for use in subsequent eye movements (Castelhano & Henderson, 2007).

In bottom-up factors, eye guidance can be affected by the local contrast or salience of features in an image (Itti & Koch, 2000). An example of this would be an area with a large difference in luminance (Parkhurst et al., 2002), a greater density of edges (Mannan, Ruddock & Wooding, 1996) or binocular disparity determining the distance of different objects on the scene (Jansen et al., 2009).

The top-down factors of scenes have more of an impact than bottom-up features in affecting fixation positions. Behaviourally relevant information that are more interesting in a scene is more salient than low-level features, drawing fixations more frequently and more quickly from scene onset (Onat, Açik, Schumann & König, 2014). Local scene colour in a fixation position has an influence on where fixations occur. The presence of colour can increase the likelihood of the item being processed as a semantic object as it can aid the discrimination of the object, making it more interesting to view (Amano & Foster, 2014). When viewers are semantically primed by being presented with consistently similar scenes, the density of fixations increase, and fixation durations decrease (Henderson, Weeks Jr., & Hollingworth, 1999).

Information separate to what is presented in a scene also has an effect on the area being fixated upon. Eye movements can be guided anticipatorily by linguistic input, where if an item in the scene is presented verbally, the listener will be more likely to move their visual focus to that object (Staub, Abott & Bogartz, 2012). With regard to factors relating to viewers rather than the scene, differences have been found in cross-cultural research. Westerners have an inclination to concentrate on focal objects in a scene, where they look at focal objects more often and quicker in comparison to East Asians who attend more to contextual information, where they make more saccades to the background of the scene (Chua, Boland & Nisbett, 2002).

Temporal variation

Regarding the temporality of fixations, average fixation durations last for 300ms on average, although there is a large variability around this approximation. Some of this variability can be explained through global properties of an image, impacting upon both bottom-up processing and top-down processing.

During natural scene viewing, the masking of an image by replacing it with a grey field during fixations has an increase in fixation durations (Henderson & Pierce, 2008). More subtle degradations of an image on fixation durations, such as the decrease in luminance of an image during fixations, also increases the length of fixation durations (Henderson, Nuthmann & Luke, 2013). An asymmetric effect is shown where the increase of luminance also increases fixation durations (Walshe & Nuthmann, 2014). However, the change in factors affecting top-down processing, such as blurring or phase noise, increases fixation durations when used to degrade a scene and decreases fixation durations when used to enhance a scene (Henderson, Olejarczyk, Luke & Schmidt, 2014; Einhäuser et al., 2006).

Gaussian blur

In image processing, a Gaussian blur is the result of blurring an image by a Gaussian function. It is a widely used effect in graphics software, typically to reduce image noise and reduce detail. The visual effect of this blurring technique is a smooth blur resembling that of viewing the image through a translucent screen, distinctly different from the bokeh effect produced by an out-of-focus lens or the shadow of an object under usual illumination. Gaussian smoothing is also used as a pre-processing stage in computer vision algorithms in order to enhance image structures at different scales—see scale space representation and scale space implementation.

In signal processing, phase noise is the frequency-domain representation of random fluctuations in the phase of a waveform, corresponding to time-domain deviations from perfect periodicity ("jitter"). Generally speaking, radio-frequency engineers speak of the phase noise of an oscillator, whereas digital-system engineers work with the jitter of a clock.

Furthermore, temporal and spatial aspects interact in a complex manner. When a picture is first presented on the screen, fixations made within the first second are more likely to be directed toward the left side of the scene, whereas the opposite holds true for the remaining part of the presentation (Ossandón et al., 2014).

See also

Related Research Articles

Attention behavioral and cognitive process of selectively concentrating on a discrete aspect of information, whether deemed subjective or objective, while ignoring other perceivable information

Attention is the behavioral and cognitive process of selectively concentrating on a discrete aspect of information, whether deemed subjective or objective, while ignoring other perceivable information. It is a state of arousal. It is the taking possession by the mind in clear and vivid form of one out of what seem several simultaneous objects or trains of thought. Focalization, the concentration of consciousness, is of its essence. Attention has also been described as the allocation of limited cognitive processing resources.

Saccade short, quick, simultaneous movement of both eyes between two or more phases of fixation in the same direction; random eye movement

A saccade is a quick, simultaneous movement of both eyes between two or more phases of fixation in the same direction. In contrast, in smooth pursuit movements, the eyes move smoothly instead of in jumps. The phenomenon can be associated with a shift in frequency of an emitted signal or a movement of a body part or device. Controlled cortically by the frontal eye fields (FEF), or subcortically by the superior colliculus, saccades serve as a mechanism for fixation, rapid eye movement, and the fast phase of optokinetic nystagmus. The word appears to have been coined in the 1880s by French ophthalmologist Émile Javal, who used a mirror on one side of a page to observe eye movement in silent reading, and found that it involves a succession of discontinuous individual movements.

Binocular vision type of vision in which an animal having two eyes is able to perceive a single three-dimensional image of its surroundings

In biology, binocular vision is a type of vision in which an animal having two eyes is able to perceive a single three-dimensional image of its surroundings. Neurological researcher Manfred Fahle has stated six specific advantages of having two eyes rather than just one:

  1. It gives a creature a spare eye in case one is damaged.
  2. It gives a wider field of view. For example, humans have a maximum horizontal field of view of approximately 190 degrees with two eyes, approximately 120 degrees of which makes up the binocular field of view flanked by two uniocular fields of approximately 40 degrees.
  3. It can give stereopsis in which binocular disparity provided by the two eyes' different positions on the head gives precise depth perception. This also allows a creature to break the camouflage of another creature.
  4. It allows the angles of the eyes' lines of sight, relative to each other (vergence), and those lines relative to a particular object to be determined from the images in the two eyes. These properties are necessary for the third advantage.
  5. It allows a creature to see more of, or all of, an object behind an obstacle. This advantage was pointed out by Leonardo da Vinci, who noted that a vertical column closer to the eyes than an object at which a creature is looking might block some of the object from the left eye but that part of the object might be visible to the right eye.
  6. It gives binocular summation in which the ability to detect faint objects is enhanced.

Iconic memory is the visual sensory memory (SM) register pertaining to the visual domain and a fast-decaying store of visual information. It is a component of the visual memory system which also includes visual short-term memory (VSTM) and long-term memory (LTM). Iconic memory is described as a very brief, pre-categorical, high capacity memory store. It contributes to VSTM by providing a coherent representation of our entire visual perception for a very brief period of time. Iconic memory assists in accounting for phenomena such as change blindness and continuity of experience during saccades. Iconic memory is no longer thought of as a single entity but instead, is composed of at least two distinctive components. Classic experiments including Sperling's partial report paradigm as well as modern techniques continue to provide insight into the nature of this SM store.

Human eye mammalian eye; part of the visual organ of the human body, and move using a system of six muscles

The human eye is an organ which reacts to light and pressure. As a sense organ, the mammalian eye allows vision. Human eyes help to provide a three dimensional, moving image, normally coloured in daylight. Rod and cone cells in the retina allow conscious light perception and vision including color differentiation and the perception of depth. The human eye can differentiate between about 10 million colors and is possibly capable of detecting a single photon.

Superior colliculus structure in the mammalian midbrain

The superior colliculus is a paired structure of the mammalian midbrain. In other vertebrates the homologous structure is known as the optic tectum or simply tectum. The adjective form tectal is commonly used for mammals as well as other vertebrates.

Eye tracking

Eye tracking is the process of measuring either the point of gaze or the motion of an eye relative to the head. An eye tracker is a device for measuring eye positions and eye movement. Eye trackers are used in research on the visual system, in psychology, in psycholinguistics, marketing, as an input device for human-computer interaction, and in product design. There are a number of methods for measuring eye movement. The most popular variant uses video images from which the eye position is extracted. Other methods use search coils or are based on the electrooculogram.

Microsaccades are a kind of fixational eye movement. They are small, jerk-like, involuntary eye movements, similar to miniature versions of voluntary saccades. They typically occur during prolonged visual fixation, not only in humans, but also in animals with foveal vision. Microsaccade amplitudes vary from 2 to 120 arcminutes. The first empirical evidence for their existence was provided by Robert Darwin, the father of Charles Darwin.

Fixation (visual) eye movement

Fixation or visual fixation is the maintaining of the visual gaze on a single location. An animal can exhibit visual fixation if they possess a fovea in the anatomy of their eye. The fovea is typically located at the center of the retina and is the point of clearest vision. The species in which fixational eye movement has been found thus far include humans, primates, cats, rabbits, turtles, salamanders, and owls. Regular eye movement alternates between saccades and visual fixations, the notable exception being in smooth pursuit, controlled by a different neural substrate that appears to have developed for hunting prey. The term "fixation" can either be used to refer to the point in time and space of focus or the act of fixating. Fixation, in the act of fixating, is the point between any two saccades, during which the eyes are relatively stationary and virtually all visual input occurs. In the absence of retinal jitter, a laboratory condition known as retinal stabilization, perceptions tend to rapidly fade away. To maintain visibility, the nervous system carries out a mechanism called fixational eye movement, which continuously stimulates neurons in the early visual areas of the brain responding to transient stimuli. There are three categories of fixational eye movements: microsaccades, ocular drifts, and ocular microtremor. Although the existence of these movements has been known since the 1950s, only recently their functions have started to become clear.

Eye movement in music reading

Eye movement in music reading is the scanning of a musical score by a musician's eyes. This usually occurs as the music is read during performance, although musicians sometimes scan music silently to study it. The phenomenon has been studied by researchers from a range of backgrounds, including cognitive psychology and music education. These studies have typically reflected a curiosity among performing musicians about a central process in their craft, and a hope that investigating eye movement might help in the development of more effective methods of training musicians' sight reading skills.

Eye movement in reading involves the visual processing of written text. This was described by the French ophthalmologist Louis Émile Javal in the late 19th century. He reported that eyes do not move continuously along a line of text, but make short, rapid movements (saccades) intermingled with short stops (fixations). Javal's observations were characterised by a reliance on naked-eye observation of eye movement in the absence of technology. From the late 19th to the mid-20th century, investigators used early tracking technologies to assist their observation, in a research climate that emphasised the measurement of human behaviour and skill for educational ends. Most basic knowledge about eye movement was obtained during this period. Since the mid-20th century, there have been three major changes: the development of non-invasive eye-movement tracking equipment; the introduction of computer technology to enhance the power of this equipment to pick up, record, and process the huge volume of data that eye movement generates; and the emergence of cognitive psychology as a theoretical and methodological framework within which reading processes are examined. Sereno & Rayner (2003) believed that the best current approach to discover immediate signs of word recognition is through recordings of eye movement and event-related potential.

The gaze-contingency paradigm is a general term for techniques allowing a computer screen display to change in function depending on where the viewer is looking. Gaze-contingent techniques are part of the eye movement field of study in psychology.

Saccadic suppression of image displacement (SSID) is the phenomenon in visual perception where the brain selectively blocks visual processing during eye movements in such a way that large changes in object location in the visual scene during a saccade or blink are not detected.

Eye–hand coordination is the coordinated control of eye movement with hand movement and the processing of visual input to guide reaching and grasping along with the use of proprioception of the hands to guide the eyes. Eye–hand coordination has been studied in activities as diverse as the movement of solid objects such as wooden blocks, archery, sporting performance, music reading, computer gaming, copy-typing, and even tea-making. It is part of the mechanisms of performing everyday tasks; in its absence, most people would be unable to carry out even the simplest of actions such as picking up a book from a table or playing a video game. While it is recognized by the term hand–eye coordination, without exception, medical sources, and most psychological sources, refer to eye–hand coordination.

Foveated imaging

Foveated imaging is a digital image processing technique in which the image resolution, or amount of detail, varies across the image according to one or more "fixation points." A fixation point indicates the highest resolution region of the image and corresponds to the center of the eye's retina, the fovea.

Chronostasis is a type of temporal illusion in which the first impression following the introduction of a new event or task-demand to the brain can appear to be extended in time. For example, chronostasis temporarily occurs when fixating on a target stimulus, immediately following a saccade. This elicits an overestimation in the temporal duration for which that target stimulus was perceived. This effect can extend apparent durations by up to 500 ms and is consistent with the idea that the visual system models events prior to perception.

Transsaccadic memory is the neural process that allows humans to perceive their surroundings as a seamless, unified image despite rapid changes in fixation points. The human eyes move rapidly and repeatedly, focusing on a single point for only a short period of time before moving to the next point. These rapid eye movements are called saccades. If a video camera were to perform such high speed changes in focal points, the image on screen would be a blurry, nauseating mess. Despite this rapidly changing input to the visual system, the normal experience is of a stable visual world; an example of perceptual constancy. Transsaccadic memory is a system that contributes to this stability.

Parafovea

Parafovea or the parafoveal belt is a region in the retina that circumscribes the fovea and is part of the macula lutea. It is circumscribed by the perifovea.

Binocular switch suppression (BSS) is a fairly new technique developed to suppress usually salient images from one's awareness. Unlike previous methods such as visual masking, this new empirical method allows one to investigate the neural and behavioural consequences during the period of visual suppression itself and not just after the presentation of the target stimuli. Suppressing usually salient images from an individual's awareness is regarded as a popular experimental manipulation in visual perception and cognitive neuroscience. Some popular and familiar examples of such manipulation include binocular rivalry, continuous flash suppression (CFS), visual masking and flicker switch suppression.

References