Human visual system model

Last updated January 23, 2023

A human visual system model (HVS model) is used by image processing, video processing and computer vision experts to deal with biological and psychological processes that are not yet fully understood. Such a model is used to simplify the behaviours of what is a very complex system. As our knowledge of the true visual system improves, the model is updated.

Another example is lossy image compression, like JPEG. Our HVS model says that we cannot see high frequency detail so in JPEG we can quantise these components without a perceptible loss of quality. Similar concepts are applied in audio compression, where sound frequencies inaudible to humans are bandstop filtered.

Several HVS features are derived from evolution, when we needed to defend ourselves or hunt for food. We often see demonstrations of HVS features when we are looking at optical illusions.

Block diagram of HVS

Assumptions about the HVS

Low-pass filter characteristic (limited number of rods in human eye): see Mach bands
Lack of colour resolution (fewer cones in human eye than rods)
Motion sensitivity
- More sensitive in peripheral vision
- Stronger than texture sensitivity, e.g. viewing a camouflaged animal
Texture stronger than disparity – 3D depth resolution does not need to be so accurate
Integral Face recognition (babies smile at faces)
- Depth inverted face looks normal (facial features overrule depth information)
  - Upside down face with inverted mouth and eyes looks normal^[1]

Examples of taking advantage of an HVS model

Flicker frequency of film and television using persistence of vision to fool viewer into seeing a continuous image
Interlaced television painting half images to give the impression of a higher flicker frequency
Colour television (chrominance at half resolution of luminance corresponding to proportions of rods and cones in eye)
Image compression (difficult to see higher frequencies more harshly quantised)
Motion estimation (use luminance and ignore colour)
Watermarking and Steganography

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

<span class="mw-page-title-main">Lossy compression</span> Data compression approach that reduces data size while discarding or changing some of it

In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat on this page show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than using lossless techniques.

Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) systems which, in turn, were replaced by flat panel displays of several types.

Frame rate is the frequency (rate) at which consecutive images (frames) are captured or displayed. The term applies equally to film and video cameras, computer graphics, and motion capture systems. Frame rate may also be called the frame frequency, and be expressed in hertz. Frame rate in electronic camera specifications may refer to the maximal possible rate, where, in practice, other settings may reduce the frequency to a lower number.

Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data.

Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless on its own but is used to enable better quantization, which then results in a lower quality copy of the original input.

Within visual perception, an optical illusion is an illusion caused by the visual system and characterized by a visual percept that arguably appears to differ from reality. Illusions come in a wide variety; their categorization is difficult because the underlying cause is often not clear but a classification proposed by Richard Gregory is useful as an orientation. According to that, there are three main classes: physical, physiological, and cognitive illusions, and in each class there are four kinds: Ambiguities, distortions, paradoxes, and fictions. A classical example for a physical distortion would be the apparent bending of a stick half immerged in water; an example for a physiological paradox is the motion aftereffect. An example for a physiological fiction is an afterimage. Three typical cognitive distortions are the Ponzo, Poggendorff, and Müller-Lyer illusion. Physical illusions are caused by the physical environment, e.g. by the optical properties of water. Physiological illusions arise in the eye or the visual pathway, e.g. from the effects of excessive stimulation of a specific receptor type. Cognitive visual illusions are the result of unconscious inferences and are perhaps those most widely known.

<span class="mw-page-title-main">Compression artifact</span> Distortion of media caused by lossy data compression

A compression artifact is a noticeable distortion of media caused by the application of lossy compression. Lossy data compression involves discarding some of the media's data so that it becomes small enough to be stored within the desired disk space or transmitted (streamed) within the available bandwidth. If the compressor cannot store enough data in the compressed version, the result is a loss of quality, or introduction of artifacts. The compression algorithm may not be intelligent enough to discriminate between distortions of little subjective importance and those objectionable to the user.

The flicker fusion threshold, critical flicker frequency (CFF) or flicker fusion rate, is a concept in the psychophysics of vision. It is defined as the frequency at which an intermittent light stimulus appears to be completely steady to the average human observer. A traditional term for flicker fusion is "persistence of vision", but this has also been used to describe positive afterimages or motion blur. Although flicker can be detected for many waveforms representing time-variant fluctuations of intensity, it is conventionally, and most easily, studied in terms of sinusoidal modulation of intensity.

In digital photography, computer-generated imagery, and colorimetry, a grayscale image is one in which the value of each pixel is a single sample representing only an amount of light; that is, it carries only intensity information. Grayscale images, a kind of black-and-white or gray monochrome, are composed exclusively of shades of gray. The contrast ranges from black at the weakest intensity to white at the strongest.

Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic range, PSNR is usually expressed as a logarithmic quantity using the decibel scale.

Ambiguous images or reversible figures are visual forms that create ambiguity by exploiting graphical similarities and other properties of visual system interpretation between two or more distinct image forms. These are famous for inducing the phenomenon of multistable perception. Multistable perception is the occurrence of an image being able to provide multiple, although stable, perceptions.

Tone mapping is a technique used in image processing and computer graphics to map one set of colors to another to approximate the appearance of high-dynamic-range images in a medium that has a more limited dynamic range. Print-outs, CRT or LCD monitors, and projectors all have a limited dynamic range that is inadequate to reproduce the full range of light intensities present in natural scenes. Tone mapping addresses the problem of strong contrast reduction from the scene radiance to the displayable range while preserving the image details and color appearance important to appreciate the original scene content.

<span class="mw-page-title-main">Chubb illusion</span> Optical illusion

The Chubb illusion is an optical illusion or error in visual perception in which the apparent contrast of an object varies substantially to most viewers depending on its relative contrast to the field on which it is displayed. These visual illusions are of particular interest to researchers because they may provide valuable insights in regard to the workings of human visual systems.

Parallax scanning depth enhancing imaging methods rely on discrete parallax differences between depth planes in a scene. The differences are caused by a parallax scan. When properly balanced (tuned) and displayed, the discrete parallax differences are perceived by the brain as depth.

Foveated imaging is a digital image processing technique in which the image resolution, or amount of detail, varies across the image according to one or more "fixation points". A fixation point indicates the highest resolution region of the image and corresponds to the center of the eye's retina, the fovea.

<span class="mw-page-title-main">Visual perception</span> Ability to interpret the surrounding environment using light in the visible spectrum

Visual perception is the ability to interpret the surrounding environment through photopic vision, color vision, scotopic vision, and mesopic vision, using light in the visible spectrum reflected by objects in the environment. This is different from visual acuity, which refers to how clearly a person sees. A person can have problems with visual perceptual processing even if they have 20/20 vision.

2.5D is an effect in visual perception. It is the construction of an apparently three-dimensional environment from 2D retinal projections. While the result is technically 2D, it allows for the illusion of depth. It is easier for the eye to discern the distance between two items than the depth of a single object in the view field. Computers can use 2.5D to make images human faces look lifelike.

A phantom contour is a type of illusory contour. Most illusory contours are seen in still images, such as the Kanizsa triangle and the Ehrenstein illusion. A phantom contour, however, is perceived in the presence of moving or flickering images with contrast reversal. The rapid, continuous alternation between opposing, but correlated, adjacent images creates the perception of a contour that is not physically present in the still images. Quaid et al. have also authored a PhD thesis on the phantom contour illusion and its spatiotemporal limits which maps out limits and proposes mechanisms for its perception centering around magnocellularly driven visual area MT.

A color appearance model (CAM) is a mathematical model that seeks to describe the perceptual aspects of human color vision, i.e. viewing conditions under which the appearance of a color does not tally with the corresponding physical measurement of the stimulus source.

References

↑ Margaret Thatcher Illusion – Mighty Optical Illusions

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Margaret Thatcher Illusion – Mighty Optical Illusions

[1]