This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
Visual information fidelity (VIF) is a full reference image quality assessment index based on natural scene statistics and the notion of image information extracted by the human visual system. [1] It was developed by Hamid R Sheikh and Alan Bovik at the Laboratory for Image and Video Engineering (LIVE) at the University of Texas at Austin in 2006. It is deployed in the core of the Netflix VMAF video quality monitoring system, which controls the picture quality of all encoded videos streamed by Netflix.
Images and videos of the three-dimensional visual environments come from a common class: the class of natural scenes. Natural scenes from a tiny subspace in the space of all possible signals, and researchers have developed sophisticated models to characterize these statistics. Most real-world distortion processes disturb these statistics and make the image or video signals unnatural. The VIF index employs natural scene statistical (NSS) models in conjunction with a distortion (channel) model to quantify the information shared between the test and the reference images. Further, the VIF index is based on the hypothesis that this shared information is an aspect of fidelity that relates well with visual quality. In contrast to prior approaches based on human visual system (HVS) error-sensitivity and measurement of structure, [2] this statistical approach used in an information-theoretic setting, yields a full reference (FR) quality assessment (QA) method that does not rely on any HVS or viewing geometry parameter, nor any constants requiring optimization, and yet is competitive with state of the art QA methods. [3]
Specifically, the reference image is modeled as being the output of a stochastic 'natural' source that passes through the HVS channel and is processed later by the brain. The information content of the reference image is quantified as being the mutual information between the input and output of the HVS channel. This is the information that the brain could ideally extract from the output of the HVS. The same measure is then quantified in the presence of an image distortion channel that distorts the output of the natural source before it passes through the HVS channel, thereby measuring the information that the brain could ideally extract from the test image. This is shown pictorially in Figure 1. The two information measures are then combined to form a visual information fidelity measure that relates visual quality to relative image information.
This article may be too technical for most readers to understand.(January 2018) |
A Gaussian scale mixture (GSM) is used to statistically model the wavelet coefficients of a steerable pyramid decomposition of an image. [4] The model is described below for a given subband of the multi-scale multi-orientation decomposition and can be extended to other subbands similarly. Let the wavelet coefficients in a given subband be where denotes the set of spatial indices across the subband and each is an dimensional vector. The subband is partitioned into non-overlapping blocks of coefficients each, where each block corresponds to . According to the GSM model, where is a positive scalar and is a Gaussian vector with mean zero and co-variance . Further the non-overlapping blocks are assumed to be independent of each other and that the random field is independent of .
The distortion process is modeled using a combination of signal attenuation and additive noise in the wavelet domain. Mathematically, if denotes the random field from a given subband of the distorted image, is a deterministic scalar field and , where is a zero mean Gaussian vector with co-variance , then
Further, is modeled to be independent of and .
The duality of HVS models and NSS implies that several aspects of the HVS have already been accounted for in the source model. Here, the HVS is additionally modeled based on the hypothesis that the uncertainty in the perception of visual signals limits the amount of information that can be extracted from the source and distorted image. This source of uncertainty can be modeled as visual noise in the HVS model. In particular, the HVS noise in a given subband of the wavelet decomposition is modeled as additive white Gaussian noise. Let and be random fields, where and are zero mean Gaussian vectors with co-variance and . Further, let and denote the visual signal at the output of the HVS. Mathematically, we have and . Note that and are random fields that are independent of , and .
This article may be too technical for most readers to understand.(January 2018) |
Let denote the vector of all blocks from a given subband. Let and be similarly defined. Let denote the maximum likelihood estimate of given and . The amount of information extracted from the reference is obtained as
while the amount of information extracted from the test image is given as
Denoting the blocks in subband of the wavelet decomposition by , and similarly for the other variables, the VIF index is defined as
The Spearman's rank-order correlation coefficient (SROCC) between the VIF index scores of distorted images on the LIVE Image Quality Assessment Database and the corresponding human opinion scores is evaluated to be 0.96.[ citation needed ]
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
In particle physics, the Dirac equation is a relativistic wave equation derived by British physicist Paul Dirac in 1928. In its free form, or including electromagnetic interactions, it describes all spin-1/2 massive particles, called "Dirac particles", such as electrons and quarks for which parity is a symmetry. It is consistent with both the principles of quantum mechanics and the theory of special relativity, and was the first theory to account fully for special relativity in the context of quantum mechanics. It was validated by accounting for the fine structure of the hydrogen spectrum in a completely rigorous way. It has become vital in the building of the Standard Model.
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.
A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing.
In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form and with parametric extension for arbitrary real constants a, b and non-zero c. It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".
In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.
Array processing is a wide area of research in the field of signal processing that extends from the simplest form of 1 dimensional line arrays to 2 and 3 dimensional array geometries. Array structure can be defined as a set of sensors that are spatially separated, e.g. radio antenna and seismic arrays. The sensors used for a specific problem may vary widely, for example microphones, accelerometers and telescopes. However, many similarities exist, the most fundamental of which may be an assumption of wave propagation. Wave propagation means there is a systemic relationship between the signal received on spatially separated sensors. By creating a physical model of the wave propagation, or in machine learning applications a training data set, the relationships between the signals received on spatially separated sensors can be leveraged for many applications.
In theoretical physics and mathematics, a Wess–Zumino–Witten (WZW) model, also called a Wess–Zumino–Novikov–Witten model, is a type of two-dimensional conformal field theory named after Julius Wess, Bruno Zumino, Sergei Novikov and Edward Witten. A WZW model is associated to a Lie group, and its symmetry algebra is the affine Lie algebra built from the corresponding Lie algebra. By extension, the name WZW model is sometimes used for any conformal field theory whose symmetry algebra is an affine Lie algebra.
The structural similarityindex measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. It is also used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference.
In mathematics, the Cuntz algebra, named after Joachim Cuntz, is the universal C*-algebra generated by isometries of an infinite-dimensional Hilbert space satisfying certain relations. These algebras were introduced as the first concrete examples of a separable infinite simple C*-algebra, meaning as a Hilbert space, is isometric to the sequence space
In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.
In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.
In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0. It is essentially a mixture of a discrete distribution and a continuous distribution as a result of censoring.
Within bayesian statistics for machine learning, kernel methods arise from the assumption of an inner product space or similarity structure on inputs. For some such methods, such as support vector machines (SVMs), the original formulation and its regularization were not Bayesian in nature. It is helpful to understand them from a Bayesian perspective. Because the kernels are not necessarily positive semidefinite, the underlying structure may not be inner product spaces, but instead more general reproducing kernel Hilbert spaces. In Bayesian probability kernel methods are a key component of Gaussian processes, where the kernel function is known as the covariance function. Kernel methods have traditionally been used in supervised learning problems where the input space is usually a space of vectors while the output space is a space of scalars. More recently these methods have been extended to problems that deal with multiple outputs such as in multi-task learning.
In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model, and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging.
Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.
Kernel methods are a well-established tool to analyze the relationship between input data and the corresponding output of a function. Kernels encapsulate the properties of functions in a computationally efficient way and allow algorithms to easily swap functions of varying complexity.
The two-dimensional critical Ising model is the critical limit of the Ising model in two dimensions. It is a two-dimensional conformal field theory whose symmetry algebra is the Virasoro algebra with the central charge . Correlation functions of the spin and energy operators are described by the minimal model. While the minimal model has been exactly solved, the solution does not cover other observables such as connectivities of clusters.
In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.
The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN) or a diffusion model.