Video quality

Last updated

Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation (typically, compared to the original video). Video processing systems may introduce some amount of distortion or artifacts in the video signal that negatively impacts the user's perception of a system. For many stakeholders in video production and distribution, assurance of video quality is an important task.

Contents

Video quality evaluation is performed to describe the quality of a set of video sequences under study. Video quality can be evaluated objectively (by mathematical models) or subjectively (by asking users for their rating). Also, the quality of a system can be determined offline (i.e., in a laboratory setting for developing new codecs or services), or in-service (to monitor and ensure a certain level of quality).

From analog to digital video

Since the world's first video sequence was recorded and transmitted, many video processing systems have been designed. Such systems encode video streams and transmit them over various kinds of networks or channels. In the ages of analog video systems, it was possible to evaluate the quality aspects of a video processing system by calculating the system's frequency response using test signals (for example, a collection of color bars and circles).

Digital video systems have almost fully replaced analog ones, and quality evaluation methods have changed. The performance of a digital video processing and transmission system can vary significantly and depends on many factors including the characteristics of the input video signal (e.g. amount of motion or spatial details), the settings used for encoding and transmission, and the channel fidelity or network performance.

Objective video quality

Objective video quality models are mathematical models that approximate results from subjective quality assessment, in which human observers are asked to rate the quality of a video. [1] In this context, the term model may refer to a simple statistical model in which several independent variables (e.g. the packet loss rate on a network and the video coding parameters) are fit against results obtained in a subjective quality evaluation test using regression techniques. A model may also be a more complicated algorithm implemented in software or hardware.

Terminology

The terms model and metric are often used interchangeably in the field to mean a descriptive statistic which provides an indicator of quality. The term “objective” relates to the fact that, in general, quality models are based on criteria that can be measured objectively – that is, free from human interpretation. They can be automatically evaluated by a computer program. Unlike a panel of human observers, an objective model should always deterministically output the same quality score for a given set of input parameters.

Objective quality models are sometimes also referred to as instrumental (quality) models, [2] [3] in order to emphasize their application as measurement instruments. Some authors suggest that the term “objective” is misleading, as it “implies that instrumental measurements bear objectivity, which they only do in case that they can be generalized.” [4]

Classification of objective video quality models

Classification of objective video quality models into Full-Reference, Reduced-Reference and No-Reference. Objective Video Quality Model Classification.png
Classification of objective video quality models into Full-Reference, Reduced-Reference and No-Reference.
No-reference image and video quality assessment methods. No-Reference Image and video quality assessment.jpg
No-reference image and video quality assessment methods.

Objective models can be classified by the amount of information available about the original signal, the received signal, or whether there is a signal present at all: [5]

Use of picture quality models for video quality estimation

Some models that are used for video quality assessment (such as PSNR or SSIM) are simply image quality models, whose output is calculated for every frame of a video sequence. An overview of recent no-reference image quality models has also been given in a journal paper by Shahid et al. [5]

The quality measure of every frame in a video (as determined by an image quality model) can then be recorded and pooled over time to assess the quality of an entire video sequence. While this method is easy to implement, it does not factor in certain kinds of degradations that develop over time, such as the moving artifacts caused by packet loss and its concealment. A video quality model that considers the temporal aspects of quality degradations, like VQM or the MOVIE Index, may be able to produce more accurate predictions of human-perceived quality.

Video quality artifacts

The estimation of visual artifacts is a well known technique for estimating overall video quality. The majority of these artifacts are compression artifacts caused by lossy compression. Some of the attributes typically estimated by pixel-based metrics include:

Spatial

Temporal

Examples of video quality metrics

This section lists examples of video quality metrics.

MetricUsageDescription
Full-Reference PSNR (Peak Signal-to-Noise Ratio)ImageIt is calculated between every frame of the original and the degraded video signal. PSNR is the most widely used objective image quality metric. However, PSNR values do not correlate well with perceived picture quality due to the complex, highly non-linear behaviour of the human visual system.
SSIM [8] (Structural SIMilarity)ImageSSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms.
MOVIE Index [9] (MOtion-based Video Integrity Evaluation)VideoThe MOVIE index is a neuroscience-based model for predicting the perceptual quality of a (possibly compressed or otherwise distorted) motion picture or video against a pristine reference video.
VMAF [10] (Video Multimethod Assessment Fusion)VideoVMAF uses four features to predict video quality VIF, DLM, MCPD, AN-SNR. The above features are fused using a SVM-based regression to provide a single output score. These scores are then temporally pooled over the entire video sequence using the arithmetic mean to provide an overall differential mean opinion score (DMOS).
VQM [11] VideoThis model has been standardized in ITU-T Rec. J.144 in 2001.
Reduced-ReferenceSRR [12] (SSIM Reduced-Reference)VideoSRR value is calculated as the ratio of received (target) video signal SSIM with reference video pattern SSIM values.
ST-RRED [13] VideoCompute wavelet coefficients of frame differences between the adjacent frames in a video sequence (modeled by a Gaussian Scale Mixture). It is used to evaluate RR entropic differences leading to temporal RRED.It in conjunction with spatial RRED indices evaluated by applying the RRED index on every frame of the video, yield the spatio-temporal RRED
ITU-T Rec. P.1204.4 VideoThis reduced-reference model compares features extracted from a reference video with a distorted (compressed video). [14]
No-ReferenceNIQE [15] Naturalness Image Quality EvaluatorImageThis IQA model is founded on perceptually relevant spatial domain natural scene statistic (NSS) features extracted from local image patches that effectively capture the essential low-order statistics of natural images.
BRISQUE [16] Blind/Referenceless Image Spatial Quality EvaluatorImageThe method extracts the pointwise statistics of local normalized luminance signals and measures image naturalness (or lack thereof) based on measured deviations from a natural image model. It also models the distribution of pairwise statistics of adjacent normalized luminance signals which provides distortion orientation information.
Video-BLIINDS [17] VideoComputes statistical models on DCT coefficients of frame differences and calculates motion characterization. Pedicts score based on those features using SVM.
ITU-T Rec. P.1203.1 VideoThis is a metric that is part of the P.1203 family of standards, which can use either metadata only (codec, resolution, bitrate, framerate), frame information (frame types and sizes), or the entire bitstream to analyze the quality of a compressed video. It is primarily intended to be used in the context of HTTP adaptive streaming.
ITU-T Rec. P.1204.3 VideoThis model uses the video bitstream to analyze compression/coding quality based on features like quantization parameters and motion vectors. [14]
ITU-T Rec. P.1204.5 VideoThis is a hybrid model that uses the decoded pixels and information about the video codec to determine final video quality. [14]

Training and performance evaluation

Since objective video quality models are expected to predict results given by human observers, they are developed with the aid of subjective test results. During the development of an objective model, its parameters should be trained so as to achieve the best correlation between the objectively predicted values and the subjective scores, often available as mean opinion scores (MOS).

The most widely used subjective test materials are in the public domain and include still pictures, motion pictures, streaming video, high definition, 3-D (stereoscopic), and special-purposes picture quality-related datasets. [18] These so-called databases are created by various research laboratories around the world. Some of them have become de facto standards, including several public-domain subjective picture quality databases created and maintained by the Laboratory for Image and Video Engineering (LIVE) as well the Tampere Image Database 2008. A collection of databases can be found in the QUALINET Databases repository. The Consumer Digital Video Library (CDVL) hosts freely available video test sequences for model development.

Some databases also provide pre-computed metric scores to allow others to benchmark new metrics against existing ones. Examples can be seen in the table below

Examples of Video Model Benchmark Databases
BenchmarkNumber of videosNumber of metricsType of metrics
LIVE-VQC 58511No-reference
KoNViD-1k 1,20011No-reference
YouTube-UGC 1,5009No-reference
MSU No-Reference VQA 2,50015No-reference
MSU Full-Reference VQA 2,50044Full-reference
LIVE-FB Large-Scale Social Video Quality 39,0006No-reference
LIVE-ETRI 4375No-reference
LIVE Livestream 3153No-reference

In theory, a model can be trained on a set of data in such a way that it produces perfectly matching scores on that dataset. However, such a model will be over-trained and will therefore not perform well on new datasets. It is therefore advised to validate models against new data and use the resulting performance as a real indicator of the model's prediction accuracy.

To measure the performance of a model, some frequently used metrics are the linear correlation coefficient, Spearman's rank correlation coefficient, and the root mean square error (RMSE). Other metrics are the kappa coefficient and the outliers ratio. ITU-T Rec. P.1401 gives an overview of statistical procedures to evaluate and compare objective models.

Uses and application of objective models

Objective video quality models can be used in various application areas. In video codec development, the performance of a codec is often evaluated in terms of PSNR or SSIM. For service providers, objective models can be used for monitoring a system. For example, an IPTV provider may choose to monitor their service quality by means of objective models, rather than asking users for their opinion, or waiting for customer complaints about bad video quality. Few of these standards have found commercial applications, including PEVQ and VQuad-HD. SSIM is also part of a commercially available video quality toolset (SSIMWAVE). VMAF is used by Netflix to tune their encoding and streaming algorithms, and to quality-control all streamed content. [19] [20] It is also being used by other technology companies like Bitmovin [21] and has been integrated into software such as FFmpeg.

An objective model should only be used in the context that it was developed for. For example, a model that was developed using a particular video codec is not guaranteed to be accurate for another video codec. Similarly, a model trained on tests performed on a large TV screen should not be used for evaluating the quality of a video watched on a mobile phone.

Other approaches

When estimating quality of a video codec, all the mentioned objective methods may require repeating post-encoding tests in order to determine the encoding parameters that satisfy a required level of visual quality, making them time consuming, complex and impractical for implementation in real commercial applications. There is ongoing research into developing novel objective evaluation methods which enable prediction of the perceived quality level of the encoded video before the actual encoding is performed. [22]

Subjective video quality

The main goal of many-objective video quality metrics is to automatically estimate the average user's (viewer's) opinion on the quality of a video processed by a system. Procedures for subjective video quality measurements are described in ITU-R recommendation BT.500 and ITU-T recommendation P.910. In such tests, video sequences are shown to a group of viewers. The viewers' opinion is recorded and averaged into the mean opinion score to evaluate the quality of each video sequence. However, the testing procedure may vary depending on what kind of system is tested.

Tools for video quality assessment

ToolDescriptionАvailabilityLicenseIncluded metrics
FFmpeg Free and open-source multimedia tool that incorporates some video quality metricsFreeOpen sourcePSNR, SSIM, VMAF
MSU VQMT A software suite for objective video quality assessment (full reference and no reference)Free for basic metrics

Paid for HDR metrics

ProprietaryPSNR, SSIM, MS-SSIM, 3SSIM, VMAF, NIQE, VQM, Delta, MSAD, MSE

MSU developed metrics: Blurring Metric, Blocking Metric, Brightness Flicking Metric, Drop Frame Metric, Noise Estimation Metric

EPFL VQMT Various metrics implemented in OpenCV (C++) based on existing MATLAB implementationsFreeOpen sourcePSNR, PSNR-HVS, PSNR-HVS-M, SSIM, MS-SSIM, VIFp
OpenVQ A toolkit implementing various metrics including the authors' OPVQFreeOpen sourcePSNR, SSIM, OPVQ
Elecard A commercial video quality estimation programDemo version availableProprietaryPSNR, APSNR, MSAD, MSE, SSIM, Delta, VQM, NQI, VMAF, VIF
AviSynth A video processing tool that can be used as a plugin or via scriptionFreeOpen sourceSSIM
VQ Probe A software to calculate video quality metricsFreeProprietaryPSNR, SSIM, VMAF
vmaf.dev An online video quality calculation software implementing VMAFFreeOpen source VMAF

See also

Related Research Articles

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

<span class="mw-page-title-main">Image compression</span> Reduction of image size to save storage and transmission costs

Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data.

<span class="mw-page-title-main">Advanced Video Coding</span> Most widely used standard for video compression

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.

Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic range, PSNR is usually expressed as a logarithmic quantity using the decibel scale.

Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual "values on a predefined scale that a subject assigns to his opinion of the performance of a system quality". Such ratings are usually gathered in a subjective quality evaluation test, but they can also be algorithmically estimated.

Subjective video quality is video quality as experienced by humans. It is concerned with how video is perceived by a viewer and designates their opinion on a particular video sequence. It is related to the field of Quality of Experience. Measuring subjective video quality is necessary because objective quality assessment algorithms such as PSNR have been shown to correlate poorly with subjective ratings. Subjective ratings may also be used as ground truth to develop new algorithms.

The structural similarityindex measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. It is also used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference.

Α video codec is software or a device that provides encoding and decoding for digital video, and which may or may not include the use of video compression and/or decompression. Most codecs are typically implementations of video coding formats.

Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994–1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2023. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric.

Image quality can refer to the level of accuracy with which different imaging systems capture, process, store, compress, transmit and display the signals that form an image. Another definition refers to image quality as "the weighted combination of all of the visually significant attributes of an image". The difference between the two definitions is that one focuses on the characteristics of signal processing in different imaging systems and the latter on the perceptual assessments that make an image pleasant for human viewers.

Perceptual Evaluation of Video Quality(PEVQ) is an end-to-end (E2E) measurement algorithm to score the picture quality of a video presentation by means of a 5-point mean opinion score (MOS). It is, therefore, a video quality model. PEVQ was benchmarked by the Video Quality Experts Group (VQEG) in the course of the Multimedia Test Phase 2007–2008. Based on the performance results, in which the accuracy of PEVQ was tested against ratings obtained by human viewers, PEVQ became part of the new International Standard.

<span class="mw-page-title-main">Alan Bovik</span>

Alan Conrad Bovik is an American engineer, vision scientist, and educator. He is a professor at the University of Texas at Austin (UT-Austin), where he holds the Cockrell Family Regents Endowed Chair in the Cockrell School of Engineering and is Director of the Laboratory for Image and Video Engineering (LIVE). He is a faculty member in the UT-Austin Department of Electrical and Computer Engineering, the Machine Learning Laboratory, the Institute for Neuroscience, and the Wireless Networking and Communications Group.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding. In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC's higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.

VQuad-HD(Objective perceptual multimedia video quality measurement of HDTV) is a video quality testing technology for high definition video signals. It is a full-reference model, meaning that it requires access to the original and the degraded signal to estimate the quality.

A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec.

<span class="mw-page-title-main">VP9</span> Open and royalty-free video coding format released by Google in 2013

VP9 is an open and royalty-free video coding format developed by Google.

The MOtion-tuned Video Integrity Evaluation (MOVIE) index is a model and set of algorithms for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos.

ZPEG is a motion video technology that applies a human visual acuity model to a decorrelated transform-domain space, thereby optimally reducing the redundancies in motion video by removing the subjectively imperceptible. This technology is applicable to a wide range of video processing problems such as video optimization, real-time motion video compression, subjective quality monitoring, and format conversion.

Video Multimethod Assessment Fusion (VMAF) is an objective full-reference video quality metric developed by Netflix in cooperation with the University of Southern California, The IPI/LS2N lab Nantes Université, and the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin. It predicts subjective video quality based on a reference and distorted video sequence. The metric can be used to evaluate the quality of different video codecs, encoders, encoding settings, or transmission variants.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

References

  1. "Objective video quality assessment methods for Video assistant refereeing (VAR) System" (PDF).
  2. Raake, Alexander (2006). Speech quality of VoIP : assessment and prediction. Wiley InterScience (Online service). Chichester, England: Wiley. ISBN   9780470030608. OCLC   85785040.
  3. Möller, Sebastian (2000). Assessment and Prediction of Speech Quality in Telecommunications. Boston, MA: Springer US. ISBN   9781475731170. OCLC   851800613.
  4. Raake, Alexander; Egger, Sebastian (2014). Quality of Experience. T-Labs Series in Telecommunication Services. Springer, Cham. pp. 11–33. doi:10.1007/978-3-319-02681-7_2. ISBN   9783319026800.
  5. 1 2 Shahid, Muhammad; Rossholm, Andreas; Lövström, Benny; Zepernick, Hans-Jürgen (2014-08-14). "No-reference image and video quality assessment: a classification and review of recent approaches". EURASIP Journal on Image and Video Processing. 2014: 40. doi: 10.1186/1687-5281-2014-40 . ISSN   1687-5281.
  6. Barman, Nabajeet; Reznik, Yuriy; Martini, Maria G. (2023). "A Subjective Dataset for Multi-Screen Video Streaming Applications". arXiv: 2305.03138 [cs.MM].
  7. Lee, Seon-Oh; Jung, Kwang-Su; Sim, Dong-Gyu (2010). "Real-time Objective Quality Assessment based on Coding Parameters Extracted from H.264/AVC Bitstream". IEEE Transactions on Consumer Electronics. 56 (2): 1071–1078. doi:10.1109/TCE.2010.5506041. S2CID   23190244.
  8. Wang, Zhou; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. (2004-04-01). "Image quality assessment: from error visibility to structural similarity". IEEE Transactions on Image Processing. 13 (4): 600–612. Bibcode:2004ITIP...13..600W. CiteSeerX   10.1.1.2.5689 . doi:10.1109/TIP.2003.819861. ISSN   1057-7149. PMID   15376593. S2CID   207761262.
  9. Seshadrinathan, K.; Bovik, A.C. (2010-02-01). "Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos". IEEE Transactions on Image Processing. 19 (2): 335–350. Bibcode:2010ITIP...19..335S. CiteSeerX   10.1.1.153.9018 . doi:10.1109/TIP.2009.2034992. ISSN   1057-7149. PMID   19846374. S2CID   15356687.
  10. vmaf: Perceptual video quality assessment based on multi-method fusion, Netflix, Inc., 2017-07-14, retrieved 2017-07-15
  11. "Description of Video Quality Metric (VQM) Software - ITS". its.ntia.gov. Retrieved 2023-07-12.
  12. Kourtis, M.-A.; Koumaras, H.; Liberal, F. (July–August 2016). "Reduced-reference video quality assessment using a static video pattern". Journal of Electronic Imaging. 25 (4): 043011. Bibcode:2016JEI....25d3011K. doi: 10.1117/1.jei.25.4.043011 .
  13. Soundararajan, R.; Bovik, A.C. (2013-04-04). "Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing". IEEE Transactions on Circuits and Systems for Video Technology. 23 (4): 684–694. doi:10.1109/tcsvt.2012.2214933. S2CID   206661510.
  14. 1 2 3 Raake, Alexander; Borer, Silvio; Satti, Shahid M.; Gustafsson, Jorgen; Rao, Rakesh Rao Ramachandra; Medagli, Stefano; List, Peter; Goring, Steve; Lindero, David; Robitza, Werner; Heikkila, Gunnar; Broom, Simon; Schmidmer, Christian; Feiten, Bernhard; Wustenhagen, Ulf (2020). "Multi-Model Standard for Bitstream-, Pixel-Based and Hybrid Video Quality Assessment of UHD/4K: ITU-T P.1204". IEEE Access. 8: 193020–193049. doi: 10.1109/ACCESS.2020.3032080 . ISSN   2169-3536. S2CID   226293635.
  15. Mittal, A.; Soundararajan, R.; Bovik, A.C. (March 2013). "Making a "Completely Blind" Image Quality Analyzer". IEEE Signal Processing Letters. 20 (3): 209–212. Bibcode:2013ISPL...20..209M. doi:10.1109/lsp.2012.2227726. S2CID   16892725.
  16. Mittal, A.; Moorthy, A.K.; Bovik, A.C. (2011-11-09). "Blind/Referenceless Image Spatial Quality Evaluator". 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR). pp. 723–727. doi:10.1109/acssc.2011.6190099. ISBN   978-1-4673-0323-1. S2CID   16388844.
  17. Saad, M. A.; Bovik, A. C.; Charrier, C. (March 2014). "Blind Prediction of Natural Video Quality". IEEE Transactions on Image Processing. 23 (3): 1352–1365. Bibcode:2014ITIP...23.1352S. CiteSeerX   10.1.1.646.9045 . doi:10.1109/tip.2014.2299154. ISSN   1057-7149. PMID   24723532. S2CID   14314450.
  18. Liu, Tsung-Jung; Lin, Yu-Chieh; Lin, Weisi; Kuo, C.-C. Jay (2013). "Visual quality assessment: recent developments, coding applications and future trends". APSIPA Transactions on Signal and Information Processing. 2. doi: 10.1017/atsip.2013.5 . hdl: 10356/106287 . ISSN   2048-7703.
  19. Blog, Netflix Technology (2016-06-06). "Toward A Practical Perceptual Video Quality Metric". Netflix TechBlog. Retrieved 2017-10-08.
  20. Blog, Netflix Technology (2018-10-26). "VMAF: The Journey Continues". Medium. Retrieved 2019-10-23.
  21. "Per-Scene Adaptation: Going Beyond Bitrate". Bitmovin. 2018-01-05. Retrieved 2019-10-23.
  22. Koumaras, H.; Kourtis, A.; Martakos, D.; Lauterjung, J. (2007-09-01). "Quantified PQoS assessment based on fast estimation of the spatial and temporal activity level". Multimedia Tools and Applications. 34 (3): 355–374. doi:10.1007/s11042-007-0111-1. ISSN   1380-7501. S2CID   14136479.

Further reading