Comparison gallery of image scaling algorithms

Last updated January 23, 2025

This gallery shows the results of numerous image scaling algorithms.

Scaling methods
Nearest-neighbor interpolation
Bilinear interpolation
Bicubic interpolation
Fourier-based interpolation
Edge-directed interpolation
Pixel art scaling algorithms (hqx)
Pixel art scaling algorithms (xbr)
Pixel art scaling algorithms (GemCutter)
Image tracing
Deep convolutional neural networks
Deep convolutional neural networks using perceptual loss
References

Scaling methods

An image size can be changed in several ways. Consider resizing a 160x160 pixel photo to the following 40x40 pixel thumbnail and then scaling the thumbnail to a 160x160 pixel image. Also consider doubling the size of the following image containing text.

Low-resolution images
Thumbnail	Text

Comparison of scaling methods
Original photo	Upscaled thumbnail	Upscaled text	Algorithm and description
			Nearest-neighbor interpolation One of the simpler ways of increasing the size, replacing every pixel with a number of pixels of the same color. The resulting image is larger than the original, and preserves all the original detail, but has (possibly undesirable) jaggedness. The diagonal lines of the "W", for example, now show the "stairway" shape characteristic of nearest-neighbor interpolation. Other scaling methods below are better at preserving smooth contours in the image.
			Bilinear interpolation Linear (or bilinear, in two dimensions) interpolation is typically good for changing the size of an image, but causes some undesirable softening of details and can still be somewhat jagged.
			Bicubic interpolation Better scaling methods include Lanczos resampling and Mitchell-Netravali filters.
			Fourier-based interpolation Simple Fourier based interpolation based on padding of the frequency domain with zero components (a smooth-window-based approach would reduce the ringing). Beside the good conservation of details, notable is the ringing and the circular bleeding of content from the left border to right border (and way around).
			Edge-directed interpolation Edge-directed interpolation algorithms aim to preserve edges in the image after scaling, unlike other algorithms which can produce staircase artifacts around diagonal lines or curves. Examples of algorithms for this task include New Edge-Directed Interpolation (NEDI),^[1]^[2] Edge-Guided Image Interpolation (EGGI),^[3] Iterative Curvature-Based Interpolation (ICBI),^{[ citation needed ]} and Directional Cubic Convolution Interpolation (DCCI).^[4] A study found that DCCI had the best scores in PSNR and SSIM on a series of test images.^[5]
			Pixel art scaling algorithms (hqx) For magnifying computer graphics with low resolution and few colors (usually from 2 to 256 colors), better results will be achieved by pixel art scaling algorithms such as hqx or xbr. These produce sharp edges and maintain high level of detail. Unfortunately due to the standardized size of 218x80 pixels, the "Wiki" image cannot use HQ4x or 4xBRZ to better demonstrate the artifacts they may produce such as row shifting. The example images use HQ4x and HQ2x respectively.
			Pixel art scaling algorithms (xbr) The xbr family is very useful for creating smooth edges. It will however deform the shape significantly, which in many cases creates a very appealing result. However it will create an effect similar to posterization by grouping together local areas into a single colour. It will also remove small details if in-between larger ones which connect together. The example images use 4xBRZ and 2xBRZ respectively.
			Pixel art scaling algorithms (GemCutter) An adaptable technique which can deliver variable amounts of detail or smoothness. It aims to preserve the shape and coordinates of original details, without blurring those details into neighboring ones. It will avoid blending pixels which directly touch each other, and instead only blend pixels with their diagonal neighbors. The "Cutter" name comes from its tendency to cut corners of squares and turn them into diamonds, as well as create distinct faces along stair-stepped pixels, i.e. those which exist on along the angles of edges found on a diamond. The "Gem" prefix both refers to the diamond cut, and also many traditional gem cuts which involve cutting corners at a 45-degree angle. The example images use GemCutter Preserve Details (Top), and GemCutter Smooth Edges (Bottom).
			Image tracing Vectorization first creates a resolution-independent vector representation of the graphic to be scaled. Then the resolution-independent version is rendered as a raster image at the desired resolution. This technique is used by Adobe Illustrator Live Trace, Inkscape, and several recent papers.^[6] Scalable Vector Graphics are well suited to simple geometric images, while photographs do not fare well with vectorization due to their complexity. Note that the special characteristics of vectors allow for greater resolution example images. The other algorithms are standardized to a resolution of 160x160 and 218x80 pixels respectively.
			Deep convolutional neural networks Using machine learning, convincing details are generated as best guesses by learning common patterns from a training data set. The upscaled result is sometimes described as a hallucination because the information introduced may not correspond to the content of the source. Enhanced deep residual network (EDSR) methods have been developed by optimizing conventional residual neural network architecture.^[7] Programs that use this method include waifu2x, Imglarger and Neural Enhance.
			Deep convolutional neural networks using perceptual loss Developed on the basis of the super-resolution generative adversarial network (SRGAN) method,^[8] enhanced SRGAN (ESRGAN)^[9] is an incremental tweaking of the same generative adversarial network basis. Both methods rely on a perceptual loss function ^[10] to evaluate training iterations.

Related Research Articles

Super-resolution imaging (SR) is a class of techniques that enhance (increase) the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced.

Texture synthesis is the process of algorithmically constructing a large digital image from a small digital sample image by taking advantage of its structural content. It is an object of research in computer graphics and is used in many fields, amongst others digital image editing, 3D computer graphics and post-production of films.

In computer graphics and digital imaging, imagescaling refers to the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement.

Demosaicing, also known as color reconstruction, is a digital image processing algorithm used to reconstruct a full color image from the incomplete color samples output from an image sensor overlaid with a color filter array (CFA) such as a Bayer filter. It is also known as CFA interpolation or debayering.

Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems. This is based on the principle that, through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Nyquist–Shannon sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity, which requires the signal to be sparse in some domain. The second one is incoherence, which is applied through the isometric property, which is sufficient for sparse signals. Compressed sensing has applications in, for example, magnetic resonance imaging (MRI) where the incoherence condition is typically satisfied.

<span class="mw-page-title-main">Object detection</span> Computer technology related to computer vision and image processing

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

Deep learning is a subset of machine learning that focuses on utilizing neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

In computer vision, a saliency map is an image that highlights either the region on which people's eyes focus first or the most relevant regions for machine learning models. The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system or an otherwise opaque ML model.

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par with or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

Neural style transfer (NST) refers to a class of software algorithms that manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s).

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by biological neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling this period an "AI winter".

An energy-based model (EBM) is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence.

Deep learning in photoacoustic imaging combines the hybrid imaging modality of photoacoustic imaging (PA) with the rapidly evolving field of deep learning. Photoacoustic imaging is based on the photoacoustic effect, in which optical absorption causes a rise in temperature, which causes a subsequent rise in pressure via thermo-elastic expansion. This pressure rise propagates through the tissue and is sensed via ultrasonic transducers. Due to the proportionality between the optical absorption, the rise in temperature, and the rise in pressure, the ultrasound pressure wave signal can be used to quantify the original optical energy deposition within the tissue.

The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN) or a diffusion model.

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches, serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.

Small object detection is a particular case of object detection where various techniques are employed to detect small objects in digital images and videos. "Small objects" are objects having a small pixel footprint in the input image. In areas such as aerial imagery, state-of-the-art object detection techniques under performed because of small objects.

A text-to-video model is a machine learning model that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models.

References

↑ "Edge-Directed Interpolation" . Retrieved 19 February 2016.
↑ Xin Li; Michael T. Orchard. "NEW EDGE DIRECTED INTERPOLATION" (PDF). 2000 IEEE International Conference on Image Processing: 311. Archived from the original (PDF) on 2016-02-14. Retrieved 2016-07-03.
↑ Zhang, D.; Xiaolin Wu (2006). "An Edge-Guided Image Interpolation Algorithm via Directional Filtering and Data Fusion". IEEE Transactions on Image Processing. 15 (8): 2226–38. Bibcode:2006ITIP...15.2226Z. doi:10.1109/TIP.2006.877407. PMID 16900678. S2CID 9760560.
↑ Dengwen Zhou; Xiaoliu Shen. "Image Zooming Using Directional Cubic Convolution Interpolation" . Retrieved 13 September 2015.
↑ Shaode Yu; Rongmao Li; Rui Zhang; Mou An; Shibin Wu; Yaoqin Xie (2013). "Performance evaluation of edge-directed interpolation methods for noise-free images". arXiv: 1303.6455 [cs.CV].
↑ Johannes Kopf and Dani Lischinski (2011). "Depixelizing Pixel Art". ACM Transactions on Graphics. 30 (4): 99:1–99:8. doi:10.1145/2010324.1964994. Archived from the original on 2015-09-01. Retrieved 24 October 2012.
↑ Lim, Bee; Son, Sanghyun; Kim, Heewon; Nah, Seungjun; Kyoung Mu Lee (2017). "Enhanced Deep Residual Networks for Single Image Super-Resolution". arXiv: 1707.02921 [cs.CV].
↑ "Generative Adversarial Network and Super Resolution GAN(SRGAN)". 26 April 2020. Retrieved 26 August 2020.
↑ Wang, Xintao; Yu, Ke; Wu, Shixiang; Gu, Jinjin; Liu, Yihao; Dong, Chao; Chen Change Loy; Qiao, Yu; Tang, Xiaoou (2018). "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks". arXiv: 1809.00219 [cs.CV].
↑ "Perceptual Loss Functions". 17 May 2019. Retrieved 26 August 2020.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Edge-Directed Interpolation" . Retrieved 19 February 2016.

[2] Xin Li; Michael T. Orchard. "NEW EDGE DIRECTED INTERPOLATION" (PDF). 2000 IEEE International Conference on Image Processing: 311. Archived from the original (PDF) on 2016-02-14. Retrieved 2016-07-03.

[3] Zhang, D.; Xiaolin Wu (2006). "An Edge-Guided Image Interpolation Algorithm via Directional Filtering and Data Fusion". IEEE Transactions on Image Processing. 15 (8): 2226–38. Bibcode:2006ITIP...15.2226Z. doi:10.1109/TIP.2006.877407. PMID 16900678. S2CID 9760560.

[4] Dengwen Zhou; Xiaoliu Shen. "Image Zooming Using Directional Cubic Convolution Interpolation" . Retrieved 13 September 2015.

[5] Shaode Yu; Rongmao Li; Rui Zhang; Mou An; Shibin Wu; Yaoqin Xie (2013). "Performance evaluation of edge-directed interpolation methods for noise-free images". arXiv: 1303.6455 [cs.CV].

[pixelart-6] Johannes Kopf and Dani Lischinski (2011). "Depixelizing Pixel Art". ACM Transactions on Graphics. 30 (4): 99:1–99:8. doi:10.1145/2010324.1964994. Archived from the original on 2015-09-01. Retrieved 24 October 2012.

[7] Lim, Bee; Son, Sanghyun; Kim, Heewon; Nah, Seungjun; Kyoung Mu Lee (2017). "Enhanced Deep Residual Networks for Single Image Super-Resolution". arXiv: 1707.02921 [cs.CV].

[8] "Generative Adversarial Network and Super Resolution GAN(SRGAN)". 26 April 2020. Retrieved 26 August 2020.

[9] Wang, Xintao; Yu, Ke; Wu, Shixiang; Gu, Jinjin; Liu, Yihao; Dong, Chao; Chen Change Loy; Qiao, Yu; Tang, Xiaoou (2018). "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks". arXiv: 1809.00219 [cs.CV].

[10] "Perceptual Loss Functions". 17 May 2019. Retrieved 26 August 2020.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

v t e Video processing
Post-processing	Deblocking Resizing Comparison Deinterlacing Denoising Deflicking
Special processing	Film colorization (tinting) Color grading Film look Super-resolution imaging Video matting Uncompressed Pixel art scaling Telecine

Comparison gallery of image scaling algorithms

Contents

Scaling methods

Nearest-neighbor interpolation

Bilinear interpolation

Bicubic interpolation

Fourier-based interpolation

Edge-directed interpolation

Pixel art scaling algorithms (hqx)

Pixel art scaling algorithms (xbr)

Pixel art scaling algorithms (GemCutter)

Image tracing

Deep convolutional neural networks

Deep convolutional neural networks using perceptual loss

Related Research Articles

References