Face hallucination

Last updated

Face hallucination refers to any superresolution technique which applies specifically to faces. It comprises techniques which take noisy or low-resolution facial images, and convert them into high-resolution images using knowledge about typical facial features. It can be applied in facial recognition systems for identifying faces faster and more effectively. Due to the potential applications in facial recognition systems, face hallucination has become an active area of research.

Contents

Differences between face hallucination and super-resolution

Image superresolution is a class of techniques that enhance the resolution of an image using a set of low resolution images. The main difference between both techniques is that face hallucination is the super-resolution for face images and always employs typical face priors with strong cohesion to face domain concept.

Measures

An image is considered high resolution when it measures 128×96 pixels.[ citation needed ] Therefore, the goal of face hallucination is to make the input image reach that number of pixels. The most common values of the input image is usually 32×24 pixels or 16×12 pixels.[ citation needed ]

Moreover, the challenge in face hallucination is the difficulty of aligning faces. Many methods are required to bring the alignment between the test sample taken and the training samples. Even a slight amount of wrong alignment can degrade the method and the result.

The algorithm

In the last two decades[ when? ], many specific face hallucination algorithms have been reported to perform this technique. Although the existing face hallucination methods have achieved great success, there is still much room for improvement.

The common algorithms usually perform two steps: the first step generates global face image which keeps the characteristics of the face using probabilistic method maximum a posteriori (MAP). The second step produces residual image to compensate the result of the first step. Furthermore, all the algorithms are based on a set of high- and low-resolution training image pairs, which incorporates image super-resolution techniques into facial image synthesis.

Any face hallucination algorithm must be based in three constraints:

Data constraint
The output image should be nearly to the original image when it is smoothed or down-sampled.
Global constraint
The resulting image always contains all common features of a human face. The facial features must be coherent always. Without this constraint, the output could be too noisy.
Local constraint
The output image must have very specific features of the face image having resemblance with photorealistic local features. Without this constraint, the resulting image could be too smooth.

Methods

Face hallucination enhances facial features with improved image resolution using different methods.

Interpolation

The simplest way to increase image resolution is a direct interpolation increasing the pixel intensities of input images with such algorithms as nearest-neighbour, bilinear and variants of cubic spline interpolation. Another approach to interpolation is to learn how to interpolate from a set of high resolution training samples, together with the corresponding low resolution versions of them. (pg 4 baker and kanade)

However, the results are very poor since no new information is added in the process. That is why new methods have been proposed in recent years.

Face hallucination based on Bayes theorem

This method was proposed by Baker and Kanade, [1] the pioneering of face hallucination technique.

The algorithm is based on Bayesian MAP formulation and use gradient descent to optimize the objective function and it generates the high frequency details from a parent structure with the assistance of training samples.

Super-resolution from multiple views using learnt image models

Capel and Zisserman [2] was the first to propose the local face image SR method.

It divided the face image into four key regions: the eyes, nose, mouth and cheek areas. For each area, it learns a separate Principal Component Analysis (PCA) basis and reconstructs the area separately. However, the reconstructed face images in this method have visible artifacts between different regions.

Face Hallucination via Sparse Coding

This method was proposed by J. Yang and H. Tang [3] and it is based in hallucinating of High-Resolution face image by taking Low-Resolution input value. The method exploits the facial features by using a Non-negative Matrix factorization (NMF) approach to learn localized part-based subspace. That subspace is effective for super-resolving the incoming face.

For further enhance the detailed facial structure by using a local patch method based on sparse representation.

Face Hallucination by Eigentransformation

This method was proposed by Wang and Tang [4] and it uses an eigentransformation. This method sees the solution as a transformation between different styles of image and uses a principal component analysis (PCA) applied to the low-resolution face image. By selecting the number of "eigenfaces", we can extract amount of facial image information of low resolution and remove the noise.

In the eigentransformation algorithm, the hallucinated face image is synthesized by the linear combination of high-resolution training images and the combination coefficients come from the low-resolution face images using the principal component analysis method. The algorithm improves the image resolution by inferring some high-frequency face details from the low-frequency facial information by taking advantage of the correlation between the two parts. Because of the structural similarity among face images, in multiresolution analysis, there exists strong correlation between the high-frequency band and low-frequency band. For high-resolution face images, PCA can compact this correlated information onto a small number of principal components. Then, in the eigentransformation process, these principal components can be inferred from the principal components of the low-resolution face by mapping between the high- and low-resolution training pairs.

Two-step approach

This method was developed by C. Liu and Shum [5] [6] and it integrates a global parametric and a local parametric model. The global model is a lineal parametric inference and the local model is a patch-based non-parametric Markov network.

In first step, learn the relationship between the high resolution image and their smoothed and down-sampled. In second step, model the residue between an original high resolution and the reconstructed high-resolution image after applying learned lineal model by a non-parametric Markov network to capture the high-frequency content of faces.

Face hallucination based on MCA

This algorithm formulates the face hallucination as an image decomposition problem and propose a Morphological Component Analysis (MCA) [7] based method.

The method is presented in three-step framework. Firstly, a low-resolution input image is up-sampled by an interpolation. The interpolated image can be represented as a superposition of the global high-resolution image and an “unsharp mask”. In the second step, the interpolated image is decomposed into a global high-resolution image by using MCA to obtain the global approximation of the HR image from interpolated image. Finally, facial detail information is compensated onto the estimated HT image by using the neighbour reconstruction of position-patches.

Another methods

Results

All methods presented above have very satisfactory results and meet expectations, so it is difficult to determine which method is most effective and which gives a better result.

However, it can be stated that:

Related Research Articles

<span class="mw-page-title-main">Rendering (computer graphics)</span> Process of generating an image from a model

Rendering or image synthesis is the process of generating a photorealistic or non-photorealistic image from a 2D or 3D model by means of a computer program. The resulting image is referred to as the render. Multiple models can be defined in a scene file containing objects in a strictly defined language or data structure. The scene file contains geometry, viewpoint, texture, lighting, and shading information describing the virtual scene. The data contained in the scene file is then passed to a rendering program to be processed and output to a digital image or raster graphics image file. The term "rendering" is analogous to the concept of an artist's impression of a scene. The term "rendering" is also used to describe the process of calculating effects in a video editing program to produce the final video output.

In digital signal processing, spatial anti-aliasing is a technique for minimizing the distortion artifacts (aliasing) when representing a high-resolution image at a lower resolution. Anti-aliasing is used in digital photography, computer graphics, digital audio, and many other applications.

<span class="mw-page-title-main">Chroma subsampling</span> Practice of encoding images

Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance.

<span class="mw-page-title-main">Eigenface</span> Set of eigenvectors used in the computer vision problem of human face recognition

An eigenface is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set.

In computer graphics, mipmaps or pyramids are pre-calculated, optimized sequences of images, each of which is a progressively lower resolution representation of the previous. The height and width of each image, or level, in the mipmap is a factor of two smaller than the previous level. Mipmaps do not have to be square. They are intended to increase rendering speed and reduce aliasing artifacts. A high-resolution mipmap image is used for high-density samples, such as for objects close to the camera; lower-resolution images are used as the object appears farther away. This is a more efficient way of downfiltering (minifying) a texture than sampling all texels in the original texture that would contribute to a screen pixel; it is faster to take a constant number of samples from the appropriately downfiltered textures. Mipmaps are widely used in 3D computer games, flight simulators, other 3D imaging systems for texture filtering, and 2D and 3D GIS software. Their use is known as mipmapping. The letters MIP in the name are an acronym of the Latin phrase multum in parvo, meaning "much in little".

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

<span class="mw-page-title-main">Bayer filter</span> Color filter array

A Bayer filter mosaic is a color filter array (CFA) for arranging RGB color filters on a square grid of photosensors. Its particular arrangement of color filters is used in most single-chip digital image sensors used in digital cameras, and camcorders to create a color image. The filter pattern is half green, one quarter red and one quarter blue, hence is also called BGGR,RGBG, GRBG, or RGGB.

<span class="mw-page-title-main">Facial recognition system</span> Technology capable of matching a face from an image against a database of faces

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.

Super-resolution imaging (SR) is a class of techniques that enhance (increase) the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced.

<span class="mw-page-title-main">Pixel-art scaling algorithms</span> Upscaling filters for pixel art graphics

Pixel art scaling algorithms are graphical filters that attempt to enhance the appearance of hand-drawn 2D pixel art graphics. These algorithms are a form of automatic image enhancement. Pixel art scaling algorithms employ methods significantly different than the common methods of image rescaling, which have the goal of preserving the appearance of images.

Texture synthesis is the process of algorithmically constructing a large digital image from a small digital sample image by taking advantage of its structural content. It is an object of research in computer graphics and is used in many fields, amongst others digital image editing, 3D computer graphics and post-production of films.

<span class="mw-page-title-main">Image scaling</span> Changing the resolution of a digital image

In computer graphics and digital imaging, imagescaling refers to the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement.

Demosaicing, also known as color reconstruction, is a digital image processing algorithm used to reconstruct a full color image from the incomplete color samples output from an image sensor overlaid with a color filter array (CFA) such as a Bayer filter. It is also known as CFA interpolation or debayering.

<span class="mw-page-title-main">Supersampling</span> Spatial anti-aliasing method

Supersampling or supersampling anti-aliasing (SSAA) is a spatial anti-aliasing method, i.e. a method used to remove aliasing from images rendered in computer games or other computer programs that generate imagery. Aliasing occurs because unlike real-world objects, which have continuous smooth curves and lines, a computer screen shows the viewer a large number of small squares. These pixels all have the same size, and each one has a single color. A line can only be shown as a collection of pixels, and therefore appears jagged unless it is perfectly horizontal or vertical. The aim of supersampling is to reduce this effect. Color samples are taken at several instances inside the pixel, and an average color value is calculated. This is achieved by rendering the image at a much higher resolution than the one being displayed, then shrinking it to the desired size, using the extra pixels for calculation. The result is a downsampled image with smoother transitions from one line of pixels to another along the edges of objects. The number of samples determines the quality of the output.

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems, such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph. Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph, the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization.

Super-resolution microscopy is a series of techniques in optical microscopy that allow such images to have resolutions higher than those imposed by the diffraction limit, which is due to the diffraction of light. Super-resolution imaging techniques rely on the near-field or on the far-field. Among techniques that rely on the latter are those that improve the resolution only modestly beyond the diffraction-limit, such as confocal microscopy with closed pinhole or aided by computational methods such as deconvolution or detector-based pixel reassignment, the 4Pi microscope, and structured-illumination microscopy technologies such as SIM and SMI.

<span class="mw-page-title-main">3D reconstruction from multiple images</span> Creation of a 3D model from a set of images

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

Single-particle trajectories (SPTs) consist of a collection of successive discrete points causal in time. These trajectories are acquired from images in experimental data. In the context of cell biology, the trajectories are obtained by the transient activation by a laser of small dyes attached to a moving molecule.

References

  1. Baker, Simon; Kanade, Takeo. "Hallucinating Faces" . Retrieved 18 November 2014.{{cite journal}}: Cite journal requires |journal= (help)
  2. Capel, D.; Zisserman, A. (2001). "Super-resolution from multiple views using learnt image models" (PDF). Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 . Vol. 2. Kauai, Hawaii. pp.  627–634. doi:10.1109/CVPR.2001.991022. ISBN   978-0-7695-1272-3. S2CID   14090080 . Retrieved 4 March 2015.{{cite book}}: CS1 maint: location missing publisher (link)
  3. Yang, Jianchao; Tang, Hao; Ma, Yi; Huang, Thomas. "Face Hallucination Via Sparse Coding" (PDF). Retrieved 4 March 2015.{{cite journal}}: Cite journal requires |journal= (help)
  4. Xiaogang Wang and Xiaoou Tang "Hallucinating Face by Eigentransformation" (PDF). 2005. Retrieved 17 November 2014.
  5. C. Liu, H.Y. Shum and W.T Freeman "Face Hallucination: Theory and Practice". October 2007. Retrieved 20 November 2014.
  6. C. Liu, H.Y. Shum and W.T Freeman "Face Hallucination: Theory and Practice" (PDF). October 2007. Retrieved 20 November 2014.
  7. Yan Liang, Xiaohua Xie, Jian-Huang Lai "Face Hallucination based on Morphological Component Analysis" (PDF). Oct 2012. Archived from the original (PDF) on 5 December 2014. Retrieved 21 November 2014.

Bibliography