View synthesis

Last updated

In computer graphics, view synthesis, or novel view synthesis, is a task which consists of generating images of a specific subject or scene from a specific point of view, when the only available information is pictures taken from different points of view. [1]

Contents

This task was only recently (late 2010s – early 2020s) tackled with significant success, mostly as a result of advances in machine learning. Notable successful methods are Neural radiance fields, and 3D gaussian splatting.

Applications of view synthesis are numerous, one of them being Free viewpoint television.

See also

Related Research Articles

<span class="mw-page-title-main">Point cloud</span> Set of data points in three-dimensional space

A point cloud is a discrete set of data points in space. The points may represent a 3D shape or object. Each point position has its set of Cartesian coordinates. Points may contain data other than position such as RGB colors, normals, timestamps and others. Point clouds are generally produced by 3D scanners or by photogrammetry software, which measure many points on the external surfaces of objects around them. As the output of 3D scanning processes, point clouds are used for many purposes, including to create 3D computer-aided design (CAD) or geographic information systems (GIS) models for manufactured parts, for metrology and quality inspection, and for a multitude of visualizing, animating, rendering, and mass customization applications.

In computer graphics and computer vision, image-based modeling and rendering (IBMR) methods rely on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene.

<span class="mw-page-title-main">Autostereoscopy</span> A method of displaying stereoscopic images without the use of special headgear or glasses

Autostereoscopy is any method of displaying stereoscopic images without the use of special headgear, glasses, something that affects vision, or anything for eyes on the part of the viewer. Because headgear is not required, it is also called "glasses-free 3D" or "glassesless 3D".

Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.

<span class="mw-page-title-main">Bilateral filter</span> Smoothing filler for images

A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images. It replaces the intensity of each pixel with a weighted average of intensity values from nearby pixels. This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on Euclidean distance of pixels, but also on the radiometric differences. This preserves sharp edges.

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.

<span class="mw-page-title-main">3D reconstruction</span> Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

<span class="mw-page-title-main">Omnidirectional (360-degree) camera</span> Camera that can see in all directions

In photography, an omnidirectional camera, also known as 360-degree camera, is a camera having a field of view that covers approximately the entire sphere or at least a full circle in the horizontal plane. Omnidirectional cameras are important in areas where large visual field coverage is needed, such as in panoramic photography and robotics.

High-level synthesis (HLS), sometimes referred to as C synthesis, electronic system-level (ESL) synthesis, algorithmic synthesis, or behavioral synthesis, is an automated design process that takes an abstract behavioral specification of a digital system and finds a register-transfer level structure that realizes the given behavior.

<span class="mw-page-title-main">Saraju Mohanty</span> Indian-American computer scientist

Saraju Mohanty is an Indian-American professor of the Department of Computer Science and Engineering, and the director of the Smart Electronic Systems Laboratory, at the University of North Texas in Denton, Texas. Mohanty received a Glorious India Award – Rich and Famous NRIs of America in 2017 for his contributions to the discipline. Mohanty is a researcher in the areas of "smart electronics for smart cities/villages", "smart healthcare", "application-Specific things for efficient edge computing", and "methodologies for digital and mixed-signal hardware". He has made significant research contributions to security by design (SbD) for electronic systems, hardware-assisted security (HAS) and protection, high-level synthesis of digital signal processing (DSP) hardware, and mixed-signal integrated circuit computer-aided design and electronic design automation. Mohanty has been the editor-in-chief (EiC) of the IEEE Consumer Electronics Magazine during 2016-2021. He has held the Chair of the IEEE Computer Society's Technical Committee on Very Large Scale Integration during 2014-2018. He holds 4 US patents in the areas of his research, and has published 500 research articles and 5 books. He is ranked among top 2% faculty around the world in Computer Science and Engineering discipline as per the standardized citation metric adopted by the Public Library of Science Biology journal.

Dorin Comaniciu is a Romanian-American computer scientist. He is the Senior Vice President of Artificial Intelligence and Digital Innovation at Siemens Healthcare.

<span class="mw-page-title-main">Michael J. Black</span> American-born computer scientist

Michael J. Black is an American-born computer scientist working in Tübingen, Germany. He is a founding director at the Max Planck Institute for Intelligent Systems where he leads the Perceiving Systems Department in research focused on computer vision, machine learning, and computer graphics. He is also an Honorary Professor at the University of Tübingen.

An energy-based model (EBM) (also called a Canonical Ensemble Learning(CEL) or Learning via Canonical Ensemble (LCE)) is an application of canonical ensemble formulation of statistical physics for learning from data problems. The approach prominently appears in generative models (GMs).

In the domain of physics and probability, the filters, random fields, and maximum entropy (FRAME) model is a Markov random field model of stationary spatial processes, in which the energy function is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. The FRAME model was originally developed by Song-Chun Zhu, Ying Nian Wu, and David Mumford for modeling stochastic texture patterns, such as grasses, tree leaves, brick walls, water waves, etc. This model is the maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters, where for each filter tuned to a specific scale and orientation, the marginal histogram is pooled over all the pixels in the image domain. The FRAME model is also proved to be equivalent to the micro-canonical ensemble, which was named the Julesz ensemble. Gibbs sampler is adopted to synthesize texture images by drawing samples from the FRAME model.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

A neural radiance field (NeRF) is a method based on deep learning for reconstructing a three-dimensional representation of a scene from two-dimensional images. The NeRF model enables downstream applications of novel view synthesis, scene geometry reconstruction, and obtaining the reflectance properties of the scene. Additional scene properties such as camera poses may also be jointly learned. First introduced in 2020, it has since gained significant attention for its potential applications in computer graphics and content creation.

<span class="mw-page-title-main">3D Face Morphable Model</span> Generative model for 3D textured faces

In computer vision and computer graphics, the 3D Face Morphable Model (3DFMM) is a generative technique for modeling textured 3D faces. The generation of new faces is based on a pre-existing database of example faces acquired through a 3D scanning procedure. All these faces are in dense point-to-point correspondence, which enables the generation of a new realistic face (morph) by combining the acquired faces. A new 3D face can be inferred from one or multiple existing images of a face or by arbitrarily combining the example faces. 3DFMM provides a way to represent face shape and texture disentangled from external factors, such as camera parameters and illumination.

<span class="mw-page-title-main">3D Gaussian splatting</span>

The 3D Gaussian splatting is a technique used in the field of Real-Time Radiance Field Rendering. It enables the creation of high-quality real-time novel-view scenes by stringing together multiple photos or videos, which had historically been a big challenge.

References

  1. "Velasco, Jose Maria", Benezit Dictionary of Artists, Oxford University Press, 2011-10-31, doi:10.1093/benz/9780199773787.article.b00189225 , retrieved 2024-03-31