View synthesis

Last updated

In computer graphics, view synthesis, or novel view synthesis, is a task which consists of generating images of a specific subject or scene from a specific point of view, when the only available information are pictures taken from different point of views. [1]

Contents

Such task was only recently (late 2010s - early 2020s) tackled with significative success, mostly as a result of advances in machine learning. Notable successful methods are Neural radiance fields, and 3D gaussian splatting.

Applications of view synthesis are numerous, one of them being Free viewpoint television.

See also

Related Research Articles

<span class="mw-page-title-main">Point cloud</span> Set of data points in three-dimensional space

A point cloud is a discrete set of data points in space. The points may represent a 3D shape or object. Each point position has its set of Cartesian coordinates. Point clouds are generally produced by 3D scanners or by photogrammetry software, which measure many points on the external surfaces of objects around them. As the output of 3D scanning processes, point clouds are used for many purposes, including to create 3D computer-aided design (CAD) or geographic information systems (GIS) models for manufactured parts, for metrology and quality inspection, and for a multitude of visualizing, animating, rendering, and mass customization applications.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

In computer graphics and computer vision, image-based modeling and rendering (IBMR) methods rely on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene.

<span class="mw-page-title-main">Linda Shapiro</span>

Linda G. Shapiro is a professor in the Department of Computer Science and Engineering, a professor of electrical engineering, and adjunct professor of Biomedical Informatics and Medical Education at the University of Washington.

<span class="mw-page-title-main">Visual hull</span>

A visual hull is a geometric entity created by shape-from-silhouette 3D reconstruction technique introduced by A. Laurentini. This technique assumes the foreground object in an image can be separated from the background. Under this assumption, the original image can be thresholded into a foreground/background binary image, which we call a silhouette image. The foreground mask, known as a silhouette, is the 2D projection of the corresponding 3D foreground object. Along with the camera viewing parameters, the silhouette defines a back-projected generalized cone that contains the actual object. This cone is called a silhouette cone. The upper right thumbnail shows two such cones produced from two silhouette images taken from different viewpoints. The intersection of the two cones is called a visual hull, which is a bounding geometry of the actual 3D object. When the reconstructed geometry is only used for rendering from a different viewpoint, the implicit reconstruction together with rendering can be done using graphics hardware.

Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.

<span class="mw-page-title-main">Bilateral filter</span> Smoothing filler for images

A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images. It replaces the intensity of each pixel with a weighted average of intensity values from nearby pixels. This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on Euclidean distance of pixels, but also on the radiometric differences. This preserves sharp edges.

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.

<span class="mw-page-title-main">3D reconstruction</span> Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

<span class="mw-page-title-main">Omnidirectional (360-degree) camera</span> Camera that can see in all directions

In photography, an omnidirectional camera, also known as 360-degree camera, is a camera having a field of view that covers approximately the entire sphere or at least a full circle in the horizontal plane. Omnidirectional cameras are important in areas where large visual field coverage is needed, such as in panoramic photography and robotics.

<span class="mw-page-title-main">Saraju Mohanty</span> Indian-American computer scientist

Saraju Mohanty is an Indian-American professor of the Department of Computer Science and Engineering, and the director of the Smart Electronic Systems Laboratory, at the University of North Texas in Denton, Texas. Mohanty received a Glorious India Award – Rich and Famous NRIs of America in 2017 for his contributions to the discipline. Mohanty is a researcher in the areas of "smart electronics for smart cities/villages", "smart healthcare", "application-Specific things for efficient edge computing", and "methodologies for digital and mixed-signal hardware". He has made significant research contributions to security by design (SbD) for electronic systems, hardware-assisted security (HAS) and protection, high-level synthesis of digital signal processing (DSP) hardware, and mixed-signal integrated circuit computer-aided design and electronic design automation. Mohanty has been the editor-in-chief (EiC) of the IEEE Consumer Electronics Magazine during 2016-2021. He has held the Chair of the IEEE Computer Society's Technical Committee on Very Large Scale Integration during 2014-2018. He holds 4 US patents in the areas of his research, and has published 500 research articles and 5 books. He is ranked among top 2% faculty around the world in Computer Science and Engineering discipline as per the standardized citation metric adopted by the Public Library of Science Biology journal.

<span class="mw-page-title-main">Michael J. Black</span> American-born computer scientist

Michael J. Black is an American-born computer scientist working in Tübingen, Germany. He is a founding director at the Max Planck Institute for Intelligent Systems where he leads the Perceiving Systems Department in research focused on computer vision, machine learning, and computer graphics. He is also an Honorary Professor at the University of Tübingen.

An energy-based model (EBM) (also called a Canonical Ensemble Learning(CEL) or Learning via Canonical Ensemble (LCE)) is an application of canonical ensemble formulation of statistical physics for learning from data problems. The approach prominently appears in generative models (GMs).

In the domain of physics and probability, the filters, random fields, and maximum entropy (FRAME) model is a Markov random field model of stationary spatial processes, in which the energy function is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. The FRAME model was originally developed by Song-Chun Zhu, Ying Nian Wu, and David Mumford for modeling stochastic texture patterns, such as grasses, tree leaves, brick walls, water waves, etc. This model is the maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters, where for each filter tuned to a specific scale and orientation, the marginal histogram is pooled over all the pixels in the image domain. The FRAME model is also proved to be equivalent to the micro-canonical ensemble, which was named the Julesz ensemble. Gibbs sampler is adopted to synthesize texture images by drawing samples from the FRAME model.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

Hatice Gunes is a Turkish computer scientist who is Professor of Affective Intelligence & Robotics at the University of Cambridge. Gunes leads the Affective Intelligence & Robotics Lab. Her research considers human robot interactions and the development of sophisticated technologies with emotional intelligence.

A neural radiance field (NeRF) is a method based on deep learning for reconstructing a three-dimensional representation of a scene from sparse two-dimensional images. The NeRF model enables learning of novel view synthesis, scene geometry, and the reflectance properties of the scene. Additional scene properties such as camera poses may also be jointly learned. NeRF enables rendering of photorealistic views from novel viewpoints. First introduced in 2020, it has since gained significant attention for its potential applications in computer graphics and content creation.

Xiaoming Liu is a Chinese-American computer scientist and an academic. He is a Professor in the Department of Computer Science and Engineering, MSU Foundation Professor as well as Anil K. and Nandita Jain Endowed Professor of Engineering at Michigan State University.

References

  1. "Velasco, Jose Maria", Benezit Dictionary of Artists, Oxford University Press, 2011-10-31, doi:10.1093/benz/9780199773787.article.b00189225 , retrieved 2024-03-31