This article includes a list of general references, but it lacks sufficient corresponding inline citations .(July 2020) |
Free viewpoint television (FTV) is a system for viewing natural video, allowing the user to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position. [1] The equivalent system for computer-simulated video is known as virtual reality. With FTV, the focus of attention can be controlled by the viewers rather than a director, meaning that each viewer may be observing a unique viewpoint. It remains to be seen how FTV will affect television watching as a group activity.
Systems for rendering arbitrary views of natural scenes have been well known in the computer vision community for a long time but only in recent years[ when? ] has the speed and quality reached levels that are suitable for serious consideration as an end-user system.[ citation needed ]
Professor Masayuki Tanimoto from Nagoya University (Japan) has done much to promote the use of the term "free viewpoint television" and has published many papers on the ray space representation,[ citation needed ][ clarification needed ] although other techniques can be, and are used for FTV.
QuickTime VR might be considered a predecessor to FTV.[ citation needed ]
In order to acquire the views necessary to allow a high-quality rendering of the scene from any angle, several cameras are placed around the scene; either in a studio environment or an outdoor venue, such as a sporting arena for example. The output Multiview Video (MVV) must then be packaged suitably so that the data may be compressed and also so that the users' viewing device may easily access the relevant views to interpolate new views.[ citation needed ]
It is not enough to simply place cameras around the scene to be captured. The geometry of the camera set up must be measured by a process known in computer vision as "camera calibration."[ citation needed ] Manual alignment would be too cumbersome so typically a "best effort" alignment is performed prior to capturing a test pattern that is used to generate calibration parameters.
Restricted free viewpoint television views for large environments can be captured from a single location camera system mounted on a moving platform.[ citation needed ] Depth data must also be captured, which is necessary to generate the free viewpoint. The Google Street View capture system is an example with limited functionality. The first full commercial implementation, iFlex, was delivered in 2009 by Real Time Race. [2]
Multiview video capture varies from partial (usually about 30 degrees) to complete (360 degrees) coverage of the scene. Therefore, it is possible to output stereoscopic views suitable for viewing with a 3D display or other 3D methods. Systems with more physical cameras can capture images with more coverage of the viewable scene, however, it is likely that certain regions will always be occluded from any viewpoint. A larger number of cameras should make it possible to obtain high quality output because less interpolation is needed.
More cameras mean that efficient coding of the Multiview Video is required. This may not be such a big disadvantage as there are representations that can remove the redundancy in MVV; such as inter view coding using MPEG-4 or Multiview Video Coding, the ray space representation, geometry videos, [3] [ clarification needed ] etc.
In terms of hardware, the user requires a viewing device that can decode MVV and synthesize new viewpoints, and a 2D or 3D display.
The Moving Picture Experts Group (MPEG) has normalized Annex H of MPEG-4 AVC in March 2009 called Multiview Video Coding after the work of a group called '3DAV' (3D Audio and Visual) headed by Aljoscha Smolic [4] at the Heinrich-Hertz Institute.
Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
A point cloud is a discrete set of data points in space. The points may represent a 3D shape or object. Each point position has its set of Cartesian coordinates. Points may contain data other than position such as RGB colors, normals, timestamps and others. Point clouds are generally produced by 3D scanners or by photogrammetry software, which measure many points on the external surfaces of objects around them. As the output of 3D scanning processes, point clouds are used for many purposes, including to create 3D computer-aided design (CAD) or geographic information systems (GIS) models for manufactured parts, for metrology and quality inspection, and for a multitude of visualizing, animating, rendering, and mass customization applications.
Bullet time is a visual effect or visual impression of detaching the time and space of a camera from that of its visible subject. It is a depth enhanced simulation of variable-speed action and performance found in films, broadcast advertisements, and realtime graphics within video games and other special media. It is characterized by its extreme transformation of both time, and of space. This is almost impossible with conventional slow motion, as the physical camera would have to move implausibly fast; the concept implies that only a "virtual camera", often illustrated within the confines of a computer-generated environment such as a virtual world or virtual reality, would be capable of "filming" bullet-time types of moments. Technical and historical variations of this effect have been referred to as time slicing, view morphing, temps mort and virtual cinematography.
Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.
Autodesk 3ds Max, formerly 3D Studio and 3D Studio Max, is a professional 3D computer graphics program for making 3D animations, models, games and images. It is developed and produced by Autodesk Media and Entertainment. It has modeling capabilities and a flexible plugin architecture and must be used on the Microsoft Windows platform. It is frequently used by video game developers, many TV commercial studios, and architectural visualization studios. It is also used for movie effects and movie pre-visualization. 3ds Max features shaders, dynamic simulation, particle systems, radiosity, normal map creation and rendering, global illumination, a customizable user interface, and its own scripting language.
A camcorder is a self-contained portable electronic device with video and recording as its primary function. It is typically equipped with an articulating screen mounted on the left side, a belt to facilitate holding on the right side, hot-swappable battery facing towards the user, hot-swappable recording media, and an internally contained quiet optical zoom lens.
In 3D computer graphics, hidden-surface determination is the process of identifying what surfaces and parts of surfaces can be seen from a particular viewing angle. A hidden-surface determination algorithm is a solution to the visibility problem, which was one of the first major problems in the field of 3D computer graphics. The process of hidden-surface determination is sometimes called hiding, and such an algorithm is sometimes called a hider. When referring to line rendering it is known as hidden-line removal. Hidden-surface determination is necessary to render a scene correctly, so that one may not view features hidden behind the model itself, allowing only the naturally viewable portion of the graphic to be visible.
A volumetric display device is a display device that forms a visual representation of an object in three physical dimensions, as opposed to the planar image of traditional screens that simulate depth through a number of different visual effects. One definition offered by pioneers in the field is that volumetric displays create 3D imagery via the emission, scattering, or relaying of illumination from well-defined regions in (x,y,z) space.
Real-time computer graphics or real-time rendering is the sub-field of computer graphics focused on producing and analyzing images in real time. The term can refer to anything from rendering an application's graphical user interface (GUI) to real-time image analysis, but is most often used in reference to interactive 3D computer graphics, typically using a graphics processing unit (GPU). One example of this concept is a video game that rapidly renders changing 3D environments to produce an illusion of motion.
A visual hull is a geometric entity created by shape-from-silhouette 3D reconstruction technique introduced by A. Laurentini. This technique assumes the foreground object in an image can be separated from the background. Under this assumption, the original image can be thresholded into a foreground/background binary image, which we call a silhouette image. The foreground mask, known as a silhouette, is the 2D projection of the corresponding 3D foreground object. Along with the camera viewing parameters, the silhouette defines a back-projected generalized cone that contains the actual object. This cone is called a silhouette cone. The upper right thumbnail shows two such cones produced from two silhouette images taken from different viewpoints. The intersection of the two cones is called a visual hull, which is a bounding geometry of the actual 3D object. When the reconstructed geometry is only used for rendering from a different viewpoint, the implicit reconstruction together with rendering can be done using graphics hardware.
TDVision Systems, Inc., was a company that designed products and system architectures for stereoscopic video coding, stereoscopic video games, and head mounted displays. The company was founded by Manuel Gutierrez Novelo and Isidoro Pessah in Mexico in 2001 and moved to the United States in 2004.
2D-plus-Depth is a stereoscopic video coding format that is used for 3D displays, such as Philips WOWvx. Philips discontinued work on the WOWvx line in 2009, citing "current market developments". Currently, this Philips technology is used by SeeCubic company, led by former key 3D engineers and scientists of Philips. They offer autostereoscopic 3D displays which use the 2D-plus-Depth format for 3D video input.
In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.
Multi View Video Coding is a stereoscopic video coding standard for video compression that allows for encoding video sequences captured simultaneously from multiple camera angles in a single video stream. It uses the 2D plus Delta method and it is an amendment to the H.264 video compression standard, developed jointly by MPEG and VCEG, with the contributions from a number of companies, such as Panasonic and LG Electronics.
A variety of computer graphic techniques have been used to display video game content throughout the history of video games. The predominance of individual techniques have evolved over time, primarily due to hardware advances and restrictions such as the processing power of central or graphics processing units.
DVB 3D-TV is a deprecated standard that partially came out at the end of 2010 which included techniques and procedures to send a three-dimensional video signal through actual DVB transmission standards. There was a commercial requirement text for 3D TV broadcasters and set-top box manufacturers, but no technical information was in there.
Chessboards arise frequently in computer vision theory and practice because their highly structured geometry is well-suited for algorithmic detection and processing. The appearance of chessboards in computer vision can be divided into two main areas: camera calibration and feature extraction. This article provides a unified discussion of the role that chessboards play in the canonical methods from these two areas, including references to the seminal literature, examples, and pointers to software implementations.
This is a glossary of terms relating to computer graphics.
Volumetric capture or volumetric video is a technique that captures a three-dimensional space, such as a location or performance. This type of volumography acquires data that can be viewed on flat screens as well as using 3D displays and VR headset. Consumer-facing formats are numerous and the required motion capture techniques lean on computer graphics, photogrammetry, and other computation-based methods. The viewer generally experiences the result in a real-time engine and has direct input in exploring the generated volume.