Free viewpoint television

Last updated

Free viewpoint television (FTV) is a system for viewing natural video, allowing the user to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position. [1] The equivalent system for computer-simulated video is known as virtual reality. With FTV, the focus of attention can be controlled by the viewers rather than a director, meaning that each viewer may be observing a unique viewpoint. It remains to be seen how FTV will affect television watching as a group activity.

Contents

History

Systems for rendering arbitrary views of natural scenes have been well known in the computer vision community for a long time but only in recent years[ when? ] has the speed and quality reached levels that are suitable for serious consideration as an end-user system.[ citation needed ]

Professor Masayuki Tanimoto from Nagoya University (Japan) has done much to promote the use of the term "free viewpoint television" and has published many papers on the ray space representation,[ citation needed ][ clarification needed ] although other techniques can be, and are used for FTV.

QuickTime VR might be considered a predecessor to FTV.[ citation needed ]

Capture and display

In order to acquire the views necessary to allow a high-quality rendering of the scene from any angle, several cameras are placed around the scene; either in a studio environment or an outdoor venue, such as a sporting arena for example. The output Multiview Video (MVV) must then be packaged suitably so that the data may be compressed and also so that the users' viewing device may easily access the relevant views to interpolate new views.[ citation needed ]

It is not enough to simply place cameras around the scene to be captured. The geometry of the camera set up must be measured by a process known in computer vision as "camera calibration."[ citation needed ] Manual alignment would be too cumbersome so typically a "best effort" alignment is performed prior to capturing a test pattern that is used to generate calibration parameters.

Restricted free viewpoint television views for large environments can be captured from a single location camera system mounted on a moving platform.[ citation needed ] Depth data must also be captured, which is necessary to generate the free viewpoint. The Google Street View capture system is an example with limited functionality. The first full commercial implementation, iFlex, was delivered in 2009 by Real Time Race. [2]

Multiview video capture varies from partial (usually about 30 degrees) to complete (360 degrees) coverage of the scene. Therefore, it is possible to output stereoscopic views suitable for viewing with a 3D display or other 3D methods. Systems with more physical cameras can capture images with more coverage of the viewable scene, however, it is likely that certain regions will always be occluded from any viewpoint. A larger number of cameras should make it possible to obtain high quality output because less interpolation is needed.

More cameras mean that efficient coding of the Multiview Video is required. This may not be such a big disadvantage as there are representations that can remove the redundancy in MVV; such as inter view coding using MPEG-4 or Multiview Video Coding, the ray space representation, geometry videos, [3] [ clarification needed ] etc.

In terms of hardware, the user requires a viewing device that can decode MVV and synthesize new viewpoints, and a 2D or 3D display.

Standardization

The Moving Picture Experts Group (MPEG) has normalized Annex H of MPEG-4 AVC in March 2009 called Multiview Video Coding after the work of a group called '3DAV' (3D Audio and Visual) headed by Aljoscha Smolic [4] at the Heinrich-Hertz Institute.

See also

Related Research Articles

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do.

Digital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images with analog signals. Digital video comprises a series of digital images displayed in rapid succession.

Stereoscopy Technique for creating or enhancing the illusion of depth in an image

Stereoscopy is a technique for creating or enhancing the illusion of depth in an image by means of stereopsis for binocular vision. The word stereoscopy derives from Greek στερεός (stereos) 'firm, solid', and σκοπέω (skopeō) 'to look, to see'. Any stereoscopic image is called a stereogram. Originally, stereogram referred to a pair of stereo images which could be viewed using a stereoscope.

Bullet time is a visual effect or visual impression of detaching the time and space of a camera from those of its visible subject. It is a depth enhanced simulation of variable-speed action and performance found in films, broadcast advertisements, and realtime graphics within video games and other special media. It is characterized both by its extreme transformation of time and space. This is almost impossible with conventional slow motion, as the physical camera would have to move implausibly fast; the concept implies that only a "virtual camera", often illustrated within the confines of a computer-generated environment such as a virtual world or virtual reality, would be capable of "filming" bullet-time types of moments. Technical and historical variations of this effect have been referred to as time slicing, view morphing, temps mort and virtual cinematography.

Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, Advanced Video Coding, is a video compression standard based on block-oriented, motion-compensated integer-DCT coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports resolutions up to and including 8K UHD.

Hidden-surface determination Visibility in 3D computer graphics

In 3D computer graphics, hidden-surface determination is the process of identifying what surfaces and parts of surfaces can be seen from a particular viewing angle. A hidden-surface determination algorithm is a solution to the visibility problem, which was one of the first major problems in the field of 3D computer graphics. The process of hidden-surface determination is sometimes called hiding, and such an algorithm is sometimes called a hider. When referring to line rendering it is known as hidden-line removal. Hidden-surface determination is necessary to render a scene correctly, so that one may not view features hidden behind the model itself, allowing only the naturally viewable portion of the graphic to be visible.

The light field is a vector function that describes the amount of light flowing in every direction through every point in space. The space of all possible light rays is given by the five-dimensional plenoptic function, and the magnitude of each ray is given by the radiance. Michael Faraday was the first to propose that light should be interpreted as a field, much like the magnetic fields on which he had been working for several years. The phrase light field was coined by Andrey Gershun in a classic paper on the radiometric properties of light in three-dimensional space (1936).

A volumetric display device is a graphic display device that forms a visual representation of an object in three physical dimensions, as opposed to the planar image of traditional screens that simulate depth through a number of different visual effects. One definition offered by pioneers in the field is that volumetric displays create 3D imagery via the emission, scattering, or relaying of illumination from well-defined regions in (x,y,z) space.

Non-photorealistic rendering

Non-photorealistic rendering (NPR) is an area of computer graphics that focuses on enabling a wide variety of expressive styles for digital art, in contrast to traditional computer graphics, which focuses on photorealism. NPR is inspired by other artistic modes such as painting, drawing, technical illustration, and animated cartoons. NPR has appeared in movies and video games in the form of cel-shaded animation as well as in scientific visualization, architectural illustration and experimental animation.

Real-time computer graphics

Real-time computer graphics or real-time rendering is the sub-field of computer graphics focused on producing and analyzing images in real time. The term can refer to anything from rendering an application's graphical user interface (GUI) to real-time image analysis, but is most often used in reference to interactive 3D computer graphics, typically using a graphics processing unit (GPU). One example of this concept is a video game that rapidly renders changing 3D environments to produce an illusion of motion.

Autodesk Softimage Discontinued 3D graphics software

Autodesk Softimage, or simply Softimage is a discontinued 3D computer graphics application, for producing 3D computer graphics, 3D modeling, and computer animation. Now owned by Autodesk and formerly titled Softimage|XSI, the software has been predominantly used in the film, video game, and advertising industries for creating computer generated characters, objects, and environments.

Clipping,in the context of Computer graphics, is a method to selectively enable or disable rendering operations within a defined region of interest. Mathematically, clipping can be described using the terminology of constructive geometry. A rendering algorithm only draws pixels in the intersection between the clip region and the scene model. Lines and surfaces outside the view volume are removed.

3D computer graphics Graphics that use a three-dimensional representation of geometric data

3D computer graphics, or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data that is stored in the computer for the purposes of performing calculations and rendering 2D images. The resulting images may be stored for viewing later or displayed in real time. Unlike 3D film and similar techniques, the result is two-dimensional, without the illusion of being solid.

2D-plus-depth

2D-plus-Depth is a stereoscopic video coding format that is used for 3D displays, such as Philips WOWvx. Philips discontinued work on the WOWvx line in 2009, citing "current market developments". Currently, this Philips technology is used by SeeCubic company, led by former key 3D engineers and scientists of Philips. They offer autostereoscopic 3D displays which use the 2D-plus-Depth format for 3D video input.

Multiview Video Coding is a stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. It uses the 2D plus Delta method and is an amendment to the H.264 video compression standard, developed jointly by MPEG and VCEG, with contributions from a number of companies, primarily Panasonic and LG Electronics.

2D Plus Delta is a method of encoding 3D image listed as a part of MPEG2 and MPEG4 standards, specifically on the H.264 implementation of Multiview Video Coding extension. This technology originally started as a proprietary method for Stereoscopic Video Coding and content deployment that utilizes the Left or Right channel as the 2D version and the optimized difference or disparity (Delta) between that image channel view and a second eye image view is injected into the videostream as user_data, secondary stream, independent stream, enhancement layer or NALu for deployment. The Delta data can be either a spatial stereo disparity, temporal predictive, bidirectional or optimized motion compensation.

A variety of computer graphic techniques have been used to display video game content throughout the history of video games. The predominance of individual techniques have evolved over time, primarily due to hardware advances and restrictions such as the processing power of central or graphics processing units.

DVB 3D-TV

DVB 3D-TV is a new standard that partially came out at the end of 2010 which included techniques and procedures to send a three-dimensional video signal through actual DVB transmission standards. Currently there is a commercial requirement text for 3D TV broadcasters and Set-top box manufacturers, but no technical information is in there.

This is a glossary of terms relating to computer graphics.

Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting.

References

  1. Tanimoto, Masayuki. "FTV (free-viewpoint television)." APSIPA Transactions on Signal and Information Processing 1 (2012).
  2. "Real Time Racing | Automotive News". Diseno-art.com. 2009-11-02. Archived from the original on 2011-04-17. Retrieved 2010-09-13.
  3. "Geometry videos: A new representation for 3D animations" . Retrieved 2016-12-08.
  4. "Joschi's Home (Aljoscha Smolic)". Iphome.hhi.de. Archived from the original on 2009-11-26. Retrieved 2010-09-13.

Bibliography