2D-plus-Depth is a stereoscopic video coding format that is used for 3D displays, such as Philips WOWvx. Philips discontinued work on the WOWvx line in 2009, citing "current market developments". [1] Currently, this Philips technology is used by SeeCubic company, led by former key 3D engineers and scientists of Philips. They offer autostereoscopic 3D displays which use the 2D-plus-Depth format for 3D video input. [2]
The 2D-plus-Depth format is described in a Philips' white paper [3] and articles. [4]
Each 2D image frame is supplemented with a greyscale depth map which indicates if a specific pixel in the 2D image needs to be shown in front of the display (white) or behind the screen plane (black). The 256 greyscales can build a smooth gradient of depth within the image. Processing within the monitor used this input to render the multiview images.
Supported by various companies across the display industry, 2D-plus-Depth has been standardized in MPEG as an extension for 3D filed under ISO/IEC FDIS 23002-3:2007(E). [5]
There is also an extension on the 2D-plus-Depth format called the WOWvx Declipse format. It is described in the same Philips' white paper "3D Interface Specifications". In this advanced format, two more planes are added to the original 2D image and its depth map for each frame: the background areas covered by foreground objects and their respective depth map. So, each frame in the Declipse format is described with an image containing four parts, or quadrants. This extension improves potential visual quality by providing data for more correct and precise filling of the uncovered occlusion areas created by shifting foreground objects during the multiview generation process.
2D-plus-Depth has the advantage that it has a limited bandwidth increase compared to 2D (compressed greyscale increases bandwidth 5–20%) so that it can be used in existing distribution infrastructures.
2D-plus-Depth offers flexibility and compatibility with existing production equipment and compression tools. [6] [7]
It allows applications to use different 3D display screen sizes and designs in the same system.
Another advantage is that depth maps are created in the course of 2D-to-stereo 3D conversion using almost any approach to this video transformation. [8] That is why there is the respective 2D-plus-Depth representation of a converted stereo footage in almost all cases. Considering the lack of 3D content shot in stereo and the number of converted 3D films, this is a big benefit.
2D-plus-Depth is not compatible with existing 2D or 3D-Ready displays. The format has been criticized due to the limited amount of depth that can be displayed in an 8-bit greyscale.
2d-plus-Depth cannot handle transparency (semi-transparent objects in the scene) and occlusion (an object blocking the view of another). The 2d plus DOT format takes these factors into account. [9] Additionally, it cannot handle reflection, refraction (beyond simple transparency) and other optical phenomena.
Creation of accurate 2D-plus-Depth can be costly and difficult, though recent advances in range imaging have made this process more accessible. [10] [11]
2d-plus-Depth lacks the potential increase in resolution of using two complete images.
Depth cannot be reliably estimated for a monocular video in most cases. Notable exceptions are camera motion scenes when object motion is static or almost absent, and landscape scenes when depth map can be approximated well enough with a gradient. This allows automatic depth estimation. [12] [13] In general case only semi-automatic approach is viable for 2D to 2D-plus-Depth conversion. Philips developed a 3D content creation software suite named BlueBox [14] which includes semi-automated conversion of 2D content into 2D-plus-Depth format and automatic generation of 2D-plus-Depth from stereo. A similar semiautomatic approach to high quality 2D to 2D-plus-Depth conversion is implemented in YUVsoft's 2D to 3D Suite, available as a set of plugins for After Effects and NUKE video compositing software. [15]
Stereoscopic to 2D-plus-Depth conversion involves several algorithms including scene change detection, segmentation, motion estimation and image matching. Automatic stereo to 2D+Depth conversion is now possible due to new high performance software and GPU technology, even in live real-time mode. [16]
Other 3D formats are stereo (left-right or alternating frames) and multiview 3D format (such as Multiview Video Coding and Scalable Video Coding), and 2D plus Delta.
Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
Stereoscopy is a technique for creating or enhancing the illusion of depth in an image by means of stereopsis for binocular vision. The word stereoscopy derives from Greek στερεός (stereos) 'firm, solid' and σκοπέω (skopeō) 'to look, to see'. Any stereoscopic image is called a stereogram. Originally, stereogram referred to a pair of stereo images which could be viewed using a stereoscope.
Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. It supports a maximum resolution of 8K UHD.
A volumetric display device is a display device that forms a visual representation of an object in three physical dimensions, as opposed to the planar image of traditional screens that simulate depth through a number of different visual effects. One definition offered by pioneers in the field is that volumetric displays create 3D imagery via the emission, scattering, or relaying of illumination from well-defined regions in (x,y,z) space.
Anaglyph 3D is the stereoscopic 3D effect achieved by means of encoding each eye's image using filters of different colors, typically red and cyan. Anaglyph 3D images contain two differently filtered colored images, one for each eye. When viewed through the "color-coded" "anaglyph glasses", each of the two images is visible to the eye it is intended for, revealing an integrated stereoscopic image. The visual cortex of the brain fuses this into the perception of a three-dimensional scene or composition.
3D scanning is the process of analyzing a real-world object or environment to collect three dimensional data of its shape and possibly its appearance. The collected data can then be used to construct digital 3D models.
Autostereoscopy is any method of displaying stereoscopic images without the use of special headgear, glasses, something that affects vision, or anything for eyes on the part of the viewer. Because headgear is not required, it is also called "glasses-free 3D" or "glassesless 3D".
3D computer graphics, sometimes called CGI, 3-D-CGI or three-dimensional computer graphics, are graphics that use a three-dimensional representation of geometric data that is stored in the computer for the purposes of performing calculations and rendering digital images, usually 2D images but sometimes 3D images. The resulting images may be stored for viewing later or displayed in real time.
3DMLW is a discontinued open-source project, and a XML-based Markup Language for representing interactive 3D and 2D content on the World Wide Web.
3D television (3DTV) is television that conveys depth perception to the viewer by employing techniques such as stereoscopic display, multi-view display, 2D-plus-depth, or any other form of 3D display. Most modern 3D television sets use an active shutter 3D system or a polarized 3D system, and some are autostereoscopic without the need of glasses. As of 2017, most 3D TV sets and services are no longer available from manufacturers.
Multi View Video Coding is a stereoscopic video coding standard for video compression that allows for encoding video sequences captured simultaneously from multiple camera angles in a single video stream. It uses the 2D plus Delta method and it is an amendment to the H.264 video compression standard, developed jointly by MPEG and VCEG, with the contributions from a number of companies, such as Panasonic and LG Electronics.
2D Plus Delta is a method of encoding a 3D image and is listed as a part of MPEG2 and MPEG4 standards, specifically on the H.264 implementation of the Multiview Video Coding extension. This technology originally started as a proprietary method for Stereoscopic Video Coding and content deployment that utilizes the left or right channel as the 2D version and the optimized difference or disparity (Delta) between that image channel view and a second eye image view is injected into the video stream as user data, secondary stream, independent stream, enhancement layer or NALu for deployment. The Delta data can be either a spatial stereo disparity, temporal predictive, bidirectional, or optimized motion compensation.
The Fujifilm FinePix Real 3D W series is a line of consumer-grade digital cameras designed to capture stereoscopic images that recreate the perception of 3D depth, having both still and video formats while retaining standard 2D still image and video modes. The cameras feature a pair of lenses, and an autostereoscopic display which directs pixels of the two offset images to the user's left and right eyes simultaneously. Methods are included for extending or contracting the stereoscopic baseline, albeit with an asynchronous timer or manually depressing the shutter twice. The dual-lens architecture also enables novel modes such as simultaneous near and far zoom capture of a 2D image. The remainder of the camera is similar to other compact digital cameras.
iClone is a real-time 3D animation and rendering software program. Real-time playback is enabled by using a 3D videogame engine for instant on-screen rendering.
Nvidia 3D Vision is a discontinued stereoscopic gaming kit from Nvidia which consists of LC shutter glasses and driver software which enables stereoscopic vision for any Direct3D game, with various degrees of compatibility. There have been many examples of shutter glasses. Electrically controlled mechanical shutter glasses date back to the middle of the 20th century. LCD shutter glasses appeared in the 1980s, one example of which is Sega's SegaScope. This was available for Sega's game console, the Master System. The NVIDIA 3D Vision gaming kit introduced in 2008 made this technology available for mainstream consumers and PC gamers.
DVB 3D-TV is a deprecated standard that partially came out at the end of 2010 which included techniques and procedures to send a three-dimensional video signal through actual DVB transmission standards. There was a commercial requirement text for 3D TV broadcasters and set-top box manufacturers, but no technical information was in there.
Wiggle stereoscopy is an example of stereoscopy in which left and right images of a stereogram are animated. This technique is also called wiggle 3-D, wobble 3-D, wigglegram, or sometimes Piku-Piku.
2D to 3D video conversion is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.
3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.
Video matting is a technique for separating the video into two or more layers, usually foreground and background, and generating alpha mattes which determine blending of the layers. The technique is very popular in video editing because it allows to substitute the background, or process the layers individually.