2D to 3D conversion

Last updated
2D to 3D conversion
Process typedigital and print
Industrial sector(s)Film and television, print production
Main technologies or sub-processesComputer software
Product(s)Movies, television shows, social media, printed images

2D to 3D video conversion (also called 2D to stereo 3D conversion and stereo conversion) is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.

Contents

Overview

2D-to-3D conversion adds the binocular disparity depth cue to digital images perceived by the brain, thus, if done properly, greatly improving the immersive effect while viewing stereo video in comparison to 2D video. However, in order to be successful, the conversion should be done with sufficient accuracy and correctness: the quality of the original 2D images should not deteriorate, and the introduced disparity cue should not contradict other cues used by the brain for depth perception. If done properly and thoroughly, the conversion produces stereo video of similar quality to "native" stereo video which is shot in stereo and accurately adjusted and aligned in post-production. [1]

Two approaches to stereo conversion can be loosely defined: quality semiautomatic conversion for cinema and high quality 3DTV, and low-quality automatic conversion for cheap 3DTV, VOD and similar applications.

Re-rendering of computer animated films

Computer animated 2D films made with 3D models can be re-rendered in stereoscopic 3D by adding a second virtual camera if the original data is still available. This is technically not a conversion; therefore, such re-rendered films have the same quality as films originally produced in stereoscopic 3D. Examples of this technique include the re-release of Toy Story and Toy Story 2 . Revisiting the original computer data for the two films took four months, as well as an additional six months to add the 3D. [2] However, not all CGI films are re-rendered for the 3D re-release because of the costs, time required, lack of skilled resources or missing computer data.

Importance and applicability

With the increase of films released in 3D, 2D to 3D conversion has become more common. The majority of non-CGI stereo 3D blockbusters are converted fully or at least partially from 2D footage. Even Avatar , notable for its extensive stereo filming, contains several scenes shot in 2D and converted to stereo in post-production. [3] Reasons for shooting in 2D instead of stereo can be financial, technical and sometimes artistic: [1] [4]

Even in the case of stereo shooting, conversion can frequently be necessary. Besides hard-to-shoot scenes, there can be mismatches in stereo views that are too big to adjust, and it is simpler to perform 2D to stereo conversion, treating one of the stereo views as the original 2D source.

General problems

Without respect to particular algorithms, all conversion workflows should solve the following tasks: [4] [5]

  1. Allocation of "depth budget" – defining the range of permitted disparity or depth, what depth value corresponds to the screen position (so-called "convergence point" position), the permitted distance ranges for out-of-the-screen effects and behind-the-screen background objects. If an object in stereo pair is in exactly the same spot for both eyes, then it will appear on the screen surface and it will be in zero parallax. Objects in front of the screen are said to be in negative parallax, and background imagery behind the screen is in positive parallax. There are the corresponding negative or positive offsets in object positions for left and right eye images.
  2. Control of comfortable disparity depending on scene type and motion – too much parallax or conflicting depth cues may cause eye-strain and nausea effects
  3. Filling of uncovered areas – left or right view images show a scene from a different angle, and parts of objects or entire objects covered by the foreground in the original 2D image should become visible in a stereo pair. Sometimes the background surfaces are known or can be estimated, so they should be used for filling uncovered areas. Otherwise the unknown areas must be filled in by an artist or inpainted, since the exact reconstruction is not possible.

High quality conversion methods should also deal with many typical problems including:

Quality semiautomatic conversion

Depth-based conversion

Most semiautomatic methods of stereo conversion use depth maps and depth-image-based rendering. [4] [5]

The idea is that a separate auxiliary picture known as the "depth map" is created for each frame or for a series of homogenous frames to indicate depths of objects present in the scene. The depth map is a separate grayscale image having the same dimensions as the original 2D image, with various shades of gray to indicate the depth of every part of the frame. While depth mapping can produce a fairly potent illusion of 3D objects in the video, it inherently does not support semi-transparent objects or areas, nor does it represent occluded surfaces; to emphasize this limitation, depth-based 3D representations are often explicitly referred to as 2.5D. [6] [7] These and other similar issues should be dealt with via a separate method. [6] [8] [9]

An example of depth map 2D plus depth.png
An example of depth map
Generating and reconstructing 3D shapes from single or multi-view depth maps or silhouettes Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks.png
Generating and reconstructing 3D shapes from single or multi-view depth maps or silhouettes

The major steps of depth-based conversion methods are:

  1. Depth budget allocation – how much total depth in the scene and where the screen plane will be.
  2. Image segmentation, creation of mattes or masks, usually by rotoscoping. Each important surface should be isolated. The level of detail depends on the required conversion quality and budget.
  3. Depth map creation. Each isolated surface should be assigned a depth map. The separate depth maps should be composed into a scene depth map. This is an iterative process requiring adjustment of objects, shapes, depth, and visualization of intermediate results in stereo. Depth micro-relief, 3D shape is added to most important surfaces to prevent the "cardboard" effect when stereo imagery looks like a combination of flat images just set at different depths.
  4. Stereo generation based on 2D+Depth with any supplemental information like clean plates, restored background, transparency maps, etc. When the process is complete, a left and right image will have been created. Usually the original 2D image is treated as the center image, so that two stereo views are generated. However, some methods propose to use the original image as one eye's image and to generate only the other eye's image to minimize the conversion cost. [4] During stereo generation, pixels of the original image are shifted to the left or to the right depending on depth map, maximum selected parallax, and screen surface position.
  5. Reconstruction and painting of any uncovered areas not filled by the stereo generator.

Stereo can be presented in any format for preview purposes, including anaglyph.

Time-consuming steps are image segmentation/rotoscoping, depth map creation and uncovered area filling. The latter is especially important for the highest quality conversion.

There are various automation techniques for depth map creation and background reconstruction. For example, automatic depth estimation can be used to generate initial depth maps for certain frames and shots. [11]

People engaged in such work may be called depth artists. [12]

Multi-layering

A development on depth mapping, multi-layering works around the limitations of depth mapping by introducing several layers of grayscale depth masks to implement limited semi-transparency. Similar to a simple technique, [13] multi-layering involves applying a depth map to more than one "slice" of the flat image, resulting in a much better approximation of depth and protrusion. The more layers are processed separately per frame, the higher the quality of 3D illusion tends to be.

Other approaches

3D reconstruction and re-projection may be used for stereo conversion. It involves scene 3D model creation, extraction of original image surfaces as textures for 3D objects and, finally, rendering the 3D scene from two virtual cameras to acquire stereo video. The approach works well enough in case of scenes with static rigid objects like urban shots with buildings, interior shots, but has problems with non-rigid bodies and soft fuzzy edges. [3]

Another method is to set up both left and right virtual cameras, both offset from the original camera but splitting the offset difference, then painting out occlusion edges of isolated objects and characters. Essentially clean-plating several background, mid ground and foreground elements.

Binocular disparity can also be derived from simple geometry. [14]

Automatic conversion

Depth from motion

It is possible to automatically estimate depth using different types of motion. In case of camera motion, a depth map of the entire scene can be calculated. Also, object motion can be detected and moving areas can be assigned with smaller depth values than the background. Occlusions provide information on relative position of moving surfaces. [15] [16]

Depth from focus

Approaches of this type are also called "depth from defocus" and "depth from blur". [15] [17] On "depth from defocus" (DFD) approaches, the depth information is estimated based on the amount of blur of the considered object, whereas "depth from focus" (DFF) approaches tend to compare the sharpness of an object over a range of images taken with different focus distances in order to find out its distance to the camera. DFD only needs two or three at different focus to properly work, whereas DFF needs 10 to 15 images at least but is more accurate than the previous method.

If the sky is detected in the processed image, it can also be taken into account that more distant objects, besides being hazy, should be more desaturated and more bluish because of a thick air layer. [17]

Depth from perspective

The idea of the method is based on the fact that parallel lines, such as railroad tracks and roadsides, appear to converge with distance, eventually reaching a vanishing point at the horizon. Finding this vanishing point gives the farthest point of the whole image. [15] [17]

The more the lines converge, the farther away they appear to be. So, for depth map, the area between two neighboring vanishing lines can be approximated with a gradient plane.

Conversion artifacts

3D quality metrics

PQM

PQM [18] mimic the HVS as the results obtained aligns very closely to the Mean Opinion Score (MOS) obtained from subjective tests. The PQM quantifies the distortion in the luminance, and contrast distortion using an approximation (variances) weighted by the mean of each pixel block to obtain the distortion in an image. This distortion is subtracted from 1 to obtain the objective quality score.

HV3D

HV3D [19] quality metric has been designed having the human visual 3D perception in mind. It takes into account the quality of the individual right and left views, the quality of the cyclopean view (the fusion of the right and left view, what the viewer perceives), as well as the quality of the depth information.

VQMT3D

The VQMT3D project [20] includes several developed metrics for evaluating the quality of 2D to 3D conversion based on the cardboard effect, edge-sharpness mismatch, stuck-to-background objects, and comparison with the 2D version.

See also

Related Research Articles

<span class="mw-page-title-main">Stereoscopy</span> Technique for creating or enhancing the illusion of depth in an image

Stereoscopy is a technique for creating or enhancing the illusion of depth in an image by means of stereopsis for binocular vision. The word stereoscopy derives from Greek στερεός (stereos) 'firm, solid' and σκοπέω (skopeō) 'to look, to see'. Any stereoscopic image is called a stereogram. Originally, stereogram referred to a pair of stereo images which could be viewed using a stereoscope.

3D films are motion pictures made to give an illusion of three-dimensional solidity, usually with the help of special glasses worn by viewers. They have existed in some form since 1915, but had been largely relegated to a niche in the motion picture industry because of the costly hardware and processes required to produce and display a 3D film, and the lack of a standardized format for all segments of the entertainment business. Nonetheless, 3D films were prominently featured in the 1950s in American cinema, and later experienced a worldwide resurgence in the 1980s and 1990s driven by IMAX high-end theaters and Disney-themed venues. 3D films became increasingly successful throughout the 2000s, peaking with the success of 3D presentations of Avatar in December 2009, after which 3D films again decreased in popularity. Certain directors have also taken more experimental approaches to 3D filmmaking, most notably celebrated auteur Jean-Luc Godard in his film Goodbye to Language.

<span class="mw-page-title-main">Hidden-surface determination</span> Visibility in 3D computer graphics

In 3D computer graphics, hidden-surface determination is the process of identifying what surfaces and parts of surfaces can be seen from a particular viewing angle. A hidden-surface determination algorithm is a solution to the visibility problem, which was one of the first major problems in the field of 3D computer graphics. The process of hidden-surface determination is sometimes called hiding, and such an algorithm is sometimes called a hider. When referring to line rendering it is known as hidden-line removal. Hidden-surface determination is necessary to render a scene correctly, so that one may not view features hidden behind the model itself, allowing only the naturally viewable portion of the graphic to be visible.

<span class="mw-page-title-main">Autostereogram</span> Visual illusion of 3D scene achieved by unfocusing eyes when viewing specific 2D images

An autostereogram is a two-dimensional (2D) image that can create the optical illusion of a three-dimensional (3D) scene. Autostereograms use only one image to accomplish the effect while normal stereograms require two. The 3D scene in an autostereogram is often unrecognizable until it is viewed properly, unlike typical stereograms. Viewing any kind of stereogram properly may cause the viewer to experience vergence-accommodation conflict.

2.5D perspective refers to gameplay or movement in a video game or virtual reality environment that is restricted to a two-dimensional (2D) plane with little or no access to a third dimension in a space that otherwise appears to be three-dimensional and is often simulated and rendered in a 3D digital environment.

<span class="mw-page-title-main">Anaglyph 3D</span> Method of representing images in 3D

Anaglyph 3D is the stereoscopic 3D effect achieved by means of encoding each eye's image using filters of different colors, typically red and cyan. Anaglyph 3D images contain two differently filtered colored images, one for each eye. When viewed through the "color-coded" "anaglyph glasses", each of the two images reaches the eye it's intended for, revealing an integrated stereoscopic image. The visual cortex of the brain fuses this into the perception of a three-dimensional scene or composition.

Clipping, in the context of computer graphics, is a method to selectively enable or disable rendering operations within a defined region of interest. Mathematically, clipping can be described using the terminology of constructive geometry. A rendering algorithm only draws pixels in the intersection between the clip region and the scene model. Lines and surfaces outside the view volume are removed.

The stereo cameras approach is a method of distilling a noisy video signal into a coherent data set that a computer can begin to process into actionable symbolic objects, or abstractions. Stereo cameras is one of many approaches used in the broader fields of computer vision and machine vision.

The term post-processing is used in the video and film industry for quality-improvement image processing methods used in video playback devices, such as stand-alone DVD-Video players; video playing software; and transcoding software. It is also commonly used in real-time 3D rendering to add additional effects.

Range imaging is the name for a collection of techniques that are used to produce a 2D image showing the distance to points in a scene from a specific point, normally associated with some type of sensor device.

<span class="mw-page-title-main">2D-plus-depth</span> Stereoscopic video coding format

2D-plus-Depth is a stereoscopic video coding format that is used for 3D displays, such as Philips WOWvx. Philips discontinued work on the WOWvx line in 2009, citing "current market developments". Currently, this Philips technology is used by SeeCubic company, led by former key 3D engineers and scientists of Philips. They offer autostereoscopic 3D displays which use the 2D-plus-Depth format for 3D video input.

Screen space ambient occlusion (SSAO) is a computer graphics technique for efficiently approximating the ambient occlusion effect in real time. It was developed by Vladimir Kajalin while working at Crytek and was used for the first time in 2007 by the video game Crysis, also developed by Crytek.

<span class="mw-page-title-main">3D reconstruction</span> Process of capturing the shape and appearance of real objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

<span class="mw-page-title-main">3D television</span> Television that conveys depth perception to the viewer

3D television (3DTV) is television that conveys depth perception to the viewer by employing techniques such as stereoscopic display, multi-view display, 2D-plus-depth, or any other form of 3D display. Most modern 3D television sets use an active shutter 3D system or a polarized 3D system, and some are autostereoscopic without the need of glasses. As of 2017, most 3D TV sets and services are no longer available from manufacturers.

A variety of computer graphic techniques have been used to display video game content throughout the history of video games. The predominance of individual techniques have evolved over time, primarily due to hardware advances and restrictions such as the processing power of central or graphics processing units.

<span class="mw-page-title-main">DVB 3D-TV</span>

DVB 3D-TV is a new standard that partially came out at the end of 2010 which included techniques and procedures to send a three-dimensional video signal through actual DVB transmission standards. Currently there is a commercial requirement text for 3D TV broadcasters and Set-top box manufacturers, but no technical information is in there.

<span class="mw-page-title-main">Wiggle stereoscopy</span> 3-D image display method

Wiggle stereoscopy is an example of stereoscopy in which left and right images of a stereogram are animated. This technique is also called wiggle 3-D, wobble 3-D, wigglegram, or sometimes Piku-Piku.

<span class="mw-page-title-main">3D reconstruction from multiple images</span> Creation of a 3D model from a set of images

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

<span class="mw-page-title-main">Stereo photography techniques</span>

Stereo photography techniques are methods to produce stereoscopic images, videos and films. This is done with a variety of equipment including special built stereo cameras, single cameras with or without special attachments, and paired cameras. This involves traditional film cameras as well as, tape and modern digital cameras. A number of specialized techniques are employed to produce different kinds of stereo images.

This is a glossary of terms relating to computer graphics.

References

  1. 1 2 Barry Sandrew. "2D – 3D Conversion Can Be Better Than Native 3D"
  2. Murphy, Mekado (October 1, 2009). "Buzz and Woody Add a Dimension". The New York Times. Retrieved February 18, 2010.
  3. 1 2 Seymour, Mike (2012-05-08). "Art of Stereo Conversion: 2D to 3D – 2012". fxguide . Retrieved 2024-07-11.
  4. 1 2 3 4 Scott Squires. 2D to 3D Conversions
  5. 1 2 Jon Karafin. State-of-the-Art 2D to 3D Conversion and Stereo VFX Archived 2012-04-26 at the Wayback Machine International 3D Society University. Presentation from the October 21, 2011 3DU-Japan event in Tokyo.
  6. 1 2 Wu, Jiajun; et al. (2017). MarrNet: 3D Shape Reconstruction via 2.5D Sketches (PDF). Conference on Neural Information Processing Systems (NeurIPS). pp. 540–550.
  7. Tateno, Keisuke; et al. (2016). When 2.5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM (PDF). IEEE International Conference on Robotics and Automation (ICRA). pp. 2295–2302.
  8. Rock, Jason; et al. (2015). Completing 3D Object Shape from One Depth Image (PDF). IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2484–2493.
  9. Shin, Daeyun; et al. (2019). 3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers (PDF). IEEE International Conference on Computer Vision (ICCV). pp. 2172–2182.
  10. "Soltani, A. A., Huang, H., Wu, J., Kulkarni, T. D., & Tenenbaum, J. B. Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1511-1519)". GitHub . 2019-07-11.
  11. YUVsoft. 2D–to–Stereo 3D Conversion Process
  12. Mike Eisenberg (31 October 2011). "Interview with 3D Artist Adam Hlavac". Screen Rant. Retrieved 28 December 2015.
  13. Cutler, James. "Masking Multiple Layers in Adobe Photoshop". Archived from the original on January 18, 2012.
  14. Converting a 2D picture to a 3D Lenticular Print
  15. 1 2 3 Dr. Lai-Man Po. Automatic 2D-to-3D Video Conversion Techniques for 3DTV Department of Electronic Engineering, City University of Hong Kong. 13 April 2010
  16. Automatic 2D to 2D-plus-Depth conversion sample for a camera motion scene
  17. 1 2 3 Qingqing We. "Converting 2D to 3D: A Survey" (PDF). Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology. Archived from the original (PDF) on 2012-04-15.
  18. Joveluro, P.; Malekmohamadi, H.; Fernando, W. A. C; Kondoz, A. M. (2010). "Perceptual Video Quality Metric for 3D video quality assessment". 2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video. IEEE. pp. 1–4. doi:10.1109/3dtv.2010.5506331. ISBN   978-1-4244-6377-0.
  19. Banitalebi-Dehkordi, Amin; Pourazad, Mahsa T.; Nasiopoulos, Panos (2013). "3D video quality metric for 3D video compression". Ivmsp 2013. IEEE. pp. 1–4. arXiv: 1803.04629 . doi:10.1109/ivmspw.2013.6611930. ISBN   978-1-4673-5858-3.
  20. VQMT3D

Sources