Deinterlacing

Last updated

Deinterlacing is the process of converting interlaced video into a non-interlaced or progressive form. Interlaced video signals are commonly found in analog television, digital television (HDTV) when in the 1080i format, some DVD titles, and a smaller number of Blu-ray discs.

Contents

An interlaced video frame consists of two fields taken in sequence: the first containing all the odd lines of the image, and the second all the even lines. Analog television employed this technique because it allowed for less transmission bandwidth while keeping a high frame rate for smoother and more life-like motion. A non-interlaced (or progressive scan) signal that uses the same bandwidth only updates the display half as often and was found to create a perceived flicker or stutter. CRT-based displays were able to display interlaced video correctly due to their complete analog nature, blending in the alternating lines seamlessly. However, since the early 2000s, displays such as televisions and computer monitors have become almost entirely digital - in that the display is composed of discrete pixels - and on such displays the interlacing becomes noticeable and can appear as a distracting visual defect. The deinterlacing process should try to minimize these.

Deinterlacing is thus a necessary process and comes built-in to most modern DVD players, Blu-ray players, LCD/LED televisions, digital projectors, TV set-top boxes, professional broadcast equipment, and computer video players and editors - although each with varying levels of quality.

Deinterlacing has been researched for decades and employs complex processing algorithms; however, consistent results have been very hard to achieve. [1] [2]

Background

Example of interlaced video (slowed down) Interlace zoom.gif
Example of interlaced video (slowed down)

Both video and photographic film capture a series of frames (still images) in rapid succession; however, television systems read the captured image by serially scanning the image sensor by lines (rows). In analog television, each frame is divided into two consecutive fields, one containing all even lines, another with the odd lines. The fields are captured in succession at a rate twice that of the nominal frame rate. For instance, PAL and SECAM systems have a rate of 25 frames/sec or 50 fields/sec, while the NTSC system delivers 29.97 frames/sec or 59.94 fields/sec. This process of dividing frames into half-resolution fields at double the frame rate is known as interlacing.

Since the interlaced signal contains the two fields of a video frame shot at two different times, it enhances motion perception to the viewer and reduces flicker by taking advantage of the persistence of vision effect. This results in an effective doubling of time resolution as compared with non-interlaced footage (for frame rates equal to field rates). However, interlaced signal requires a display that is natively capable of showing the individual fields in a sequential order, and only traditional CRT-based TV sets are capable of displaying interlaced signal, due to the electronic scanning and lack of apparent fixed resolution.

Most modern displays, such as LCD, DLP and plasma displays, are not able to work in interlaced mode, because they are fixed-resolution displays and only support progressive scanning. In order to display interlaced signal on such displays, the two interlaced fields must be converted to one progressive frame with a process known as de-interlacing. However, when the two fields taken at different points in time are re-combined to a full frame displayed at once, visual defects called interlace artifacts or combing occur with moving objects in the image. A good deinterlacing algorithm should try to avoid interlacing artifacts as much as possible and not sacrifice image quality in the process, which is hard to achieve consistently. There are several techniques available that extrapolate the missing picture information, however they rather fall into the category of intelligent frame creation and require complex algorithms and substantial processing power.

Deinterlacing techniques require complex processing and thus can introduce a delay into the video feed. While not generally noticeable, this can result in the display of older video games lagging behind controller input. Many TVs thus have a "game mode" in which minimal processing is done in order to maximize speed at the expense of image quality. Deinterlacing is only partly responsible for such lag; scaling also involves complex algorithms that take milliseconds to run.

Progressive source material

Some interlaced video may have been originally created from progressive footage, and the deinterlacing process should consider this as well.

Typical movie material is shot on 24 frames/s film. Converting film to interlaced video typically uses a process called telecine whereby each frame is converted to multiple fields. In some cases, each film frame can be presented by exactly two progressive segmented frames (PsF), and in this format it does not require a complex deinterlacing algorithm because each field contains a part of the very same progressive frame. However, to match 50 field interlaced PAL/SECAM or 59.94/60 field interlaced NTSC signal, frame rate conversion is necessary using various "pulldown" techniques. Most advanced TV sets can restore the original 24 frame/s signal using an inverse telecine process. Another option is to speed up 24-frame film by 4% (to 25 frames/s) for PAL/SECAM conversion; this method is still widely used for DVDs, as well as television broadcasts (SD & HD) in the PAL markets.

DVDs can either encode movies using one of these methods, or store original 24 frame/s progressive video and use MPEG-2 decoder tags to instruct the video player on how to convert them to the interlaced format. Most movies on Blu-rays have preserved the original non interlaced 24 frame/s motion film rate and allow output in the progressive 1080p24 format directly to display devices, with no conversion necessary.

Some 1080i HDV camcorders also offer PsF mode with cinema-like frame rates of 24 or 25 frame/s. TV production crews can also use special film cameras which operate at 25 or 30 frame/s, where such material does not need framerate conversion for broadcasting in the intended video system format.

Deinterlacing methods

When someone watches interlaced video on a progressive monitor with poor deinterlacing, they can see "combing" in movement between two fields of one frame. Interlaced video frame (car wheel).jpg
When someone watches interlaced video on a progressive monitor with poor deinterlacing, they can see "combing" in movement between two fields of one frame.

Deinterlacing requires the display to buffer one or more fields and recombine them into full frames. In theory this would be as simple as capturing one field and combining it with the next field to be received, producing a single frame. However, the originally recorded signal was produced from two fields at different points in time, and without special processing any motion across the fields usually results in a "combing" effect where alternate lines are slightly displaced from each other.

There are various methods to deinterlace video, each producing different problems or artifacts of its own. Some methods are much cleaner in artifacts than other methods.

Most deinterlacing techniques fall under three broad groups:

  1. Field combination deinterlacing which takes the even and odd fields and combine them into one frame. This halves the perceived frame-rate (the temporal resolution) whereby 50i or 60i is converted to 25p or 30p.
  2. Field extension deinterlacing which takes each field (with only half the lines) and extend it to the entire screen to make a frame. This halves the vertical resolution of the image but maintains the original field-rate (50i or 60i is converted to 50p or 60p).
  3. Motion compensation deinterlacing which uses more advanced algorithms to detect motion across fields, switching techniques when necessary. This produces the best quality result, but requires the most processing power.

Modern deinterlacing systems therefore buffer several fields and use techniques like edge detection in an attempt to find the motion between the fields. This is then used to interpolate the missing lines from the original field, reducing the combing effect. [3]

Field combination deinterlacing

These methods take the even and odd fields and combine them into one frame. They retain the full vertical resolution at the expense of the temporal resolution (perceived frame-rate) whereby 50i/60i is converted to 24p/25p/30p which may lose the smooth, fluid feel of the original. However, if the interlaced signal was originally produced from a lower frame-rate source such as film, then no information is lost and these methods may suffice.

Weaving Image deinterlaced by weaving.jpg
Weaving
Blending Image deinterlaced by blending.jpg
Blending

Field extension deinterlacing

These methods take each field (with only half the lines) and extend it to the entire screen to make a frame. This may halve the vertical resolution of the image but aims to maintain the original field-rate (50i or 60i is converted to 50p or 60p).

Half-sizing Image deinterlaced by halfsizing.jpg
Half-sizing
Line doubling Image deinterlaced by line doubling.jpg
Line doubling

Line doubling is sometimes confused with deinterlacing in general, or with interpolation (image scaling) which uses spatial filtering to generate extra lines and hence reduce the visibility of pixelation on any type of display. [4] The terminology 'line doubler' is used more frequently in high end consumer electronics, while 'deinterlacing' is used more frequently in the computer and digital video arena.

Motion compensation deinterlacing

More advanced deinterlacing algorithms combine the traditional field combination methods (weaving and blending) and frame extension methods (bob or line doubling) to create a high quality progressive video sequence. One of the basic hints to the direction and amount of motion would be the direction and length of combing artifacts in the interlaced signal.

The best algorithms also try to predict the direction and the amount of image motion between subsequent fields in order to better blend the two fields together. They may employ algorithms similar to block motion compensation used in video compression. For example, if two fields had a person's face moving to the left, weaving would create combing, and blending would create ghosting. Advanced motion compensation (ideally) would see that the face in several fields is the same image, just moved to a different position, and would try to detect direction and amount of such motion. The algorithm would then try to reconstruct the full detail of the face in both output frames by combining the images together, moving parts of each field along the detected direction by the detected amount of movement. Deinterlacers that use this technique are often superior because they can use information from many fields, as opposed to just one or two, however they require powerful hardware to achieve this in real-time.

Motion compensation needs to be combined with scene change detection (which has its own challenges), otherwise it will attempt to find motion between two completely different scenes. A poorly implemented motion compensation algorithm would interfere with natural motion and could lead to visual artifacts which manifest as "jumping" parts in what should be a stationary or a smoothly moving image.

Quality Measurement

Different deinterlacing methods have different quality and speed characteristics.

Usually, to measure quality of deinterlacing method, the following approach is used:

  1. A set of progressive videos is composed
  2. All of these videos are interlaced
  3. Each of interlaced videos are deinterlaced with specific deinterlacing method
  4. All of deinterlaced videos are compared with the corresponding source video via objective video quality metric, such as PSNR, SSIM or VMAF.

The main speed measurement metric is frames per second (FPS) - how many frames deinterlacer is able to process per second. Talking about FPS, it is necessary to specify the resolution of all frames and hardware characteristics, because the speed of specific deinterlacing method significantly depends on these two factors.

Benchmarks

Deinterlacing Challenge 2019

This benchmark has compared 8 different deinterlacing methods on a synthetic video. There is a moving 3-dimensional Lissajous curve on the video in order to make it challenging for the modern deinterlacing methods. The authors used MSE and PSNR as objective metrics. Also, they measure processing speed in FPS. For some methods there is only visual comparison, for others - only objective. [5]

Another algorithms of Deinterlacing Challenge 2019
Algorithm MSE PSNR Processing speed
(FPS)
Open source
Vegas De-interlace Blend [6] 8.08643.5943.53No
Vegas De-interlace Interpolate [6] 16.42641.2923.58No

MSU Deinterlacer Benchmark

This benchmark has compared more than 20 methods on 40 video sequences. Total length of the sequences is 834 frames. Its authors state that the main feature of this benchmark is the comprehensive comparison of methods with visual comparison tools, performance plots and parameter tuning. Authors used PSNR and SSIM as objective metrics. [7]

Top algorithms of MSU DIB
Algorithm PSNR SSIM Processing speed
(FPS)
Open source
MSU Deinterlacer [8] 40.7080.9831.3No
VapourSynth TDeintMod [9] 39.9160.97750.29Yes
NNEDI [10] 39.6250.9781.91Yes
FFmpeg Bob Weaver Deinterlacing Filter [11] 39.6790.97646.45Yes
Vapoursynth EEDI3 [12] 39.3730.97751.9Yes
Real-Time Deep Video Deinterlacer [13] 39.2030.9760.27Yes

VapourSynth TDeintMod author states that it is bi-directional motion adaptive deinterlacer. NNEDI method uses a Neural Network to deinterlace video sequences. FFmpeg Bob Weaver Deinterlacing Filter is the part of well-known framework for video and audio processing. Vapoursynth EEDI3 is the abbreviation for "enhanced edge directed interpolation 3", authors of this method state that it works by finding the best non-decreasing warping between two lines according to a cost functional. The authors of Real-Time Deep Video Deinterlacer use Deep CNN to get the best quality of output video.

Where deinterlacing is performed

Deinterlacing of an interlaced video signal can be done at various points in the TV production chain.

Progressive media

Deinterlacing is required for interlaced archive programs when the broadcast format or media format is progressive, as in EDTV 576p or HDTV 720p50 broadcasting, or mobile DVB-H broadcasting; there are two ways to achieve this.

Interlaced media

When the broadcast format or media format is interlaced, real-time deinterlacing should be performed by embedded circuitry in a set-top box, television, external video processor, DVD or DVR player, or TV tuner card. Since consumer electronics equipment is typically far cheaper, has considerably less processing power and uses simpler algorithms compared to professional deinterlacing equipment, the quality of deinterlacing may vary broadly and typical results are often poor even on high-end equipment.[ citation needed ]

Using a computer for playback and/or processing potentially allows a broader choice of video players and/or editing software not limited to the quality offered by the embedded consumer electronics device, so at least theoretically higher deinterlacing quality is possible – especially if the user can pre-convert interlaced video to progressive scan before playback and advanced and time-consuming deinterlacing algorithms (i.e. employing the "production" method).

However, the quality of both free and commercial consumer-grade software may not be up to the level of professional software and equipment. Also, most users are not trained in video production; this often causes poor results as many people do not know much about deinterlacing and are unaware that the frame rate is half the field rate. Many codecs/players do not even deinterlace by themselves and rely on the graphics card and video acceleration API to do proper deinterlacing.

Concerns over effectiveness

The European Broadcasting Union has argued against the use of interlaced video in production and broadcasting, recommending 720p 50 fps (frames per second) as current production format and working with the industry to introduce 1080p50 as a future-proof production standard which offers higher vertical resolution, better quality at lower bitrates, and easier conversion to other formats such as 720p50 and 1080i50. [14] [15] The main argument is that no matter how complex the deinterlacing algorithm may be, the artifacts in the interlaced signal cannot be eliminated because some information is lost between frames.

Yves Faroudja, the founder of Faroudja Labs and Emmy Award winner for his achievements in deinterlacing technology, has stated that "interlace to progressive does not work" and advised against using interlaced signal. [2] [16]

See also

Related Research Articles

<span class="mw-page-title-main">NTSC</span> Analog television system

NTSC is the first American standard for analog television, published in 1941. In 1961, it was assigned the designation System M. It is also known as EIA standard.

<span class="mw-page-title-main">Video</span> Electronic moving image

Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) systems, which, in turn, were replaced by flat-panel displays of several types.

<span class="mw-page-title-main">Interlaced video</span> Technique for doubling the perceived frame rate of a video display

Interlaced video is a technique for doubling the perceived frame rate of a video display without consuming extra bandwidth. The interlaced signal contains two fields of a video frame captured consecutively. This enhances motion perception to the viewer, and reduces flicker by taking advantage of the phi phenomenon.

Progressive scanning is a format of displaying, storing, or transmitting moving images in which all the lines of each frame are drawn in sequence. This is in contrast to interlaced video used in traditional analog television systems where only the odd lines, then the even lines of each frame are drawn alternately, so that only half the number of actual image frames are used to produce video. The system was originally known as "sequential scanning" when it was used in the Baird 240 line television transmissions from Alexandra Palace, United Kingdom in 1936. It was also used in Baird's experimental transmissions using 30 lines in the 1920s. Progressive scanning became universally used in computer screens beginning in the early 21st century.

<span class="mw-page-title-main">Telecine</span> Process for broadcasting content stored on film stock

Telecine is the process of transferring film into video and is performed in a color suite. The term is also used to refer to the equipment used in this post-production process.

A line doubler is a device or algorithm used to deinterlace video signals prior to display on a progressive scan display.

Enhanced-definition television, or extended-definition television (EDTV) is a Consumer Electronics Association (CEA) marketing shorthand term for certain digital television (DTV) formats and devices. Specifically, this term defines an extension of the standard-definition television (SDTV) format that enables a clearer picture during high-motion scenes compared to previous iterations of SDTV, but not producing images as detailed as high-definition television (HDTV).

In video technology, 24p refers to a video format that operates at 24 frames per second frame rate with progressive scanning. Originally, 24p was used in the non-linear editing of film-originated material. Today, 24p formats are being increasingly used for aesthetic reasons in image acquisition, delivering film-like motion characteristics. Some vendors advertise 24p products as a cheaper alternative to film acquisition.

HD-MAC was a broadcast television standard proposed by the European Commission in 1986, as part of Eureka 95 project. It belongs to the MAC - Multiplexed Analogue Components standard family. It is an early attempt by the EEC to provide High-definition television (HDTV) in Europe. It is a complex mix of analogue signal, multiplexed with digital sound, and assistance data for decoding (DATV). The video signal was encoded with a modified D2-MAC encoder.

<span class="mw-page-title-main">720p</span> Video resolution

720p is a progressive HD signal format with 720 horizontal lines/1280 columns and an aspect ratio (AR) of 16:9, normally known as widescreen HD (1.78:1). All major HD broadcasting standards include a 720p format, which has a resolution of 1280×720p.

1080i is a combination of frame resolution and scan type. 1080i is used in high-definition television (HDTV) and high-definition video. The number "1080" refers to the number of horizontal lines on the screen. The "i" is an abbreviation for "interlaced"; this indicates that only the even lines of each frame, then only the odd lines, are drawn alternately, so that only half the number of lines are ever updated at once. A related display resolution is 1080p, which also has 1080 lines of resolution; the "p" refers to progressive scan, which indicates that each full frame appears on the screen in sequence.

<span class="mw-page-title-main">576i</span> Standard-definition video mode

576i is a standard-definition digital video mode, originally used for digitizing analogue television in most countries of the world where the utility frequency for electric power distribution is 50 Hz. Because of its close association with the legacy colour encoding systems, it is often referred to as PAL, PAL/SECAM or SECAM when compared to its 60 Hz NTSC-colour-encoded counterpart, 480i.

High-definition video is video of higher resolution and quality than standard-definition. While there is no standardized meaning for high-definition, generally any video image with considerably more than 480 vertical scan lines or 576 vertical lines (Europe) is considered high-definition. 480 scan lines is generally the minimum even though the majority of systems greatly exceed that. Images of standard resolution captured at rates faster than normal, by a high-speed camera may be considered high-definition in some contexts. Some television series shot on high-definition video are made to look as if they have been shot on film, a technique which is often known as filmizing.

<span class="mw-page-title-main">1080p</span> Video mode

1080p is a set of HDTV high-definition video modes characterized by 1,920 pixels displayed across the screen horizontally and 1,080 pixels down the screen vertically; the p stands for progressive scan, i.e. non-interlaced. The term usually assumes a widescreen aspect ratio of 16:9, implying a resolution of 2.1 megapixels. It is often marketed as Full HD or FHD, to contrast 1080p with 720p resolution screens. Although 1080p is sometimes informally referred to as 2K, these terms reflect two distinct technical standards, with differences including resolution and aspect ratio.

Progressive segmented Frame is a scheme designed to acquire, store, modify, and distribute progressive scan video using interlaced equipment.

In video, a field is one of the many still images displayed sequentially to create the impression of motion on the screen. Two fields comprise one video frame. When the fields are displayed on a video monitor they are "interlaced" so that the content of one field will be used on all of the odd-numbered lines on the screen, and the other field will be displayed on the even lines. Converting fields to a still frame image requires a process called deinterlacing, in which the missing lines are duplicated or interpolated to recreate the information that would have been contained in the discarded field. Since each field contains only half of the information of a full frame, however, deinterlaced images do not have the resolution of a full frame.

This article discusses moving image capture, transmission and presentation from today's technical and creative points of view; concentrating on aspects of frame rates.

<span class="mw-page-title-main">Three-two pull down</span>

Three-two pull down is a term used in filmmaking and television production for the post-production process of transferring film to video.

Television standards conversion is the process of changing a television transmission or recording from one video system to another. Converting video between different numbers of lines, frame rates, and color models in video pictures is a complex technical problem. However, the international exchange of television programming makes standards conversion necessary so that video may be viewed in another nation with a differing standard. Typically video is fed into video standards converter which produces a copy according to a different video standard. One of the most common conversions is between the NTSC and PAL standards.

<span class="mw-page-title-main">Rec. 709</span> Standard for HDTV image encoding and signal characteristics

Rec. 709, also known as Rec.709, BT.709, and ITU 709, is a standard developed by ITU-R for image encoding and signal characteristics of high-definition television.

References

  1. Jung, J.H.; Hong, S.H. (2011). "Deinterlacing method based on edge direction refinement using weighted maximum frequent filter". Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication. ACM. ISBN   978-1-4503-0571-6.
  2. 1 2 Philip Laven (26 January 2005). "EBU Technical Review No. 301 (January 2005)". EBU. Archived from the original on 16 June 2006.
  3. USpatent 4698675,Casey, Robert F.,"Progressive Scan Display System Having Intra-field and Inter-field Process Modes",issued October 6, 1987, assigned to RCA Corporation
  4. PC Magazine. "PCMag Definition: Deinterlace". Archived from the original on 7 October 2012. Retrieved 26 August 2017.
  5. "Deinterlacing Challenge 2019".
  6. 1 2 "VEGAS Creative Software - Faster. More creative. Endless possibilities". www.vegascreativesoftware.com.
  7. "MSU Deinterlacer Benchmark". Archived from the original on 13 February 2021. Retrieved 24 February 2021.
  8. "VirtualDub MSU Deinterlacer / Ïëàãèí äëÿ äåèíòåðëåéñèíãà ïîä VirtualDub". compression.ru.
  9. "Description". GitHub . 13 October 2021.
  10. "Dubhater/Vapoursynth-nnedi3". GitHub . 21 October 2021.
  11. "FFmpeg Filters Documentation".
  12. "Description". GitHub . 24 June 2021.
  13. Zhu, Haichao; Liu, Xueting; Mao, Xiangyu; Wong, Tien-Tsin (2017). "Real-time Deep Video Deinterlacing". arXiv: 1708.00187 [cs.CV].
  14. "EBU R115-2005: FUTURE HIGH DEFINITION TELEVISION SYSTEMS" (PDF). EBU. May 2005. Archived (PDF) from the original on 26 March 2009. Retrieved 24 May 2009.
  15. "10 things you need to know about... 1080p/50" (PDF). EBU. September 2009. Retrieved 26 June 2010.
  16. Philip Laven (25 January 2005). "EBU Technical Review No. 300 (October 2004)". EBU. Archived from the original on 7 June 2011.