Three-two pull down

Last updated
An illustration of the process 32pulldown.svg
An illustration of the process

Three-two pull down (3:2 pull down) is a term used in filmmaking and television production for the post-production process of transferring film to video.


It converts 24 frames per second into 29.97 frames per second, converting approximately every four frames into five frames plus a slight slow down in speed. Film runs at a standard rate of 24 frames per second, whereas NTSC video has a signal frame rate of 29.97 frames per second. Every interlaced video frame has two fields for each frame. The three-two pull down is where the telecine adds a third video field (a half frame) to every second video frame, but the untrained eye cannot see the addition of this extra video field. In the figure, the film frames A–D are the true or original images since they have been photographed as a complete frame. The A, B, and D frames on the right in the NTSC footage are original frames. The third and fourth frames have been created by blending fields from different frames.



In the United States and other countries where television uses the 59.94 Hz vertical scanning frequency, video is broadcast at 29.97 frame/s. For the film's motion to be accurately rendered on the video signal, a telecine must use a technique called the 2:3 pull down (or a variant called 3:2 pull down) to convert from 24 to 29.97 frame/s.

The term "pulldown" comes from the mechanical process of "pulling" (physically moving) the film downward within the film portion of the transport mechanism to advance it from one frame to the next at a repetitive rate (nominally 24 frames/s). This is accomplished in two steps.

The first step is to slow down the film motion by 1/1000 to 23.976 frames/s (or 24 frames every 1.001 seconds). This difference in speed is imperceptible to the viewer. For a two-hour film, play time is extended by 7.2 seconds.

The second step of is distributing cinema frames into video fields. At 23.976 frame/s, there are four frames of film for every five frames of 29.97 Hz video:

These four frames needs to be "stretched" into five frames by exploiting the interlaced nature of video. Since an interlaced video frame is made up of two incomplete fields (one for the odd-numbered lines of the image, and one for the even-numbered lines), conceptually four frames need to be used in ten fields (to produce five frames).

The term "2:3" comes from the pattern for producing fields in the new video frames. The pattern of 2-3 is an abbreviation of the actual pattern of 2-3-2-3, which indicates that the first film frame is used in 2 fields, the second film frame is used in 3 fields, the third film frame is used in 2 fields, and the fourth film frame is used in 3 fields, producing a total of 10 fields, or 5 video frames. If the four film frames are called A, B, C and D, the five video frames produced are A1-A2, B1-B2, B2-C1, C2-D1 and D1-D2. That is, frame A is used 2 times (in both fields of the first video frame); frame B is used 3 times (in both fields of the second video frame and in one of the fields of the third video frame); frame C is used 2 times (in the other field of the third video frame, and in one of the fields of the fourth video frame); and frame D is used 3 times (in the other field of the fourth video frame, and in both fields of the fifth video frame). The 2-3-2-3 cycle repeats itself completely after four film frames have been exposed.



The alternative "3:2" pattern is similar to the one shown above, except it is shifted by one frame. For instance, a cycle that starts with film frame B yields a 3:2 pattern: B1-B2, B2-C1, C2-D1, D1-D2, A1-A2 or 3-2-3-2 or simply 3-2. In other words, there is no difference between the 2-3 and 3-2 patterns. In fact, the "3-2" notation is misleading because according to SMPTE standards for every four-frame film sequence the first frame is scanned twice, not three times. [1]

Modern alternatives

The above method is a "classic" 2:3, which was used before frame buffers allowed for holding more than one frame. It has the disadvantage of creating two dirty frames (which are a mix from two different film frames) and three clean frames (which matches an unmodified film frame) in every five video frames.

The preferred method for doing a 2:3 creates only one dirty frame in every five (i.e. 3:3:2:2 or 2:3:3:2 or 2:2:3:3). The 3-3-2-2 pattern produces A1-A2 A2-B1 B1-B2 C1-C2 D1-D2, where only the second frame is dirty. While this method has a slight bit more judder, it allows for easier upconversion (the dirty frame can be dropped without losing information) and a better overall compression when encoding. Note that just fields are displayed—no frames hence no dirty frames—in interlaced displays such as on a CRT. Dirty frames may appear in other methods of displaying the interlaced video.


The rate of NTSC video (initially monochrome, only, but soon thereafter monochrome and color) is 29.97 frames per second, or one-thousandth slower than 30 frame/s, due to the NTSC color encoding process which mandated that the line rate be a sub-multiple of the 3.579545 MHz color "burst" frequency, or 15734.2637 Hz (29.9700 Hz, frame rate), rather than the (60 Hz) ac "line locked" line rate of 15750.0000… Hz (30.0000… Hz, frame rate). This was done to maintain compatibility with black and white televisions.

Because of this 0.1% speed difference, when converting film to video, or vice versa, the sync will drift and the audio will end up out of sync by 3.2 seconds per hour. In order to correct this error, the audio can be either pulled up or pulled down. A pull up will speed up the sound by 0.1%, used for transferring video to film. A pull down will slow the audio speed down by 0.1%, necessary for transferring film to video.

See also

Related Research Articles

<span class="mw-page-title-main">MPEG-2</span> Video encoding standard

MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth. While MPEG-2 is not as efficient as newer standards such as H.264/AVC and H.265/HEVC, backwards compatibility with existing hardware and software means it is still widely used, for example in over-the-air digital television broadcasting and in the DVD-Video standard.

<span class="mw-page-title-main">NTSC</span> Analog television system

The first American standard for analog television broadcast was developed by the National Television System Committee (NTSC) in 1941. In 1961, it was assigned the designation System M.

<span class="mw-page-title-main">PAL</span> Colour encoding system for analogue television

Phase Alternating Line (PAL) is a colour encoding system for analogue television. It was one of three major analogue colour television standards, the others being NTSC and SECAM. In most countries it was broadcast at 625 lines, 50 fields per second, and associated with CCIR analogue broadcast television systems B, D, G, H, I or K. The articles on analog broadcast television systems further describe frame rates, image resolution, and audio modulation.

<span class="mw-page-title-main">Interlaced video</span> Technique for doubling the perceived frame rate of a video display

Interlaced video is a technique for doubling the perceived frame rate of a video display without consuming extra bandwidth. The interlaced signal contains two fields of a video frame captured consecutively. This enhances motion perception to the viewer, and reduces flicker by taking advantage of the phi phenomenon.

<span class="mw-page-title-main">Telecine</span> Process for broadcasting content stored on film stock

Telecine is the process of transferring film into video and is performed in a color suite. The term is also used to refer to the equipment used in this post-production process.

Broadcasttelevision systems are the encoding or formatting systems for the transmission and reception of terrestrial television signals.

The refresh rate, also known as vertical refresh rate or vertical scan rate in reference to terminology originating with the cathode-ray tubes (CRTs), is the number of times per second that a raster-based display device displays a new image. This is independent from frame rate, which describes how many images are stored or generated every second by the device driving the display. On CRT displays, higher refresh rates produce less flickering, thereby reducing eye strain. In other technologies such as liquid-crystal displays, the refresh rate affects only how often the image can potentially be updated.

In video technology, 24p refers to a video format that operates at 24 frames per second frame rate with progressive scanning. Originally, 24p was used in the non-linear editing of film-originated material. Today, 24p formats are being increasingly used for aesthetic reasons in image acquisition, delivering film-like motion characteristics. Some vendors advertise 24p products as a cheaper alternative to film acquisition.

Deinterlacing is the process of converting interlaced video into a non-interlaced or progressive form. Interlaced video signals are commonly found in analog television, digital television (HDTV) when in the 1080i format, some DVD titles, and a smaller number of Blu-ray discs.

1080i is a combination of frame resolution and scan type. 1080i is used in high-definition television (HDTV) and high-definition video. The number "1080" refers to the number of horizontal lines on the screen. The "i" is an abbreviation for "interlaced"; this indicates that only the even lines, then the odd lines of each frame are drawn alternately, so that only half the number of actual image frames are used to produce video. A related display resolution is 1080p, which also has 1080 lines of resolution; the "p" refers to progressive scan, which indicates that the lines of resolution for each frame are "drawn" on the screen in sequence.

<span class="mw-page-title-main">576i</span> Standard-definition video mode

576i is a standard-definition digital video mode, originally used for digitizing analog television in most countries of the world where the utility frequency for electric power distribution is 50 Hz. Because of its close association with the legacy colour encoding systems, it is often referred to as PAL, PAL/SECAM or SECAM when compared to its 60 Hz NTSC-colour-encoded counterpart, 480i.

Film-out is the process in the computer graphics, video production and filmmaking disciplines of transferring images or animation from videotape or digital files to a traditional film print. Film-out is a broad term that encompasses the conversion of frame rates, color correction, as well as the actual printing, also called scannior recording.

High-definition video is video of higher resolution and quality than standard-definition. While there is no standardized meaning for high-definition, generally any video image with considerably more than 480 vertical scan lines or 576 vertical lines (Europe) is considered high-definition. 480 scan lines is generally the minimum even though the majority of systems greatly exceed that. Images of standard resolution captured at rates faster than normal, by a high-speed camera may be considered high-definition in some contexts. Some television series shot on high-definition video are made to look as if they have been shot on film, a technique which is often known as filmizing.

<span class="mw-page-title-main">1080p</span> Video mode

1080p is a set of HDTV high-definition video modes characterized by 1,920 pixels displayed across the screen horizontally and 1,080 pixels down the screen vertically; the p stands for progressive scan, i.e. non-interlaced. The term usually assumes a widescreen aspect ratio of 16:9, implying a resolution of 2.1 megapixels. It is often marketed as Full HD or FHD, to contrast 1080p with 720p resolution screens. Although 1080p is sometimes informally referred to as 2K, these terms reflect two distinct technical standards, with differences including resolution and aspect ratio.

Progressive segmented Frame is a scheme designed to acquire, store, modify, and distribute progressive scan video using interlaced equipment.

This article discusses moving image capture, transmission and presentation from today's technical and creative points of view; concentrating on aspects of frame rates.

Television standards conversion is the process of changing a television transmission or recording from one video system to another. Converting video between different numbers of lines, frame rates, and color models in video pictures is a complex technical problem. However, the international exchange of television programming makes standards conversion necessary so that video may be viewed in another nation with a differing standard. Typically video is fed into video standards converter which produces a copy according to a different video standard. One of the most common conversions is between the NTSC and PAL standards.

<span class="mw-page-title-main">Rec. 709</span> Standard for HDTV image encoding and signal characteristics

Rec. 709, also known as Rec.709, BT.709, and ITU 709, is a standard developed by ITU-R for image encoding and signal characteristics of high-definition television.

High-definition television describes a television system which provides a substantially higher image resolution than the previous generation of technologies. The term has been used since 1936; in more recent times, it refers to the generation following standard-definition television (SDTV), often abbreviated to HDTV or HD-TV. It is the current de facto standard video format used in most broadcasts: terrestrial broadcast television, cable television, satellite television and Blu-ray Discs.


  1. Poynton, Charles (2003). Charles Poynton, Digital Video and HDTV: Algorithms and Interfaces. ISBN   9781558607927., page 430