Audio synchronizer

Last updated

An audio synchronizer is a variable audio delay used to correct or maintain audio-video sync or timing [1] also known as lip sync error. See for example the specification for audio to video timing given in ATSC Document IS-191. [2] Modern television systems use large amounts of video signal processing such as MPEG preprocessing, encoding and decoding, video synchronization and resolution conversion in pixelated displays. This video processing can cause delays in the video signal ranging from a few microseconds to tens of seconds. If the television program is displayed to the viewer with this video delay the audio-video synchronization is wrong, and the video will appear to the viewer after the sound is heard. This effect is commonly referred to as A/V sync or lip sync error and can cause serious problems related to the viewer's enjoyment of the program.

Contents

Error correction

To correct audio video sync problems, the video processing circuitry outputs a DDO (digital delay output) signal, which carries information about the amount of delay the video signal experiences due to the video processing. The DDO may, for example, be provided by equipment which adheres to the SMPTE [3] Audio to Video Synchronization Standard. The audio synchronizer receives the DDO signal and in response delays the audio by an equivalent amount, thereby maintaining proper audio-video sync. Modern audio synchronizers operate by digitizing and writing the audio signal into a ring memory, which is most commonly a RAM-based memory having independent read and write ability. At the appropriate delay time (as conveyed by the DDO) after an audio sample (or group of samples) are written into the memory, the previously stored audio sample is read from the ring memory. The storage and reading of the audio samples take place continuously in response to the respective memory write and read addresses, which increment by 1 count for every write or read operation. For example, an audio sample would be written at address 1, a different sample read from (previously written) address 5, another sample is written at address 2, yet another read from 6, write at 3, read from 7 and so on. The delay between writing and reading a particular sample is 4 addresses which, when multiplied by the amount of time it takes to change from one address to the next, gives the total audio delay.

Tracking changes

Unfortunately, video delays frequently make quick and large changes, for example, a jump in delay time from 2 seconds to 6 seconds is possible. To maintain proper audio-video sync, the audio delay must track these video delay changes. Changing the audio delay requires changing the difference between the write address and the read address. This change can be accomplished by causing either the write or read address to jump forward or backward, however, this jump causes some audio samples to repeat or be lost resulting in an unwanted and annoying pop, click, gap, distortion and/or noise in the audio signal. Some audio synchronizers operate by making repeated, very small jumps that cause unwanted (but less annoying) distortion and noise in the audio signal, rather than pops, gaps, and clicks. Other audio synchronizers change delay by changing the speed of the reading of audio from the ring memory. If audio samples are read out of the memory more slowly than they are written, the delay increases. If audio samples are read out faster than they are written the delay decreases. Using variable speed reading prevents pops, clicks, gaps, distortion and noise from being introduced into the audio, but does create unwanted and annoying pitch errors. For example, reading faster than writing causes the audio pitch to increase and reading slower than writing causes the pitch to decrease.

Variable speed reading

Audio synchronizers that use variable speed reading are generally preferred in professional applications. The control of audio delay is generally more accurate and more easily accomplished. Pitch errors in lower performance devices are uncompensated and kept to a level generally not perceived by the average viewer, by limiting the amount of change of reading speed. Typically the change limit is in the order of 0.2%. Unfortunately, this limits the rate of delay change and when large video delay changes occur the slow tracking rate of these uncompensated synchronizers can cause the audio-video sync to be off for several seconds or minutes until the audio delay catches up with the video delay. Additionally, listeners with excellent pitch perception may notice and be annoyed by even these small pitch errors.

Pitch correction circuit

In higher performance audio synchronizers, the rate of delay change is allowed to be much faster, generally in the order of 25%, and the resulting pitch error is corrected with a pitch correction circuit. The pitch correction circuitry is frequently a proprietary design, due to the difficulty in performing correction so the errors are imperceptible to critical listeners. These higher performance audio synchronizers allow the audio delay to track even large and quick video delay changes without generating any artifacts that are perceptible to even critical listeners for most audio program material.

Recent developments

Recent development in video processing devices permit those devices to sense when a large video delay change will need to be made beforehand and allow that change information to be communicated to the audio synchronizer. The "advanced notice" from the video processing device allows the audio synchronizer to anticipate and take advantage of particular audio material (e.g., periods of relative silence or periods without music) to facilitate making corresponding large audio delay changes that do not risk generating noticeable audio artifacts. Further developments permit handshaking between the video processing device and the audio synchronizer to control when the video delay change is made to optimize the timing of the tracking audio delay change thereby further reducing the risk of generating noticeable audio artifacts and at the same time reducing the risk of missynchronization due to rapid video delay changes.

Related Research Articles

<span class="mw-page-title-main">Analog television</span> Television that uses analog signals

Analog television is the original television technology that uses analog signals to transmit video and audio. In an analog television broadcast, the brightness, colors and sound are represented by amplitude, phase and frequency of an analog signal.

<span class="mw-page-title-main">Amiga Original Chip Set</span> Chipset used in Amiga personal computer

The Original Chip Set (OCS) is a chipset used in the earliest Commodore Amiga computers and defined the Amiga's graphics and sound capabilities. It was succeeded by the slightly improved Enhanced Chip Set (ECS) and the greatly improved Advanced Graphics Architecture (AGA).

<span class="mw-page-title-main">Peripheral Component Interconnect</span> Local computer bus for attaching hardware devices

Peripheral Component Interconnect (PCI) is a local computer bus for attaching hardware devices in a computer and is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus but in a standardized format that is independent of any given processor's native bus. Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space. It is a parallel bus, synchronous to a single bus clock. Attached devices can take either the form of an integrated circuit fitted onto the motherboard or an expansion card that fits into a slot. The PCI Local Bus was first implemented in IBM PC compatibles, where it displaced the combination of several slow Industry Standard Architecture (ISA) slots and one fast VESA Local Bus (VLB) slot as the bus configuration. It has subsequently been adopted for other computer types. Typical PCI cards used in PCs include: network cards, sound cards, modems, extra ports such as Universal Serial Bus (USB) or serial, TV tuner cards and hard disk drive host adapters. PCI video cards replaced ISA and VLB cards until rising bandwidth needs outgrew the abilities of PCI. The preferred interface for video cards then became Accelerated Graphics Port (AGP), a superset of PCI, before giving way to PCI Express.

Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines".

In electronics and telecommunications, jitter is the deviation from true periodicity of a presumably periodic signal, often in relation to a reference clock signal. In clock recovery applications it is called timing jitter. Jitter is a significant, and usually undesired, factor in the design of almost all communications links.

<span class="mw-page-title-main">Dynamic random-access memory</span> Type of computer memory

Dynamic random-access memory is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal–oxide–semiconductor (MOS) technology. While most DRAM memory cell designs use a capacitor and transistor, some only use two transistors. In the designs where a capacitor is used, the capacitor can either be charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. The electric charge on the capacitors gradually leaks away; without intervention the data on the capacitor would soon be lost. To prevent this, DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors, restoring them to their original charge. This refresh process is the defining characteristic of dynamic random-access memory, in contrast to static random-access memory (SRAM) which does not require data to be refreshed. Unlike flash memory, DRAM is volatile memory, since it loses its data quickly when power is removed. However, DRAM does exhibit limited data remanence.

<span class="mw-page-title-main">Component video</span> Video signal that has been split into component channels

Component video is an analog video signal that has been split into two or more component channels. In popular use, it refers to a type of component analog video (CAV) information that is transmitted or stored as three separate signals. Component video can be contrasted with composite video in which all the video information is combined into a single signal that is used in analog television. Like composite, component cables do not carry audio and are often paired with audio cables.

<span class="mw-page-title-main">Audio system measurements</span> Means of quantifying system performance

Audio system measurements are used to quantify audio system performance. These measurements are made for several purposes. Designers take measurements to specify the performance of a piece of equipment. Maintenance engineers make them to ensure equipment is still working to specification, or to ensure that the cumulative defects of an audio path are within limits considered acceptable. Audio system measurements often accommodate psychoacoustic principles to measure the system in a way that relates to human hearing.

In digital audio electronics, a word clock or wordclock is a clock signal used to synchronise other devices, such as digital audio tape machines and compact disc players, which interconnect via digital audio signals. Word clock is so named because it clocks each audio sample. Samples are represented in data words.

<span class="mw-page-title-main">Television Interface Adaptor</span> Video/audio/input chip of the Atari 2600

The Television Interface Adaptor (TIA) is the custom computer chip, along with a variant of the MOS Technology 6502 constituting the heart of the 1977 Atari Video Computer System game console. The TIA generates the screen display, sound effects, and reads the controllers. At the time the Atari VCS was designed, even small amounts of RAM were expensive. The chip was designed around not having a frame buffer, instead requiring detailed programming to create even a simple display.

In computer science, a data buffer is a region of memory used to store data temporarily while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device ; however, a buffer may be used when data is moved between processes within a computer, comparable to buffers in telecommunication. Buffers can be implemented in a fixed memory location in hardware or by using a virtual data buffer in software that points at a location in the physical memory.

Time base correction (TBC) is a technique to reduce or eliminate errors caused by mechanical instability present in analog recordings on mechanical media. Without time base correction, a signal from a videotape recorder (VTR) or videocassette recorder (VCR), cannot be mixed with other, more time-stable devices such as character generators and video cameras found in television studios and post-production facilities.

CANopen is a communication protocol stack and device profile specification for embedded systems used in automation. In terms of the OSI model, CANopen implements the layers above and including the network layer. The CANopen standard consists of an addressing scheme, several small communication protocols and an application layer defined by a device profile. The communication protocols have support for network management, device monitoring and communication between nodes, including a simple transport layer for message segmentation/desegmentation. The lower level protocol implementing the data link and physical layers is usually Controller Area Network (CAN), although devices using some other means of communication can also implement the CANopen device profile.

Measurement of wow and flutter is carried out on audio tape machines, cassette recorders and players, and other analog recording and reproduction devices with rotary components This measurement quantifies the amount of 'frequency wobble' present in subjectively valid terms. Turntables tend to suffer mainly slow wow. In digital systems, which are locked to crystal oscillators, variations in clock timing are referred to as wander or jitter, depending on speed.

<span class="mw-page-title-main">ATSC tuner</span> Tuner for digital television channels

An ATSCtuner, often called an ATSC receiver or HDTV tuner, is a type of television tuner that allows reception of digital television (DTV) television channels that use ATSC standards, as transmitted by television stations in North America, parts of Central America, and South Korea. Such tuners are usually integrated into a television set, VCR, digital video recorder (DVR), or set-top box which provides audio/video output connectors of various types.

A video signal generator is a type of signal generator which outputs predetermined video and/or television oscillation waveforms, and other signals used in the synchronization of television devices and to stimulate faults in, or aid in parametric measurements of, television and video systems. There are several different types of video signal generators in widespread use. Regardless of the specific type, the output of a video generator will generally contain synchronization signals appropriate for television, including horizontal and vertical sync pulses or sync words. Generators of composite video signals will also include a colorburst signal as part of the output.

<span class="mw-page-title-main">Delay (audio effect)</span> Echo-like effect

Delay is an audio signal processing technique that records an input signal to a storage medium and then plays it back after a period of time. When the delayed playback is mixed with the live audio, it creates an echo-like effect, whereby the original audio is heard followed by the delayed audio. The delayed signal may be played back multiple times, or fed back into the recording, to create the sound of a repeating, decaying echo.

Audio-to-video synchronization refers to the relative timing of audio (sound) and video (image) parts during creation, post-production (mixing), transmission, reception and play-back processing. AV synchronization can be an issue in television, videoconferencing, or film.

Display lag is a phenomenon associated with most types of liquid crystal displays (LCDs) like smartphones and computers and nearly all types of high-definition televisions (HDTVs). It refers to latency, or lag between when the signal is sent to the display and when the display starts to show that signal. This lag time has been measured as high as 68 ms, or the equivalent of 3-4 frames on a 60 Hz display. Display lag is not to be confused with pixel response time, which is the amount of time it takes for a pixel to change from one brightness value to another. Currently the majority of manufacturers quote the pixel response time, but neglect to report display lag.

This glossary defines terms that are used in the document "Defining Video Quality Requirements: A Guide for Public Safety", developed by the Video Quality in Public Safety (VQIPS) Working Group. It contains terminology and explanations of concepts relevant to the video industry. The purpose of the glossary is to inform the reader of commonly used vocabulary terms in the video domain. This glossary was compiled from various industry sources.

References

  1. Aldo Cucnini (2007-09-01). "Managing lip sync". Broadcast Engineering. Archived from the original on 2011-07-27. Retrieved 2011-07-27.{{cite journal}}: Cite journal requires |journal= (help)
  2. IS-191: Relative Timing of Sound and Vision for Broadcast Operations, ATSC, 2003-06-26, archived from the original on 2012-03-21
  3. IEEE ST2064