Professional video over IP

Last updated November 09, 2024

Professional video over IP systems use some existing standard video codec to reduce the program material to a bitstream (e.g., an MPEG transport stream), and then use an Internet Protocol (IP) network to carry that bitstream encapsulated in a stream of IP packets. This is typically accomplished using some variant of the RTP protocol.

Carrying professional video over IP networks has special challenges compared to most non-time-critical IP traffic. Many of these problems are similar to those encountered in voice over IP, but to more stringent engineering requirements. In particular, there are very strict quality of service requirements that must be fulfilled for use in professional broadcast environments.

Packet loss

Since even well-engineered IP networks tend to have a small residual packet loss rate caused by low-probability statistical congestion events and amplification of bit errors in the underlying hardware, most professional solutions use some kind of forward error correction to ensure that the encoded video stream can be reconstructed even if a few packets are lost. This is usually applied at the packet level, since the encapsulated video bitstream is typically only designed to tolerate low levels of bit or burst errors, rather than the loss of whole packets. Resending packets is not an option because of the sequential nature of the underlying video signal. For live video, a re-sent packet would arrive well after the arrival of the next frame of video.

Network delay variation

Network delay variation can be kept to a minimum by using a high-speed network backbone, and ensuring that video traffic does not encounter excessive queue delays. This is typically done by either ensuring that the network is not too close to its full capacity, or that video traffic is prioritized using traffic engineering techniques (see below).

The remaining delay variation can be removed by buffering, at the expense of added time delay. If forward error correction is used, a small proportion of packets arriving after the deadline can be tolerated as they can be discarded on receipt and treated in the same way as lost packets. Added time delay over 250ms is particularly problematical with PTZ cameras as it makes operator control difficult.

Timing reconstruction

The other problem presented by latency variation is that it makes synchronization more complex by making the recovery of the underlying timing of the video signal far more difficult. This is typically solved by genlocking both ends of the system to external station sync signals, typically generated from sources such as GPS or atomic clocks, thus only requiring the extraction of coarse timing information at the receiving end in order to achieve high-quality video synchronization. The extraction of coarse timing data is typically done using a phase locked loop with a long time constant.

Adequate bandwidth

Even with packet loss mitigation, video over IP will only work if the network is capable of carrying the content with some reasonable maximum packet loss rate. In practice, this means that video over IP will not work on overloaded networks. Since IP does not of itself offer any traffic guarantees, this must be applied at the network engineering level. One approach to this is the "quality of service" approach which simply allocates sufficient bandwidth to video-carrying traffic that it will not congest under any possible load pattern. Other approaches include dynamic reduction in frame rate or resolution, Network Admission Control, bandwidth reservation, traffic shaping, and traffic prioritization techniques, which require more complex network engineering, but will work when the simple approach of building a non-blocking network is not possible. See RSVP for one approach to IP network traffic engineering.

The Pro-MPEG Wide Area Network group has done much work on creating a draft standard for interoperable professional video over IP.

Use in the security industry

Within the security products industry, IP-based Closed Circuit Television (CCTV) has made gains over the analog market. Key components of IP-based CCTV remain consistent with analog technologies: image capture, with a combination of IP-based cameras or analog cameras using IP-based encoders; image transmission; Storage and Retrieval, which uses technologies such as RAID arrays and iSCSI for recorded and indexed video; and video management, which affords web browser-enabled management and control of IP-based CCTV systems.

One key advantage of IP-based CCTV is the ability to use network infrastructure, providing adequate bandwidth and availability of switching and routing, rather than coaxial cabling. However, running bandwidth-intensive surveillance video over corporate data networks may worsen network performance.

A class of companies produce video management software to help manage capture and storage of video content. Digital video also makes possible Video Content Analysis, which allows automatic detection and identification of various kinds of objects or motion.

Another emerging^{[ when? ]} model is off-site storage of surveillance video. Online surveillance providers use cloud computing technologies to consolidate multi-site surveillance video over the web.

Manufacturers of CCTV equipment have been integrating IP network technology into their product ranges.

Related Research Articles

Digital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images in the form of analog signals. Digital video comprises a series of digital images displayed in rapid succession, usually at 24, 25, 30, or 60 frames per second. Digital video has many advantages such as easy copying, multicasting, sharing and storage.

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitatively measure quality of service, several related aspects of the network service are often considered, such as packet loss, bit rate, throughput, transmission delay, availability, jitter, etc.

The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.

Frame Relay is a standardized wide area network (WAN) technology that specifies the physical and data link layers of digital telecommunications channels using a packet switching methodology. Originally designed for transport across Integrated Services Digital Network (ISDN) infrastructure, it may be used today in the context of many other network interfaces.

In electronics and telecommunications, jitter is the deviation from true periodicity of a presumably periodic signal, often in relation to a reference clock signal. In clock recovery applications it is called timing jitter. Jitter is a significant, and usually undesired, factor in the design of almost all communications links.

In telecommunications and computer networking, a network packet is a formatted unit of data carried by a packet-switched network. A packet consists of control information and user data; the latter is also known as the payload. Control information provides data for delivering the payload. Typically, control information is found in packet headers and trailers.

Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for voice calls for the delivery of voice communication sessions over Internet Protocol (IP) networks, such as the Internet.

Serial digital interface (SDI) is a family of digital video interfaces first standardized by SMPTE in 1989. For example, ITU-R BT.656 and SMPTE 259M define digital video interfaces used for broadcast-grade video. A related standard, known as high-definition serial digital interface (HD-SDI), is standardized in SMPTE 292M; this provides a nominal data rate of 1.485 Gbit/s.

In computer networking and telecommunications, TDM over IP (TDMoIP) is the emulation of time-division multiplexing (TDM) over a packet-switched network (PSN). TDM refers to a T1, E1, T3 or E3 signal, while the PSN is based either on IP or MPLS or on raw Ethernet. A related technology is circuit emulation, which enables transport of TDM traffic over cell-based (ATM) networks.

Datacasting is the transmission of data over a wide area using radio waves. It typically refers to supplemental information sent by television stations alongside digital terrestrial television (DTT) signals. However, datacasting can also be applied to digital data signals carried on analog TV or radio broadcasts.

Dolby Digital Plus, also known as Enhanced AC-3, is a digital audio compression scheme developed by Dolby Labs for the transport and storage of multi-channel digital audio. It is a successor to Dolby Digital (AC-3), and has a number of improvements over that codec, including support for a wider range of data rates, an increased channel count, and multi-program support, as well as additional tools (algorithms) for representing compressed data and counteracting artifacts. Whereas Dolby Digital (AC-3) supports up to five full-bandwidth audio channels at a maximum bitrate of 640 kbit/s, E-AC-3 supports up to 15 full-bandwidth audio channels at a maximum bitrate of 6.144 Mbit/s.

Asynchronous Serial Interface, or ASI, is a method of carrying an MPEG Transport Stream (MPEG-TS) over 75-ohm copper coaxial cable or optical fiber. It is popular in the television industry as a means of transporting broadcast programs from the studio to the final transmission equipment before it reaches viewers sitting at home.

A computer network is a set of computers sharing resources located on or provided by network nodes. Computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are made up of telecommunication network technologies based on physically wired, optical, and wireless radio-frequency methods that may be arranged in a variety of network topologies.

An Internet Protocol camera, or IP camera, is a type of digital video camera that receives control data and sends image data via an IP network. They are commonly used for surveillance, but, unlike analog closed-circuit television (CCTV) cameras, they require no local recording device, only a local area network. Most IP cameras are webcams, but the term IP camera or netcam usually applies only to those that can be directly accessed over a network connection.

Audio-to-video synchronization refers to the relative timing of audio (sound) and video (image) parts during creation, post-production (mixing), transmission, reception and play-back processing. AV synchronization can be an issue in television, videoconferencing, or film.

AES67 is a technical standard for audio over IP and audio over Ethernet (AoE) interoperability. The standard was developed by the Audio Engineering Society and first published in September 2013. It is a layer 3 protocol suite based on existing standards and is designed to allow interoperability between various IP-based audio networking systems such as RAVENNA, Wheatnet, Livewire, Q-LAN and Dante.

SMPTE 2022 is a standard from the Society of Motion Picture and Television Engineers (SMPTE) that describes how to send digital video over an IP network. Video formats supported include MPEG-2 and serial digital interface The standard was introduced in 2007 and has been expanded in the years since.

SMPTE 2110 is a suite of standards from the Society of Motion Picture and Television Engineers (SMPTE) that describes how to send digital media over an IP network.

Deterministic Networking (DetNet) is an effort by the IETF DetNet Working Group to study implementation of deterministic data paths for real-time applications with extremely low data loss rates, packet delay variation (jitter), and bounded latency, such as audio and video streaming, industrial automation, and vehicle control.

Audio Video Bridging (AVB) is a common name for a set of technical standards that provide improved synchronization, low latency, and reliability for switched Ethernet networks. AVB embodies the following technologies and standards:

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.