I-Frame Delay

Last updated August 11, 2021

I-Frame Delay (IFD) is a scheduling technique for adaptive streaming of MPEG video. The idea behind it is that streaming scheduler drops video frames when the transmission buffer is full because of insufficient bandwidth, to reduce the transmitted bit-rate. The characteristics of the algorithm.:^[1]

number of frames currently in the buffer (not the number of bytes) is indicating buffer fullness,
less important frames (B-frame) from the buffer are dropped before the more important frames (I-frame and P-frame),
the transmission of I-frames is delayed when conditions are bad, even if they are out-of-date w.r.t. the display time (they can still be used to decode subsequent interpredicted frames).

I-Frame Delay algorithm

The IFD mechanism is divided on two parts:^[1]

as the stream is parsed and packetized into network packets, it is also analyzed and the packets are tagged with a priority number reflecting the frame type (I-frame, P-frame or B-frame). Non-video packets are given a highest priority number, which causes that audio will never be dropped.
during transmission, packets are dropped by the IFD scheduler when the bandwidth is insufficient.

The size of the IFD buffer should be big enough to hold a number of frames but minimum required is two frames, one to hold the frame currently being sent (indicated below as ScheduledFrame), and one currently waiting to be sent (indicated as WaitingFrame). Increasing the IFD buffer size could potentially permit a more elaborate prioritization, however it can cause increased latency and memory usage.^[1] The figure below depicts an example of the buffer filling. The numbers represent the priority a packet.

Here the video frames priority numbers are 10 and higher. The packets with priority number 12 belong to the frame scheduled for sending, and the packets with number 11 belong to the waiting frame. On the figure a packet which belongs to the next frame is about to enter the buffer.

As can be seen, it is possible to interleave video packets with non-video packets (audio or system with priority numbers 2 and 0 respectively). When a packet belonging to next frame is about to be written to the IFD buffer and the buffer is full, the IFD scheduler will drop a frame based on the priority assigned earlier. When the network bandwidth is so low that also P-frames need to be dropped, then the GOP (Group of Pictures) is set to be "disturbed" and the rest of the GOP (which depends on the P-frame) is also dropped.

If only B-frames are dropped there should be no distortions in the frame image because there are no subsequent frames depending on them.^[1] The dropping of frames by IFD causes the effect of the video playback being temporarily frozen, the duration of which depends on the number of frames dropped after which the playback resumes from the next frame which got through.^[1] For an IFD implementation with a buffer of the size of two frames the algorithm is shown in figure below.

procedure Enqueue(NextFrame)   if DisturbedGOP == True then     if NextFrame is type I then             # New GOP is encountered       DisturbedGOP = False                  # Reset disturbed GOP flag     end   end   if DisturbedGOP == True then     Drop NextFrame                          # Discard rest of disturbed GOP     return   end   if WaitingFrame is empty then     WaitingFrame = NextFrame   else     if NextFrame is type I then       WaitingFrame = NextFrame     else       if NextFrame is type B then         Drop NextFrame       else         if WaitingFrame is type I or P then           Drop NextFrame           if NextFrame is type P then       # Discarded frame is P-frame             DisturbedGOP = True             # Set disturbed GOP flag           end         else           WaitingFrame = NextFrame         end       end     end   end end

Related Research Articles

Digital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images in the form of analog signals. Digital video comprises a series of digital images displayed in rapid succession.

MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) practical.

MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, which permit storage and transmission of movies using currently available storage media and transmission bandwidth. While MPEG-2 is not as efficient as newer standards such as H.264/AVC and H.265/HEVC, backwards compatibility with existing hardware and software means it is still widely used, for example in over-the-air digital television broadcasting and in the DVD-Video standard.

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network or a cloud computing service, particularly the performance seen by the users of the network. To quantitatively measure quality of service, several related aspects of the network service are often considered, such as packet loss, bit rate, throughput, transmission delay, availability, jitter, etc.

The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.

Time-division multiplexing (TDM) is a method of transmitting and receiving independent signals over a common signal path by means of synchronized switches at each end of the transmission line so that each signal appears on the line only a fraction of time in an alternating pattern. This method transmits two or more digital signals or analog signals over a common channel. It can be used when the bit rate of the transmission medium exceeds that of the signal to be transmitted. This form of signal multiplexing was developed in telecommunications for telegraphy systems in the late 19th century, but found its most common application in digital telephony in the second half of the 20th century.

In telecommunications and computer networking, a network packet is a formatted unit of data carried by a packet-switched network. A packet consists of control information and user data; the latter is also known as the payload. Control information provides data for delivering the payload. Typically, control information is found in packet headers and trailers.

IEEE 802.11e-2005 or 802.11e is an approved amendment to the IEEE 802.11 standard that defines a set of quality of service (QoS) enhancements for wireless LAN applications through modifications to the media access control (MAC) layer. The standard is considered of critical importance for delay-sensitive applications, such as Voice over Wireless LAN and streaming multimedia. The amendment has been incorporated into the published IEEE 802.11-2007 standard.

Advanced Television Systems Committee (ATSC) standards are an American set of standards for digital television transmission over terrestrial, cable and satellite networks. It is largely a replacement for the analog NTSC standard and, like that standard, is used mostly in the United States, Mexico, Canada, and South Korea. Several former NTSC users, in particular Japan, have not used ATSC during their digital television transition, because they adopted their own system called ISDB.

Digital Video Broadcasting - Satellite - Second Generation (DVB-S2) is a digital television broadcast standard that has been designed as a successor for the popular DVB-S system. It was developed in 2003 by the Digital Video Broadcasting Project, an international industry consortium, and ratified by ETSI in March 2005. The standard is based on, and improves upon DVB-S and the electronic news-gathering system, used by mobile units for sending sounds and images from remote locations worldwide back to their home television stations.

H.262 or MPEG-2 Part 2 is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical.

MPEG transport stream or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast systems such as DVB, ATSC and IPTV.

Statistical multiplexing is a type of communication link sharing, very similar to dynamic bandwidth allocation (DBA). In statistical multiplexing, a communication channel is divided into an arbitrary number of variable bitrate digital channels or data streams. The link sharing is adapted to the instantaneous traffic demands of the data streams that are transferred over each channel. This is an alternative to creating a fixed sharing of a link, such as in general time division multiplexing (TDM) and frequency division multiplexing (FDM). When performed correctly, statistical multiplexing can provide a link utilization improvement, called the statistical multiplexing gain.

Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion. Packet loss is measured as a percentage of packets lost with respect to packets sent.

An elementary stream (ES) as defined by the MPEG communication protocol is usually the output of an audio encoder or video encoder. An ES contains only one kind of data. An elementary stream is often referred to as "elementary", "data", "audio", or "video" bitstreams or streams. The format of the elementary stream depends upon the codec or data carried in the stream, but will often carry a common header when packetized into a packetized elementary stream.

In video coding, a group of pictures, or GOP structure, specifies the order in which intra- and inter-frames are arranged. The GOP is a collection of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs, from which the visible frames are generated. Encountering a new GOP in a compressed video stream means that the decoder doesn't need any previous frames in order to decode the next ones, and allows fast seeking through the video.

ATSC-M/H is a U.S. standard for mobile digital TV that allows TV broadcasts to be received by mobile devices.

The zap time is the total duration of time from which the viewer changes the channel using a remote control to the point that the picture of the new channel is displayed. This includes the corresponding audio. These delays exist in all television systems, but they are more pronounced in digital television and systems that use the internet such as IPTV. Human interaction with the system is completely ignored in these measurements, so zap time is not the same as channel surfing.

Time-Sensitive Networking (TSN) is a set of standards under development by the Time-Sensitive Networking task group of the IEEE 802.1 working group. The TSN task group was formed in November 2012 by renaming the existing Audio Video Bridging Task Group and continuing its work. The name changed as a result of the extension of the working area of the standardization group. The standards define mechanisms for the time-sensitive transmission of data over deterministic Ethernet networks.

Deterministic Networking (DetNet) is an effort by the IETF DetNet Working Group to study implementation of deterministic data paths for real-time applications with extremely low data loss rates, packet delay variation (jitter), and bounded latency, such as audio and video streaming, industrial automation, and vehicle control.

References

1 2 3 4 5 Marek Burza, Jeffrey Kang, Peter van der Stok; Adaptive Streaming of MPEG-based Audio/Video Content over Wireless Networks; Journal of Multimedia vol. 2, no. 2, April 2007; ISSN 1796-2048

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[RecentIFD-1] 1 2 3 4 5 Marek Burza, Jeffrey Kang, Peter van der Stok; Adaptive Streaming of MPEG-based Audio/Video Content over Wireless Networks; Journal of Multimedia vol. 2, no. 2, April 2007; ISSN 1796-2048

[1]