Real-time Transport Protocol

Last updated
Real-time Transport Protocol
Communication protocol
AbbreviationRTP
PurposeDelivering audio and video
Developer(s)Audio-Video Transport Working Group of the IETF
IntroductionJanuary 1996;28 years ago (1996-01)
Based on Network Voice Protocol [1]
RFC(s) RFC   1889, 3550, 3551

The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.

Contents

RTP typically runs over User Datagram Protocol (UDP). RTP is used in conjunction with the RTP Control Protocol (RTCP). While RTP carries the media streams (e.g., audio and video), RTCP is used to monitor transmission statistics and quality of service (QoS) and aids synchronization of multiple streams. RTP is one of the technical foundations of Voice over IP and in this context is often used in conjunction with a signaling protocol such as the Session Initiation Protocol (SIP) which establishes connections across the network.

RTP was developed by the Audio-Video Transport Working Group of the Internet Engineering Task Force (IETF) and first published in 1996 as RFC   1889 which was then superseded by RFC   3550 in 2003. [2]

Overview

Research on audio and video over packet-switched networks dates back to the early 1970s. The Internet Engineering Task Force (IETF) published RFC   741 in 1977 and began developing RTP in 1992, [1] and would go on to develop Session Announcement Protocol (SAP), the Session Description Protocol (SDP), and the Session Initiation Protocol (SIP).

RTP is designed for end-to-end, real-time transfer of streaming media. The protocol provides facilities for jitter compensation and detection of packet loss and out-of-order delivery, which are common, especially during UDP transmissions on an IP network. RTP allows data transfer to multiple destinations through IP multicast. [3] RTP is regarded as the primary standard for audio/video transport in IP networks and is used with an associated profile and payload format. [4] The design of RTP is based on the architectural principle known as application-layer framing where protocol functions are implemented in the application as opposed to the operating system's protocol stack.

Real-time multimedia streaming applications require timely delivery of information and often can tolerate some packet loss to achieve this goal. For example, loss of a packet in an audio application may result in loss of a fraction of a second of audio data, which can be made unnoticeable with suitable error concealment algorithms. [5] The Transmission Control Protocol (TCP), although standardized for RTP use, [6] is not normally used in RTP applications because TCP favors reliability over timeliness. Instead, the majority of the RTP implementations are built on the User Datagram Protocol (UDP). [5] Other transport protocols specifically designed for multimedia sessions are SCTP [7] and DCCP, [8] although, as of 2012, they were not in widespread use. [9]

RTP was developed by the Audio/Video Transport working group of the IETF standards organization. RTP is used in conjunction with other protocols such as H.323 and RTSP. [4] The RTP specification describes two protocols: RTP and RTCP. RTP is used for the transfer of multimedia data, and the RTCP is used to periodically send control information and QoS parameters. [10]

The data transfer protocol, RTP, carries real-time data. Information provided by this protocol includes timestamps (for synchronization), sequence numbers (for packet loss and reordering detection) and the payload format which indicates the encoded format of the data. [11] The control protocol, RTCP, is used for quality of service (QoS) feedback and synchronization between the media streams. The bandwidth of RTCP traffic compared to RTP is small, typically around 5%. [11] [12]

RTP sessions are typically initiated between communicating peers using a signaling protocol, such as H.323, the Session Initiation Protocol (SIP), RTSP, or Jingle (XMPP). These protocols may use the Session Description Protocol to specify the parameters for the sessions. [13]

An RTP session is established for each multimedia stream. Audio and video streams may use separate RTP sessions, enabling a receiver to selectively receive components of a particular stream. [14] The RTP and RTCP design is independent of the transport protocol. Applications most typically use UDP with port numbers in the unprivileged range (1024 to 65535). [15] The Stream Control Transmission Protocol (SCTP) and the Datagram Congestion Control Protocol (DCCP) may be used when a reliable transport protocol is desired. The RTP specification recommends even port numbers for RTP, and the use of the next odd port number for the associated RTCP session. [16] :68 A single port can be used for RTP and RTCP in applications that multiplex the protocols. [17]

RTP is used by real-time multimedia applications such as voice over IP, audio over IP, WebRTC and Internet Protocol television.

Profiles and payload formats

RTP is designed to carry a multitude of multimedia formats, which permits the development of new formats without revising the RTP standard. To this end, the information required by a specific application of the protocol is not included in the generic RTP header. For each class of application (e.g., audio, video), RTP defines a profile and associated payload formats. [10] Every instantiation of RTP in a particular application requires a profile and payload format specifications. [16] :71

The profile defines the codecs used to encode the payload data and their mapping to payload format codes in the protocol field Payload Type (PT) of the RTP header. Each profile is accompanied by several payload format specifications, each of which describes the transport of particular encoded data. [4] Examples of audio payload formats are G.711, G.723, G.726, G.729, GSM, QCELP, MP3, and DTMF, and examples of video payloads are H.261, H.263, H.264, H.265 and MPEG-1/MPEG-2. [18] The mapping of MPEG-4 audio/video streams to RTP packets is specified in RFC   3016, and H.263 video payloads are described in RFC   2429. [19]

Examples of RTP profiles include:

Packet header

RTP packets are created at the application layer and handed to the transport layer for delivery. Each unit of RTP media data created by an application begins with the RTP packet header.

RTP packet header
OffsetsOctet0123
OctetBit [lower-alpha 1] 012345678910111213141516171819202122232425262728293031
00VersionPXCCMPTSequence number
432Timestamp
864SSRC identifier
1296CSRC identifiers
...
12+4×CC96+32×CCProfile-specific extension header IDExtension header length
16+4×CC128+32×CCExtension header
...

The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application. [22] The fields in the header are as follows:

Application design

A functional multimedia application requires other protocols and standards used in conjunction with RTP. Protocols such as SIP, Jingle, RTSP, H.225 and H.245 are used for session initiation, control and termination. Other standards, such as H.264, MPEG and H.263, are used for encoding the payload data as specified by the applicable RTP profile. [26]

An RTP sender captures the multimedia data, then encodes, frames and transmits it as RTP packets with appropriate timestamps and increasing timestamps and sequence numbers. The sender sets the payload type field in accordance with connection negotiation and the RTP profile in use. The RTP receiver detects missing packets and may reorder packets. It decodes the media data in the packets according to the payload type and presents the stream to its user. [26]

Standards documents

See also

Notes

  1. Bits are ordered most significant to least significant; bit offset 0 is the most significant bit of the first octet. Octets are transmitted in network order. Bit transmission order is medium dependent.
  2. RFC   7273 provides a means for signalling the relationship between media clocks of different streams.

Related Research Articles

H.263 is a video compression standard originally designed as a low-bit-rate compressed format for videotelephony. It was standardized by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996. It is a member of the H.26x family of video coding standards in the domain of the ITU-T.

The Internet Control Message Protocol (ICMP) is a supporting protocol in the Internet protocol suite. It is used by network devices, including routers, to send error messages and operational information indicating success or failure when communicating with another IP address, for example, an error is indicated when a requested service is not available or that a host or router could not be reached. ICMP differs from transport protocols such as TCP and UDP in that it is not typically used to exchange data between systems, nor is it regularly employed by end-user network applications.

The Real-Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams over a suitable transport protocol. RTSP is used in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between endpoints. Clients of media servers issue commands such as play, record and pause, to facilitate real-time control of the media streaming from the server to a client or from a client to the server.

The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telephone systems, as well as mobile phone calling over LTE (VoLTE).

The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) and video conferencing. SDP does not deliver any media streams itself but is used between endpoints for negotiation of network metrics, media types, and other associated properties. The set of properties and parameters is called a session profile.

The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the Transport Layer of the TCP/IP suite. SSL/TLS often runs on top of TCP.

In telecommunications and computer networking, a network packet is a formatted unit of data carried by a packet-switched network. A packet consists of control information and user data; the latter is also known as the payload. Control information provides data for delivering the payload. Typically, control information is found in packet headers and trailers.

Digital storage media command and control (DSM-CC) is a toolkit for developing control channels associated with MPEG-1 and MPEG-2 streams. It is defined in part 6 of the MPEG-2 standard and uses a client/server model connected via an underlying network.

The RTP Control Protocol (RTCP) is a binary-encoded out-of-band signaling protocol that functions alongside the Real-time Transport Protocol (RTP). Its basic functionality and packet structure is defined in RFC 3550. RTCP provides statistics and control information for an RTP session. It partners with RTP in the delivery and packaging of multimedia data but does not transport any media data itself.

MPEG transport stream or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast systems such as DVB, ATSC and IPTV.

The Secure Real-time Transport Protocol (SRTP) is a profile for Real-time Transport Protocol (RTP) intended to provide encryption, message authentication and integrity, and replay attack protection to the RTP data in both unicast and multicast applications. It was developed by a small team of Internet Protocol and cryptographic experts from Cisco and Ericsson. It was first published by the IETF in March 2004 as RFC 3711.

Text over IP is a means of providing a real-time text (RTT) service that operates over IP-based networks. It complements Voice over IP (VoIP) and Video over IP.

UDP-Lite is a connectionless protocol that allows a potentially damaged data payload to be delivered to an application rather than being discarded by the receiving station. This is useful as it allows decisions about the integrity of the data to be made in the application layer, where the significance of the bits is understood. UDP-Lite is described in RFC 3828.

Audio-to-video synchronization refers to the relative timing of audio (sound) and video (image) parts during creation, post-production (mixing), transmission, reception and play-back processing. AV synchronization can be an issue in television, videoconferencing, or film.

ATSC-M/H is a U.S. standard for mobile digital TV that allows TV broadcasts to be received by mobile devices.

The Real-time Transport Protocol (RTP) specifies a general-purpose data format and network protocol for transmitting digital media streams on Internet Protocol (IP) networks. The details of media encoding, such as signal sampling rate, frame size and timing, are specified in an RTP payload format. The format parameters of the RTP payload are typically communicated between transmission endpoints with the Session Description Protocol (SDP), but other protocols, such as the Extensible Messaging and Presence Protocol (XMPP) may be used.

Ravenna is a technology for real-time transport of audio and other media data over IP networks. Ravenna was introduced on September 10, 2010 at the International Broadcasting Convention in Amsterdam. Ravenna can operate on most existing network infrastructures using standard networking technology. Performance and capacity scale with network performance. Ravenna is designed to match broadcasters' requirements for low latency, full signal transparency and high reliability. Fields of application include in-house signal distribution for broadcasting houses and other fixed installations, flexible setups at venues and live events, outside broadcasting support, and inter-studio links across wide area network links and production facilities.

RTP-MIDI is a protocol to transport MIDI messages within Real-time Transport Protocol (RTP) packets over Ethernet and WiFi networks. It is completely open and free, and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.

NACK-Oriented Reliable Multicast (NORM) is a transport layer Internet protocol designed to provide reliable transport in multicast groups in data networks. It is formally defined by the Internet Engineering Task Force (IETF) in Request for Comments (RFC) 5740, which was published in November 2009.

AES67 is a technical standard for audio over IP and audio over Ethernet (AoE) interoperability. The standard was developed by the Audio Engineering Society and first published in September 2013. It is a layer 3 protocol suite based on existing standards and is designed to allow interoperability between various IP-based audio networking systems such as RAVENNA, Livewire, Q-LAN and Dante.

References

  1. 1 2 Perkins 2003, p. 6.
  2. Wright, Gavin. "What is the Real-time Transport Protocol (RTP)?". TechTarget. Retrieved 2022-11-10.
  3. 1 2 Daniel Hardy (2002). Network. De Boeck Université. p.  298.
  4. 1 2 3 Perkins 2003 , p. 55
  5. 1 2 Perkins 2003 , p. 46
  6. RFC   4571
  7. Farrel, Adrian (2004). The Internet and its protocols. Morgan Kaufmann. p. 363. ISBN   978-1-55860-913-6.
  8. Ozaktas, Haldun M.; Levent Onural (2007). THREE-DIMENSIONAL TELEVISION. Springer. p. 356. ISBN   978-3-540-72531-2.
  9. Hogg, Scott. "What About Stream Control Transmission Protocol (SCTP)?". Network World. Retrieved 2017-10-04.
  10. 1 2 Larry L. Peterson (2007). Computer Networks. Morgan Kaufmann. p.  430. ISBN   978-1-55860-832-0.
  11. 1 2 Perkins 2003 , p. 56
  12. Peterson & Davie 2007 , p. 435
  13. RFC   4566: SDP: Session Description Protocol, M. Handley, V. Jacobson, C. Perkins, IETF (July 2006)
  14. Zurawski, Richard (2004). "RTP, RTCP and RTSP protocols". The industrial information technology handbook. CRC Press. pp.  28–7. ISBN   978-0-8493-1985-3.
  15. Collins, Daniel (2002). "Transporting Voice by using IP". Carrier grade voice over IP. McGraw-Hill Professional. pp.  47. ISBN   978-0-07-136326-6.
  16. 1 2 3 4 5 6 7 8 9 RFC   3550
  17. Multiplexing RTP Data and Control Packets on a Single Port. IETF. April 2010. doi: 10.17487/RFC5761 . RFC 5761 . Retrieved November 21, 2015.
  18. Perkins 2003 , p. 60
  19. Chou, Philip A.; Mihaela van der Schaar (2007). Multimedia over IP and wireless networks. Academic Press. pp.  514. ISBN   978-0-12-088480-3.
  20. Perkins 2003 , p. 367
  21. Breese, Finley (2010). Serial Communication over RTP/CDP. BoD - Books on Demand. pp.  . ISBN   978-3-8391-8460-8.
  22. Peterson & Davie 2007 , p. 430
  23. 1 2 3 Peterson & Davie 2007 , p. 431
  24. Perkins 2003 , p. 59
  25. Peterson, p.432
  26. 1 2 Perkins 2003 , pp. 11–13

Further reading