Communication protocol | |
Abbreviation | RTP |
---|---|
Purpose | Delivering audio and video |
Developer(s) | Audio-Video Transport Working Group of the IETF |
Introduction | January 1996 |
Based on | Network Voice Protocol [1] |
RFC(s) | RFC 1889, 3550, 3551 |
Internet protocol suite |
---|
Application layer |
Transport layer |
Internet layer |
Link layer |
The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.
RTP typically runs over User Datagram Protocol (UDP). RTP is used in conjunction with the RTP Control Protocol (RTCP). While RTP carries the media streams (e.g., audio and video), RTCP is used to monitor transmission statistics and quality of service (QoS) and aids synchronization of multiple streams. RTP is one of the technical foundations of Voice over IP and in this context is often used in conjunction with a signaling protocol such as the Session Initiation Protocol (SIP) which establishes connections across the network.
RTP was developed by the Audio-Video Transport Working Group of the Internet Engineering Task Force (IETF) and first published in 1996 as RFC 1889 which was then superseded by RFC 3550 in 2003. [2]
Research on audio and video over packet-switched networks dates back to the early 1970s. The Internet Engineering Task Force (IETF) published RFC 741 in 1977 and began developing RTP in 1992, [1] and would go on to develop Session Announcement Protocol (SAP), the Session Description Protocol (SDP), and the Session Initiation Protocol (SIP).
RTP is designed for end-to-end, real-time transfer of streaming media. The protocol provides facilities for jitter compensation and detection of packet loss and out-of-order delivery, which are common, especially during UDP transmissions on an IP network. RTP allows data transfer to multiple destinations through IP multicast. [3] RTP is regarded as the primary standard for audio/video transport in IP networks and is used with an associated profile and payload format. [4] The design of RTP is based on the architectural principle known as application-layer framing where protocol functions are implemented in the application as opposed to the operating system's protocol stack.
Real-time multimedia streaming applications require timely delivery of information and often can tolerate some packet loss to achieve this goal. For example, loss of a packet in an audio application may result in loss of a fraction of a second of audio data, which can be made unnoticeable with suitable error concealment algorithms. [5] The Transmission Control Protocol (TCP), although standardized for RTP use, [6] is not normally used in RTP applications because TCP favors reliability over timeliness. Instead, the majority of the RTP implementations are built on the User Datagram Protocol (UDP). [5] Other transport protocols specifically designed for multimedia sessions are SCTP [7] and DCCP, [8] although, as of 2012 [update] , they were not in widespread use. [9]
RTP was developed by the Audio/Video Transport working group of the IETF standards organization. RTP is used in conjunction with other protocols such as H.323 and RTSP. [4] The RTP specification describes two protocols: RTP and RTCP. RTP is used for the transfer of multimedia data, and the RTCP is used to periodically send control information and QoS parameters. [10]
The data transfer protocol, RTP, carries real-time data. Information provided by this protocol includes timestamps (for synchronization), sequence numbers (for packet loss and reordering detection) and the payload format which indicates the encoded format of the data. [11] The control protocol, RTCP, is used for quality of service (QoS) feedback and synchronization between the media streams. The bandwidth of RTCP traffic compared to RTP is small, typically around 5%. [11] [12]
RTP sessions are typically initiated between communicating peers using a signaling protocol, such as H.323, the Session Initiation Protocol (SIP), RTSP, or Jingle (XMPP). These protocols may use the Session Description Protocol to specify the parameters for the sessions. [13]
An RTP session is established for each multimedia stream. Audio and video streams may use separate RTP sessions, enabling a receiver to selectively receive components of a particular stream. [14] The RTP and RTCP design is independent of the transport protocol. Applications most typically use UDP with port numbers in the unprivileged range (1024 to 65535). [15] The Stream Control Transmission Protocol (SCTP) and the Datagram Congestion Control Protocol (DCCP) may be used when a reliable transport protocol is desired. The RTP specification recommends even port numbers for RTP and the use of the next odd port number for the associated RTCP session. [16] : 68 A single port can be used for RTP and RTCP in applications that multiplex the protocols. [17]
RTP is used by real-time multimedia applications such as voice over IP, audio over IP, WebRTC, Internet Protocol television, and professional video over IP including SMPTE 2022 and SMPTE 2110.
RTP is designed to carry a multitude of multimedia formats, which permits the development of new formats without revising the RTP standard. To this end, the information required by a specific application of the protocol is not included in the generic RTP header. For each class of application (e.g., audio, video), RTP defines a profile and associated payload formats. [10] Every instantiation of RTP in a particular application requires a profile and payload format specifications. [16] : 71
The profile defines the codecs used to encode the payload data and their mapping to payload format codes in the protocol field Payload Type (PT) of the RTP header. Each profile is accompanied by several payload format specifications, each of which describes the transport of particular encoded data. [4] Examples of audio payload formats are G.711, G.723, G.726, G.729, GSM, QCELP, MP3, and DTMF, and examples of video payloads are H.261, H.263, H.264, H.265 and MPEG-1/MPEG-2. [18] The mapping of MPEG-4 audio/video streams to RTP packets is specified in RFC 3016, and H.263 video payloads are described in RFC 2429. [19]
Examples of RTP profiles include:
RTP packets are created at the application layer and handed to the transport layer for delivery. Each unit of RTP media data created by an application begins with the RTP packet header.
Offsets | Octet | 0 | 1 | 2 | 3 | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Octet | Bit [a] | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | ||
0 | 0 | Version | P | X | CC | M | PT | Sequence number | |||||||||||||||||||||||||||
4 | 32 | Timestamp | |||||||||||||||||||||||||||||||||
8 | 64 | SSRC identifier | |||||||||||||||||||||||||||||||||
12 | 96 | CSRC identifiers ... | |||||||||||||||||||||||||||||||||
12+4×CC | 96+32×CC | Profile-specific extension header ID | Extension header length | ||||||||||||||||||||||||||||||||
16+4×CC | 128+32×CC | Extension header ... |
The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application. [22] The fields in the header are as follows:
A functional multimedia application requires other protocols and standards used in conjunction with RTP. Protocols such as SIP, Jingle, RTSP, H.225 and H.245 are used for session initiation, control and termination. Other standards, such as H.264, MPEG and H.263, are used for encoding the payload data as specified by the applicable RTP profile. [26]
An RTP sender captures the multimedia data, then encodes, frames and transmits it as RTP packets with appropriate timestamps and increasing timestamps and sequence numbers. The sender sets the payload type field in accordance with connection negotiation and the RTP profile in use. The RTP receiver detects missing packets and may reorder packets. It decodes the media data in the packets according to the payload type and presents the stream to its user. [26]
H.263 is a video compression standard originally designed as a low-bit-rate compressed format for videotelephony. It was standardized by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996. It is a member of the H.26x family of video coding standards in the domain of the ITU-T.
The Internet Control Message Protocol (ICMP) is a supporting protocol in the Internet protocol suite. It is used by network devices, including routers, to send error messages and operational information indicating success or failure when communicating with another IP address. For example, an error is indicated when a requested service is not available or that a host or router could not be reached. ICMP differs from transport protocols such as TCP and UDP in that it is not typically used to exchange data between systems, nor is it regularly employed by end-user network applications.
The Real-Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams over a suitable transport protocol. RTSP is used in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between endpoints. Clients of media servers issue commands such as play, record and pause, to facilitate real-time control of the media streaming from the server to a client or from a client to the server.
The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telephone systems, as well as mobile phone calling over LTE (VoLTE).
The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) and video conferencing. SDP does not deliver any media streams itself but is used between endpoints for negotiation of network metrics, media types, and other associated properties. The set of properties and parameters is called a session profile.
The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the Transport layer of the TCP/IP suite. SSL/TLS often runs on top of TCP.
In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages to other hosts on an Internet Protocol (IP) network. Within an IP network, UDP does not require prior communication to set up communication channels or data paths.
In telecommunications and computer networking, a network packet is a formatted unit of data carried by a packet-switched network. A packet consists of control information and user data; the latter is also known as the payload. Control information provides data for delivering the payload. Typically, control information is found in packet headers and trailers.
Digital storage media command and control (DSM-CC) is a toolkit for developing control channels associated with MPEG-1 and MPEG-2 streams. It is defined in part 6 of the MPEG-2 standard and uses a client/server model connected via an underlying network.
The RTP Control Protocol (RTCP) is a binary-encoded out-of-band signaling protocol that functions alongside the Real-time Transport Protocol (RTP). Its basic functionality and packet structure is defined in RFC 3550. RTCP provides statistics and control information for an RTP session. It partners with RTP in the delivery and packaging of multimedia data but does not transport any media data itself.
MPEG transport stream or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast systems such as DVB, ATSC and IPTV.
The Secure Real-time Transport Protocol (SRTP) is a profile for Real-time Transport Protocol (RTP) intended to provide encryption, message authentication and integrity, and replay attack protection to the RTP data in both unicast and multicast applications. It was developed by a small team of Internet Protocol and cryptographic experts from Cisco and Ericsson. It was first published by the IETF in March 2004 as RFC 3711.
Text over IP is a means of providing a real-time text (RTT) service that operates over IP-based networks. It complements Voice over IP (VoIP) and Video over IP.
UDP-Lite is a connectionless protocol that allows a potentially damaged data payload to be delivered to an application rather than being discarded by the receiving station. This is useful as it allows decisions about the integrity of the data to be made in the application layer, where the significance of the bits is understood. UDP-Lite is described in RFC 3828.
Video Share is an IP Multimedia System (IMS) enabled service for mobile networks that allows users engaged in a circuit switch voice call to add a unidirectional video streaming session over the packet network during the voice call. Any of the parties on the voice call can initiate a video streaming session. There can be multiple video streaming sessions during a voice call, and each of these streaming sessions can be initiated by any of the parties on the voice call. The video source can either be the camera on the phone or a pre-recorded video clip.
The Real-time Transport Protocol (RTP) specifies a general-purpose data format and network protocol for transmitting digital media streams on Internet Protocol (IP) networks. The details of media encoding, such as signal sampling rate, frame size and timing, are specified in an RTP payload format. The format parameters of the RTP payload are typically communicated between transmission endpoints with the Session Description Protocol (SDP), but other protocols, such as the Extensible Messaging and Presence Protocol (XMPP) may be used.
Ravenna is a technology for real-time transport of audio and other media data over IP networks. Ravenna was introduced on September 10, 2010 at the International Broadcasting Convention in Amsterdam.
RTP-MIDI is a protocol to transport MIDI messages within Real-time Transport Protocol (RTP) packets over Ethernet and WiFi networks. It is completely open and free, and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.
NACK-Oriented Reliable Multicast (NORM) is a transport layer Internet protocol designed to provide reliable transport in multicast groups in data networks. It is formally defined by the Internet Engineering Task Force (IETF) in Request for Comments (RFC) 5740, which was published in November 2009.
AES67 is a technical standard for audio over IP and audio over Ethernet (AoE) interoperability. The standard was developed by the Audio Engineering Society and first published in September 2013. It is a layer 3 protocol suite based on existing standards and is designed to allow interoperability between various IP-based audio networking systems such as RAVENNA, Wheatnet, Livewire, Q-LAN and Dante.