RTP payload formats

Last updated

The Real-time Transport Protocol (RTP) specifies a general-purpose data format and network protocol for transmitting digital media streams on Internet Protocol (IP) networks. The details of media encoding, such as signal sampling rate, frame size and timing, are specified in an RTP payload format. The format parameters of the RTP payload are typically communicated between transmission endpoints with the Session Description Protocol (SDP), but other protocols, such as the Extensible Messaging and Presence Protocol (XMPP) may be used.

Contents

Audio and video payload types

RFC 3551, entitled RTP Profile for Audio and Video (RTP/AVP), specifies the technical parameters of payload formats for audio and video streams.

The standard also describes the process of registering new payload types with IANA; additional payload formats and payload types are defined in the following specifications:

Payload identifiers 96–127 are used for payloads defined dynamically during a session. It is recommended to dynamically assign port numbers, although port numbers 5004 and 5005 have been registered for use of the profile when a dynamically assigned port is not required.

Applications should always support PCMU (payload type 0); previously, DVI4 (payload type 5) was also recommended, but this was removed in 2013 by RFC 7007.

Payload type (PT)NameTypeNo. of channelsClock rate (Hz) [note 1] Frame size (byte)Default packet interval (ms)DescriptionReferences
0PCMUaudio18000any20ITU-T G.711 PCM μ-Law audio 64 kbit/s RFC 3551
1reserved (previously FS-1016 CELP)audio18000reserved, previously FS-1016 CELP audio 4.8 kbit/s RFC 3551, previously RFC 1890
2reserved (previously G721 or G726-32)audio18000reserved, previously ITU-T G.721 ADPCM audio 32 kbit/s or ITU-T G.726 audio 32 kbit/s RFC 3551, previously RFC 1890
3GSMaudio180002020European GSM Full Rate audio 13 kbit/s (GSM 06.10) RFC 3551
4G723audio180003030ITU-T G.723.1 audio RFC 3551
5DVI4audio18000any20 IMA ADPCM audio 32 kbit/s RFC 3551
6DVI4audio116000any20 IMA ADPCM audio 64 kbit/s RFC 3551
7LPCaudio18000any20Experimental Linear Predictive Coding audio 5.6 kbit/s RFC 3551
8PCMAaudio18000any20ITU-T G.711 PCM A-Law audio 64 kbit/s RFC 3551
9G722audio18000 [note 2] any20ITU-T G.722 audio 64 kbit/s RFC 3551 - Page 14
10L16audio244100any20 Linear PCM 16-bit Stereo audio 1411.2 kbit/s, [2] [3] [4] uncompressed RFC 3551, Page 27
11L16audio144100any20 Linear PCM 16-bit audio 705.6 kbit/s, uncompressed RFC 3551, Page 27
12QCELPaudio180002020 Qualcomm Code Excited Linear Prediction RFC 2658, RFC 3551
13CNaudio18000 Comfort noise. Payload type used with audio codecs that do not support comfort noise as part of the codec itself such as G.711, G.722.1, G.722, G.726, G.727, G.728, GSM 06.10, Siren, and RTAudio. RFC 3389
14MPAaudio1, 2900008–72 MPEG-1 or MPEG-2 audio only RFC 3551, RFC 2250
15G728audio180002.520ITU-T G.728 audio 16 kbit/s RFC 3551
16DVI4audio111025any20 IMA ADPCM audio 44.1 kbit/s RFC 3551
17DVI4audio122050any20IMA ADPCM audio 88.2 kbit/s RFC 3551
18G729audio180001020ITU-T G.729 and G.729a audio 8 kbit/s; Annex B is implied unless the annexb=no parameter is used RFC 3551, Page 20, RFC 3555, Page 15
19reserved (previously CN)audioreserved, previously comfort noise RFC 3551
25CELLBvideo90000 Sun CellB video [5] RFC 2029
26JPEGvideo90000 JPEG video RFC 2435
28nvvideo90000 Xerox PARC's Network Video (nv) [6] [7] RFC 3551, Page 32
31H261video90000ITU-T H.261 video RFC 4587
32MPVvideo90000MPEG-1 and MPEG-2 video RFC 2250
33MP2Taudio/video90000MPEG-2 transport stream RFC 2250
34H263video90000 H.263 video, first version (1996) RFC 3551, RFC 2190
7276reservedreserved because RTCP packet types 200204 would otherwise be indistinguishable from RTP payload types 7276 with the marker bit set RFC 3550, RFC 3551
7795unassignednote that RTCP packet type 207 (XR, Extended Reports) would be indistinguishable from RTP payload types 79 with the marker bit set RFC 3551, RFC 3611
dynamicH263-1998video90000 H.263 video, second version (1998) RFC 3551, RFC 4629, RFC 2190
dynamicH263-2000video90000 H.263 video, third version (2000) RFC 4629
dynamic (or profile)H264 AVCvideo90000 H.264 video (MPEG-4 Part 10) RFC 6184, previously RFC 3984
dynamic (or profile)H264 SVCvideo90000 H.264 video RFC 6190
dynamic (or profile)H265video90000 H.265 video (HEVC) RFC 7798
dynamic (or profile)theoravideo90000 Theora video draft-barbato-avt-rtp-theora
dynamiciLBCaudio1800020, 3020, 30 Internet low Bitrate Codec 13.33 or 15.2 kbit/s RFC 3952
dynamicPCMA-WBaudio1160005ITU-T G.711.1 A-law RFC 5391
dynamicPCMU-WBaudio1160005ITU-T G.711.1 μ-law RFC 5391
dynamicG718audio32000 (placeholder)20ITU-T G.718 draft-ietf-payload-rtp-g718
dynamicG719audio(various)4800020ITU-T G.719 RFC 5404
dynamicG7221audio16000, 3200020ITU-T G.722.1 and G.722.1 Annex C RFC 5577
dynamicG726-16audio18000any20ITU-T G.726 audio 16 kbit/s RFC 3551
dynamicG726-24audio18000any20ITU-T G.726 audio 24 kbit/s RFC 3551
dynamicG726-32audio18000any20ITU-T G.726 audio 32 kbit/s RFC 3551
dynamicG726-40audio18000any20ITU-T G.726 audio 40 kbit/s RFC 3551
dynamicG729Daudio180001020ITU-T G.729 Annex D RFC 3551
dynamicG729Eaudio180001020ITU-T G.729 Annex E RFC 3551
dynamicG7291audio1600020ITU-T G.729.1 RFC 4749
dynamicGSM-EFRaudio180002020ITU-T GSM-EFR (GSM 06.60) RFC 3551
dynamicGSM-HR-08audio1800020ITU-T GSM-HR (GSM 06.20) RFC 5993
dynamic (or profile)AMRaudio(various)800020 Adaptive Multi-Rate audio RFC 4867
dynamic (or profile)AMR-WBaudio(various)1600020 Adaptive Multi-Rate Wideband audio (ITU-T G.722.2) RFC 4867
dynamic (or profile)AMR-WB+audio1, 2 or omit7200013.3–40 Extended Adaptive Multi Rate – WideBand audio RFC 4352
dynamic (or profile)vorbisaudio(various)(various) Vorbis audio RFC 5215
dynamic (or profile)opusaudio1, 248000 [note 3] 2.5–6020 Opus audio RFC 7587
dynamic (or profile)speexaudio18000, 16000, 3200020 Speex audio RFC 5574
dynamicmpa-robustaudio1, 29000024–72Loss-Tolerant MP3 audio RFC 5219 (previously RFC 3119)
dynamic (or profile)MP4A-LATMaudio90000 or others MPEG-4 Audio (includes AAC) RFC 6416 (previously RFC 3016)
dynamic (or profile)MP4V-ESvideo90000 or others MPEG-4 Visual RFC 6416 (previously RFC 3016)
dynamic (or profile)mpeg4-genericaudio/video90000 or other MPEG-4 Elementary Streams RFC 3640
dynamicVP8video90000 VP8 video RFC 7741
dynamicVP9video90000 VP9 video draft-ietf-payload-vp9
dynamicL8audio(various)(various)any20 Linear PCM 8-bit audio with 128 offset RFC 3551 Section 4.5.10 and Table 5
dynamicDAT12audio(various)(various)any20 (by analogy with L16)IEC 61119 12-bit nonlinear audio RFC 3190 Section 3
dynamicL16audio(various)(various)any20 Linear PCM 16-bit audio RFC 3551 Section 4.5.11, RFC 2586
dynamicL20audio(various)(various)any20 (by analogy with L16) Linear PCM 20-bit audio RFC 3190 Section 4
dynamicL24audio(various)(various)any20 (by analogy with L16) Linear PCM 24-bit audio RFC 3190 Section 4
dynamicrawvideo90000Uncompressed Video RFC 4175
dynamicac3audio(various)32000, 44100, 48000 Dolby AC-3 audio RFC 4184
dynamiceac3audio(various)32000, 44100, 48000 Enhanced AC-3 audio RFC 4598
dynamict140text1000 Text over IP RFC 4103
dynamicEVRC
EVRC0
EVRC1
audio8000 EVRC audio RFC 4788
dynamicEVRCB
EVRCB0
EVRCB1
audio8000 EVRC-B audio RFC 4788
dynamicEVRCWB
EVRCWB0
EVRCWB1
audio16000 EVRC-WB audio RFC 5188
dynamicjpeg2000video90000 JPEG 2000 video RFC 5371
dynamicUEMCLIPaudio8000, 16000 UEMCLIP audio RFC 5686
dynamicATRAC3audio44100 ATRAC3 audio RFC 5584
dynamicATRAC-Xaudio44100, 48000 ATRAC3+ audio RFC 5584
dynamicATRAC-ADVANCED-LOSSLESSaudio(various) ATRAC Advanced Lossless audio RFC 5584
dynamicDVvideo90000 DV video RFC 6469 (previously RFC 3189)
dynamicBT656video ITU-R BT.656 video RFC 3555
dynamicBMPEGvideoBundled MPEG-2 video RFC 2343
dynamicSMPTE292Mvideo SMPTE 292M video RFC 3497
dynamicREDaudioRedundant Audio Data RFC 2198
dynamicVDVIaudioVariable-rate DVI4 audio RFC 3551
dynamicMP1SvideoMPEG-1 Systems Streams video RFC 2250
dynamicMP2PvideoMPEG-2 Program Streams video RFC 2250
dynamictoneaudio8000 (default)tone RFC 4733
dynamictelephone-eventaudio8000 (default) DTMF tone RFC 4733
dynamicaptxaudio2 6(equal to sampling rate)4000 ÷ sample rate4 [note 4] aptX audio RFC 7310
dynamicjxsvvideo90000 JPEG XS video RFC 9134
  1. The "clock rate" is the rate at which the timestamp in the RTP header is incremented, which need not be the same as the codec's sampling rate. For instance, video codecs typically use a clock rate of 90000 so their frames can be more precisely aligned with the RTCP NTP timestamp, even though video sampling rates are typically in the range of 160 samples per second.
  2. Although the sampling rate for G.722 is 16000, its clock rate is 8000 to remain backwards compatible with RFC 1890, which incorrectly used this value. [1]
  3. Because Opus can change sampling rates dynamically, its clock rate is fixed at 48000, even when the codec will be operated at a lower sampling rate. The maxplaybackrate and sprop-maxcapturerate parameters in SDP can be used to indicate hints/preferences about the maximum sampling rate to encode/decode.
  4. For aptX, the packetization interval must be rounded down to the nearest packet interval that can contain an integer number of samples. So at sampling rates of 11025, 22050, or 44100, a packetization rate of "4" is rounded down to 3.99.

Text messaging payload

MIDI payload

See also

Related Research Articles

H.263 is a video compression standard originally designed as a low-bit-rate compressed format for videotelephony. It was standardized by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996. It is a member of the H.26x family of video coding standards in the domain of the ITU-T.

The Real Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams over a suitable transport protocol. RTSP is used in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between endpoints. Clients of media servers issue commands such as play, record and pause, to facilitate real-time control of the media streaming from the server to a client or from a client to the server.

The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.

The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telephone systems, as well as mobile phone calling over LTE (VoLTE).

The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) and video conferencing. SDP does not deliver any media streams itself but is used between endpoints for negotiation of network metrics, media types, and other associated properties. The set of properties and parameters is called a session profile.

MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.

The RTP Control Protocol (RTCP) is a sister protocol of the Real-time Transport Protocol (RTP). Its basic functionality and packet structure is defined in RFC 3550. RTCP provides out-of-band statistics and control information for an RTP session. It partners with RTP in the delivery and packaging of multimedia data, but does not transport any media data itself.

<span class="mw-page-title-main">G.729</span> ITU-T Recommendation

G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.

<span class="mw-page-title-main">G.722</span> ITU-T recommendation

G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.

<span class="mw-page-title-main">G.726</span> ITU-T Recommendation

G.726 is an ITU-T ADPCM speech codec standard covering the transmission of voice at rates of 16, 24, 32, and 40 kbit/s. It was introduced to supersede both G.721, which covered ADPCM at 32 kbit/s, and G.723, which described ADPCM for 24 and 40 kbit/s. G.726 also introduced a new 16 kbit/s rate. The four bit rates associated with G.726 are often referred to by the bit size of a sample, which are 2, 3, 4, and 5-bits respectively. The corresponding wide-band codec based on the same technology is G.722.

Extended Adaptive Multi-Rate – Wideband (AMR-WB+) is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding. Automatic switching between transform coding and ACELP provides both good speech and audio quality with moderate bit rates.

MPEG transport stream or simply transport stream (TS) is a standard digital container format for transmission and storage of audio, video, and Program and System Information Protocol (PSIP) data. It is used in broadcast systems such as DVB, ATSC and IPTV.

Flash Video is a container file format used to deliver digital video content over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There are two different Flash Video file formats: FLV and F4V. The audio and video data within FLV files are encoded in the same way as SWF files. The F4V file format is based on the ISO base media file format, starting with Flash Player 9 update 3. Both formats are supported in Adobe Flash Player and developed by Adobe Systems. FLV was originally developed by Macromedia. In the early 2000s, Flash Video was the de facto standard for web-based streaming video. Users include Hulu, VEVO, Yahoo! Video, metacafe, Reuters.com, and many other news providers.

Real-Time Messaging Protocol (RTMP) is a communication protocol for streaming audio, video, and data over the Internet. Originally developed as a proprietary protocol by Macromedia for streaming between Flash Player and the Flash Communication Server, Adobe has released an incomplete version of the specification of the protocol for public use.

Internet Low Bitrate Codec (iLBC) is a royalty-free narrowband speech audio coding format and an open-source reference implementation (codec), developed by Global IP Solutions (GIPS) formerly Global IP Sound. It was formerly freeware with limitations on commercial use, but since 2011 it is available under a free software/open source license as a part of the open source WebRTC project. It is suitable for VoIP applications, streaming audio, archival and messaging. The algorithm is a version of block-independent linear predictive coding, with the choice of data frame lengths of 20 and 30 milliseconds. The encoded blocks have to be encapsulated in a suitable protocol for transport, usually the Real-time Transport Protocol (RTP).

UDP-Lite is a connectionless protocol that allows a potentially damaged data payload to be delivered to an application rather than being discarded by the receiving station. This is useful as it allows decisions about the integrity of the data to be made in the application layer, where the significance of the bits is understood. UDP-Lite is described in RFC 3828.

Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

RTP-MIDI is a protocol to transport MIDI messages within RTP packets over Ethernet and WiFi networks. It is completely open and free, and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.

AES67 is a technical standard for audio over IP and audio over Ethernet (AoE) interoperability. The standard was developed by the Audio Engineering Society and first published in September 2013. It is a layer 3 protocol suite based on existing standards and is designed to allow interoperability between various IP-based audio networking systems such as RAVENNA, Livewire, Q-LAN and Dante.

JPEG XS is an interoperable, visually lossless, low-latency and lightweight image and video coding system that targets mezzanine compression within any AV application. Applications of the standard include streaming high quality content for virtual reality, drones, autonomous vehicles using cameras, gaming, and broadcasting. In this respect, JPEG XS is unique, being the first ISO codec ever designed for this specific purpose. JPEG XS, built on core technology from both intoPIX and Fraunhofer IIS, is formally standardized as ISO/IEC 21122 by the Joint Photographic Experts Group with the first edition published in 2019. Although not official, the XS acronym was chosen to highlight the eXtra Small and eXtra Speed characteristics of the codec. Today, the JPEG committee is still actively working on further improvements to XS, with the second edition scheduled for publication and initial efforts being launched towards a third edition.

References

  1. RFC 3551, RTP Profile for Audio and Video Conferences with Minimal Control, H. Schulzrinne, S. Casner, The Internet Society (July 2003).
  2. "RFC 2586 - The Audio/L16 MIME content type". May 1999. Retrieved 2010-03-16.
  3. "RFC 3108 - Conventions for the use of the Session Description Protocol (SDP) for ATM Bearer Connections". May 2001. Retrieved 2010-03-16.
  4. "RFC 4856 - Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences - Registration of Media Type audio/L16". March 2007. Retrieved 2010-03-16.
  5. XIL Programmer's Guide, Chapter 22 "CellB Codec". August 1997. Retrieved on 2014-07-19.
  6. nv - network video on Henning Schulzrinne's website, Network Video on The University of Toronto's website, Retrieved on 2009-07-09.
  7. Ron Frederick Github with source code