Real-Time Messaging Protocol (RTMP) is a communication protocol for streaming audio, video, and data over the Internet. Originally developed as a proprietary protocol by Macromedia for streaming between Flash Player and the Flash Communication Server, Adobe (which acquired Macromedia) has released an incomplete version of the specification of the protocol for public use.
The RTMP protocol has multiple variations:
While the primary motivation for RTMP was to be a protocol for playing Flash video, it is also used in some other applications, such as the Adobe LiveCycle Data Services ES.
RTMP is a TCP-based protocol which maintains persistent connections and allows low-latency communication. To deliver streams smoothly and transmit as much information as possible, it splits streams into fragments, and their size is negotiated dynamically between the client and server. Sometimes, it is kept unchanged; the default fragment sizes are 64 bytes for audio data, and 128 bytes for video data and most other data types. Fragments from different streams may then be interleaved, and multiplexed over a single connection. With longer data chunks, the protocol thus carries only a one-byte header per fragment, so incurring very little overhead. However, in practice, individual fragments are not typically interleaved. Instead, the interleaving and multiplexing is done at the packet level, with RTMP packets across several different active channels being interleaved in such a way as to ensure that each channel meets its bandwidth, latency, and other quality-of-service requirements. Packets interleaved in this fashion are treated as indivisible, and are not interleaved on the fragment level.
The RTMP defines several virtual channels on which packets may be sent and received, and which operate independently of each other. For example, there is a channel for handling RPC requests and responses, a channel for video stream data, a channel for audio stream data, a channel for out-of-band control messages (fragment size negotiation, etc.), and so on. During a typical RTMP session, several channels may be active simultaneously at any given time. When RTMP data is encoded, a packet header is generated. The packet header specifies, amongst other matters, the ID of the channel on which it is to be sent, a timestamp of when it was generated (if necessary), and the size of the packet's payload. This header is then followed by the actual payload content of the packet, which is fragmented according to the currently agreed-upon fragment size before it is sent over the connection. The packet header itself is never fragmented, and its size does not count towards the data in the packet's first fragment. In other words, only the actual packet payload (the media data) is subject to fragmentation.
At a higher level, the RTMP encapsulates MP3 or AAC audio and FLV1 video multimedia streams, and can make remote procedure calls (RPCs) using the Action Message Format. Any RPC services required are made asynchronously, using a single client/server request/response model, such that real-time communication is not required.[ clarification needed ] [2] [3]
RTMP sessions may be encrypted using either of two methods:
In RTMP Tunneled (RTMPT), RTMP data is encapsulated and exchanged via HTTP, and messages from the client (the media player, in this case) are addressed to port 80 (the default for HTTP) on the server.
While the messages in RTMPT are larger than the equivalent non-tunneled RTMP messages due to HTTP headers, RTMPT may facilitate the use of RTMP in scenarios where the use of non-tunneled RTMP would otherwise not be possible, such as when the client is behind a firewall that blocks non-HTTP and non-HTTPS outbound traffic.
The protocol works by sending commands through the POST URL, and AMF messages through the POST body. An example is
POST /open/1 HTTP/1.1
for a connection to be opened.
Adobe has released a specification for version 1.0 of the protocol, dated 21 December 2012. [4] The web landing page leading to that specification notes that "To benefit customers who want to protect their content, the open RTMP specification does not include Adobe's unique secure RTMP measures". [5]
A document accompanying the Adobe specification grants "non-exclusive, royalty-free, nontransferable, non-sublicensable, personal, worldwide" patent license to all implementations of the protocol, with two restrictions: one forbids use for intercepting streaming data ("any technology that intercepts streaming video, audio and/or data content for storage in any device or medium"), and another prohibits circumvention of "technological measures for the protection of audio, video and/or data content, including any of Adobe’s secure RTMP measures". [6]
Stefan Richter, author of some books on Flash, noted in 2008 that while Adobe is vague as to which patents apply to RTMP, U.S. patent 7,246,356 appears to be one of them. [2]
In 2011, Adobe did sue Wowza Media Systems claiming, among other things, infringement of their RTMP patents. [7] [8] [9] In 2015, Adobe and Wowza announced that the lawsuits have been settled and dismissed with prejudice. [10]
Packets are sent over a TCP connection, which is established first between client and server. They contain a header and a body which, in the case of connection and control commands, is encoded using the Action Message Format (AMF). The header is split into the Basic Header (shown as detached from the rest, in the diagram) and Chunk Message Header. The Basic Header is the only constant part of the packet and is usually composed of a single composite byte, where the two most significant bits are the Chunk Type (fmt in the specification) and the rest form the Stream ID. Depending on the value of the former, some fields of the Message Header can be omitted, and their value derived from previous packets while depending on the value of the latter, the Basic Header can be extended with one or two extra bytes (as in the case of the diagramme that has three bytes in total (c)). If the value of the remaining six bits of the Basic Header (BH) (least significant) is 0 then the BH is two bytes and represents from Stream ID 64 to 319 (64+255); if the value is 1, then the BH is three bytes (with last two bytes encoded as 16bit Little Endian) and represents from Stream ID 64 to 65599 (64+65535); if the value is 2, then BH is one byte and is reserved for low-level protocol control messages and commands. The Chunk Message Header contains meta-data information such as the message size (measured in bytes), the Timestamp Delta and Message Type. This last value is a single byte and defines whether the packet is an audio, video, command or "low level" RTMP packet such as an RTMP Ping.
An example is shown below as captured when a flash client executes the following code:
varstream:NetStream=newNetStream(connectionObject);
this will generate the following Chunk:
Hex Code | ASCII |
---|---|
03 00 0B 68 00 00 19 14 00 00 00 00 0200 0C63 72 65 61 74 65 53 74 72 65 61 6D 00 40 00 00 00 00 00 00 00 05 | ␃ ␀ @ I ␀ ␀ ␙ ␔ ␀ ␀ ␀ ␀ ␂␀ ␌c r e a t e S t r e a m ␀ @ ␀ ␀ ␀ ␀ ␀ ␀ ␀ ␅ |
The packet starts with a Basic Header of a single byte (0x03) where the two most significant bits (b00000011) define a chunk header type of 0 while the rest (b00000011) define a Chunk Stream ID of 3. The four possible values of the header type and their significance are:
The last type (b11) is always used in the case of aggregate messages where, in the example above, the second message will start with an id of 0xC3 (b11000011) and would mean that all Message Header fields should be derived from the message with a stream Id of 3 (which would be the message right above it). The six least significant bits that form the Stream ID can take values between 3 and 63. Some values have special meaning, like 1 that stands for an extended ID format, in which case there will be two bytes following that. A value of two is for low level messages such as Ping and Set Client Bandwidth.
The next bytes of the RTMP Header (including the values in the example packet above) are decoded as follows:
The Message Type ID byte defines whether the packet contains audio/video data, a remote object or a command. Some possible values for are:
Following the header, 0x02 denotes a string of size 0x000C and values 0x63 0x72 ... 0x6D ("createStream" command). Following that we have a 0x00 (number) which is the transaction id of value 2.0. The last byte is 0x05 (null) which means there are no arguments.
Some of the message types shown above, such as Ping and Set Client/Server Bandwidth, are considered low level RTMP protocol messages which do not use the AMF encoding format. Command messages on the other hand, whether AMF0 (Message Type of 0x14) or AMF3 (0x11), use the format and have the general form shown below:
(String) <Command Name> (Number) <Transaction Id> (Mixed) <Argument> ex. Null, String, Object: {key1:value1, key2:value2 ... }
The transaction id is used for commands that can have a reply. The value can be either a string like in the example above or one or more objects, each composed of a set of key/value pairs where the keys are always encoded as strings while the values can be any AMF data type, including complex types like arrays.
Control messages are not AMF encoded. They start with a stream Id of 0x02 which implies a full (type 0) header and have a message type of 0x04. The header is followed by six bytes, which are interpreted as such:
The first two bytes of the message body define the Ping Type, which can apparently [11] take six possible values.
Pong is the name for a reply to a Ping, with the values used as seen above.
This relates to messages that have to do with the client up-stream and server down-stream bit-rate. The body is composed of four bytes showing the bandwidth value, with a possible extension of one byte which sets the Limit Type. This can have one of three possible values which can be: hard, soft or dynamic (either soft or hard).
The value received in the four bytes of the body. A default value of 128 bytes exists, and the message is sent only when a change is wanted.
After establishing a TCP connection, an RTMP connection is established first, performing a handshake through the exchange of three packets from each side (also referred to as Chunks in the official documentation). These are referred in the official spec as C0-2 for the client sent packets and S0-2 for the server side respectively and are not to be confused with RTMP packets that can be exchanged only after the handshake is complete. These packets have a structure of their own and C1 contains a field setting the "epoch" timestamp, but since this can be set to zero, as is done in third party implementations, the packet can be simplified. The client initialises the connection by sending the C0 packet with a constant value of 0x03 representing the current protocol version. It follows straight with C1 without waiting for S0 to be received first which contains 1536 bytes, with the first four representing the epoch timestamp, the second four all being 0, and the rest being random (and which can be set to 0 in third party implementations). C2 and S2 are an echo of S1 and C1 respectively, except with the second four bytes being the time the respective message was received (instead of 0). After C2 and S2 are received, the handshake is considered complete.
At this point, the client, and server can negotiate a connection by exchanging AMF encoded messages. These include key value pairs which relate to variables that are needed for a connection to be established. An example message from the client is:
(Invoke)"connect"(TransactionID)1.0(Object1){app:"sample",flashVer:"MAC 10,2,153,2",swfUrl:null,tcUrl:"rtmpt://127.0.0.1/sample ",fpad:false,capabilities:9947.75,audioCodecs:3191,videoCodecs:252,videoFunction:1,pageUrl:null,objectEncoding:3.0}
The Flash Media Server and other implementations uses the concept of an "app" to conceptually define a container for audio/video and other content, implemented as a folder on the server root which contains the media files to be streamed. The first variable contains the name of this app as "sample" which is the name provided by the Wowza Server for their testing. The flashVer
string is the same as returned by the Action-script getversion()
function. The audioCodec
and videoCodec
are encoded as doubles and their meaning can be found in the original spec. The same is true for the videoFunction
variable, which in this case is the self-explanatory SUPPORT_VID_CLIENT_SEEK constant. Of special interest is the objectEncoding
which will define whether the rest of the communication will make use of the extended AMF3 format or not. As version 3 is the current default, the flash client has to be told explicitly in Action-script code to use AMF0 if that is requested. The server then replies with a ServerBW, a ClientBW and a SetPacketSize message sequence, finally followed by an Invoke, with an example message.
(Invoke)"_result"(transactionID)1.0(Object1){fmsVer:"FMS/3,5,5,2004",capabilities:31.0,mode:1.0}(Object2){level:"status",code:"NetConnection.Connect.Success",description:"Connection succeeded",data:(array){version:"3,5,5,2004"},clientId:1728724019,objectEncoding:3.0}
Some values above are serialised into properties of a generic Action-script Object, which is then passed to the NetConnection event listener. The clientId
will establish a number for the session to be started by the connection. Object encoding must match the value previously set.
To start a video stream, the client sends a "createStream" invocation followed by a ping message, followed by a "play" invocation with the file name as argument. The server will then reply with a series of "onStatus" commands followed by the video data as encapsulated within RTMP messages.
After a connection is established, media is sent by encapsulating the content of FLV tags into RTMP messages of type 8 and 9 for audio and video, respectively.
This refers to the HTTP tunneled version of the protocol. It communicates over port 80 and passes the AMF data inside HTTP POST request and responses. The sequence for connection is as follows:
POST/fcs/ident2HTTP/1.1Content-Type:application/x-fcs\r\n HTTP/1.0 404 Not Found
POST/open/1HTTP/1.1Content-Type:application/x-fcs\r\n HTTP/1.1 200 OK Content-Type: application/x-fcs\r\n 1728724019
The first request has an /fcs/ident2
path, and the correct reply is a 404 Not Found error. The client then sends an /open/1 request where the server must reply with a 200 ok appending a random number that will be used as the session identifier for the said communication. In this example, 1728724019 is returned in the response body.
POST/idle/1728724019/0HTTP/1.1HTTP/1.1200 OK0x01
From now on, the /idle/<session id>/<sequence #>
is a polling request where the session id has been generated and returned from the server and the sequence is just a number that increments by one for every request. The appropriate response is a 200 OK, with an integer returned in the body signifying the interval time. AMF data is sent through /send/<session id>/<sequence #>
RTMP is implemented at these three stages:
The open-source RTMP client command-line tool rtmpdump is designed to play back or save to disk the full RTMP stream, including the RTMPE protocol Adobe uses for encryption. RTMPdump runs on Linux, Android, Solaris, Mac OS X , and most other Unix-derived operating systems, as well as Microsoft Windows. Originally supporting all versions of 32-bit Windows including Windows 98, from version 2.2 the software will run only on Windows XP and above (although earlier versions remain fully functional).
Packages of the rtmpdump suite of software are available in the major open-source repositories (Linux distributions). These include the front-end apps "rtmpdump", "rtmpsrv" and "rtmpsuck."
Development of RTMPdump was restarted in October 2009, outside the United States, at the MPlayer site. [12] The current version features greatly improved functionality, and has been rewritten to take advantage of the benefits of the C programming language. In particular, the main functionality was built into a library (librtmp) which can easily be used by other applications. The RTMPdump developers have also written support for librtmp for MPlayer, FFmpeg, XBMC, cURL, VLC and a number of other open source software projects. Use of librtmp provides these projects with full support of RTMP in all its variants without any additional development effort.
FLVstreamer is a fork of RTMPdump, without the code, which Adobe claims violates the DMCA in the USA. This was developed as a response to Adobe's attempt in 2008 to suppress RTMPdump. FLVstreamer is an RTMP client that will save a stream of audio or video content from any RTMP server to disk, if encryption (RTMPE) is not enabled on the stream.
Flash video container in RTMP is limited to H264 codec in most implementations. For this reason, The Veovera Software Organization, including Adobe, Google, Twitch, and Veriskope published the enhanced RTMP specification, [13] which adds support for VP9, H265 and AV1 codecs in the Flash Video container FLV.
The Real-Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams over a suitable transport protocol. RTSP is used in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between endpoints. Clients of media servers issue commands such as play, record and pause, to facilitate real-time control of the media streaming from the server to a client or from a client to the server.
The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.
The RTP Control Protocol (RTCP) is a binary-encoded out-of-band signaling protocol that functions alongside the Real-time Transport Protocol (RTP). Its basic functionality and packet structure is defined in RFC 3550. RTCP provides statistics and control information for an RTP session. It partners with RTP in the delivery and packaging of multimedia data but does not transport any media data itself.
Flash Video is a container file format used to deliver digital video content over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There are two different Flash Video file formats: FLV and F4V. The audio and video data within FLV files are encoded in the same way as SWF files. The F4V file format is based on the ISO base media file format, starting with Flash Player 9 update 3. Both formats are supported in Adobe Flash Player and developed by Adobe Systems. FLV was originally developed by Macromedia. In the early 2000s, Flash Video was the de facto standard for web-based streaming video. Users include Hulu, VEVO, Yahoo! Video, metacafe, Reuters.com, and many other news providers.
Adobe Media Server (AMS) is a proprietary data and media server from Adobe Systems. This server works with the Flash Player and HTML5 runtime to create media driven, multiuser RIAs. The server uses ActionScript 1, an ECMAScript based scripting language, for server-side logic. Prior to version 2, it was known as Flash Communication Server. Prior to version 5, it was known as Flash Media Server. In February 2019, Adobe Systems Incorporated granted Veriskope Inc rights to further develop, resell, and extend distribution of the software product.
Chunked transfer encoding is a streaming data transfer mechanism available in Hypertext Transfer Protocol (HTTP) version 1.1, defined in RFC 9112 §7.1. In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another. No knowledge of the data stream outside the currently-being-processed chunk is necessary for both the sender and the receiver at any given time.
Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.
Asao is a proprietary single-channel (mono) codec and compression format optimized for low-bitrate transmission of audio, developed by Nellymoser Inc.
Byte serving is the process introduced in HTTP protocol 1.1 of sending only a portion of a message from a server to a client. Byte serving begins when an HTTP server advertises its willingness to serve partial requests using the Accept-Ranges response header. A client then requests a specific part of a file from the server using the Range request header. If the range is valid, the server sends it to the client with a 206 Partial Content status code and a Content-Range header listing the range sent. If the range is invalid, the server responds with a 416 Requested Range Not Satisfiable status code.
The Secure Real-Time Media Flow Protocol (RTMFP) is a protocol suite developed by Adobe Systems for encrypted, efficient multimedia delivery through both client-server and peer-to-peer models over the Internet. The protocol was originally proprietary, but was later opened up and is now published as RFC 7016.
Protected Streaming is a DRM technology by Adobe. The aim of the technology is to protect digital content from unauthorized use.
HTTP Live Streaming is an HTTP-based adaptive bitrate streaming communications protocol developed by Apple Inc. and released in 2009. Support for the protocol is widespread in media players, web browsers, mobile devices, and streaming media servers. As of 2022, an annual video industry survey has consistently found it to be the most popular streaming format.
Flash Media Live Encoder (FMLE) was a free live encoding software product from Adobe Systems. It was available for Microsoft Windows and Mac OS.
The Helix Universal Media Server was a product developed by RealNetworks and originates from the first streaming media server originally developed by Progressive Networks in 1994. It supported a variety of streaming media delivery transports including MPEG-DASH RTMP (flash), RTSP (standard), HTTP Live Streaming (HLS), Microsoft Silverlight and HTTP Progressive Download enabling mobile phone OS and PC OS media client delivery.
Adaptive bitrate streaming is a technique used in streaming multimedia over computer networks.
Sirannon is a free, open-source, media server and client. The goal is to aid in video research and experimental streaming. Sirannon allows the programmer to create a wide variety of media-handling components such as streaming, reading, writing, packetizing. By organizing these components in a workflow the programmer can create many applications such as a media server, media proxy or video tool. Sirannon was introduced at the ACM multimedia conference in October 2009 under its former name xStreamer.
Unreal Media Server is a streaming server software created by Unreal Streaming Technologies.
Red5 is a free software media streaming server implemented in Java, which provides services similar to those offered by the proprietary Adobe Flash Media Server and Wowza Streaming Engine including:
Web Call Server is unified intermedia server software developed by Flashphoner. It is a server-side platform, implemented in Java, dedicated for streaming video over wide range of communication protocols, including:
{{cite journal}}
: Cite journal requires |journal=
(help)