Communication protocol | |
Abbreviation | SIP |
---|---|
Purpose | Internet telephony |
Introduction | March 1999 |
OSI layer | Application layer (Layer 7) |
Port(s) | 5060, 5061 |
RFC(s) | 2543, 3261 |
Internet protocol suite |
---|
Application layer |
Transport layer |
Internet layer |
Link layer |
The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. [1] SIP is used in Internet telephony, in private IP telephone systems, as well as mobile phone calling over LTE (VoLTE). [2]
The protocol defines the specific format of messages exchanged and the sequence of communications for cooperation of the participants. SIP is a text-based protocol, incorporating many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP). [3] A call established with SIP may consist of multiple media streams, but no separate streams are required for applications, such as text messaging, that exchange data as payload in the SIP message.
SIP works in conjunction with several other protocols that specify and carry the session media. Most commonly, media type and parameter negotiation and media setup are performed with the Session Description Protocol (SDP), which is carried as payload in SIP messages. SIP is designed to be independent of the underlying transport layer protocol and can be used with the User Datagram Protocol (UDP), the Transmission Control Protocol (TCP), and the Stream Control Transmission Protocol (SCTP). For secure transmissions of SIP messages over insecure network links, the protocol may be encrypted with Transport Layer Security (TLS). For the transmission of media streams (voice, video) the SDP payload carried in SIP messages typically employs the Real-time Transport Protocol (RTP) or the Secure Real-time Transport Protocol (SRTP).
SIP was originally designed by Mark Handley, Henning Schulzrinne, Eve Schooler and Jonathan Rosenberg in 1996 to facilitate establishing multicast multimedia sessions on the Mbone. The protocol was standardized as RFC 2543 in 1999. In November 2000, SIP was accepted as a 3GPP signaling protocol and permanent element of the IP Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia services in cellular networks. In June 2002 the specification was revised in RFC 3261 [4] and various extensions and clarifications have been published since. [5]
SIP was designed to provide a signaling and call setup protocol for IP-based communications supporting the call processing functions and features present in the public switched telephone network (PSTN) with a vision of supporting new multimedia applications. It has been extended for video conferencing, streaming media distribution, instant messaging, presence information, file transfer, Internet fax and online games. [1] [6] [7]
SIP is distinguished by its proponents for having roots in the Internet community rather than in the telecommunications industry. SIP has been standardized primarily by the Internet Engineering Task Force (IETF), while other protocols, such as H.323, have traditionally been associated with the International Telecommunication Union (ITU).
SIP is only involved in the signaling operations of a media communication session and is primarily used to set up and terminate voice or video calls. SIP can be used to establish two-party (unicast) or multiparty (multicast) sessions. It also allows modification of existing calls. The modification can involve changing addresses or ports, inviting more participants, and adding or deleting media streams. SIP has also found applications in messaging applications, such as instant messaging, and event subscription and notification.
SIP works in conjunction with several other protocols that specify the media format and coding and that carry the media once the call is set up. For call setup, the body of a SIP message contains a Session Description Protocol (SDP) data unit, which specifies the media format, codec and media communication protocol. Voice and video media streams are typically carried between the terminals using the Real-time Transport Protocol (RTP) or Secure Real-time Transport Protocol (SRTP). [3] [8]
Every resource of a SIP network, such as user agents, call routers, and voicemail boxes, are identified by a Uniform Resource Identifier (URI). The syntax of the URI follows the general standard syntax also used in Web services and e-mail. [9] The URI scheme used for SIP is sip and a typical SIP URI has the form sip:username@domainname or sip:username@hostport, where domainname requires DNS SRV records to locate the servers for SIP domain while hostport can be an IP address or a fully qualified domain name of the host and port. If secure transmission is required, the scheme sips is used. [10] [11]
SIP employs design elements similar to the HTTP request and response transaction model. [12] Each transaction consists of a client request that invokes a particular method or function on the server and at least one response. SIP reuses most of the header fields, encoding rules and status codes of HTTP, providing a readable text-based format.
SIP can be carried by several transport layer protocols including Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Stream Control Transmission Protocol (SCTP). [13] [14] SIP clients typically use TCP or UDP on port numbers 5060 or 5061 for SIP traffic to servers and other endpoints. Port 5060 is commonly used for non-encrypted signaling traffic whereas port 5061 is typically used for traffic encrypted with Transport Layer Security (TLS).
SIP-based telephony networks often implement call processing features of Signaling System 7 (SS7), for which special SIP protocol extensions exist, although the two protocols themselves are very different. SS7 is a centralized protocol, characterized by a complex central network architecture and dumb endpoints (traditional telephone handsets). SIP is a client-server protocol of equipotent peers. SIP features are implemented in the communicating endpoints, while the traditional SS7 architecture is in use only between switching centers.
The network elements that use the Session Initiation Protocol for communication are called SIP user agents. Each user agent (UA) performs the function of a user agent client (UAC) when it is requesting a service function, and that of a user agent server (UAS) when responding to a request. Thus, any two SIP endpoints may in principle operate without any intervening SIP infrastructure. However, for network operational reasons, for provisioning public services to users, and for directory services, SIP defines several specific types of network server elements. Each of these service elements also communicates within the client-server model implemented in user agent clients and servers. [15]
A user agent is a logical network endpoint that sends or receives SIP messages and manages SIP sessions. User agents have client and server components. The user agent client (UAC) sends SIP requests. The user agent server (UAS) receives requests and returns a SIP response. Unlike other network protocols that fix the roles of client and server, e.g., in HTTP, in which a web browser only acts as a client, and never as a server, SIP requires both peers to implement both roles. The roles of UAC and UAS only last for the duration of a SIP transaction. [6]
A SIP phone is an IP phone that implements client and server functions of a SIP user agent and provides the traditional call functions of a telephone, such as dial, answer, reject, call hold, and call transfer. [16] [17] SIP phones may be implemented as a hardware device or as a softphone. As vendors increasingly implement SIP as a standard telephony platform, the distinction between hardware-based and software-based SIP phones is blurred and SIP elements are implemented in the basic firmware functions of many IP-capable communications devices such as smartphones.
In SIP, as in HTTP, the user agent may identify itself using a message header field (User-Agent), containing a text description of the software, hardware, or the product name. The user agent field is sent in request messages, which means that the receiving SIP server can evaluate this information to perform device-specific configuration or feature activation. Operators of SIP network elements sometimes store this information in customer account portals, [18] where it can be useful in diagnosing SIP compatibility problems or in the display of service status.
A proxy server is a network server with UAC and UAS components that functions as an intermediary entity for the purpose of performing requests on behalf of other network elements. A proxy server primarily plays the role of call routing; it sends SIP requests to another entity closer to the destination. Proxies are also useful for enforcing policy, such as for determining whether a user is allowed to make a call. A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it.
SIP proxy servers that route messages to more than one destination are called forking proxies. The forking of a SIP request establishes multiple dialogs from the single request. Thus, a call may be answered from one of multiple SIP endpoints. For identification of multiple dialogs, each dialog has an identifier with contributions from both endpoints.
A redirect server is a user agent server that generates 3xx (redirection) responses to requests it receives, directing the client to contact an alternate set of URIs. A redirect server allows proxy servers to direct SIP session invitations to external domains.
A registrar is a SIP endpoint that provides a location service. It accepts REGISTER requests, recording the address and other parameters from the user agent. For subsequent requests, it provides an essential means to locate possible communication peers on the network. The location service links one or more IP addresses to the SIP URI of the registering agent. Multiple user agents may register for the same URI, with the result that all registered user agents receive the calls to the URI.
SIP registrars are logical elements and are often co-located with SIP proxies. To improve network scalability, location services may instead be located with a redirect server.
Session border controllers (SBCs) serve as middleboxes between user agents and SIP servers for various types of functions, including network topology hiding and assistance in NAT traversal. SBCs are an independently engineered solution and are not mentioned in the SIP RFC.
Gateways can be used to interconnect a SIP network to other networks, such as the PSTN, which use different protocols or technologies.
SIP is a text-based protocol with syntax similar to that of HTTP. There are two different types of SIP messages: requests and responses. The first line of a request has a method, defining the nature of the request, and a Request-URI, indicating where the request should be sent. [19] The first line of a response has a response code.
Requests initiate a functionality of the protocol. They are sent by a user agent client to the server and are answered with one or more SIP responses, which return a result code of the transaction, and generally indicate the success, failure, or other state of the transaction.
Request name | Description | Notes | RFC references |
---|---|---|---|
REGISTER | Register the URI listed in the To-header field with a location server and associates it with the network address given in a Contact header field. | The command implements a location service. | RFC 3261 |
INVITE | Initiate a dialog for establishing a call. The request is sent by a user agent client to a user agent server. | When sent during an established dialog (reinvite) it modifies the sessions, for example placing a call on hold. | RFC 3261 |
ACK | Confirm that an entity has received a final response to an INVITE request. | RFC 3261 | |
BYE | Signal termination of a dialog and end a call. | This message may be sent by either endpoint of a dialog. | RFC 3261 |
CANCEL | Cancel any pending request. | Usually means terminating a call while it is still ringing, before answer. | RFC 3261 |
UPDATE | Modify the state of a session without changing the state of the dialog. | RFC 3311 | |
REFER | Ask recipient to issue a request for the purpose of call transfer. | RFC 3515 | |
PRACK | Provisional acknowledgement. | PRACK is sent in response to provisional response (1xx). | RFC 3262 |
SUBSCRIBE | Initiates a subscription for notification of events from a notifier. | RFC 6665 | |
NOTIFY | Inform a subscriber of notifications of a new event. | RFC 6665 | |
PUBLISH | Publish an event to a notification server. | RFC 3903 | |
MESSAGE | Deliver a text message. | Used in instant messaging applications. | RFC 3428 |
INFO | Send mid-session information that does not modify the session state. | This method is often used for DTMF relay. | RFC 6086 |
OPTIONS | Query the capabilities of an endpoint. | It is often used for NAT keepalive purposes. | RFC 3261 |
Responses are sent by the user agent server indicating the result of a received request. Several classes of responses are recognized, determined by the numerical range of result codes: [20]
SIP defines a transaction mechanism to control the exchanges between participants and deliver messages reliably. A transaction is a state of a session, which is controlled by various timers. Client transactions send requests and server transactions respond to those requests with one or more responses. The responses may include provisional responses with a response code in the form 1xx, and one or multiple final responses (2xx – 6xx).
Transactions are further categorized as either type invite or type non-invite. Invite transactions differ in that they can establish a long-running conversation, referred to as a dialog in SIP, and so include an acknowledgment (ACK) of any non-failing final response, e.g., 200 OK.
The Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE) is the SIP-based suite of standards for instant messaging and presence information. Message Session Relay Protocol (MSRP) allows instant message sessions and file transfer.
The SIP developer community meets regularly at conferences organized by SIP Forum to test interoperability of SIP implementations. [22] The TTCN-3 test specification language, developed by a task force at ETSI (STF 196), is used for specifying conformance tests for SIP implementations. [23]
When developing SIP software or deploying a new SIP infrastructure, it is important to test the capability of servers and IP networks to handle certain call load: number of concurrent calls and number of calls per second. SIP performance tester software is used to simulate SIP and RTP traffic to see if the server and IP network are stable under the call load. [24] The software measures performance indicators like answer delay, answer/seizure ratio, RTP jitter and packet loss, round-trip delay time.
SIP connection is a marketing term for voice over Internet Protocol (VoIP) services offered by many Internet telephony service providers (ITSPs). The service provides routing of telephone calls from a client's private branch exchange (PBX) telephone system to the PSTN. Such services may simplify corporate information system infrastructure by sharing Internet access for voice and data, and removing the cost for Basic Rate Interface (BRI) or Primary Rate Interface (PRI) telephone circuits.
SIP trunking is a similar marketing term preferred for when the service is used to simplify a telecom infrastructure by sharing the carrier access circuit for voice, data, and Internet traffic while removing the need for PRI circuits. [25] [26]
SIP-enabled video surveillance cameras can initiate calls to alert the operator of events, such as the motion of objects in a protected area.
SIP is used in audio over IP for broadcasting applications where it provides an interoperable means for audio interfaces from different manufacturers to make connections with one another. [27]
The U.S. National Institute of Standards and Technology (NIST), Advanced Networking Technologies Division provides a public-domain Java implementation [28] that serves as a reference implementation for the standard. The implementation can work in proxy server or user agent scenarios and has been used in numerous commercial and research projects. It supports RFC 3261 in full and a number of extension RFCs including RFC 6665 (event notification) and RFC 3262 (reliable provisional responses).
Numerous other commercial and open-source SIP implementations exist. See List of SIP software.
SIP-I, Session Initiation Protocol with encapsulated ISUP, is a protocol used to create, modify, and terminate communication sessions based on ISUP using SIP and IP networks. Services using SIP-I include voice, video telephony, fax and data. SIP-I and SIP-T [29] are two protocols with similar features, notably to allow ISUP messages to be transported over SIP networks. This preserves all of the detail available in the ISUP header. [lower-alpha 1] SIP-I was defined by the ITU-T, whereas SIP-T was defined by the IETF. [30]
Concerns about the security of calls via the public Internet have been addressed by encryption of the SIP protocol for secure transmission. The URI scheme SIPS is used to mandate that SIP communication be secured with Transport Layer Security (TLS). SIPS URIs take the form sips:user@example.com.
End-to-end encryption of SIP is only possible if there is a direct connection between communication endpoints. While a direct connection can be made via Peer-to-peer SIP or via a VPN between the endpoints, most SIP communication involves multiple hops, with the first hop being from a user agent to the user agent's ITSP. For the multiple-hop case, SIPS will only secure the first hop; the remaining hops will normally not be secured with TLS and the SIP communication will be insecure. In contrast, the HTTPS protocol provides end-to-end security as it is done with a direct connection and does not involve the notion of hops.
The media streams (audio and video), which are separate connections from the SIPS signaling stream, may be encrypted using SRTP. The key exchange for SRTP is performed with SDES ( RFC 4568), or with ZRTP ( RFC 6189). When SDES is used, the keys will be transmitted via insecure SIP unless SIPS is used. One may also add a MIKEY ( RFC 3830) exchange to SIP to determine session keys for use with SRTP.
HTTP is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.
The Real-Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams over a suitable transport protocol. RTSP is used in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between endpoints. Clients of media servers issue commands such as play, record and pause, to facilitate real-time control of the media streaming from the server to a client or from a client to the server.
Extensible Messaging and Presence Protocol is an open communication protocol designed for instant messaging (IM), presence information, and contact list maintenance. Based on XML, it enables the near-real-time exchange of structured data between two or more network entities. Designed to be extensible, the protocol offers a multitude of applications beyond traditional IM in the broader realm of message-oriented middleware, including signalling for VoIP, video, file transfer, gaming and other uses.
Inter-Asterisk eXchange (IAX) is a communications protocol native to the Asterisk private branch exchange (PBX) software, and is supported by a few other softswitches, PBX systems, and softphones. It is used for transporting voice over IP telephony sessions between servers and to terminal devices.
STUN is a standardized set of methods, including a network protocol, for traversal of network address translator (NAT) gateways in applications of real-time voice, video, messaging, and other interactive communications.
The IP Multimedia Subsystem or IP Multimedia Core Network Subsystem (IMS) is a standardised architectural framework for delivering IP multimedia services. Historically, mobile phones have provided voice call services over a circuit-switched-style network, rather than strictly over an IP packet-switched network. Various voice over IP technologies are available on smartphones; IMS provides a standard protocol across vendors.
The Gateway Control Protocol is an implementation of the media gateway control protocol architecture for providing telecommunication services across a converged internetwork consisting of the traditional public switched telephone network (PSTN) and modern packet networks, such as the Internet. H.248 is the designation of the recommendations developed by the ITU Telecommunication Standardization Sector (ITU-T) and Megaco is a contraction of media gateway control protocol used by the earliest specifications by the Internet Engineering Task Force (IETF). The standard published in March 2013 by ITU-T is entitled H.248.1: Gateway control protocol: Version 3.
A session border controller (SBC) is a network element deployed to protect SIP based voice over Internet Protocol (VoIP) networks.
A back-to-back user agent (B2BUA) is a logical network element in Session Initiation Protocol (SIP) applications. SIP is a signaling protocol for managing multimedia Voice over Internet Protocol (VoIP) telephone calls. A back-to-back user agent operates between both end points of a communications session and divides the communication channel into two call legs, and mediates all SIP signaling between the endpoints of the session, from establishment to termination. As all control messages for each call flow through the B2BUA, a service provider may implement value-added features available during the call.
The SIP URI scheme is a Uniform Resource Identifier (URI) scheme for the Session Initiation Protocol (SIP) multimedia communications protocol. A SIP address is a URI that addresses a specific telephone extension on a voice over IP system. Such a number could be a private branch exchange or an E.164 telephone number dialled through a specific gateway. The scheme was defined in RFC 3261.
An application-level gateway is a security component that augments a firewall or NAT employed in a mobile network. It allows customized NAT traversal filters to be plugged into the gateway to support address and port translation for certain application layer "control/data" protocols such as FTP, BitTorrent, SIP, RTSP, file transfer in IM applications. In order for these protocols to work through NAT or a firewall, either the application has to know about an address/port number combination that allows incoming packets, or the NAT has to monitor the control traffic and open up port mappings dynamically as required. Legitimate application data can thus be passed through the security checks of the firewall or NAT that would have otherwise restricted the traffic for not meeting its limited filter criteria.
Video Share is an IP Multimedia System (IMS) enabled service for mobile networks that allows users engaged in a circuit switch voice call to add a unidirectional video streaming session over the packet network during the voice call. Any of the parties on the voice call can initiate a video streaming session. There can be multiple video streaming sessions during a voice call, and each of these streaming sessions can be initiated by any of the parties on the voice call. The video source can either be the camera on the phone or a pre-recorded video clip.
Peer-to-peer SIP (P2P-SIP) is an implementation of a distributed voice over Internet Protocol (VoIP) or instant messaging communications application using a peer-to-peer (P2P) architecture in which session control between communication end points is facilitated with the Session Initiation Protocol (SIP).
In computer networking, the Message Session Relay Protocol (MSRP) is a protocol for transmitting a series of related instant messages in the context of a communications session. An application instantiates the session with the Session Description Protocol (SDP) over Session Initiation Protocol (SIP) or other rendezvous methods.
The Media Gateway Control Protocol (MGCP) is a telecommunication protocol for signaling and call control in hybrid voice over IP (VoIP) and traditional telecommunication systems. It implements the media gateway control protocol architecture for controlling media gateways connected to the public switched telephone network (PSTN). The media gateways provide conversion of traditional electronic media to the Internet Protocol (IP) network. The protocol is a successor to the Simple Gateway Control Protocol (SGCP), which was developed by Bellcore and Cisco, and the Internet Protocol Device Control (IPDC).
Network address translators (NAT) are used to overcome the lack of IPv4 address availability by hiding an enterprise or even an operator's network behind one or few IP addresses. The devices behind the NAT use private IP addresses that are not routable in the public Internet. The Session Initiation Protocol (SIP) has established itself as the de facto standard for voice over IP (VoIP) communication. In order to establish a call, a caller sends a SIP message, which contains its own IP address. The callee is supposed to reply back with a SIP message destined to the IP addresses included in the received SIP message. This will obviously not work if the caller is behind a NAT and is using a private IP address.
The Session Initiation Protocol (SIP) is the signaling protocol selected by the 3rd Generation Partnership Project (3GPP) to create and control multimedia sessions with multiple participants in the IP Multimedia Subsystem (IMS). It is therefore a key element in the IMS framework.
JsSIP is a library for the programming language JavaScript. It takes advantage of SIP and WebRTC to provide a fully featured SIP endpoint in any website. JsSIP allows any website to get real-time communication features using audio and video. It makes it possible to build SIP user agents that send and receive audio and video calls as well as and text messages.
{{citation}}
: CS1 maint: ref duplicates default (link){{citation}}
: CS1 maint: ref duplicates default (link)