This article's use of external links may not follow Wikipedia's policies or guidelines.(March 2021) |
VoIP spam or SPIT (spam over Internet telephony) is unsolicited, automatically dialed telephone calls, typically using voice over Internet Protocol (VoIP) technology. [1]
VoIP systems, like e-mail and other Internet applications, are susceptible to abuse by malicious parties who initiate unsolicited and unwanted communications, such as telemarketers and prank callers. VoIP calling rates are cheap, and the technology provides convenient, often free tools, such as Asterisk and other applications.
The primary underlying technology driving this threat is the Session Initiation Protocol (SIP), [2] which is a standard for VoIP telecommunications.
Various techniques have been devised to detect spam calls; some take effect even before the recipient has answered a call to disconnect it. These techniques rely on statistical analysis of the features of the call, [3] such as the originating IP address, or features of the signalling and media messages. [4]
VoIP spam is characterized as unsolicited calls initiated by voice over Internet Protocol systems. The spammer attempts to initiate a voice session and plays a recorded message if the receiver answers. Robocalls can be delivered automatically using telephony software, such as Asterisk.
RFC 5039 [1] contains some basic methods for the mitigation of telephony spam over SIP:
A strong identification of the caller, for example as described in RFC 4474, [5] helps to mitigate SPIT. In a public switched telephone network (PSTN), the Caller ID permits caller identification, but at least the displayed caller ID can be spoofed.
Various SPIT mitigation methods and frameworks have been proposed. The vast amount of work on spam detection in emails does not directly apply here because of the real-time nature of the voice calls. A comprehensive survey of Voice over IP Security Research (Chapter IV b) provides an overview. Many proposals focus on the reputation and the behavior of callers, while some focus on machine learning classifiers using features extracted from the control signals or the data of the call. A statistical analysis of the signaling traffic and in particular the call frequency can be used to detect anomalies, to observe and finally to black-list suspicious callers. [3] A semi-supervised machine learning tool creates clusters of similar calls and a human operator can flag any given cluster as being spam. A Voice Spam Detector (VSD) [6] is a multi-stage spam filter based on trust and reputation. The SPIDER project proposes a SPIT mitigation architecture, [7] which uses a detection layer consisting of various modules and a decision layer. The VoIP SEAL system [8] uses different stages. After a signaling analysis in the first stage, the suspicious callers are subjected to tests (e.g. Audio-CAPTCHAs) and the callee is asked for feedback in later stages. SymRank [9] adapts of the PageRank algorithm and computes the reputation of subscribers based on both incoming and outgoing calls. Furthermore, outliers in total talk duration and in repetitive and reciprocal calls can be used to detect suspicious callers. [9]
SPIT detection can make use of sophisticated machine learning algorithms, including semi-supervised machine learning algorithms. A protocol called pMPCK-Means [4] performs the detection as soon as the call is established providing the option of automatically hanging up a suspect call. It builds on the notion of clustering whereby calls with similar features are placed in a cluster for SPIT or legitimate calls and human input is used to mark which cluster corresponds to SPIT. Call features include those extracted directly from signaling traffic such as the source and destination addresses, extracted from media traffic, such as proportion of silence, and derived from calls, such as duration and frequency of calls.
SPIT detection and mitigation can also be based solely on the caller's audio data. [10] [11] This approach uses audio identification techniques (similar to music identification) to detect calls with identical audio data including certain degradations (e.g., noise and different audio codecs). A robust Acoustic fingerprint (perceptual hashing) is derived from spectral parameters of the audio data and replayed calls are identified by a comparison of fingerprints. [12] A prototype solution has been developed within the VIAT project.
Researchers Azad and Morla (2013) conducted a study on detecting spam callers in a much accurate and secure approach. They invented a new scheme to detect spam calls without user interaction and prior reviewing the content of the message. The statistics from the several experiments showed this new system effectively detected spammers calling legitimate users without accessing the private information and user interaction. [13]
Little information is available about implementations of SPIT mitigation measures by telephone companies. Some recent smartphone vendors are incorporating notification of possible spam for incoming calls, such as Google in its Nexus Android devices [14] and Apple in its iOS 10 release. [15] SPIT is generally not yet considered to be a problem as critical as email spam. An automated analysis of the call signaling flow can help to discover SPIT. Commercial VoIP software for communication service providers may include a behavioral analysis, e.g. Acme Packet Palladion. Relevant parameters and indications of SPIT are, for example, a high call attempt frequency, concurrent calls, low call completion and low call duration average.
The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.
The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telephone systems, as well as mobile phone calling over LTE (VoLTE).
Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet telephony, broadband telephony, and broadband phone service specifically refer to the provisioning of communications services over the Internet, rather than via the public switched telephone network (PSTN), also known as plain old telephone service (POTS).
Telephone number mapping is a system of unifying the international telephone number system of the public switched telephone network with the Internet addressing and identification name spaces. Internationally, telephone numbers are systematically organized by the E.164 standard, while the Internet uses the Domain Name System (DNS) for linking domain names to IP addresses and other resource information. Telephone number mapping systems provide facilities to determine applicable Internet communications servers responsible for servicing a given telephone number using DNS queries.
Inter-Asterisk eXchange (IAX) is a communications protocol native to the Asterisk private branch exchange (PBX) software, and is supported by a few other softswitches, PBX systems, and softphones. It is used for transporting VoIP telephony sessions between servers and to terminal devices.
Asterisk is a software implementation of a private branch exchange (PBX). In conjunction with suitable telephony hardware interfaces and network applications, Asterisk is used to establish and control telephone calls between telecommunication endpoints, such as customary telephone sets, destinations on the public switched telephone network (PSTN), and devices or services on voice over Internet Protocol (VoIP) networks. Its name comes from the asterisk (*) symbol for a signal used in dual-tone multi-frequency (DTMF) dialing.
A session border controller (SBC) is a network element deployed to protect SIP based voice over Internet Protocol (VoIP) networks.
An Internet telephony service provider (ITSP) offers digital telecommunications services based on Voice over Internet Protocol (VoIP) that are provisioned via the Internet.
Direct inward dialing (DID), also called direct dial-in (DDI) in Europe and Oceania, is a telecommunication service offered by telephone companies to subscribers who operate a private branch exchange (PBX) system. The feature provides service for multiple telephone numbers over one or more analog or digital physical circuits to the PBX, and transmits the dialed telephone number to the PBX so that a PBX extension is directly accessible for an outside caller, possibly by-passing an auto-attendant.
A VoIP phone or IP phone uses voice over IP technologies for placing and transmitting telephone calls over an IP network, such as the Internet. This is in contrast to a standard phone which uses the traditional public switched telephone network (PSTN).
ZRTP is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over IP (VoIP) phone telephony call based on the Real-time Transport Protocol. It uses Diffie–Hellman key exchange and the Secure Real-time Transport Protocol (SRTP) for encryption. ZRTP was developed by Phil Zimmermann, with help from Bryce Wilcox-O'Hearn, Colin Plumb, Jon Callas and Alan Johnston and was submitted to the Internet Engineering Task Force (IETF) by Zimmermann, Callas and Johnston on March 5, 2006 and published on April 11, 2011 as RFC 6189.
The SIP URI scheme is a Uniform Resource Identifier (URI) scheme for the Session Initiation Protocol (SIP) multimedia communications protocol. A SIP address is a URI that addresses a specific telephone extension on a voice over IP system. Such a number could be a private branch exchange or an E.164 telephone number dialled through a specific gateway. The scheme was defined in RFC 3261.
Text over IP is a means of providing a real-time text (RTT) service that operates over IP-based networks. It complements Voice over IP (VoIP) and Video over IP.
Peer-to-peer SIP (P2P-SIP) is an implementation of a distributed voice over Internet Protocol (VoIP) or instant messaging communications application using a peer-to-peer (P2P) architecture in which session control between communication end points is facilitated with the Session Initiation Protocol (SIP).
A softphone is a software program for making telephone calls over the Internet using a general purpose computer rather than dedicated hardware. The softphone can be installed on a piece of equipment such as a desktop, mobile device, or other computer and allows the user to place and receive calls without requiring an actual telephone set. Often, a softphone is designed to behave like a traditional telephone, sometimes appearing as an image of a handset, with a display panel and buttons with which the user can interact. A softphone is usually used with a headset connected to the sound card of the PC or with a USB phone.
Federated VoIP is a form of packetized voice telephony that uses voice over IP between autonomous domains in the public Internet without the deployment of central virtual exchange points or switching centers for traffic routing. Federated VoIP uses decentralized addressing systems, such as ENUM, for location and identity information of participants and implements secure, trusted communications (TLS) for identify verification.
The media gateway control protocol architecture is a methodology of providing telecommunication services using decomposed multimedia gateways for transmitting telephone calls between an Internet Protocol network and traditional analog facilities of the public switched telephone network (PSTN). The architecture was originally defined in RFC 2805 and has been used in several prominent voice over IP (VoIP) protocol implementations, such as the Media Gateway Control Protocol (MGCP) and Megaco (H.248), both successors to the obsolete Simple Gateway Control Protocol (SGCP).
The Session Initiation Protocol (SIP) is the signaling protocol selected by the 3rd Generation Partnership Project (3GPP) to create and control multimedia sessions with two or more participants in the IP Multimedia Subsystem (IMS), and therefore is a key element in the IMS framework.
STIR/SHAKEN, or SHAKEN/STIR, is a suite of protocols and procedures intended to combat caller ID spoofing on public telephone networks. Caller ID spoofing is used by robocallers to mask their identity or to make it appear the call is from a legitimate source, often a nearby phone number with the same area code and exchange, or from well-known agencies like the Internal Revenue Service or Ontario Provincial Police. This sort of spoofing is common for calls originating from voice-over-IP (VoIP) systems, which can be located anywhere in the world.