IP traceback is any method for reliably determining the origin of a packet on the Internet. The IP protocol does not provide for the authentication of the source IP address of an IP packet, enabling the source address to be falsified in a strategy called IP address spoofing, and creating potential internet security and stability problems.
Use of false source IP addresses allows denial-of-service attacks (DoS) or one-way attacks (where the response from the victim host is so well known that return packets need not be received to continue the attack[ clarification needed ]). IP traceback is critical for identifying sources of attacks and instituting protection measures for the Internet. Most existing approaches to this problem have been tailored toward DoS attack detection. Such solutions require high numbers of packets to converge on the attack path(s).
Savage et al. [1] suggested probabilistically marking packets as they traverse routers through the Internet. They propose that the router mark the packet with either the router’s IP address or the edges of the path that the packet traversed to reach the router.
For the first alternative, marking packets with the router's IP address, analysis shows that in order to gain the correct attack path with 95% accuracy as many as 294,000 packets are required. The second approach, edge marking, requires that the two nodes that make up an edge mark the path with their IP addresses along with the distance between them. This approach would require more state information in each packet than simple node marking but would converge much faster. They suggest three ways to reduce the state information of these approaches into something more manageable. [1]
The first approach is to XOR each node forming an edge in the path with each other. Node a inserts its IP address into the packet and sends it to b. Upon being detected at b (by detecting a 0 in the distance), b XORs its address with the address of a. This new data entity is called an edge id and reduces the required state for edge sampling by half. Their next approach is to further take this edge id and fragment it into k smaller fragments. Then, randomly select a fragment and encode it, along with the fragment offset so that the correct corresponding fragment is selected from a downstream router for processing. When enough packets are received, the victim can reconstruct all of the edges the series of packets traversed (even in the presence of multiple attackers). [1]
Due to the high number of combinations required to rebuild a fragmented edge id, the reconstruction of such an attack graph is computationally intensive according to research by Song and Perrig. Furthermore, the approach results in a large number of false positives. As an example, with only 25 attacking hosts in a DDoS attack the reconstruction process takes days to build and results in thousands of false positives. [2]
Accordingly, Song and Perrig propose the following traceback scheme: instead of encoding the IP address interleaved with a hash, they suggest encoding the IP address into an 11 bit hash and maintain a 5 bit hop count, both stored in the 16-bit fragment ID field. This is based on the observation that a 5-bit hop count (32 max hops) is sufficient for almost all Internet routes. Further, they suggest that two different hashing functions be used so that the order of the routers in the markings can be determined. Next, if any given hop decides to mark it first checks the distance field for a 0, which implies that a previous router has already marked it. If this is the case, it generates an 11-bit hash of its own IP address and then XORs it with the previous hop. If it finds a non-zero hop count it inserts its IP hash, sets the hop count to zero and forwards the packet on. If a router decides not to mark the packet it merely increments the hop count in the overloaded fragment id field. [2]
Song and Perrig identify that this is not robust enough against collisions and thus suggest using a set of independent hash functions, randomly selecting one, and then hashing the IP along with a FID or function id and then encoding this. They state that this approach essentially reduces the probability of collision to (1/(211)m). For further details see Song and Perrig. [2]
Belenky and Ansari, outline a deterministic packet marking scheme. They describe a more realistic topology for the Internet – that is composed of LANs and ASs with a connective boundary – and attempt to put a single mark on inbound packets at the point of network ingress. Their idea is to put, with random probability of .5, the upper or lower half of the IP address of the ingress interface into the fragment id field of the packet, and then set a reserve bit indicating which portion of the address is contained in the fragment field. By using this approach they claim to be able to obtain 0 false positives with .99 probability after only 7 packets. [3]
Rayanchu and Barua provide another spin on this approach (called DERM). Their approach is similar in that they wish to use and encoded IP address of the input interface in the fragment id field of the packet. Where they differ from Belenky and Ansari is that they wish to encode the IP address as a 16-bit hash of that IP address. Initially they choose a known hashing function. They state that there would be some collisions if there were greater than 2^16 edge routers doing the marking. [4]
They attempt to mitigate the collision problem by introducing a random distributed selection of a hash function from the universal set, and then applying it to the IP address. In either hashing scenario, the source address and the hash are mapped together in a table for later look-up along with a bit indicating which portion of the address they have received. Through a complicated procedure and a random hash selection, they are capable of reducing address collision. By using a deterministic approach they reduce the time for their reconstruction procedure for their mark (the 16-bit hash). However, by encoding that mark through hashing they introduce the probability of collisions, and thus false-positives. [4]
Shokri and Varshovi introduced the concepts of Dynamic Marking and Mark-based Detection with "Dynamic Deterministic Packet Marking," (DDPM). In dynamic marking it is possible to find the attack agents in a large scale DDoS network. In the case of a DRDoS it enables the victim to trace the attack one step further back to the source, to find a master machine or the real attacker with only a few numbers of packets. The proposed marking procedure increases the possibility of DRDoS attack detection at the victim through mark-based detection. In the mark-based method, the detection engine takes into account the marks of the packets to identify varying sources of a single site involved in a DDoS attack. This significantly increases the probability of detection. In order to satisfy the end-to-end arguments approach, fate-sharing and also respect to the need for scalable and applicable schemes, only edge routers implement a simple marking procedure. The fairly negligible amount of delay and bandwidth overhead added to the edge routers make the DDPM implementable. [5]
S. Majumdar, D. Kulkarni and C. Ravishankar proposes a new method to traceback the origin of DHCP packets in ICDCN 2011. Their method adds a new DHCP option that contains the MAC address and the ingress port of the edge switch which had received the DHCP packet. This new option will be added to the DHCP packet by the edge switch. This solution follows DHCP RFCs. Previous IP traceback mechanisms have overloaded IP header fields with traceback information and thus are violating IP RFCs. Like other mechanisms, this paper also assumes that the network is trusted. The paper presents various performance issues in routers/switches that were considered while designing this practical approach. However, this approach is not applicable to any general IP packet. [6]
With router-based approaches, the router is charged with maintaining information regarding packets that pass through it. For example, Sager proposes to log packets and then data mine them later. This has the benefit of being out of band and thus not hindering the fast path.[ citation needed ]
Snoeren et al. propose marking within the router. The idea proposed in their paper is to generate a fingerprint of the packet, based upon the invariant portions of the packet (source, destination, etc.) and the first 8 bytes of payload (which is unique enough to have a low probability of collision). More specifically, m independent simple hash functions each generate an output in the range of 2n-1. A bit is then set at the index generated to create a fingerprint when combined with the output of all other hash functions. All fingerprints are stored in a 2n bit table for later retrieval. The paper shows a simple family of hash functions suitable for this purpose and present a hardware implementation of it. [7]
The space needed at each router is limited and controllable (2n bits). A small n makes the probability of collision of packet hashes (and false identification) higher. When a packet is to be traced back, it is forwarded to originating routers where fingerprint matches are checked. As time passes, the fingerprint information is “clobbered” by hashes generated by other packets. Thus, the selectivity of this approach degrades with the time that has passed between the passage of the packet and the traceback interrogation. [7]
Another known take on the router-based schemes comes from Hazeyama et al. In their approach, they wish to integrate the SPIE approach as outlined by Snoeren, [7] with their approach of recording the layer 2 link-id along with the network ID (VLAN or true ID), the MAC address of the layer 2 switch that received the packet and the link id it came in on. This information is then put into two look-up tables – both containing the switch (layer 2 router) MAC id for look-up. They rely on the MAC:port tuple as a method of tracing a packet back (even if the MAC address has been spoofed). [8]
To help mitigate the problem of storage limitations they use Snoeren’s hashing approach and implementation (SPIE) – modifying it to accept their information for hashing. They admit their algorithm is slow (O(N2)) and with only 3.3 million packet hashes being stored the approximate time before the digest tables are invalid is 1 minute. This dictates that any attack response must be real-time – a possibility only on single-administrative LAN domains. [8]
The ICMP traceback scheme Steven M. Bellovin proposes probabilistically sending an ICMP traceback packet forward to the destination host of an IP packet with some low probability. Thus, the need to maintain state in either the packet or the router is obviated. Furthermore, the low probability keeps the processing overhead as well as the bandwidth requirement low. Bellovin suggests that the selection also be based on pseudo-random numbers to help block attempts to time attack bursts. The problem with this approach is that routers commonly block ICMP messages because of security issues associated with them.
In this type of solution, an observer tracks an existing attack flow by examining incoming and outgoing ports on routers starting from the host under attack. Thus, such a solution requires having privileged access to routers along the attack path.
To bypass this restriction and automate this process, Stone proposes routing suspicious packets on an overlay network using ISP edge routers. By simplifying the topology, suspicious packets can easily be re-routed to a specialized network for further analysis.
By nature of DoS, any such attack will be sufficiently long lived for tracking in such a fashion to be possible. Layer-three topology changes, while hard to mask to a determined attacker, have the possibility of alleviating the DoS until the routing change is discovered and subsequently adapted to. Once the attacker has adapted, the re-routing scheme can once again adapt and re-route; causing an oscillation in the DoS attack; granting some ability to absorb the impact of such an attack.
Hal Burch and William Cheswick propose a controlled flooding of links to determine how this flooding affects the attack stream. Flooding a link will cause all packets, including packets from the attacker, to be dropped with the same probability. We can conclude from this that if a given link were flooded, and packets from the attacker slowed, then this link must be part of the attack path. Then recursively upstream routers are “coerced” into performing this test until the attack path is discovered. [9]
The traceback problem is complicated because of spoofed packets. Thus, a related effort is targeted towards preventing spoofed packets; known as ingress filtering. Ingress Filtering restricts spoofed packets at ingress points to the network by tracking the set of legitimate source networks that can use this router.
Park and Lee present an extension of Ingress Filtering at layer 3. They present a means of detecting false packets, at least to the subnet, by essentially making use of existing OSPF routing state to have routers make intelligent decisions about whether or not a packet should be routed.[ citation needed ]
The Dynamic Host Configuration Protocol (DHCP) is a network management protocol used on Internet Protocol (IP) networks for automatically assigning IP addresses and other communication parameters to devices connected to the network using a client–server architecture.
An Internet Protocol address is a numerical label such as 192.0.2.1 that is assigned to a device connected to a computer network that uses the Internet Protocol for communication. IP addresses serve two main functions: network interface identification, and location addressing.
The Internet Control Message Protocol (ICMP) is a supporting protocol in the Internet protocol suite. It is used by network devices, including routers, to send error messages and operational information indicating success or failure when communicating with another IP address. For example, an error is indicated when a requested service is not available or that a host or router could not be reached. ICMP differs from transport protocols such as TCP and UDP in that it is not typically used to exchange data between systems, nor is it regularly employed by end-user network applications.
Internet Protocol version 4 (IPv4) is the first version of the Internet Protocol (IP) as a standalone specification. It is one of the core protocols of standards-based internetworking methods in the Internet and other packet-switched networks. IPv4 was the first version deployed for production on SATNET in 1982 and on the ARPANET in January 1983. It is still used to route most Internet traffic today, even with the ongoing deployment of Internet Protocol version 6 (IPv6), its successor.
Internet Protocol version 6 (IPv6) is the most recent version of the Internet Protocol (IP), the communications protocol that provides an identification and location system for computers on networks and routes traffic across the Internet. IPv6 was developed by the Internet Engineering Task Force (IETF) to deal with the long-anticipated problem of IPv4 address exhaustion, and was intended to replace IPv4. In December 1998, IPv6 became a Draft Standard for the IETF, which subsequently ratified it as an Internet Standard on 14 July 2017.
The Internet Protocol (IP) is the network layer communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet.
Multiprotocol Label Switching (MPLS) is a routing technique in telecommunications networks that directs data from one node to the next based on labels rather than network addresses. Whereas network addresses identify endpoints, the labels identify established paths between endpoints. MPLS can encapsulate packets of various network protocols, hence the multiprotocol component of the name. MPLS supports a range of access technologies, including T1/E1, ATM, Frame Relay, and DSL.
In computer networking, IP address spoofing or IP spoofing is the creation of Internet Protocol (IP) packets with a false source IP address, for the purpose of impersonating another computing system.
Differentiated services or DiffServ is a computer networking architecture that specifies a mechanism for classifying and managing network traffic and providing quality of service (QoS) on modern IP networks. DiffServ can, for example, be used to provide low-latency to critical network traffic such as voice or streaming media while providing best-effort service to non-critical services such as web traffic or file transfers.
IP fragmentation is an Internet Protocol (IP) process that breaks packets into smaller pieces (fragments), so that the resulting pieces can pass through a link with a smaller maximum transmission unit (MTU) than the original packet size. The fragments are reassembled by the receiving host.
In Internet networking, a private network is a computer network that uses a private address space of IP addresses. These addresses are commonly used for local area networks (LANs) in residential, office, and enterprise environments. Both the IPv4 and the IPv6 specifications define private IP address ranges.
An overlay network is a computer network that is layered on top of another network. The concept of overlay networking is distinct from the traditional model of OSI layered networks, and almost always assumes that the underlay network is an IP network of some kind.
In computer networking, ingress filtering is a technique used to ensure that incoming packets are actually from the networks from which they claim to originate. This can be used as a countermeasure against various spoofing attacks where the attacker's packets contain fake IP addresses. Spoofing is often used in denial-of-service attacks, and mitigating these is a primary application of ingress filtering.
In public-key cryptography, a public key fingerprint is a short sequence of bytes used to identify a longer public key. Fingerprints are created by applying a cryptographic hash function to a public key. Since fingerprints are shorter than the keys they refer to, they can be used to simplify certain key management tasks. In Microsoft software, "thumbprint" is used instead of "fingerprint."
A forwarding information base (FIB), also known as a forwarding table or MAC table, is most commonly used in network bridging, routing, and similar functions to find the proper output network interface controller to which the input interface should forward a packet. It is a dynamic table that maps MAC addresses to ports. It is the essential mechanism that separates network switches from Ethernet hubs. Content-addressable memory (CAM) is typically used to efficiently implement the FIB, thus it is sometimes called a CAM table.
IEEE 802.1aq is an amendment to the IEEE 802.1Q networking standard which adds support for Shortest Path Bridging (SPB). This technology is intended to simplify the creation and configuration of Ethernet networks while enabling multipath routing.
An Internet Protocol version 6 address is a numeric label that is used to identify and locate a network interface of a computer or a network node participating in a computer network using IPv6. IP addresses are included in the packet header to indicate the source and the destination of each packet. The IP address of the destination is used to make decisions about routing IP packets to other networks.
An IPv6 packet is the smallest message entity exchanged using Internet Protocol version 6 (IPv6). Packets consist of control information for addressing and routing and a payload of user data. The control information in IPv6 packets is subdivided into a mandatory fixed header and optional extension headers. The payload of an IPv6 packet is typically a datagram or segment of the higher-level transport layer protocol, but may be data for an internet layer or link layer instead.
SCION is a Future Internet architecture that aims to offer high availability and efficient point-to-point packet delivery with network path selection, even in the presence of actively malicious network operators and devices. It has been developed by researchers at ETH Zurich since 2009, is deployed in production networks, and is currently being explored by the IETF Path Aware Networking Research Group.
Deterministic Networking (DetNet) is an effort by the IETF DetNet Working Group to study implementation of deterministic data paths for real-time applications with extremely low data loss rates, packet delay variation (jitter), and bounded latency, such as audio and video streaming, industrial automation, and vehicle control.