P2P caching

Last updated December 23, 2021

Peer-to-peer caching (P2P caching) is a computer network traffic management technology used by Internet Service Providers (ISPs) to accelerate content delivered over peer-to-peer (P2P) networks while reducing related bandwidth costs.

P2P caching is similar in principle to the content caching long used by ISPs to accelerate Web (HTTP) content. P2P caching temporarily stores popular content that is flowing into an ISP's network. If the content requested by a subscriber is available from a cache, the cache satisfies the request from its temporary storage, eliminating data transfer through expensive transit links and reducing network congestion. This approach could make ISPs violate laws as P2P systems share files that infringe copyrights in significant portions.^[1]

P2P content responds well to caching because it has high reuse patterns reflecting a Zipf's-like distribution.^[2]^[3]^[4] P2P communities have different Zipf's parameters^[4] which determine what fraction of files is requested multiple times. For example, one P2P community may request 75% of content multiple times while another may request only 10%.

Some P2P caching devices can also accelerate HTTP video streaming traffic from YouTube, Facebook, RapidShare, MegaUpload, Google, AOL Video, MySpace and other web video-sharing sites.^[5]

How P2P caching works

P2P caching involves creating a cache or temporary storage space for P2P data, using specialized communications hardware, disk storage and associated software. This cache is placed in the ISP's network, either co-located with the Internet transit links or placed at key aggregation points or at each cable head-end.

Once a P2P cache is established, the network will transparently redirect P2P traffic to the cache, which either serves the file directly or passes the request on to a remote P2P user and simultaneously caches that data for the next user. To what extent the caching is beneficial depends on how similar the content interests of ISP's customers. Due to relatively small number of content shared in P2P systems (compared to Web) and semantic, geographic, and organization interests of users^[4] sharing ratio in P2P can be significantly higher than HTTP/Web caching^{[ citation needed ]}.

P2P caching typically works with a network traffic-mitigation technology called Deep Packet Inspection (DPI). DPI technology is used by service providers to understand what traffic is running across their networks and to separate it and treat it for the most efficient delivery. DPI products identify and pass P2P packets to the P2P caching system so it can cache the traffic and accelerate it.

Peerapp Ltd. holds the first patent ^[6] for P2P caching technology, which was filed in 2000.

The P2P bandwidth problem

In 2008, peer-to-peer traffic was estimated to account for 50% of all Internet traffic, and was expected to quadruple between 2008 and 2013, reaching 3.3 exabytes per month– or the equivalent of 500 million DVDs each month.^[7] However, this trend has been discontinued, as by 2016 the global P2P traffic began to lower, showing a 6% descent between 2016 and 2021.^[8] These statistics may be explained by the popularization of Video on Demand services, which have (until the moment) used a centralized architecture for data distribution.

Increasing P2P traffic has created problems for ISPs. Networks can become saturated with P2P traffic, creating congestion for other types of Internet use. The cost of P2P traffic is disproportionate to the amount of revenue ISPs make from these customers because of the flat-rate packages of bandwidth commonly sold. To prevent P2P traffic from degrading service for all subscribers, ISPs typically face three choices:

Invest in additional bandwidth and equipment. Unfortunately, increasing bandwidth often does not solve the problem, because P2P applications inherently tend to consume as much bandwidth as available.
Implement stricter byte caps, policies, or P2P traffic-shaping, limiting the speed of P2P traffic. The difficulty is that P2P packets are becoming harder and harder to identify, especially with the introduction of encryption (such as BitTorrent protocol encryption). Traffic shaping can also generate negative publicity and customer reactions.
Implement a form of P2P caching.

Caching releases the bandwidth demand on critical Internet links and improves the experience for all users – P2P users whose file sharing is improved through using the cache, and non-P2P users who experience better performance from networks un-congested from P2P traffic.

The initial adopters of P2P caching have been ISPs in Asia, the Pacific Rim, Latin America, the Caribbean and the Middle East, whose subscribers are heavy users of P2P networks and where providing the additional bandwidth to handle P2P data is very costly due to the expense of international transit links.

P2P caching is expected to become an increasingly essential technology for ISPs and MSOs (multiple system operators) worldwide, particularly with the growing popularity of P2P content among broadband subscribers and the adoption of P2P as a content-distribution strategy by mainstream content providers such as the BBC.

P2P caching implementations

PeerApp UltraBand Media Caching Software
Corelli ^[9] is a community-based P2P caching system that operates in a decentralized way across multiple peers. This allows a caching service to be realised in environments that do not possess fixed caching infrastructure, e.g. a Wireless ad hoc network.
Community Caching is a P2P community-interest-aware, distributed caching solution for structured (DHT-based) P2P systems. It alleviates the overhead due to isolating P2P communities and loss of content popularity due to aggregation of content from multiple communities.^[4]

Sources

↑ Jacob, Assaf M.; Zoe Argento (1 Sep 2010). "To Cache or Not to Cache – That is the Question; P2P 'System Caching' – The Copyright Dilemma". Whittier Law Review. 31: 421-. SSRN 1670289 .
↑ Sripanidkulchai, K. "The popularity of Gnutella queries and its implications on scalability" . Retrieved 6 January 2012.
↑ Klemm, A.; C. Lindemann; M. K. Vernon; O. P. Waldhorst (2004). Characterizing the query behavior in peer-to-peer file sharing systems (PDF). 4th ACM SIGCOMM Conf. on Internet Measurement.
1 2 3 4 Bandara, H. M. N. Dilum; A. P. Jayasumana (June 2011). Exploiting communities for enhancing lookup performance in structured P2P systems. IEEE Int. Conf. on Communications (ICC '11). doi:10.1109/icc.2011.5962882.
↑ "Archived copy". Archived from the original on 2010-06-09. Retrieved 2010-05-23.CS1 maint: archived copy as title (link)
↑ U.S Patent Number 7,203,741 B2
↑ Cisco. "Approaching the Zettabyte Era". Cisco. Retrieved 6 January 2012.
↑ Cisco. "Cisco Visual Networking Index: Forecast and Methodology, 2016–2021". Cisco. Retrieved 17 August 2018.
↑ Gareth Tyson, Andreas Mauthe, Sebastian Kaune, Mu Mu and Thomas Plagemann. Corelli: A Peer-to-Peer Dynamic Replication Service for Supporting Latency-Dependent Content in Community Networks. "Archived copy" (PDF). Archived from the original (PDF) on 2015-06-18. Retrieved 2012-04-26.CS1 maint: archived copy as title (link)

Related Research Articles

Gnutella is a peer-to-peer network protocol. Founded in 2000, it was the first decentralized peer-to-peer network of its kind, leading to other, later networks adopting the model.

Peer-to-peer Type of decentralized and distributed network architecture

Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the application. They are said to form a peer-to-peer network of nodes.

In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource.

BitTorrent is a communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a decentralized manner.

Traffic shaping is a bandwidth management technique used on computer networks which delays some or all datagrams to bring them into compliance with a desired traffic profile. Traffic shaping is used to optimize or guarantee performance, improve latency, or increase usable bandwidth for some kinds of packets by delaying other kinds. It is often confused with traffic policing, the distinct but related practice of packet dropping and packet marking.

A digital subscriber line access multiplexer is a network device, often located in telephone exchanges, that connects multiple customer digital subscriber line (DSL) interfaces to a high-speed digital communications channel using multiplexing techniques. Its cable internet (DOCSIS) counterpart is the Cable modem termination system.

Deep packet inspection (DPI) is a type of data processing that inspects in detail the data being sent over a computer network, and may take actions such as alerting, blocking, re-routing, or logging it accordingly. Deep packet inspection is often used to baseline application behavior, analyze network usage, troubleshoot network performance, ensure that data is in the correct format, check for malicious code, eavesdropping, and internet censorship, among other purposes. There are multiple headers for IP packets; network equipment only needs to use the first of these for normal operation, but use of the second header is normally considered to be shallow packet inspection despite this definition.

Internet traffic is the flow of data within the entire Internet, or in certain network links of its constituent networks. Common traffic measurements are total volume, in units of multiples of the byte, or as transmission rates in bytes per certain time units.

A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects, downloadable objects, applications, live streaming media, on-demand streaming media, and social media sites.

Network neutrality, most commonly called net neutrality, is the principle that Internet service providers (ISPs) must treat all Internet communications equally, and not discriminate or charge differently based on user, content, website, platform, application, type of equipment, source address, destination address, or method of communication.

Bandwidth throttling consists in the limitation of the communication speed of the ingoing (received) data and/or in the limitation of the speed of outgoing (sent) data in a network node or in a network device.

In commercial network routing between autonomous systems which are interconnected in multiple locations, hot-potato routing is the practice of passing traffic off to another autonomous system as quickly as possible, thus using their network for wide-area transit. Cold-potato routing is the opposite, where the originating autonomous system holds onto the packet until it is as near to the destination as possible.

A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are made up of telecommunication network technologies, based on physically wired, optical, and wireless radio-frequency methods that may be arranged in a variety of network topologies.

In computing, Microsoft's Windows Vista and Windows Server 2008 introduced in 2007/2008 a new networking stack named Next Generation TCP/IP stack, to improve on the previous stack in several ways. The stack includes native implementation of IPv6, as well as a complete overhaul of IPv4. The new TCP/IP stack uses a new method to store configuration settings that enables more dynamic control and does not require a computer restart after a change in settings. The new stack, implemented as a dual-stack model, depends on a strong host-model and features an infrastructure to enable more modular components that one can dynamically insert and remove.

ApplianSys, founded in 2000, is a privately held venture capital-backed technology company based in Coventry, United Kingdom. It designs, builds and markets Internet server appliances that are deployed in more than 150 countries. Forrester Research have listed ApplianSys as being a key vendor in the worldwide IP Address Management market, with its DNS engine used in a third of all GPRS networks.

Network intelligence (NI) is a technology that builds on the concepts and capabilities of Deep Packet Inspection (DPI), Packet Capture and Business Intelligence (BI). It examines, in real time, IP data packets that cross communications networks by identifying the protocols used and extracting packet content and metadata for rapid analysis of data relationships and communications patterns. Also, sometimes referred to as Network Acceleration or piracy.

Copyright Alert System (CAS) was a voluntary industry effort to educate and penalize internet users who engage in the unauthorized and unlawful distribution of copyrighted works via peer-to-peer file sharing services. The program was operated by the Center for Copyright Information, a consortium consisting of the Recording Industry Association of America (RIAA), the Motion Picture Association of America (MPAA), and the internet service providers AT&T, Cablevision, Comcast, Time Warner Cable, and Verizon.

Traffic classification is an automated process which categorises computer network traffic according to various parameters into a number of traffic classes. Each resulting traffic class can be treated differently in order to differentiate the service implied for the data generator or consumer.

Internet bottlenecks are places in telecommunication networks in which internet service providers (ISPs), or naturally occurring high use of the network, slow or alter the network speed of the users and/or content producers using that network. A bottleneck is a more general term for a system that has been reduced or slowed due to limited resources or components. The bottleneck occurs in a network when there are too many users attempting to access a specific resource. Internet bottlenecks provide artificial and natural network choke points to inhibit certain sets of users from overloading the entire network by consuming too much bandwidth. Theoretically, this will lead users and content producers through alternative paths to accomplish their goals while limiting the network load at any one time. Alternatively, internet bottlenecks have been seen as a way for ISPs to take advantage of their dominant market-power increasing rates for content providers to push past bottlenecks. The United States Federal Communications Commission (FCC) has created regulations stipulating that artificial bottlenecks are in direct opposition to a free and open Internet.

Net bias is the counter-principle to net neutrality, which indicates differentiation or discrimination of price and the quality of content or applications on the Internet by ISPs. Similar terms include data discrimination, digital redlining, and network management.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Jacob, Assaf M.; Zoe Argento (1 Sep 2010). "To Cache or Not to Cache – That is the Question; P2P 'System Caching' – The Copyright Dilemma". Whittier Law Review. 31: 421-. SSRN 1670289 .

[2] Sripanidkulchai, K. "The popularity of Gnutella queries and its implications on scalability" . Retrieved 6 January 2012.

[3] Klemm, A.; C. Lindemann; M. K. Vernon; O. P. Waldhorst (2004). Characterizing the query behavior in peer-to-peer file sharing systems (PDF). 4th ACM SIGCOMM Conf. on Internet Measurement.

[bandara&jayasumana-4] 1 2 3 4 Bandara, H. M. N. Dilum; A. P. Jayasumana (June 2011). Exploiting communities for enhancing lookup performance in structured P2P systems. IEEE Int. Conf. on Communications (ICC '11). doi:10.1109/icc.2011.5962882.

[5] "Archived copy". Archived from the original on 2010-06-09. Retrieved 2010-05-23.CS1 maint: archived copy as title (link)

[6] U.S Patent Number 7,203,741 B2

[Cisco_2008-7] Cisco. "Approaching the Zettabyte Era". Cisco. Retrieved 6 January 2012.

[Cisco-8] Cisco. "Cisco Visual Networking Index: Forecast and Methodology, 2016–2021". Cisco. Retrieved 17 August 2018.

[9] Gareth Tyson, Andreas Mauthe, Sebastian Kaune, Mu Mu and Thomas Plagemann. Corelli: A Peer-to-Peer Dynamic Replication Service for Supporting Latency-Dependent Content in Community Networks. "Archived copy" (PDF). Archived from the original (PDF) on 2015-06-18. Retrieved 2012-04-26.CS1 maint: archived copy as title (link)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]