Crowds (anonymity network)

Last updated

Crowds is a proposed anonymity network for anonymous web browsing. The main idea behind Crowds anonymity protocol is to hide each user's communications by routing them randomly within a group of similar users. Neither the collaborating group members nor the end receiver can therefore be sure where in the group the packet originated. Crowds was designed by Michael K. Reiter and Aviel D. Rubin. It defends against internal attackers and a corrupt receiver, but provides no anonymity against a global attacker or a local eavesdropper (see "Crowds: Anonymity For Web Transactions"). Crowds is vulnerable to the predecessor attack; this was discussed in Reiter and Rubin's paper and further expanded in "The Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems" by Matthew K. Wright, Micah Adler, And Brian Neil Levine. Crowds introduced the concept of users blending into a crowd of computers. [1]

Contents

How crowds works

  1. Each user joins a crowd of other users by registering himself at the blender which is a single server responsible for membership management. When a user registers, all the other members in the crowd are notified. The blender is also responsible for key distribution, as it distributes symmetric keys to individual pairs of jondos, used for encryption and decryption, respectively of packets routed along the virtual paths. [2]
  2. Each user is represented by a jondo on her machine which is an application that runs on a user's computer.
  3. Each jondo either submits a request to the end server or forwards it to a randomly chosen jondo (possibly itself). Other jondo tasks are to strip out any personal information such as cookies, identifying header fields.
  4. A jondo cannot tell if a request is initiated by the previous jondo or one before it.
  5. Request and reply follow the same virtual paths which are constructed using an algorithm involving probabilities. The virtual paths are torn down and reconstructed on a regular basis to allow anonymity for newly added members.

Definitions

Crowds uses and defines the following terms:

Sender
The initiator of a message
Receiver
The final recipient of a message
Probable Innocence
The attacker is unable to have greater than 50% confidence that any node initiated the message (a node appears equally likely to have initiated the message as to not have - each user is more likely innocent than not.)
Local Eavesdropper
An attacker that can observe all incoming and outgoing messages for any proper subset of the nodes
Corrupt Node
A node is corrupt if it uses information obtained from forwarding the message to determine the sender
The number of corrupt nodes
The number of nodes ( is the number of good nodes)
The probability of forwarding

Basic Design

Crowds works by making each node seem equally likely to be the initiator of the message. As we said each node joins the network by starting a jondo (from "John Doe"), which is a small process that will forward and receive requests from other users. When the jondo is started all nodes in the network are informed of the new node's entrance, and will begin to select him as a forwarder. To actually send a message a node chooses randomly (with uniform probability) from all nodes in the network and forwards the message to them. Upon receiving the message the node flips a biased coin (with probability ) and if it lands heads forwards it to another random node, otherwise it forwards it to the final destination. Each node when forwarding to another node records the predecessor and in this way a tunnel is built, this is used for the communication between the sender and the receiver.

The algorithm on each machine

OnReceive(Node P, Message M)

  1. Flip biased coin ()
    1. If Heads Then Select a uniformly random node and forward to them
    2. Else Forward to destination
  2. Record P so that a tunnel can be built

Security analysis

We consider the question of what information an attacker can learn about the senders and receivers of web transactions, given the mechanisms of Crowds we described.

Local eavesdropper

Recall that every message forwarded on a path, except for the final request to the end server, is encrypted. Thus, while the eavesdropper is able to view any message emanating from the user's computer, it only views a message submitted to the end server if the user's jondo ultimately submits the user's request itself. Since the probability that the user's jondo ultimately submits the request is 1/n where n is the size of the crowd when the path was created. Thus we learn that the probability that the eavesdropper learns the identity of the receiver decreases as a function of crowd size. Moreover, when the user's jondo does not ultimately submit the request, the local eavesdropper sees only the encrypted address of the end server, which we suggest yields receiver anonymity that is (informally) beyond suspicion. (beyond suspicion - no user is more suspicious than other).

Collaborating jondos

Consider a set of collaborating corrupted jondos in the crowd. Because each jondo can observe plaintext traffic on a path routed through it, any such traffic, including the address of the end server is exposed to this attacker. The question we consider here is if the attacker can determine who initiated the path. The goal of the collaborators is to determine the member that initiated the path. We now analyze how confident the collaborators can be that their immediate predecessor is in fact the path initiator:

  1. Let Hk, k >= 1, denote the event that the first collaborator on the path occupies the kth position on the path, where the initiator itself occupies the 0th position (and possibly others).
  2. Let define Hk+ = Hk or Hk+1 or Hk+2 or . . . .
  3. Let I denote the event that the first collaborator on the path is immediately preceded on the path by the path initiator.

Note that H1 => I, but the converse I => H1 is not true, because the initiating jondo might appear on the path multiple times. There can be a case where path is composed as follow:

initiator jondo(0 - position) ----> jondo(1 - position) ---->
initiator jondo(2 - position) ----> Collaborating jondo(3 - position)

Note that the first collaborator on the path is in the third position.

4.Given this notation, the collaborators now hope to determine:

P(I|H1+) - given that a collaborator is on the path, what is the probability that the path initiator is the first collaborator's immediate predecessor?

Definition:
The path initiator has probable innocence if P(I|H1+)<=1\2.

In order to yield probable innocence for the path initiator, certain conditions must be met in our system. In particular, let pf > 1/2 (the probability of forwarding in the system.)

(c - number of collaborators in the crowd)

(n - total number of crowd members when the path is formed)

The theorem below gives a sufficient condition on pf, c, and n to ensure probable innocence for the path initiator.

Theorem:The path initiator has probable innocence against c collaborators in case

Proof: we want to show that pf > 1/2   if  

note that:

P(Hi) =

in order for the first collaborator to be in the ith position on the path, the path must first wander to i-1 noncollaborators each time with probability of , each of which chooses to forward the path with probability pf, and then to a collaborator with probability .

The next two facts follow immediately from this

P(H1+) =

P(H2+) =

P(H1) =

P(I|H1) =

P(I|H2) =

Now, P(I) can be captured as

P(I) = P(H1)P(I|H1) + P(H2+)P(IH2+) =

since I=>H1+

P(I|H1+)= = =

so, if

then P(I|H1+)<=1\2

E.g. if pf=3\4, then probable innocence is guaranteed as long as n >= 3(c + 1).

Static paths

Dynamic paths tends to decrease the anonymity properties provided by the system against collaborating jondos. The reason is that the probable innocence vanishes if the collaborators are able to link many distinct paths as being initiated by the same jondo. Collaborating jondos might be able to link paths initiated by the same unknown jondo based on related path content or timing of communication on paths. To prevent this, we made paths static, so the attacker simply does not have multiple paths to link to the same jondo.

Embedded images and timing attacks

An HTML page can include a URL (e.g., the address of an image) that, when the page is retrieved, causes the user's browser to automatically issue another request. It is the immediate nature of these requests that poses the greatest opportunity for timing attacks by collaborating jondos. The first collaborating jondo on a path, upon returning a web page on that path containing a URL that will be automatically retrieved, can time the duration until it receives the request for that URL. If the duration is sufficiently short, then this could reveal that the collaborator's immediate predecessor is the initiator of the request.

Prevention

When a jondo receives an HTML reply to a request that it either received directly from a user's browser or submitted directly to an end server, it parses the HTML page to identify all URLs that the user's browser will automatically request as a result of receiving this reply. The last jondo on the path requests these URLs and sends them back along the same path on which the original request was received. The user's jondo, upon receiving requests for these URLs from the user's browser, does not forward these requests on the path, but rather simply waits for the URLs contents to arrive on the path and then feeds them to the browser. In this way, other jondos on the path never see the requests that are generated by the browser, and thus cannot glean timing information from them.

Scale

The measure of scale that we evaluate is the expected total number of appearances that each jondo makes on all paths at any point in time. For example, if a jondo occupies two positions on one path and one position on another, then it makes a total of three appearances on these paths.

Theorem : In a crowd of size n, the expected total number of appearances that any jondo makes on all paths is

Each jondo's expected number of appearances on paths is virtually constant as a function of the size of the crowd. This suggests that crowds should be able to grow quite large.

Attacks

Crowds provides perfect anonymity against a corrupt receiver (i.e. see Degree of anonymity) as all members appear equally likely to have been the initiator. As we showed against collaborating corrupt nodes Crowds provides probable innocence as long as (see the paper for the derivation of this), and provides a degree of anonymity . Against the predecessor attack Crowds succumbs in ; this attack works by a corrupt node retaining the previous hop in the path, as this will be the sender more than any other node over the rounds of rebuilding the network it will become apparent who the initiator is. Reiter and Rubin mention this and recommend long (and if possible infinite) time between path reformations (caused when a node in the path leaves the network). Crowds is unable to protect against a global eavesdropper as it cannot use encryption on the links, this is because each node in Crowds is able to communicate with every other node (a fully connected graph), because of this setting up symmetric keys requires pairwise keys; this is too large of a number to be feasible. Against a local eavesdropper again Crowds provides no protection as the eavesdropper will see a message coming out of a node that did not enter, and this positively identifies the node as the sender.

See also

Related Research Articles

The erlang is a dimensionless unit that is used in telephony as a measure of offered load or carried load on service-providing elements such as telephone circuits or telephone switching equipment. A single cord circuit has the capacity to be used for 60 minutes in one hour. Full utilization of that capacity, 60 minutes of traffic, constitutes 1 erlang.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

<span class="mw-page-title-main">Queueing theory</span> Mathematical study of waiting lines, or queues

Queueing theory is the mathematical study of waiting lines, or queues. A queueing model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service.

<span class="mw-page-title-main">GNUnet</span> Framework for decentralized, peer-to-peer networking which is part of the GNU Project

GNUnet is a software framework for decentralized, peer-to-peer networking and an official GNU package. The framework offers link encryption, peer discovery, resource allocation, communication over many transports and various basic peer-to-peer algorithms for routing, multicast and network size estimation.

An anonymous P2P communication system is a peer-to-peer distributed application in which the nodes, which are used to share resources, or participants are anonymous or pseudonymous. Anonymity of participants is usually achieved by special routing overlay networks that hide the physical location of each node from other participants.

<span class="mw-page-title-main">Naive Bayes spam filtering</span>

Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

In anonymity networks, it is important to be able to measure quantitatively the guarantee that is given to the system. The degree of anonymity is a device that was proposed at the 2002 Privacy Enhancing Technology (PET) conference. Two papers put forth the idea of using entropy as the basis for formally measuring anonymity: "Towards an Information Theoretic Metric for Anonymity", and "Towards Measuring Anonymity". The ideas presented are very similar with minor differences in the final definition of .

<span class="mw-page-title-main">Watts–Strogatz model</span> Method of generating random small-world graphs

The Watts–Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their article published in 1998 in the Nature scientific journal. The model also became known as the (Watts) beta model after Watts used to formulate it in his popular science book Six Degrees.

Cooperative diversity is a cooperative multiple antenna technique for improving or maximising total network channel capacities for any given set of bandwidths which exploits user diversity by decoding the combined signal of the relayed signal and the direct signal in wireless multihop networks. A conventional single hop system uses direct transmission where a receiver decodes the information only based on the direct signal while regarding the relayed signal as interference, whereas the cooperative diversity considers the other signal as contribution. That is, cooperative diversity decodes the information from the combination of two signals. Hence, it can be seen that cooperative diversity is an antenna diversity that uses distributed antennas belonging to each node in a wireless network. Note that user cooperation is another definition of cooperative diversity. User cooperation considers an additional fact that each user relays the other user's signal while cooperative diversity can be also achieved by multi-hop relay networking systems.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

Random walk closeness centrality is a measure of centrality in a network, which describes the average speed with which randomly walking processes reach a node from other nodes of the network. It is similar to the closeness centrality except that the farness is measured by the expected length of a random walk rather than by the shortest path.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world.

In queueing theory, the Engset formula is used to determine the blocking probability of an M/M/c/c/N queue.

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.

<span class="mw-page-title-main">Configuration model</span>

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

<span class="mw-page-title-main">Graph neural network</span> Class of artificial neural networks

A graph neural network (GNN) is a class of artificial neural networks for processing data that can be represented as graphs.

Quantum secret sharing (QSS) is a quantum cryptographic scheme for secure communication that extends beyond simple quantum key distribution. It modifies the classical secret sharing (CSS) scheme by using quantum information and the no-cloning theorem to attain the ultimate security for communications.

References

  1. Fischer-Hubner, Simone (2001) IT-Security and Privacy: Design and use of privacy-enhancing security mechanisms , Springer, p. 134-5
  2. "Enabling Anonymity for the Mobile Internet Using the mCrowds System". Engineering. 2004.

Further reading

  1. ^ Claudia Diaz and Stefaan Seys and Joris Claessens and Bart Preneel (April 2002). Roger Dingledine and Paul Syverson (ed.). "Towards measuring anonymity". Proceedings of Privacy Enhancing Technologies Workshop (PET 2002). Springer-Verlag, LNCS 2482. Archived from the original on 2006-07-10. Retrieved 2005-11-10.
  2. ^ Michael Reiter and Aviel Rubin (June 1998). "Crowds: Anonymity for Web Transactions" (PDF). ACM Transactions on Information and System Security. 1 (1). Archived from the original (PDF) on 2005-12-12. Retrieved 2005-11-23.
  3. ^ Matthew K. Wright and Micah Adler and Brian Neil Levine and Clay Shields (2004). "The predecessor attack: An analysis of a threat to anonymous communications systems" (PDF). ACM Transactions on Information and System Security. ACM Press. 7 (4): 489–522. doi:10.1145/1042031.1042032. S2CID   7711031. Archived from the original (PDF) on 2005-09-24. Retrieved 2005-11-23.
  4. ^ Matthew Wright and Micah Adler and Brian Neil Levine and Clay Shields (February 2002). "An Analysis of the Degradation of Anonymous Protocols" (PDF). Proceedings of the Network and Distributed Security Symposium - NDSS '02. IEEE. Archived from the original (PDF) on 2006-02-19. Retrieved 2005-11-23.