Private set intersection

Last updated
Private set intersection
General
Related to homomorphic encryption

Private set intersection is a secure multiparty computation cryptographic technique [1] that allows two parties holding sets to compare encrypted versions of these sets in order to compute the intersection. In this scenario, neither party reveals anything to the counterparty except for the elements in the intersection.

Example PSI diagram depicting basic security requirements Example PSI diagram depicting basic security requirements.png
Example PSI diagram depicting basic security requirements

Other variants of this exist, such as the server-client scenario, in which only the client learns the intersection of her set with the set of the server, without the server learning intersection of his set with the clients. [2]

For the comparison of data sets by cryptographic hashes on a small or predictable domain, precautions should be taken to prevent dictionary attacks. [3]

Apple uses this technique in Password Monitoring. [4] It has proposed using the technology for its announced Expanded Protections for Children [5]

In general, PSI protocols can be categorized into two broad categories: (1) traditional PSI and (2) delegated PSI. In the traditional PSI category, the data owners interact directly with each other and need to have a copy of their set at the time of the computation, e.g.,. [6] In the delegated PSI the computation of PSI and/or the storage of sets can be delegated to a third-party server (that is itself might be a passive or active adversary). The delegated PSI category can be further divided into two classes: (a) those that support one-off delegation, and (b) those that support repeated delegation. The PSI protocols that support one-off delegation require the data owner to re-encode its data and send the encoded data to the server for each computation, e.g.,. [7] Those that support repeated delegation allow the data owners to upload their (encrypted) data to the server only once, and then re-use it many times for each computation carried out but the server, e.g., [8]

Recently, researchers have proposed a variant of PSI protocol (in both traditional and delegated categories) that support data update, e.g., . [9] [10] This type of PSI protocol lets data owners insert/delete set elements into/from their data with low overheads and in a privacy-preserving manner.

Educational example

This educational example demonstrated the key idea of PSI, but does not provide real-world cryptographic security (hence should not be used with real-world data).

# Example setsparty_a_set={'apple','banana','cherry'}party_b_set={'banana','orange','apple'}# Hashing the elements in both setshashed_party_a_set={hash(e)foreinparty_a_set}hashed_party_b_set={hash(e)foreinparty_b_set}# Finding the intersection of the hashed setsintersection=hashed_party_a_set.intersection(hashed_party_b_set)# Printing hashed intersection for demonstrationprint(intersection)

Related Research Articles

In cryptography and computer security, a man-in-the-middle (MITM) attack is a cyberattack where the attacker secretly relays and possibly alters the communications between two parties who believe that they are directly communicating with each other, as the attacker has inserted themselves between the two parties.

Transport Layer Security (TLS) is a cryptographic protocol designed to provide communications security over a computer network. The protocol is widely used in applications such as email, instant messaging, and voice over IP, but its use in securing HTTPS remains the most publicly visible.

Articles related to cryptography include:

In cryptography, an oblivious transfer (OT) protocol is a type of protocol in which a sender transfers one of potentially many pieces of information to a receiver, but remains oblivious as to what piece has been transferred.

Secure multi-party computation is a subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private. Unlike traditional cryptographic tasks, where cryptography assures security and integrity of communication or storage and the adversary is outside the system of participants, the cryptography in this model protects participants' privacy from each other.

In cryptography, a private information retrieval (PIR) protocol is a protocol that allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-n oblivious transfer, where it is also required that the user should not get information about other database items.

Proof of work (PoW) is a form of cryptographic proof in which one party proves to others that a certain amount of a specific computational effort has been expended. Verifiers can subsequently confirm this expenditure with minimal effort on their part. The concept was invented by Moni Naor and Cynthia Dwork in 1993 as a way to deter denial-of-service attacks and other service abuses such as spam on a network by requiring some work from a service requester, usually meaning processing time by a computer. The term "proof of work" was first coined and formalized in a 1999 paper by Markus Jakobsson and Ari Juels.

Off-the-Record Messaging (OTR) is a cryptographic protocol that provides encryption for instant messaging conversations. OTR uses a combination of AES symmetric-key algorithm with 128 bits key length, the Diffie–Hellman key exchange with 1536 bits group size, and the SHA-1 hash function. In addition to authentication and encryption, OTR provides forward secrecy and malleable encryption.

Venti is a network storage system that permanently stores data blocks. A 160-bit SHA-1 hash of the data acts as the address of the data. This enforces a write-once policy since no other data block can be found with the same address: the addresses of multiple writes of the same data are identical, so duplicate data is easily identified and the data block is stored only once. Data blocks cannot be removed, making it ideal for permanent or backup storage. Venti is typically used with Fossil to provide a file system with permanent snapshots.

In cryptography, CRAM-MD5 is a challenge–response authentication mechanism (CRAM) based on the HMAC-MD5 algorithm. As one of the mechanisms supported by the Simple Authentication and Security Layer (SASL), it is often used in email software as part of SMTP Authentication and for the authentication of POP and IMAP users, as well as in applications implementing LDAP, XMPP, BEEP, and other protocols.

In a Windows network, NT LAN Manager (NTLM) is a suite of Microsoft security protocols intended to provide authentication, integrity, and confidentiality to users. NTLM is the successor to the authentication protocol in Microsoft LAN Manager (LANMAN), an older Microsoft product. The NTLM protocol suite is implemented in a Security Support Provider, which combines the LAN Manager authentication protocol, NTLMv1, NTLMv2 and NTLM2 Session protocols in a single package. Whether these protocols are used or can be used on a system which is governed by Group Policy settings, for which different versions of Windows have different default settings.

Cryptographic primitives are well-established, low-level cryptographic algorithms that are frequently used to build cryptographic protocols for computer security systems. These routines include, but are not limited to, one-way hash functions and encryption functions.

Secure two-party computation (2PC) a.k.a. Secure function evaluation is sub-problem of secure multi-party computation (MPC) that has received special attention by researchers because of its close relation to many cryptographic tasks. The goal of 2PC is to create a generic protocol that allows two parties to jointly compute an arbitrary function on their inputs without sharing the value of their inputs with the opposing party. One of the most well known examples of 2PC is Yao's Millionaires' problem, in which two parties, Alice and Bob, are millionaires who wish to determine who is wealthier without revealing their wealth. Formally, Alice has wealth , Bob has wealth , and they wish to compute without revealing the values or .

Non-interactive zero-knowledge proofs are cryptographic primitives, where information between a prover and a verifier can be authenticated by the prover, without revealing any of the specific information beyond the validity of the statement itself. This function of encryption makes direct communication between the prover and verifier unnecessary, effectively removing any intermediaries. The core trustless cryptography "proofing" involves a hash function generation of a random number, constrained within mathematical parameters determined by the prover and verifier.

Trusted timestamping is the process of securely keeping track of the creation and modification time of a document. Security here means that no one—not even the owner of the document—should be able to change it once it has been recorded provided that the timestamper's integrity is never compromised.

<span class="mw-page-title-main">Cryptography</span> Practice and study of secure communication techniques

Cryptography, or cryptology, is the practice and study of techniques for secure communication in the presence of adversarial behavior. More generally, cryptography is about constructing and analyzing protocols that prevent third parties or the public from reading private messages. Modern cryptography exists at the intersection of the disciplines of mathematics, computer science, information security, electrical engineering, digital signal processing, physics, and others. Core concepts related to information security are also central to cryptography. Practical applications of cryptography include electronic commerce, chip-based payment cards, digital currencies, computer passwords, and military communications.

Guided tour puzzle (GTP) protocol is a cryptographic protocol for mitigating application layer denial of service attacks. It aims to overcome the shortcoming of computation-based puzzle protocols, in which clients are required to compute hard CPU or memory-bound puzzles that favor clients with abundant computational resources. Guided tour puzzle protocol can be seen as a form of proof-of-work (POW) protocol.

In cryptography, server-based signatures are digital signatures in which a publicly available server participates in the signature creation process. This is in contrast to conventional digital signatures that are based on public-key cryptography and public-key infrastructure. With that, they assume that signers use their personal trusted computing bases for generating signatures without any communication with servers.

<span class="mw-page-title-main">Yehuda Lindell</span>

Yehuda Lindell is a professor in the Department of Computer Science at Bar-Ilan University where he conducts research on cryptography with a focus on the theory of secure computation and its application in practice. Lindell currently leads the cryptography team at Coinbase.

Application Layer Transport Security (ALTS) is a Google-developed authentication and transport encryption system used for securing Remote Procedure Call (RPC) within Google machines. Google started its development in 2007, as a tailored modification of TLS.

References

  1. Chen, Hao; Laine, Kim; Rindal, Peter (2018-05-16). Fast Private Set Intersection from Homomorphic Encryption. Association for Computing Machinery. ISBN   9781450349468.
  2. Pinkas, Benny. Private Set Intersection (PDF). Open Access logo PLoS transparent.svg
  3. Ihle, Cornelius; Schubotz, Moritz; Meuschke, Norman; Gipp, Bela (2020-08-02). "A First Step Towards Content Protecting Plagiarism Detection". Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. Virtual Event China: ACM. pp. 341–344. arXiv: 2005.11504 . doi: 10.1145/3383583.3398620 . ISBN   978-1-4503-7585-6. Open Access logo PLoS transparent.svg
  4. "Password Monitoring" . Retrieved 8 August 2021.
  5. "Child Safety" . Retrieved 8 August 2021.
  6. Freedman, Michael J; Nissim, Kobbi; Pinkas, Benny (2004). "Efficient private matching and set intersection" (PDF). International Conference on the Theory and Applications of Cryptographic Techniques'04: Proceedings. Lecture Notes in Computer Science. 3027: 1–19. doi:10.1007/978-3-540-24676-3_1. ISBN   978-3-540-21935-4. S2CID   10184294.
  7. Kamara, Seny; Mohassel, Payman; Raykova, Mariana; Sadeghian, Saeed (2014). "Scaling private set intersection to billion-element sets" (PDF). International Conference on Financial Cryptography and Data Security'14: Proceedings: 195–215.
  8. Abadi, Aydin; Terzis, Sotirios; Dong, Changyu (2016). "VD-PSI: verifiable delegated private set intersection on outsourced private datasets" (PDF). International Conference on Financial Cryptography and Data Security'16: Proceedings: 149–168.
  9. Abadi, Aydin; Dong, Changyu; Murdoch, Steven J; Terzis, Sotirios (2022). "Multi-party Updatable Delegated Private Set Intersection" (PDF). International Conference on Financial Cryptography and Data Security'22: Proceedings.
  10. Badrinarayanan, Saikrishna; Miao, Peihan; Xie, Tiancheng (2022). "Updatable Private Set Intersection" (PDF). Privacy Enhancing Technologies'22:Proceedings. 2022 (2): 378–406. doi:10.2478/popets-2022-0051. S2CID   239000070.