Implementations of differentially private analyses

Last updated

Since the advent of differential privacy, a number of systems supporting differentially private data analyses have been implemented and deployed. This article tracks real-world deployments, production software packages, and research prototypes.

Contents

Real-world deployments

NameOrganizationYear IntroducedNotesStill in use?
OnTheMap: Interactive tool for exploration of US income and commute patterns. [1] [2] US Census Bureau2008First deployment of differential privacyYes
RAPPOR in Chrome Browser to collect security metrics [3] [4] Google2014First widespread use of local differential privacy No
Emoji analytics; analytics. Improve: QuickType, emoji; Spotlight deep link suggestions; Lookup Hints in Notes. Emoji suggestions, health type usage estimates, Safari energy drain statistics, Autoplay intent detection (also in Safari) [5] Apple2017Yes
Application telemetry [6] Microsoft2017Application usage statistics Microsoft Windows.yes
Flex: A SQL-based system developed for internal Uber analytics [7] [8] Uber2017Unknown
2020 Census [9] US Census Bureau2018Yes
Audience Engagement API [10] LinkedIn2020Yes
Labor Market Insights [11] LinkedIn2020Yes
COVID-19 Community Mobility Reports [12] Google2020Unknown
Advertiser Queries [13] LinkedIn2020
U.S. Broadband Coverage Data Set [14] Microsoft2021Unknown
College Scorecard Website IRS and Dept. of Education2021Unknown
Ohm Connect [15] Recurve2021
Live Birth Dataset [16] [17] Israeli Ministry of Health2024Yes

Production software packages

These software packages purport to be usable in production systems. They are split in two categories: those focused on answering statistical queries with differential privacy, and those focused on training machine learning models with differential privacy.

Statistical analyses

NameDeveloperYear IntroducedNotesStill maintained?
Google's differential privacy libraries [18] Google 2019Building block libraries in Go, C++, and Java; end-to-end framework in Go,. [19] Yes
OpenDP [20] Harvard, Microsoft 2020Core library in Rust, [21] SDK in Python with an SQL interface.Yes
Tumult Analytics [22] Tumult Labs [23] 2022Python library, running on Apache Spark.Yes
PipelineDP [24] Google, OpenMined [25] 2022Python library, running on Apache Spark, Apache Beam, or locally.Yes
PSI (Ψ): A Private data Sharing Interface Harvard University Privacy Tools Project. [26] 2016No
TopDown Algorithm [27] United States Census Bureau 2020Production code used in the 2020 US Census.No

Machine learning

NameDeveloperYear IntroducedNotesStill maintained?
Diffprivlib [28] IBM [29] 2019Python library.Yes
TensorFlow Privacy [30] [31] Google 2019Differentially private training in TensorFlow.Yes
Opacus [32] Meta 2020Differentially private training in PyTorch.Yes

Research projects and prototypes

NameCitationYear PublishedNotes
PINQ: An API implemented in C#. [33] 2010
Airavat: A MapReduce-based system implemented in Java hardened with SELinux-like access control. [34] 2010
Fuzz: Time-constant implementation in Caml Light of a domain-specific language. [35] 2011
GUPT: Implementation of the sample-and-aggregate framework. [36] 2012
KTELO: A framework and system for answering linear counting queries. [37] 2018

Attacks on implementations

In addition to standard defects of software artifacts that can be identified using testing or fuzzing, implementations of differentially private mechanisms may suffer from the following vulnerabilities:

See also

Related Research Articles

A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

In cryptography, a private information retrieval (PIR) protocol is a protocol that allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-n oblivious transfer, where it is also required that the user should not get information about other database items.

A smart contract is a computer program or a transaction protocol that is intended to automatically execute, control or document events and actions according to the terms of a contract or an agreement. The objectives of smart contracts are the reduction of need for trusted intermediators, arbitration costs, and fraud losses, as well as the reduction of malicious and accidental exceptions. Smart contracts are commonly associated with cryptocurrencies, and the smart contracts introduced by Ethereum are generally considered a fundamental building block for decentralized finance (DeFi) and NFT applications.

Differential privacy (DP) is an approach for providing privacy while sharing information about a group of individuals, by describing the patterns within the group while withholding information about specific individuals. This is done by making arbitrary small changes to individual data that do not change the statistics of interest. Thus the data cannot be used to infer much about any individual.

Patrick Eugene O'Neil was an American computer scientist, an expert on databases, and a professor of computer science at the University of Massachusetts Boston.

Air-gap malware is malware that is designed to defeat the air-gap isolation of secure computer systems using various air-gap covert channels.

Unums are a family of number formats and arithmetic for implementing real numbers on a computer, proposed by John L. Gustafson in 2015. They are designed as an alternative to the ubiquitous IEEE 754 floating-point standard. The latest version is known as posits.

Shai Halevi is a computer scientist who works on cryptography research at Amazon Web Services.

Intel Software Guard Extensions (SGX) is a set of instruction codes implementing trusted execution environment that are built into some Intel central processing units (CPUs). They allow user-level and operating system code to define protected private regions of memory, called enclaves. SGX is designed to be useful for implementing secure remote computation, secure web browsing, and digital rights management (DRM). Other applications include concealment of proprietary algorithms and of encryption keys.

RIPE Atlas is a global, open, distributed Internet measurement platform, consisting of thousands of measurement devices that measure Internet connectivity in real time.

Local differential privacy (LDP) is a model of differential privacy with the added requirement that if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn much of the user's personal data. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data.

Differentially private analysis of graphs studies algorithms for computing accurate graph statistics while preserving differential privacy. Such algorithms are used for data represented in the form of a graph where nodes correspond to individuals and edges correspond to relationships between them. For examples, edges could correspond to friendships, sexual relationships, or communication patterns. A party that collected sensitive graph data can process it using a differentially private algorithm and publish the output of the algorithm. The goal of differentially private analysis of graphs is to design algorithms that compute accurate global information about graphs while preserving privacy of individuals whose data is stored in the graph.

Runtime predictive analysis is a runtime verification technique in computer science for detecting property violations in program executions inferred from an observed execution. An important class of predictive analysis methods has been developed for detecting concurrency errors in concurrent programs, where a runtime monitor is used to predict errors which did not happen in the observed run, but can happen in an alternative execution of the same program. The predictive capability comes from the fact that the analysis is performed on an abstract model extracted online from the observed execution, which admits a class of executions beyond the observed one.

Proof of personhood (PoP) is a means of resisting malicious attacks on peer to peer networks, particularly, attacks that utilize multiple fake identities, otherwise known as a Sybil attack. Decentralized online platforms are particularly vulnerable to such attacks by their very nature, as notionally democratic and responsive to large voting blocks. In PoP, each unique human participant obtains one equal unit of voting power, and any associated rewards.

In type theory, session types are used to ensure correctness in concurrent programs. They guarantee that messages sent and received between concurrent programs are in the expected order and of the expected type. Session type systems have been adapted for both channel and actor systems.

<span class="mw-page-title-main">Hertzbleed</span>

Hertzbleed is a hardware security attack which describes exploiting dynamic frequency scaling to reveal secret data. The attack is a kind of timing attack, bearing similarity to previous power analysis vulnerabilities. Hertzbleed is more dangerous than power analysis, as it can be exploited by a remote attacker. Disclosure of cryptographic keys is the main concern regarding the exploit but other uses of the attack have been demonstrated since its initial discovery.

<span class="mw-page-title-main">Anil Madhavapeddy</span> Irish computer scientist

Anil Madhavapeddy is the Professor of Planetary Computing at the Department of Computer Science and Technology in the University of Cambridge, a Fellow of Pembroke College, Cambridge, and a J M Keynes Fellow. He is the Founding Director of the Cambridge Centre for Carbon Credits, aiming to distribute funds raised through the sale of carbon credits in a verifiable manner.

Soufflé is an open source parallel logic programming language, influenced by Datalog. Soufflé includes both an interpreter and a compiler that targets parallel C++. Soufflé has been used to build static analyzers, disassemblers, and tools for binary reverse engineering. Soufflé is considered by academic researchers to be high-performance and "state of the art," and is often used in benchmarks in academic papers.

Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents. It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE and its successor SPLADE v2. Others include DeepCT, uniCOIL, EPIC, DeepImpact, TILDE and TILDEv2, Sparta, SPLADE-max, and DistilSPLADE-max.

References

  1. "OnTheMap". onthemap.ces.census.gov. Retrieved 29 March 2023.
  2. Machanavajjhala, Ashwin; Kifer, Daniel; Abowd, John; Gehrke, Johannes; Vilhuber, Lars (April 2008). "Privacy: Theory meets Practice on the Map". 2008 IEEE 24th International Conference on Data Engineering. pp. 277–286. doi:10.1109/ICDE.2008.4497436. ISBN   978-1-4244-1836-7. S2CID   5812674.
  3. Erlingsson, Úlfar. "Learning statistics with privacy, aided by the flip of a coin".
  4. Erlingsson, Úlfar; Pihur, Vasyl; Korolova, Aleksandra (November 2014). "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response". Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. pp. 1054–1067. arXiv: 1407.6981 . Bibcode:2014arXiv1407.6981E. doi:10.1145/2660267.2660348. ISBN   9781450329576. S2CID   6855746.
  5. Differential Privacy Team (December 2017). "Learning with Privacy at Scale". Apple Machine Learning Journal. 1 (8).{{cite journal}}: |last1= has generic name (help)
  6. Ding, Bolin; Kulkarni, Janardhan; Yekhanin, Sergey (December 2017). "Collecting Telemetry Data Privately". 31st Conference on Neural Information Processing Systems: 3574–3583. arXiv: 1712.01524 . Bibcode:2017arXiv171201524D.
  7. Tezapsidis, Katie (Jul 13, 2017). "Uber Releases Open Source Project for Differential Privacy".
  8. Johnson, Noah; Near, Joseph P.; Song, Dawn (January 2018). "Towards Practical Differential Privacy for SQL Queries". Proceedings of the VLDB Endowment. 11 (5): 526–539. arXiv: 1706.09479 . doi:10.1145/3187009.3177733.
  9. Abowd, John M. (August 2018). "The U.S. Census Bureau Adopts Differential Privacy". Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. p. 2867. doi:10.1145/3219819.3226070. hdl: 1813/60392 . ISBN   9781450355520. S2CID   51711121.
  10. Rogers, Ryan; Subramaniam, Subbu; Peng, Sean; Durfee, David; Lee, Seunghyun; Santosh Kumar Kancha; Sahay, Shraddha; Ahammad, Parvez (2020). "LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale". arXiv: 2002.05839 [cs.CR].
  11. Rogers, Ryan; Adrian Rivera Cardoso; Mancuhan, Koray; Kaura, Akash; Gahlawat, Nikhil; Jain, Neha; Ko, Paul; Ahammad, Parvez (2020). "A Members First Approach to Enabling LinkedIn's Labor Market Insights at Scale". arXiv: 2010.13981 [cs.CR].
  12. Aktay, Ahmet; Bavadekar, Shailesh; Cossoul, Gwen; Davis, John; Desfontaines, Damien; Fabrikant, Alex; Gabrilovich, Evgeniy; Gadepalli, Krishna; Gipson, Bryant; Guevara, Miguel; Kamath, Chaitanya; Kansal, Mansi; Lange, Ali; Mandayam, Chinmoy; Oplinger, Andrew; Pluntke, Christopher; Roessler, Thomas; Schlosberg, Arran; Shekel, Tomer; Vispute, Swapnil; Vu, Mia; Wellenius, Gregory; Williams, Brian; Royce J Wilson (2020). "Google COVID-19 Community Mobility Reports: Anonymization Process Description (Version 1.1)". arXiv: 2004.04145 [cs.CR].
  13. Rogers, Ryan; Subbu, Subramaniam; Peng, Sean; Durfee, David; Lee, Seunghyun; Kancha, Santosh Kumar; Sahay, Shraddha; Ahammad, Parvez (2020). "LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale". arXiv: 2002.05839 [cs.CR].
  14. Pereira, Mayana; Kim, Allen; Allen, Joshua; White, Kevin; Juan Lavista Ferres; Dodhia, Rahul (2021). "U.S. Broadband Coverage Data Set: A Differentially Private Data Release". arXiv: 2103.14035 [cs.CR].
  15. "EDP". EDP. Retrieved 29 March 2023.
  16. Hod, Shlomi; Canetti, Ran (2024). "Differentially Private Release of Israel's National Registry of Live Births". arXiv: 2405.00267 [cs.CR].
  17. "Live Birth Dataset (Hebrew)". data.gov.il. Retrieved 2 May 2024.
  18. "Google's differential privacy libraries". GitHub . 3 February 2023.
  19. "Differential-privacy/Privacy-on-beam at main · google/Differential-privacy". GitHub .
  20. "OpenDP". opendp.org. Retrieved 29 March 2023.
  21. "OpenDP Library". GitHub .
  22. "Tumult Analytics". www.tmlt.dev. Retrieved 29 March 2023.
  23. "Tumult Labs | Privacy Protection Redefined". www.tmlt.io. Retrieved 29 March 2023.
  24. "PipelineDP". pipelinedp.io. Retrieved 29 March 2023.
  25. "OpenMined". www.openmined.org. Retrieved 29 March 2023.
  26. Gaboardi, Marco; Honaker, James; King, Gary; Nissim, Kobbi; Ullman, Jonathan; Vadhan, Salil; Murtagh, Jack (June 2016). "PSI (Ψ): a Private data Sharing Interface".
  27. "DAS 2020 Redistricting Production Code Release". GitHub . 22 June 2022.
  28. "Diffprivlib v0.5". GitHub . 17 October 2022.
  29. Holohan, Naoise; Braghin, Stefano; Pól Mac Aonghusa; Levacher, Killian (2019). "Diffprivlib: The IBM Differential Privacy Library". arXiv: 1907.02444 [cs.CR].
  30. Radebaugh, Carey; Erlingsson, Ulfar (March 6, 2019). "Introducing TensorFlow Privacy: Learning with Differential Privacy for Training Data".
  31. "TensorFlow Privacy". GitHub . 2019-08-09.
  32. "Opacus · Train PyTorch models with Differential Privacy". opacus.ai. Retrieved 29 March 2023.
  33. McSherry, Frank (1 September 2010). "Privacy integrated queries" (PDF). Communications of the ACM. 53 (9): 89–97. doi:10.1145/1810891.1810916. S2CID   52898716.
  34. Roy, Indrajit; Setty, Srinath T.V.; Kilzer, Ann; Shmatikov, Vitaly; Witchel, Emmett (April 2010). "Airavat: Security and Privacy for MapReduce" (PDF). Proceedings of the 7th Usenix Symposium on Networked Systems Design and Implementation (NSDI).
  35. 1 2 Haeberlen, Andreas; Pierce, Benjamin C.; Narayan, Arjun (2011). "Differential Privacy Under Fire". 20th USENIX Security Symposium.
  36. Mohan, Prashanth; Thakurta, Abhradeep; Shi, Elaine; Song, Dawn; Culler, David E. "GUPT: Privacy Preserving Data Analysis Made Easy". Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. pp. 349–360. doi:10.1145/2213836.2213876. S2CID   2135755.
  37. Zhang, Dan; McKenna, Ryan; Kotsogiannis, Ios; Hay, Michael; Machanavajjhala, Ashwin; Miklau, Gerome (June 2018). "EKTELO: A Framework for Defining Differentially-Private Computations". Proceedings of the 2018 International Conference on Management of Data. pp. 115–130. arXiv: 1808.03555 . doi:10.1145/3183713.3196921. ISBN   9781450347037. S2CID   5033862.
  38. McSherry, Frank (25 February 2018). "Uber's differential privacy .. probably isn't". GitHub .
  39. Lyu, Min; Su, Dong; Li, Ninghui (1 February 2017). "Understanding the sparse vector technique for differential privacy". Proceedings of the VLDB Endowment. 10 (6): 637–648. arXiv: 1603.01699 . doi:10.14778/3055330.3055331. S2CID   5449336.
  40. Mironov, Ilya (October 2012). "On significance of the least significant bits for differential privacy". Proceedings of the 2012 ACM conference on Computer and communications security (PDF). ACM. pp. 650–661. doi:10.1145/2382196.2382264. ISBN   9781450316514. S2CID   3421585.
  41. Andrysco, Marc; Kohlbrenner, David; Mowery, Keaton; Jhala, Ranjit; Lerner, Sorin; Shacham, Hovav (May 2015). "On Subnormal Floating Point and Abnormal Timing". 2015 IEEE Symposium on Security and Privacy. pp. 623–639. doi:10.1109/SP.2015.44. ISBN   978-1-4673-6949-7. S2CID   1903469.
  42. Kohlbrenner, David; Shacham, Hovav (August 2017). "On the Effectiveness of Mitigations Against Floating-point Timing Channels". Proceedings of the 26th USENIX Conference on Security Symposium. USENIX Association: 69–81.
  43. Dooley, Isaac; Kale, Laxmikant (September 2006). "Quantifying the interference caused by subnormal floating-point values" (PDF). Proceedings of the Workshop on Operating System Interference in High Performance Applications.