Data re-identification

Last updated February 01, 2025

Data re-identification or de-anonymization is the practice of matching anonymous data (also known as de-identified data) with publicly available information, or auxiliary data, in order to discover the person to whom the data belongs.^[1] This is a concern because companies with privacy policies, health care providers, and financial institutions may release the data they collect after the data has gone through the de-identification process.

The de-identification process involves masking, generalizing or deleting both direct and indirect identifiers; the definition of this process is not universal. Information in the public domain, even seemingly anonymized, may thus be re-identified in combination with other pieces of available data and basic computer science techniques. The Protection of Human Subjects ('Common Rule'), a collection of multiple U.S. federal agencies and departments including the U.S. Department of Health and Human Services, warn that re-identification is becoming gradually easier because of "big data"—the abundance and constant collection and analysis of information along with the evolution of technologies and the advances of algorithms. However, others have claimed that de-identification is a safe and effective data liberation tool and do not view re-identification as a concern.^[2]^{[ neutrality is disputed ]}

More and more data are becoming publicly available over the Internet. These data are released after applying some anonymization techniques like removing personally identifiable information (PII) such as names, addresses and social security numbers to ensure the sources' privacy. This assurance of privacy allows the government to legally share limited data sets with third parties without requiring written permission. Such data has proved to be very valuable for researchers, particularly in health care.

GDPR-compliant pseudonymization seeks to reduce the risk of re-identification through the use of separately kept "additional information". The approach is based on an expert evaluation of a dataset to designate some identifiers as "direct" and some as "indirect." Proponents of this approach argue that re-identification can be avoided by limiting access to "additional information" that is kept separately by the controller. The theory is that access to separately kept "additional information" is required for re-identification, attribution of data to a specific data subject can be limited by the controller to support lawful purposes only. This approach is controversial, as it fails if there are additional datasets that can be used for re-identification. Such additional datasets may be unknown to those certifying the GDPR-compliant pseudonymization, or may not at exist at the time of the pseudonymization but may come into existence at some point in the future.

Legal protections of data in the United States

Existing privacy regulations typically protect information that has been modified, so that the data is deemed anonymized, or de-identified. For financial information, the Federal Trade Commission permits its circulation if it is de-identified and aggregated.^[3] The Gramm Leach Bliley Act (GLBA), which mandates financial institutions give consumers the opportunity to opt out of having their information shared with third parties, does not cover de-identified data if the information is aggregate and does not contain personal identifiers, since this data is not treated as personally identifiable information.^[3]

Educational records

In terms of university records, authorities both on the state and federal level have shown an awareness about issues of privacy in education and a distaste for institutions' disclosure of information. The U.S. Department of Education has provided guidance about data discourse and identification, instructing educational institutions to be sensitive to the risk of re-identification of anonymous data by cross-referencing with auxiliary data, to minimize the amount of data in the public domain by decreasing publication of directory information about students and institutional personnel, and to be consistent in the processes of de-identification.^[4]

Medical records

Medical information of patients are becoming increasingly available on the Internet, on free and publicly accessing platforms such as HealthData.gov and PatientsLikeMe, encouraged by government open data policies and data sharing initiatives spearheaded by the private sector. While this level of accessibility yields many benefits, concerns regarding discrimination and privacy have been raised.^[5] Protections on medical records and consumer data from pharmacies are stronger compared to those for other kinds of consumer data. The Health Insurance Portability and Accountability Act (HIPAA) protects the privacy of identifiable data about health, but authorize information release to third parties if de-identified. In addition, it mandates that patients receive breach notifications should there be more than a low probability that the patient's information was inappropriately disclosed or utilized without sufficient mitigation of the harm to him or her.^[6] The likelihood of re-identification is a factor in determining the probability that the patient's information has been compromised. Commonly, pharmacies sell de-identified information to data mining companies that sell to pharmaceutical companies in turn.^[3]

There have been state laws enacted to ban data mining of medical information, but they were struck down by federal courts in Maine and New Hampshire on First Amendment grounds. Another federal court on another case used "illusive" to describe concerns about privacy of patients and did not recognize the risks of re-identification.^[3]

Biospecimen

The Notice of Proposed Rule Making, published by the Common Rule Agencies in September 2015, expanded the umbrella term of "human subject" in research to include biospecimens, or materials taken from the human body - blood, urine, tissue etc. This mandates that researchers using biospecimens must follow the stricter requirements of doing research with human subjects. The rationale for this is the increased risk of re-identification of biospecimen.^[7] The final revisions affirmed this regulation.^[8]^{[ clarification needed ]}^{[ full citation needed ]}

Re-identification efforts

There have been a sizable amount of successful attempts of re-identification in different fields. Even if it is not easy for a lay person to break anonymity, once the steps to do so are disclosed and learnt, there is no need for higher level knowledge to access information in a database. Sometimes, technical expertise is not even needed if a population has a unique combination of identifiers.^[3]

Health records

In the mid-1990s, a government agency in Massachusetts called Group Insurance Commission (GIC), which purchased health insurance for employees of the state, decided to release records of hospital visits to any researcher who requested the data, at no cost. GIC assured that the patient's privacy was not a concern since it had removed identifiers such as name, addresses, social security numbers. However, information such as zip codes, birth date and sex remained untouched. The GIC assurance was reinforced by the then governor of Massachusetts, William Weld. Latanya Sweeney, a graduate student at the time, put her mind to picking out the governor's records in the GIC data. By combining the GIC data with the voter database of the city Cambridge, which she purchased for 20 dollars, Governor Weld's record was discovered with ease.^[9]

In 1997, a researcher successfully de-anonymized medical records using voter databases.^[3]

In 2011, Professor Latanya Sweeney again used anonymized hospital visit records and voting records in the state of Washington and successfully matched individual persons 43% of the time.^[10]

There are existing algorithms used to re-identify patient with prescription drug information.^[3]

Consumer habits and practices

Two researchers at the University of Texas, Arvind Narayanan and Professor Vitaly Shmatikov, were able to re-identify some portion of anonymized Netflix movie-ranking data with individual consumers on the streaming website.^[11]^[12]^[13] The data was released by Netflix 2006 after de-identification, which consisted of replacing individual names with random numbers and moving around personal details. The two researchers de-anonymized some of the data by comparing it with non-anonymous IMDb (Internet Movie Database) users' movie ratings. Very little information from the database, it was found, was needed to identify the subscriber.^[3] In the resulting research paper, there were startling revelations of how easy it is to re-identify Netflix users. For example, simply knowing data about only two movies a user has reviewed, including the precise rating and the date of rating give or take three days allows for 68% re-identification success.^[9]

In 2006, after AOL published its users' search queries, data that was anonymized prior to the public release, The New York Times reporters successfully carried out re-identification of individuals by taking groups of searches made by anonymized users.^[3] AOL had attempted to suppress identifying information, including usernames and IP addresses, but had replaced these with unique identification numbers to preserve the utility of this data for researchers. Bloggers, after the release, pored over the data, either trying to identify specific users with this content, or to point out entertaining, depressing, or shocking search queries, examples of which include "how to kill you wife", "depression and medical leave", "car crash photos." Two reporters, Michael Barbaro and Tom Zeller, were able to track down a 62 year old widow named Thelma Arnold from recognizing clues to the identity of User 417729 search histories. Arnold acknowledged that she was the author of the searches, confirming that re-identification is possible.^[9]

Location data

Location data - series of geographical positions in time that describe a person's whereabouts and movements - is a class of personal data that is specifically hard to keep anonymous. Location shows recurring visits to frequently attended places of everyday life such as home, workplace, shopping, healthcare or specific spare-time patterns.^[14] Only removing a person's identity from location data will not remove identifiable patterns such as commuting rhythms, sleeping places, or work places. By mapping coordinates onto addresses, location data is easily re-identified^[15] or correlated with a person's private life contexts. Streams of location information play an important role in the reconstruction of personal identifiers from smartphone data accessed by apps.^[16]

Court decisions

In 2019, Professor Kerstin Noëlle Vokinger and Dr. Urs Jakob Mühlematter, two researchers at the University of Zurich, analyzed cases of the Federal Supreme Court of Switzerland to assess which pharmaceutical companies and which medical drugs were involved in legal actions against the Federal Office of Public Health (FOPH) regarding pricing decisions of medical drugs. In general, involved private parties (such as pharmaceutical companies) and information that would reveal the private party (for example, drug names) are anonymized in Swiss judgments. The researchers were able to re-identify 84% of the relevant anonymized cases of the Federal Supreme Court of Switzerland by linking information from publicly accessible databases.^[17]^[18] This achievement was covered by the media and started a debate if and how court cases should be anonymized.^[19]^[20]

Concern and consequences

In 1997, Latanya Sweeney found from a study of Census records that up to 87 percent of the U.S. population can be identified using a combination of their 5-digit zip code, gender, and date of birth.^[21]^[22]

Unauthorized re-identification on the basis of such combinations does not require access to separately kept "additional information" that is under the control of the data controller, as is now required for GDPR-compliant pseudonymization.

Individuals whose data is re-identified are also at risk of having their information, with their identity attached to it, sold to organizations they do not want possessing private information about their finances, health or preferences. The release of this data may cause anxiety, shame or embarrassment. Once an individual's privacy has been breached as a result of re-identification, future breaches become much easier: once a link is made between one piece of data and a person's real identity, any association between the data and an anonymous identity breaks the anonymity of the person.^[3]

Re-identification may expose companies and institutions which have pledged to assure anonymity to increased tort liability and cause them to violate their internal policies, public privacy policies, and state and federal laws, such as laws concerning financial confidentiality or medical privacy, by having released information to third parties that can identify users after re-identification.^[3]

Remedies

To address the risks of re-identification, several proposals have been suggested:

Higher standards and uniform definition of de-identification while retaining data utility: the definition of de-identification should balance privacy protections to reduce re-identification risk with the refusal of companies to delete data^[23]
Heightened privacy protections of anonymized information^[3]
Tighter security for databases that store anonymized information^[3]
Strong ban on malicious re-identification, the passing of broader anti-discrimination and privacy legislation that ensures privacy protections as well as encourage participation in data sharing projects and endeavors, as well as establishment of uniform data protection standards in academic communities, such as in the scientific community, in order to minimize privacy violations^[24]
Creation of data-release policies: making sure de-identification rhetoric is accurate, drawing up contracts that prohibit re-identification attempts and dissemination of sensitive information, establishing data enclaves, and utilizing data-based strategies to match required protection standards to the level of risk.^[25]
Implementation of Differential Privacy on requested data sets
Generation of Synthetic Data that exhibits the statistical properties of the raw data, without allowing real individuals to be identified

While a complete ban on re-identification has been urged, enforcement would be difficult. There are, however, ways for lawmakers to combat and punish re-identification efforts, if and when they are exposed: pair a ban with harsher penalties and stronger enforcement by the Federal Trade Commission and the Federal Bureau of Investigation; grant victims of re-identification a right of action against those who re-identify them; and mandate software audit trails for people who utilize and analyze anonymized data. A small-scale re-identification ban may also be imposed on trusted recipients of particular databases, such as government data miners or researchers. This ban would be much easier to enforce and may discourage re-identification.^[9]

Examples of de-anonymization

"Researchers at MIT and the Université catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them. In other words, to extract the complete location information for a single person from an "anonymized" data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts."^[26]
"Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target."^[27]

Related Research Articles

Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively.

Identity theft, identity piracy or identity infringement occurs when someone uses another's personal identifying information, like their name, identifying number, or credit card number, without their permission, to commit fraud or other crimes. The term identity theft was coined in 1964. Since that time, the definition of identity theft has been legally defined throughout both the U.K. and the U.S. as the theft of personally identifiable information. Identity theft deliberately uses someone else's identity as a method to gain financial advantages or obtain credit and other benefits. The person whose identity has been stolen may suffer adverse consequences, especially if they are falsely held responsible for the perpetrator's actions. Personally identifiable information generally includes a person's name, date of birth, social security number, driver's license number, bank account or credit card numbers, PINs, electronic signatures, fingerprints, passwords, or any other information that can be used to access a person's financial resources.

Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as data privacy or data protection.

Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person.

Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing.

Protected health information (PHI) under U.S. law is any information about health status, provision of health care, or payment for health care that is created or collected by a Covered Entity, and can be linked to a specific individual. This is interpreted rather broadly and includes any part of a patient's medical record or payment history.

Biobank ethics refers to the ethics pertaining to all aspects of biobanks. The issues examined in the field of biobank ethics are special cases of clinical research ethics.

Privacy for research participants is a concept in research ethics which states that a person in human subject research has a right to privacy when participating in research. Some typical scenarios this would apply to include, or example, a surveyor doing social research conducts an interview with a participant, or a medical researcher in a clinical trial asks for a blood sample from a participant to see if there is a relationship between something which can be measured in blood and a person's health. In both cases, the ideal outcome is that any participant can join the study and neither the researcher nor the study design nor the publication of the study results would ever identify any participant in the study. Thus, the privacy rights of these individuals can be preserved.

De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws.

Khaled El Emam is a co-founder and Director at Replica Analytics. El Emam is also a senior scientist at the Children's Hospital of Eastern Ontario (CHEO) Research Institute and director of the multi-disciplinary Electronic Health Information Laboratory, conducting academic research on de-identification and re-identification risk. As of 2022, El-Emam has served as Editor-in-Chief of JMIR AI, a journal focused on research and applications for the health AI community.

Quasi-identifiers are pieces of information that are not of themselves unique identifiers, but are sufficiently well correlated with an entity that they can be combined with other quasi-identifiers to create a unique identifier.

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

Latanya Arvette Sweeney is an American computer scientist. She is the Daniel Paul Professor of the Practice of Government and Technology at the Harvard Kennedy School and in the Harvard Faculty of Arts and Sciences at Harvard University. She is the founder and director of the Public Interest Tech Lab, founded in 2021 with a $3 million grant from the Ford Foundation as well as the Data Privacy Lab. She is the current Faculty Dean in Currier House at Harvard.

Datafly algorithm is an algorithm for providing anonymity in medical data. The algorithm was developed by Latanya Arvette Sweeney in 1997−98. Anonymization is achieved by automatically generalizing, substituting, inserting, and removing information as appropriate without losing many of the details found within the data. The method can be used on-the-fly in role-based security within an institution, and in batch mode for exporting data from an institution. Organizations release and receive medical data with all explicit identifiers—such as name—removed, in the erroneous belief that patient confidentiality is maintained because the resulting data look anonymous. However the remaining data can be used to re-identify individuals by linking or matching the data to other databases or by looking at unique characteristics found in the fields and records of the database itself.

k-anonymity is a property possessed by certain anonymized data. The term k-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998, although the concept dates to a 1986 paper by Tore Dalenius.

Genetic privacy involves the concept of personal privacy concerning the storing, repurposing, provision to third parties, and displaying of information pertaining to one's genetic information. This concept also encompasses privacy regarding the ability to identify specific individuals by their genetic sequence, and the potential to gain information on specific characteristics about that person via portions of their genetic information, such as their propensity for specific diseases or their immediate or distant ancestry.

DNA encryption is the process of hiding or perplexing genetic information by a computational method in order to improve genetic privacy in DNA sequencing processes. The human genome is complex and long, but it is very possible to interpret important, and identifying, information from smaller variabilities, rather than reading the entire genome. A whole human genome is a string of 3.2 billion base paired nucleotides, the building blocks of life, but between individuals the genetic variation differs only by 0.5%, an important 0.5% that accounts for all of human diversity, the pathology of different diseases, and ancestral story. Emerging strategies incorporate different methods, such as randomization algorithms and cryptographic approaches, to de-identify the genetic sequence from the individual, and fundamentally, isolate only the necessary information while protecting the rest of the genome from unnecessary inquiry. The priority now is to ascertain which methods are robust, and how policy should ensure the ongoing protection of genetic privacy.

Spatial cloaking is a privacy mechanism that is used to satisfy specific privacy requirements by blurring users’ exact locations into cloaked regions. This technique is usually integrated into applications in various environments to minimize the disclosure of private information when users request location-based service. Since the database server does not receive the accurate location information, a set including the satisfying solution would be sent back to the user. General privacy requirements include K-anonymity, maximum area, and minimum area.

The Personal Information Protection Law of the People's Republic of China referred to as the Personal Information Protection Law or ("PIPL") protecting personal information rights and interests, standardize personal information handling activities, and promote the rational use of personal information. It also addresses the transfer of personal data outside of China.

Vitaly Shmatikov is a professor in computer security at Cornell Tech.

References

↑ Pedersen, Torben (2005). "HTTPS, Secure HTTPS". Encyclopedia of Cryptography and Security. pp. 268–269. doi:10.1007/0-387-23483-7_189. ISBN 978-0-387-23473-1.
↑ Richardson, Victor; Milam, Sallie; Chrysler, Denise (April 2015). "Is Sharing De-Identified Data Legal? The State of Public Health Confidentiality Laws and Their Interplay with Statistical Disclosure Limitation Techniques". The Journal of Law, Medicine & Ethics. 43 (1_suppl): 83–86. doi:10.1111/jlme.12224. hdl: 2027.42/111074AA . ISSN 1073-1105. PMID 25846173. S2CID 9384220.
1 2 3 4 5 6 7 8 9 10 11 12 13 Porter, Christine (2008). "Constitutional and Regulatory: De-Identified Data and Third Party Data Mining: The Risk of Re-Identification of Personal Information". Shidler Journal of Law, Commerce & Technology. 5 (1).
↑ Peltz, Richard (2009). "From the Ivory Tower to the Glass House: Access to "De-Identified" Public University Admission Records to Study Affirmative Action" (PDF). Harvard BlackLetter Law Journal. 25: 181–197. SSRN 1495788.
↑ Hoffman, Sharona (2015). "Citizen Science: The Law and Ethics of Public Access to Medical Big Data". Berkeley Technology Law Journal. doi:10.15779/Z385Z78.
↑ Greenberg, Yelena (2016). "Recent Case Developments: Increasing Recognition of "Risk of Harm" as an Injury Sufficient to Warrant Standing in Class Action Medical Data Breach Cases". American Journal of Law & Medicine. 42 (1): 210–4. doi:10.1177/0098858816644723. PMID 27263268. S2CID 77790820.
↑ Groden, Samantha; Martin, Summer; Merrill, Rebecca (2016). "Proposed Changes to the Common Rule: A Standoff Between Patient Rights and Scientific Advances?". Journal of Health & Life Sciences Law. 9 (3).
↑ 24 C.F.R. § .104 2017.
1 2 3 4 Ohm, Paul (August 2010). "Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization" . UCLA Law Review. 57 (6): 1701–1777. ISSN 0041-5650. OCLC 670569859 – via EBSCO.
↑ Sweeney, Latanya (28 September 2015). "Only You, Your Doctor, and Many Others May Know". Technology Science. 2015092903. Retrieved 12 July 2024.
↑ Rouse, Margaret. "de-anonymization (deanonymization)". WhatIs.com. Retrieved 19 January 2014.
↑ Narayanan, Arvind; Shmatikov, Vitaly. "Robust De-anonymization of Large Sparse Datasets" (PDF). Retrieved 19 January 2014.
↑ Narayanan, Arvind; Shmatikov, Vitaly (22 November 2007). "How To Break Anonymity of the Netflix Prize Dataset". arXiv: cs/0610105 .
↑ Fritsch, Lothar (2008), "Profiling and Location-Based Services (LBS)", Profiling the European Citizen, Springer Netherlands, pp. 147–168, doi:10.1007/978-1-4020-6914-7_8, ISBN 978-1-4020-6913-0
↑ Rocher, Luc; Hendrickx, Julien M.; de Montjoye, Yves-Alexandre (23 July 2019). "Estimating the success of re-identifications in incomplete datasets using generative models". Nature Communications. 10 (1): 3069. Bibcode:2019NatCo..10.3069R. doi:10.1038/s41467-019-10933-3. ISSN 2041-1723. PMC 6650473 . PMID 31337762.
↑ Fritsch, Lothar; Momen, Nurul (2017). Derived Partial Identities Generated from App Permissions. Gesellschaft für Informatik, Bonn. ISBN 978-3-88579-671-8.
↑ Vokinger / Mühlematter, Kerstin Noëlle / Urs Jakob (2 September 2019). "Identifikation von Gerichtsurteilen durch "Linkage" von Daten(banken)". Jusletter (990).
↑ Vokinger / Mühlematter, Kerstin Noëlle / Urs Jacob. "Re-Identifikation von Gerichtsurteilen durch "Linkage" von Daten(banken)".
↑ Chandler, Simon (4 September 2019). "Researchers Use Big Data And AI To Remove Legal Confidentiality". Forbes. Retrieved 10 December 2019.
↑ "SRF Tagesschau". SRF Swiss Radio and Television. 2 September 2019. Retrieved 10 December 2019.
↑ "How Unique am I?". Data Privacy Lab, Harvard University. Retrieved 22 July 2021.
↑ Sweeney, Latanya. "Simple Demographics Often Identify People Uniquely" (PDF). Carnegie Mellon University, Data Privacy Working Paper 3. Retrieved 22 July 2021.
↑ Lagos, Yianni (2014). "Taking the Personal Out of Data: Making Sense of De-identification" (PDF). Indiana Law Review. 48: 187–203. ISSN 2169-320X. OCLC 56050778.
↑ Sejin, Ahn (Summer 2015). "Whose Genome Is It Anyway?: Re-Identification and Privacy Protection in Public and Participatory Genomics". San Diego Law Review. 52 (3): 751–806. ISSN 2994-9599. OCLC 47865544.
↑ Rubinstein, Ira S.; Hartzog, Woodrow (June 2016). "Anonymization and Risk" . Washington Law Review. 91 (2): 703–760. ISSN 0043-0617. OCLC 3899779 – via EBSCO.
↑ Hardesty, Larry (27 March 2013). "How hard is it to 'de-anonymize' cellphone data?". MIT news. Retrieved 14 January 2015.
↑ Melissa Gymrek; Amy L. McGuire; David Golan; Eran Halperin; Yaniv Erlich (18 January 2013). "Identifying personal genomes by surname inference". Science . 339 (6117): 321–4. Bibcode:2013Sci...339..321G. doi:10.1126/SCIENCE.1229566. ISSN 0036-8075. PMID 23329047. Wikidata Q29619963.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] Pedersen, Torben (2005). "HTTPS, Secure HTTPS". Encyclopedia of Cryptography and Security. pp. 268–269. doi:10.1007/0-387-23483-7_189. ISBN 978-0-387-23473-1.

[Richardson,_Milam,_&_Chrysler_2015-2] Richardson, Victor; Milam, Sallie; Chrysler, Denise (April 2015). "Is Sharing De-Identified Data Legal? The State of Public Health Confidentiality Laws and Their Interplay with Statistical Disclosure Limitation Techniques". The Journal of Law, Medicine & Ethics. 43 (1_suppl): 83–86. doi:10.1111/jlme.12224. hdl: 2027.42/111074AA . ISSN 1073-1105. PMID 25846173. S2CID 9384220.

[Porter_2008-3] 1 2 3 4 5 6 7 8 9 10 11 12 13 Porter, Christine (2008). "Constitutional and Regulatory: De-Identified Data and Third Party Data Mining: The Risk of Re-Identification of Personal Information". Shidler Journal of Law, Commerce & Technology. 5 (1).

[4] Peltz, Richard (2009). "From the Ivory Tower to the Glass House: Access to "De-Identified" Public University Admission Records to Study Affirmative Action" (PDF). Harvard BlackLetter Law Journal. 25: 181–197. SSRN 1495788.

[5] Hoffman, Sharona (2015). "Citizen Science: The Law and Ethics of Public Access to Medical Big Data". Berkeley Technology Law Journal. doi:10.15779/Z385Z78.

[6] Greenberg, Yelena (2016). "Recent Case Developments: Increasing Recognition of "Risk of Harm" as an Injury Sufficient to Warrant Standing in Class Action Medical Data Breach Cases". American Journal of Law & Medicine. 42 (1): 210–4. doi:10.1177/0098858816644723. PMID 27263268. S2CID 77790820.

[Groden,_Martin,_&_Merrill_2016-7] Groden, Samantha; Martin, Summer; Merrill, Rebecca (2016). "Proposed Changes to the Common Rule: A Standoff Between Patient Rights and Scientific Advances?". Journal of Health & Life Sciences Law. 9 (3).

[8] 24 C.F.R. § .104 2017.

[Ohm,_Paul_2010-9] 1 2 3 4 Ohm, Paul (August 2010). "Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization" . UCLA Law Review. 57 (6): 1701–1777. ISSN 0041-5650. OCLC 670569859 – via EBSCO.

[Sweeney_2015-10] Sweeney, Latanya (28 September 2015). "Only You, Your Doctor, and Many Others May Know". Technology Science. 2015092903. Retrieved 12 July 2024.

[11] Rouse, Margaret. "de-anonymization (deanonymization)". WhatIs.com. Retrieved 19 January 2014.

[12] Narayanan, Arvind; Shmatikov, Vitaly. "Robust De-anonymization of Large Sparse Datasets" (PDF). Retrieved 19 January 2014.

[13] Narayanan, Arvind; Shmatikov, Vitaly (22 November 2007). "How To Break Anonymity of the Netflix Prize Dataset". arXiv: cs/0610105 .

[14] Fritsch, Lothar (2008), "Profiling and Location-Based Services (LBS)", Profiling the European Citizen, Springer Netherlands, pp. 147–168, doi:10.1007/978-1-4020-6914-7_8, ISBN 978-1-4020-6913-0

[15] Rocher, Luc; Hendrickx, Julien M.; de Montjoye, Yves-Alexandre (23 July 2019). "Estimating the success of re-identifications in incomplete datasets using generative models". Nature Communications. 10 (1): 3069. Bibcode:2019NatCo..10.3069R. doi:10.1038/s41467-019-10933-3. ISSN 2041-1723. PMC 6650473 . PMID 31337762.

[16] Fritsch, Lothar; Momen, Nurul (2017). Derived Partial Identities Generated from App Permissions. Gesellschaft für Informatik, Bonn. ISBN 978-3-88579-671-8.

[17] Vokinger / Mühlematter, Kerstin Noëlle / Urs Jakob (2 September 2019). "Identifikation von Gerichtsurteilen durch "Linkage" von Daten(banken)". Jusletter (990).

[18] Vokinger / Mühlematter, Kerstin Noëlle / Urs Jacob. "Re-Identifikation von Gerichtsurteilen durch "Linkage" von Daten(banken)".

[19] Chandler, Simon (4 September 2019). "Researchers Use Big Data And AI To Remove Legal Confidentiality". Forbes. Retrieved 10 December 2019.

[20] "SRF Tagesschau". SRF Swiss Radio and Television. 2 September 2019. Retrieved 10 December 2019.

[21] "How Unique am I?". Data Privacy Lab, Harvard University. Retrieved 22 July 2021.

[22] Sweeney, Latanya. "Simple Demographics Often Identify People Uniquely" (PDF). Carnegie Mellon University, Data Privacy Working Paper 3. Retrieved 22 July 2021.

[Lagos,_Yianni_2014-23] Lagos, Yianni (2014). "Taking the Personal Out of Data: Making Sense of De-identification" (PDF). Indiana Law Review. 48: 187–203. ISSN 2169-320X. OCLC 56050778.

[Sejin_2015-24] Sejin, Ahn (Summer 2015). "Whose Genome Is It Anyway?: Re-Identification and Privacy Protection in Public and Participatory Genomics". San Diego Law Review. 52 (3): 751–806. ISSN 2994-9599. OCLC 47865544.

[Rubinstein_&_Harzog_2016-25] Rubinstein, Ira S.; Hartzog, Woodrow (June 2016). "Anonymization and Risk" . Washington Law Review. 91 (2): 703–760. ISSN 0043-0617. OCLC 3899779 – via EBSCO.

[26] Hardesty, Larry (27 March 2013). "How hard is it to 'de-anonymize' cellphone data?". MIT news. Retrieved 14 January 2015.

[27] Melissa Gymrek; Amy L. McGuire; David Golan; Eran Halperin; Yaniv Erlich (18 January 2013). "Identifying personal genomes by surname inference". Science . 339 (6117): 321–4. Bibcode:2013Sci...339..321G. doi:10.1126/SCIENCE.1229566. ISSN 0036-8075. PMID 23329047. Wikidata Q29619963.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]