Pseudonymization

Last updated

Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. [1] A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing.

Contents

Pseudonymization (or pseudonymisation, the spelling under European guidelines) is one way to comply with the European Union's new General Data Protection Regulation (GDPR) demands for secure data storage of personal information. [2] Pseudonymized data can be restored to its original state with the addition of information which allows individuals to be re-identified. In contrast, anonymization is intended to prevent re-identification of individuals within the dataset. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decisions (EU) 2021/914 “requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone ... and that this process is irreversible.” [3]

Impact of Schrems II Ruling

The European Data Protection Supervisor (EDPS) on 9 December 2021 highlighted pseudonymization as the top technical supplementary measure for Schrems II compliance. [4] Less than two weeks later, the EU Commission highlighted pseudonymization as an essential element of the equivalency decision for South Korea, which is the status that was lost by the United States under the Schrems II ruling by the Court of Justice of the European Union (CJEU). [5]

The importance of GDPR-compliant pseudonymization increased dramatically in June 2021 when the European Data Protection Board (EDPB) and the European Commission highlighted GDPR-compliant Pseudonymisation as the state-of-the-art technical supplementary measure for the ongoing lawful use of EU personal data when using third country (i.e., non-EU) cloud processors or remote service providers under the "Schrems II" ruling by the CJEU. [6] Under the GDPR and final EDPB Schrems II Guidance, [7] the term pseudonymization requires a new protected “state” of data, producing a protected outcome that:

(1) Protects direct, indirect, and quasi-identifiers, together with characteristics and behaviors;

(2) Protects at the record and data set level versus only the field level so that the protection travels wherever the data goes, including when it is in use; and

(3) Protects against unauthorized re-identification via the Mosaic Effect by generating high entropy (uncertainty) levels by dynamically assigning different tokens at different times for various purposes.

The combination of these protections is necessary to prevent the re-identification of data subjects without the use of additional information kept separately, as required under GDPR Article 4(5) and as further underscored by paragraph 85(4) of the final EDPB Schrems II guidance:

GDPR-compliant pseudonymization requires that data is “anonymous” in the strictest EU sense of the word – globally anonymous – but for the additional information held separately and made available under controlled conditions as authorized by the data controller for permitted re-identification of individual data subjects. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decision (EU) 2021/914 “requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone, in line with recital 26 of Regulation (EU) 2016/679, and that this process is irreversible.” [3]

Before the Schrems II ruling, pseudonymization was a technique used by security experts or government officials to hide personally identifiable information to maintain data structure and privacy of information. Some common examples of sensitive information include postal code, location of individuals, names of individuals, race and gender, etc.

After the Schrems II ruling, GDPR-compliant pseudonymization must satisfy the above-noted elements as an "outcome" versus merely a technique.

Data fields

The choice of which data fields are to be pseudonymized is partly subjective. Less selective fields, such as Birth Date or Postal Code are often also included because they are usually available from other sources and therefore make a record easier to identify. Pseudonymizing these less identifying fields removes most of their analytic value and is therefore normally accompanied by the introduction of new derived and less identifying forms, such as year of birth or a larger postal code region.

Data fields that are less identifying, such as date of attendance, are usually not pseudonymized. This is because too much statistical utility is lost in doing so, not because the data cannot be identified. For example, given prior knowledge of a few attendance dates it is easy to identify someone's data in a pseudonymized dataset by selecting only those people with that pattern of dates. This is an example of an inference attack.

The weakness of pre-GDPR pseudonymized data to inference attacks is commonly overlooked. A famous example is the AOL search data scandal. The AOL example of unauthorized re-identification did not require access to separately kept “additional information” that was under the control of the data controller as is now required for GDPR compliant Pseudonymisation, outlined below under the section "New Definition for Pseudonymization Under GDPR".

Protecting statistically useful pseudonymized data from re-identification requires:

  1. a sound information security base
  2. controlling the risk that the analysts, researchers or other data workers cause a privacy breach

The pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from anonymization, [9] where all person-related data that could allow backtracking has been purged. Pseudonymization is an issue in, for example, patient-related data that has to be passed on securely between clinical centers.

The application of pseudonymization to e-health intends to preserve the patient's privacy and data confidentiality. It allows primary use of medical records by authorized health care providers and privacy preserving secondary use by researchers. [10] In the US, HIPAA provides guidelines on how health care data must be handled and data de-identification or pseudonymization is one way to simplify HIPAA compliance[ citation needed ]. However, plain pseudonymization for privacy preservation often reaches its limits when genetic data are involved (see also genetic privacy). Due to the identifying nature of genetic data, depersonalization is often not sufficient to hide the corresponding person. Potential solutions are the combination of pseudonymization with fragmentation and encryption.

An example of application of pseudonymization procedure is creation of datasets for de-identification research by replacing identifying words with words from the same category (e.g. replacing a name with a random name from the names dictionary), [11] [12] [13] however, in this case it is in general not possible to track data back to its origins.

New Definition for Pseudonymization Under GDPR

Effective as of May 25, 2018, the EU General Data Protection Regulation (GDPR) defines pseudonymization for the very first time at the EU level in Article 4(5). Under Article 4(5) definitional requirements, data is pseudonymized if it cannot be attributed to a specific data subject without the use of separately kept "additional information.” Pseudonymized data embodies the state of the art in Data Protection by Design and by Default [14] because it requires protection of both direct and indirect identifiers (not just direct). GDPR Data Protection by Design and by Default principles as embodied in pseudonymization require protection of both direct and indirect identifiers so that personal data is not cross-referenceable (or re-identifiable) via the "Mosaic Effect" [15] without access to “additional information” that is kept separately by the controller. Because access to separately kept “additional information” is required for re-identification, attribution of data to a specific data subject can be limited by the controller to support lawful purposes only.

GDPR Article 25(1) identifies pseudonymization as an “appropriate technical and organizational measure” and Article 25(2) requires controllers to:

“…implement appropriate technical and organizational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility. In particular, such measures shall ensure that by default personal data are not made accessible without the individual's intervention to an indefinite number of natural persons.”

A central core of Data Protection by Design and by Default under GDPR Article 25 is enforcement of technology controls that support appropriate uses and the ability to demonstrate that you can, in fact, keep your promises. Technologies like pseudonymization that enforce Data Protection by Design and by Default show individual data subjects that in addition to coming up with new ways to derive value from data, organizations are pursuing equally innovative technical approaches to protecting data privacy—an especially sensitive and topical issue given the epidemic of data security breaches around the globe.

Vibrant and growing areas of economic activity—the “trust economy,” life sciences research, personalized medicine/education, the Internet of Things, personalization of goods and services—are based on individuals trusting that their data is private, protected, and used only for appropriate purposes that bring them and society maximum value. This trust cannot be maintained using outdated approaches to data protection. Pseudonymisation, as newly defined under the GDPR, is a means of helping to achieve Data Protection by Design and by Default to earn and maintain trust and more effectively serve businesses, researchers, healthcare providers, and everyone who relies on the integrity of data.

GDPR compliant pseudonymization not only enables greater privacy-respectful use of data in today's "big data" world of data sharing and combining, but it also enables data controllers and processors to reap explicit benefits under the GDPR for correctly pseudonymized data.The benefits of properly pseudonymized data are highlighted in multiple GDPR Articles, including:

See also

Related Research Articles

The Office of the Data Protection Commissioner (DPC), also known as Data Protection Commission, is the independent national authority responsible for upholding the EU fundamental right of individuals to data privacy through the enforcement and monitoring of compliance with data protection legislation in Ireland. It was established in 1989.

<span class="mw-page-title-main">Data Protection Directive</span> EU directive on the processing of personal data

The Data Protection Directive, officially Directive 95/46/EC, enacted in October 1995, was a European Union directive which regulated the processing of personal data within the European Union (EU) and the free movement of such data. The Data Protection Directive was an important component of EU privacy and human rights law.

<span class="mw-page-title-main">Data Protection Act 1998</span> United Kingdom legislation

The Data Protection Act 1998 (DPA) was an Act of Parliament of the United Kingdom designed to protect personal data stored on computers or in an organised paper filing system. It enacted provisions from the European Union (EU) Data Protection Directive 1995 on the protection, processing, and movement of data.

Personal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person.

Information privacy, data privacy or data protection laws provide a legal framework on how to obtain, use and store data of natural persons. The various laws around the world describe the rights of natural persons to control who is using its data. This includes usually the right to get details on which data is stored, for what purpose and to request the deletion in case the purpose is not given anymore.

Privacy law is a set of regulations that govern the collection, storage, and utilization of personal information from healthcare, governments, companies, public or private entities, or individuals.

ePrivacy Directive

Privacy and Electronic Communications Directive2002/58/EC on Privacy and Electronic Communications, otherwise known as ePrivacy Directive (ePD), is an EU directive on data protection and privacy in the digital age. It presents a continuation of earlier efforts, most directly the Data Protection Directive. It deals with the regulation of a number of important issues such as confidentiality of information, treatment of traffic data, spam and cookies. This Directive has been amended by Directive 2009/136, which introduces several changes, especially in what concerns cookies, that are now subject to prior consent.

The German Bundesdatenschutzgesetz (BDSG) is a federal data protection act, that together with the data protection acts of the German federated states and other area-specific regulations, governs the exposure of personal data, which are manually processed or stored in IT systems.

Privacy by design is an approach to systems engineering initially developed by Ann Cavoukian and formalized in a joint report on privacy-enhancing technologies by a joint team of the Information and Privacy Commissioner of Ontario (Canada), the Dutch Data Protection Authority, and the Netherlands Organisation for Applied Scientific Research in 1995. The privacy by design framework was published in 2009 and adopted by the International Assembly of Privacy Commissioners and Data Protection Authorities in 2010. Privacy by design calls for privacy to be taken into account throughout the whole engineering process. The concept is an example of value sensitive design, i.e., taking human values into account in a well-defined manner throughout the process.

<span class="mw-page-title-main">General Data Protection Regulation</span> EU regulation on the processing of personal data

The General Data Protection Regulation, abbreviated GDPR, or French RGPD for Règlement général sur la protection des données) is a European Union regulation on information privacy in the European Union (EU) and the European Economic Area (EEA). The GDPR is an important component of EU privacy law and human rights law, in particular Article 8(1) of the Charter of Fundamental Rights of the European Union. It also governs the transfer of personal data outside the EU and EEA. The GDPR's goals are to enhance individuals' control and rights over their personal information and to simplify the regulations for international business. It supersedes the Data Protection Directive 95/46/EC and, among other things, simplifies the terminology.

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

<span class="mw-page-title-main">Max Schrems</span> Austrian author and privacy activist

Maximilian Schrems is an Austrian activist, lawyer, and author who became known for campaigns against Facebook for its privacy violations, including violations of European privacy laws and the alleged transfer of personal data to the US National Security Agency (NSA) as part of the NSA's PRISM program. Schrems is the founder of NOYB – European Center for Digital Rights.

Privacy engineering is an emerging field of engineering which aims to provide methodologies, tools, and techniques to ensure systems provide acceptable levels of privacy. Its focus lies in organizing and assessing methods to identify and tackle privacy concerns within the engineering of information systems.

The EU–US Privacy Shield was a legal framework for regulating transatlantic exchanges of personal data for commercial purposes between the European Union and the United States. One of its purposes was to enable US companies to more easily receive personal data from EU entities under EU privacy laws meant to protect European Union citizens. The EU–US Privacy Shield went into effect on 12 July 2016 following its approval by the European Commission. It was put in place to replace the International Safe Harbor Privacy Principles, which were declared invalid by the European Court of Justice in October 2015. The ECJ declared the EU–US Privacy Shield invalid on 16 July 2020, in the case known as Schrems II. In 2022, leaders of the US and EU announced that a new data transfer framework called the Trans-Atlantic Data Privacy Framework had been agreed to in principle, replacing Privacy Shield. However, it is uncertain what changes will be necessary or adequate for this to succeed without facing additional legal challenges.

Data re-identification or de-anonymization is the practice of matching anonymous data with publicly available information, or auxiliary data, in order to discover the person the data belong to. This is a concern because companies with privacy policies, health care providers, and financial institutions may release the data they collect after the data has gone through the de-identification process.

<span class="mw-page-title-main">NOYB</span> European data protection advocacy group

NOYB – European Center for Digital Rights is a non-profit organization based in Vienna, Austria established in 2017 with a pan-European focus. Co-founded by Austrian lawyer and privacy activist Max Schrems, NOYB aims to launch strategic court cases and media initiatives in support of the General Data Protection Regulation (GDPR), the proposed ePrivacy Regulation, and information privacy in general. The organisation was established after a funding period during which it has raised annual donations of €250,000 by supporting members. Currently, NOYB is financed by more than 4,400 supporting members.

<span class="mw-page-title-main">European Data Protection Board</span> EU body for implementing the GDPR

The European Data Protection Board (EDPB) is a European Union independent body with juridical personality whose purpose is to ensure consistent application of the General Data Protection Regulation (GDPR) and to promote cooperation among the EU’s data protection authorities. On 25 May 2018, the EDPB replaced the Article 29 Working Party.

The right of access, also referred to as right to access and (data) subject access, is one of the most fundamental rights in data protection laws around the world. For instance, the United States, Singapore, Brazil, and countries in Europe have all developed laws that regulate access to personal data as privacy protection. The European Union states that: "The right of access occupies a central role in EU data protection law's arsenal of data subject empowerment measures." This right is often implemented as a Subject Access Request (SAR) or Data Subject Access Request (DSAR).

<span class="mw-page-title-main">General Personal Data Protection Law</span> Brazilian regulation on the processing of personal data

The General Personal Data Protection Law, is a statutory law on data protection and privacy in the Federative Republic of Brazil. The law's primary aim is to unify 40 different Brazilian laws that regulate the processing of personal data. The LGPD contains provisions and requirements related to the processing of personal data of individuals, where the data is of individuals located in Brazil, where the data is collected or processed in Brazil, or where the data is used to offer goods or services to individuals in Brazil.

<span class="mw-page-title-main">Personal Information Protection Law of the People's Republic of China</span> Chinese personal information rights law

The Personal Information Protection Law of the People's Republic of China referred to as the Personal Information Protection Law or ("PIPL") protecting personal information rights and interests, standardize personal information handling activities, and promote the rational use of personal information. It also addresses the transfer of personal data outside of China.

References

  1. "General Data Protection Regulation". 4(5).{{cite web}}: CS1 maint: location (link)
  2. Skiera, Bernd (2022). The impact of the GDPR on the online advertising market. Klaus Miller, Yuxi Jin, Lennart Kraft, René Laub, Julia Schmitt. Frankfurt am Main. ISBN   978-3-9824173-0-1. OCLC   1303894344.{{cite book}}: CS1 maint: location missing publisher (link)
  3. 1 2 "Commission Implementing Decision (EU) 2021/914". Official Journal of the European Union . 7 June 2021. Retrieved 5 January 2024.
  4. "IPEN webinar 2021: Pseudonymous data: processing personal data while mitigating risks". European Data Protection Supervisor. 9 December 2021. Retrieved 4 January 2024.
  5. "Commission Implementing Decision 2022/254". Official Journal of the European Union . 24 February 2022. Retrieved 4 January 2024.
  6. "Press Release No 91/20" (PDF). Court of Justice of the European Union. 16 July 2020. Retrieved 4 January 2024.
  7. 1 2 "Recommendations" (PDF). European Data Protection Board. 18 June 2021. Retrieved 5 January 2024.
  8. "Article 4 GDPR Definitions". Intersoft Consulting. 25 May 2018. Retrieved 5 January 2024.
  9. http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.31.pdf Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management – A Consolidated Proposal for Terminology
  10. Neubauer, T; Heurix, J (Mar 2011). "A methodology for the pseudonymization of medical data". Int J Med Inform. 80 (3): 190–204. doi:10.1016/j.ijmedinf.2010.10.016. PMID   21075676.
  11. Neamatullah, Ishna; Douglass, Margaret M; Li-wei; Lehman, H; Reisner, Andrew; Villarroe, Mauricio; Long, William J; Szolovits, Peter; Moody, George B; Mark, Roger G; Clifford, Gari D (2008). "Automated de-identification of free-text medical records". BMC Medical Informatics and Decision Making. 8: 32. doi: 10.1186/1472-6947-8-32 . PMC   2526997 . PMID   18652655.
  12. Ishna Neamatullah (5 September 2006). "11 Automated De-Identification of Free-Text Medical Records" (PDF). PhysioNet. Retrieved 4 January 2024.
  13. Deleger, L; et al. (2014). "Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research". J Biomed Inform. 50: 173–183. doi:10.1016/j.jbi.2014.01.014. PMC   4125487 . PMID   24556292.
  14. "What does data protection 'by design' and 'by default' mean?". European Commission. Retrieved 2023-01-22.
  15. Vijayan, Jaikumar (2004-03-15). "Sidebar: The Mosaic Effect". Computerworld. Retrieved 2021-01-26.