Five safes

Last updated

The Five Safes is a framework for helping make decisions about making effective use of data which is confidential or sensitive. It is mainly used to describe or design research access to statistical data held by government and health agencies, and by data archives such as the UK Data Service. [1]

Contents

Two of the Five Safes refer to statistical disclosure control, and so the Five Safes is usually used to contrast statistical and non-statistical controls when comparing data management options.

Concept

The Five Safes proposes that data management decisions be considered as solving problems in five 'dimensions': projects, people, settings, data and outputs. The combination of the controls leads to 'safe use'. These are most commonly expressed as questions, for example: [2] [3]

Safe projectsIs this use of the data appropriate?
Safe peopleCan the users be trusted to use it in an appropriate manner?
Safe settingsDoes the access facility limit unauthorised use?
Safe dataIs there a disclosure risk in the data itself?
Safe outputsAre the statistical results non-disclosive?

These dimensions are scales, not limits. That is, solutions can have a mix of more or fewer controls in each dimension, but the overall aim of 'safe use' independent of the particular mix. For example, a public use file available for open download cannot control who uses it, where or for what purpose, and so all the control (protection) must be in the data itself. In contrast, a file which is only accessed through a secure environment with certified users can contain very sensitive information: the non-statistical controls allow the data to be 'unsafe'. One academic likened the process to a graphic equalizer, [4] where bass and treble can be combined independently to produce a sound the listener likes.

There is no 'order' to the Five Safes, in that one is necessarily more important than the others. However, Ritchie [5] argued that the 'managerial' controls (projects, people, setting) should be addressed before the 'statistical' controls (data, output).

The Five Safes concept is associated with other topics which developed from the same programme at ONS, although these are not necessarily implemented. Safe people is associated with 'active researcher management', [6] while safe outputs is linked with principles-based output statistical disclosure control.

The Five Safes is a positive framework, describing what is and is not. The EDRU ('evidence-based, default-open, risk-managed, user-centred') attitudinal model [7] is sometimes used to give a normative context

The 'data access spectrum'

From 2003 the Five Safes was also represented in a simpler form as a 'Data Access Spectrum' [8] .The non-data controls (project, people, setting, outputs) tend to work together, in that organisations often see these as a complementary set of restrictions on access. These can then be contrasted with choices about data anonymisation to present a linear representation of data access options. This presentation is consistent with the idea of 'data as a residual', [5] as well as data protection laws of the time which often characterised data simply as anonymous or not anonymous.

A similar idea had already been developed independently in 2001 by Chuck Humphrey of the Canadian RDC network, the 'continuum of access'. [9] More recently, The Open Data Institute has developed a 'Data Spectrum toolkit' [10] which includes industry-specific examples.

History and terminology

The Five Safes was devised in the winter of 2002/2003 by Felix Ritchie at the UK Office for National Statistics (ONS) to describe its secure remote-access Virtual Microdata Laboratory (VML). [11] It was described at this time as the 'VML Security Model'. This was adopted by the NORC data enclave, [12] and more widely in the US, as the 'portfolio model' (although this is now also used to refer to a slightly different legal/statistical/educational breakdown). [13] In 2012 the framework as was still being referred to as the 'VML security model', [14] but its increasing use among non-UK organisations led to the adoption of the more general and informative phrase 'Five Safes'. [2]

The original framework only had four safes (projects, people, settings and outputs): the framework was used to describe highly detailed data access through a secure environment, and so the 'data' dimension was irrelevant. From 2007 onwards, 'safe data' was included as the framework was used to a describe a wider range of ONS activities. As the US version was based upon the 2005 specification, some US iterations uses have the original four dimensions (eg [12] ).

Some discussions, such as the OECD, [15] use the term 'secure' instead 'safe'. However, the use of both these terms can cause presentational problems: less control in a particular dimension could be seen to imply 'unsafe users' or 'insecure settings', for example, which distracts from the main message. Hence, the Australian government uses the term "five data sharing principles". [16]

The 'Anonymisation Decision-Making Framework' [17] uses a framework based on the Five Safes but relabelling "projects", "people", and "settings" as "governance", "agency" and "infrastructure", respectively; "Output" is omitted, and "safe use" becomes "functional anonymisation". There is no reference to the Five Safes or any associated literature. The Australian version [18] was required to include references to the Five Safes, and presented it as an alternative without comment.

Application

The framework has had three uses: pedagogical, descriptive, and design. Since 2016, is has also been used, directly and indirectly in legislation. See [19] for more detailed examples.

Pedagogy

The first significant use of the framework, other than internal administrative use, was to structure researcher training courses at the UK Office for National Statistics from 2003. UK Data Archive, Administrative Data Research Network, Eurostat, Statistics New Zealand, the Mexican National Institute of Statistics and Geography, NORC, Statistics Canada and the Australian Bureau of Statistics, amongst others, have also used this framework. Most of these courses are for researchers using restricted-access facilities; the Eurostat courses [20] are unusual in that they are designed for all users of sensitive data.

Description

The framework is often used to describe existing data access solutions (e.g. UK HMRC Data Lab, [21] UK Data Service, [22] Statistics New Zealand [23] ) or planned/conceptualised ones (e.g. Eurostat in 2011 [24] ). An early use [25] was to help identify areas where ONS' still had 'irreducible risks' in its provision of secure remote access.

The framework is mostly used for confidential social science data. To date it appears to have made little impact on medical research planning, [26] although it is now included in the revised guidelines on implementing HIPAA regulations [27] in the US, and by Cancer Research UK and the Health Foundation in the UK. [28] It has also been used to describe a security model for the Scottish Health Informatics Programme. [29]

Design

In general the Five Safes has been used to describe solutions post-factum, and to explain/justify choices made, but an increasing number of organisations have used the framework to design data access solutions. For example, the Hellenic Statistical Agency developed a data strategy built around the Five Safes in 2016; the UK Health Foundation used the Five Safes to design its data management and training programmes. [28] Use in the private sector is less common but some organisations have incorporated the Five Safes into consulting services.

In 2015 the UK Data Service organized a workshop [22] to encourage data users from the academic and private sectors to think about how to manage confidential research data, using the Five Safes to demonstrate alternative options and best practice.

Early adopters for strategic design use were in Australia: both the Australian Bureau of Statistics and the Australian Department of Social Service used the Five Safes as an ex ante design tool. [3] [7] In 2017 the Australian Productivity Commission recommended [30] adopting a version of the framework to support cross-government data sharing and re-use. This underwent extensive consultation [16] and culminated in the DAT Act 2022.

Since 2020 the Five Safes has been the overriding framework for the design of new secure facilities and data sharing arrangements in the UK for public health and social sciences. This has been promoted by the Office for Statistics Regulation, the UK Statistics Authority, NHS DIgital, and the research funding bodies Administrative Data Research UK and DARE UK.

Regulation and legislation

Three laws have incorporated the Fives Safes. They are explicit in the South Australian Public Sector (Data Sharing) Act 2016, and implicit in the research provisions of the UK Digital Economy Act 2017. The Australian Data Availability and Transparency Act 2022 renames the Five Safes as the Five Data Sharing Principles.

Public engagement

The UK Data Service has produced a blog [31] and video [32] for the general public about the use of Five Safes in re-using administrative data. Statistics New Zealand produced a non-technical description, [33] as did ONS for Data Privacy Day 2017. [34] The Australian Federal Government has produced several videos on data sharing, including the Data Sharing Principles. [35]

Criticism

In the 2020 paper, "Not fit for Purpose: A critical analysis of the ‘Five Safes’", [36] the authors argue that Five Safes is fundamentally flawed due to its disconnection from existing legal protections, its appropriation of safety notions without strong technical measures, and its static view of disclosure risk. Others have argued that the Five Safes has too little content to be useful, or is a box-ticking exercise, or that more 'safes' are needed. Green and Ritchie (2023) [19] provide an extensive review of these critiques and proposals.

Related Research Articles

<span class="mw-page-title-main">Census</span> Acquiring and recording information about the members of a given population

A census is the procedure of systematically acquiring, recording and calculating population information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses include censuses of agriculture, traditional culture, business, supplies, and traffic censuses. The United Nations (UN) defines the essential features of population and housing censuses as "individual enumeration, universality within a defined territory, simultaneity and defined periodicity", and recommends that population censuses be taken at least every ten years. UN recommendations also cover census topics to be collected, official definitions, classifications and other useful information to co-ordinate international practices.

The Office for National Statistics is the executive office of the UK Statistics Authority, a non-ministerial department which reports directly to the UK Parliament.

<span class="mw-page-title-main">American Community Survey</span> Demographic survey in the United States

The American Community Survey (ACS) is an annual demographics survey program conducted by the U.S. Census Bureau. It regularly gathers information previously contained only in the long form of the decennial census, including ancestry, US citizenship status, educational attainment, income, language proficiency, migration, disability, employment, and housing characteristics. These data are used by many public-sector, private-sector, and not-for-profit stakeholders to allocate funding, track shifting demographics, plan for emergencies, and learn about local communities.

<span class="mw-page-title-main">Eurostat</span> Statistics agency of the European Union

Eurostat is a Directorate-General of the European Commission located in the Kirchberg quarter of Luxembourg City, Luxembourg. Eurostat's main responsibilities are to provide statistical information to the institutions of the European Union (EU) and to promote the harmonisation of statistical methods across its member states and candidates for accession as well as EFTA countries. The organisations in the different countries that cooperate with Eurostat are summarised under the concept of the European Statistical System.

<span class="mw-page-title-main">International Territorial Level</span> Geographic classification system in the United Kingdom

International Territorial Level (ITL) is a geocode standard for referencing the subdivisions of the United Kingdom for statistical purposes, used by the Office for National Statistics (ONS). From 2003 and until 2020 it functioned as part of the European Union and European Statistical System's geocode standard Nomenclature of Territorial Units for Statistics or NUTS.

Geodemography is the study of people based on where they live; it links the sciences of demography, the study of human population dynamics, and geography, the study of the locational and spatial variation of both physical and human phenomena on Earth, along with sociology. It includes the application of geodemographic classifications for business, social research and public policy but has a parallel history in academic research seeking to understand the processes by which settlements evolve and neighborhoods are formed. Geodemographic systems estimate the most probable characteristics of people based on the pooled profile of all people living in a small area near a particular address.

<span class="mw-page-title-main">Census in the United Kingdom</span> Mass population survey conducted in the United Kingdom

Coincident full censuses have taken place in the different jurisdictions of the United Kingdom every ten years since 1801, with the exceptions of 1941, Ireland in 1921/Northern Ireland in 1931, and Scotland in 2021. In addition to providing detailed information about national demographics, the results of the census play an important part in the calculation of resource allocation to regional and local service providers by the UK government.

<span class="mw-page-title-main">Aggregate data</span> Data combined from several measurements

Aggregate data is high-level data which is acquired by combining individual-level data. For instance, the output of an industry is an aggregate of the firms’ individual outputs within that industry. Aggregate data are applied in statistics, data warehouses, and in economics.

Barnardisation is a method of statistical disclosure control for tables of counts. It involves adding +1, 0 or -1 to some or all of the internal non-zero cells in a table in a pseudo-random fashion. The probability of adjustment for each internal cell is calculated as p/2, 1-p, p/2. The table totals are then calculated as the sum of the post-adjustment internal counts.

The UK Statistics Authority is a non-ministerial government department of the Government of the United Kingdom responsible for oversight of the Office for National Statistics, maintaining a national code of practice for official statistics, and accrediting statistics that comply with the Code as National Statistics. UKSA was established on 1 April 2008 by the Statistics and Registration Service Act 2007, and is directly accountable to the Parliament of the United Kingdom.

<span class="mw-page-title-main">Official statistics</span> Statistics published by government agencies

Official statistics are statistics published by government agencies or other public bodies such as international organizations as a public good. They provide quantitative or qualitative information on all major areas of citizens' lives, such as economic and social development, living conditions, health, education, and the environment.

Labour Force Surveys are statistical surveys conducted in a number of countries designed to capture data about the labour market. All European Union member states are required to conduct a Labour Force Survey annually. Labour Force Surveys are also carried out in some non-EU countries. They are used to calculate the International Labour Organization (ILO)-defined unemployment rate. The ILO agrees the definitions and concepts employed in Labour Force Surveys.

In the study of survey and census data, microdata is information at the level of individual respondents. For instance, a national census might collect age, home address, educational level, employment status, and many other variables, recorded separately for every person who responds; this is microdata.

<span class="mw-page-title-main">2011 United Kingdom census</span> 2011 census of the population of the United Kingdom

A census of the population of the United Kingdom is taken every ten years. The 2011 census was held in all counties of the UK on 27 March 2011. It was the first UK census which could be completed online via the Internet. The Office for National Statistics (ONS) is responsible for the census in England and Wales, the General Register Office for Scotland (GROS) is responsible for the census in Scotland, and the Northern Ireland Statistics and Research Agency (NISRA) is responsible for the census in Northern Ireland.

The Health Survey for England (HSE) is a statistical survey which is conducted annually in order to collect information concerning health and health-related behaviour of people living in private households in England.

The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom. The organisation is funded by the UK government through the Economic and Social Research Council and is led by the UK Data Archive at the University of Essex, in partnership with other universities.

In Electronic Health Records (EHR's) data masking, or controlled access, is the process of concealing patient health data from certain healthcare providers. Patients have the right to request the masking of their personal information, making it inaccessible to any physician, or a particular physician, unless a specific reason is provided. Data masking is also performed by healthcare agencies to restrict the amount of information that can be accessed by external bodies such as researchers, health insurance agencies and unauthorised individuals. It is a method used to protect patients’ sensitive information so that privacy and confidentiality are less of a concern. Techniques used to alter information within a patient's EHR include data encryption, obfuscation, hashing, exclusion and perturbation.

Natural capital accounting is the process of calculating the total stocks and flows of natural resources and services in a given ecosystem or region. Accounting for such goods may occur in physical or monetary terms. This process can subsequently inform government, corporate and consumer decision making as each relates to the use or consumption of natural resources and land, and sustainable behaviour.

Statistics Botswana (StatsBots) is the National statistical bureau of Botswana. The organization was previously under the Ministry of Finance and development planning as a department and was called Central Statistics Office. The organisation was initially set up in 1967 through an Act of Parliament – the Statistics Act and thereafter transformed into a parastatal through the revised Statistics Act of 2009. This act gives the Statistics Botswana the mandate and authority to collect, process, compile, analyse, publish, disseminate and archive official national statistics. It is also responsible for "coordinating, monitoring and supervising the National Statistical System" in Botswana. The office has its main offices in Gaborone and three satellite offices in Maun, Francistown and Ghanzi. The different areas in statistics that should be collected are covered under this Act and are clearly specified. The other statistics that are not specified can be collected as long as they are required by the Government, stakeholders and the users.

Statistical disclosure control (SDC), also known as statistical disclosure limitation (SDL) or disclosure avoidance, is a technique used in data-driven research to ensure no person or organization is identifiable from the results of an analysis of survey or administrative data, or in the release of microdata. The purpose of SDC is to protect the confidentiality of the respondents and subjects of the research.

References

  1. "What is the Five Safes framework?". www.ukdataservice.ac.uk. UK Data Service. Retrieved 2017-01-25.
  2. 1 2 Desai, Tanvi; Ritchie, Felix; Welpton, Richard (2016). "Five Safes: designing data access for research" (PDF). Bristol Business School Working Papers in Economics: Footnote 1.
  3. 1 2 "1015.0 - Information Paper: Transforming Statistics for the Future". www.abs.gov.au. Australian Bureau of Statistics. 2016. Retrieved 2017-01-25.
  4. McEachern, Steve (2015). "Implementation of the Trusted Access Model" (PDF). Australian Data Archive.
  5. 1 2 Ritchie, Felix (2017). The 'Five Safes': a framework for planning, designing and evaluating data access solutions. Data for Policy.
  6. Desai, Tanvi; Ritchie, Felix (2009). "Effective Researcher Management" (PDF). www.unece.org. Eurostat. Retrieved 2017-01-25.
  7. 1 2 Green, Elizabeth; Ritchie, Felix (2016). "Department of Social Services data access project final report. Project Report".
  8. Ritchie, Felix (2009). "Designing a national model for data access". Comparative Analysis of Enterprise (Micro)Data 2009. Retrieved 16 April 2020.
  9. Humphrey, Charles (Chuck) (2001). "The Data Liberation Initiative Orientation Session".
  10. "ODI Data Spectrum". Open Data Institute.
  11. Ritchie, Felix (2008). "Secure access to confidential microdata: four years of the Virtual Microdata Laboratory" (PDF). Economic and Labour Market Statistics. 2:5 (5): 29–34. doi:10.1057/elmr.2008.73. S2CID   154673912.
  12. 1 2 Lane, Julia; Bowie, Chet; Scheuren, Fritz; Mulcahy, Tim (2009). "NORC Data Enclave:Providing Secure Remote Access to Sensitive Microdata". UNECE/EU Workshop on Statistical Confidentiality 2009.
  13. Lane, Julia; Heus, Pascal; Mulcahy, Tim (2008). "Data Access in a Cyber World: Making Use of Cyberinfrastructure". Transactions in Data Privacy: 2–16. S2CID   16923006.
  14. Felix, Ritchie (2013-01-01). "International access to restricted data: A principles-based standards approach". Statistical Journal of the IAOS. 29 (4). doi:10.3233/sji-130780. ISSN   1874-7655.
  15. Volkow, Natalia. "OECD Expert Group For International Collaboration On Microdata Access, Chapter 6. Standardised Application Process For Microdata Access" (PDF). www.oecd.org. OECD. pp. 73–79. Retrieved 2017-01-25.
  16. 1 2 Office of the National Data Commissioner (2019). "Data sharing and release: legislative reforms" (PDF). ONDC website. Retrieved 16 April 2020.
  17. Elliot, Mark; Mackey, Elaine; O'Hara, Kieran; Tudor, Caroline (2016). Anonymisation Decision-Making Framework (PDF). University of Manchester.
  18. O'Keefe, Christine; Otorepec, Stephanie; Elliot, Mark; Mackay, Elaine; O'Hara, Kieran (2017). De-identification decision-making framework. CSIRO.
  19. 1 2 Green, Elizabeth; Ritchie, Felix (2023-11-30). "The present and future of the Five Safes framework". Journal of Privacy and Confidentiality. 13 (2). doi:10.29012/jpc.831. ISSN   2575-8527.
  20. "Self-study material for the users of European microdatasets". ec.europa.eu. European Commission. Retrieved 2017-01-25.
  21. Hawkins, Mike (2011). "The HMRC Datalab". slideserve.com. Retrieved 2017-01-25.
  22. 1 2 "The 5 safes of access to confidential data". www.ukdataservice.ac.uk. UK Data Service. Retrieved 2017-01-25.
  23. Camden, Mike (2011). "Confidentiality for integrated data" (PDF). www.unece.org. Eurostat. Retrieved 2017-01-25.
  24. Bujnowska, Aleksandra; Museux, Jean-Marc (2011). "The Future of Access to European Confidential Data for Scientific Purposes" (PDF). www.unece.org. Eurostat. Retrieved 2017-01-25.
  25. Ritchie, Felix (2005). "Access to business microdata in the UK: dealing with the irreducible risks" (PDF). UNECE/Eurostat Workshop on Statistical Data Confidentiality 2005.
  26. Green, Elizabeth (2015). "Enabling data linkage to maximise the value of public health research data" (PDF). Public Health Research Data Forum Commissioned Reports. et al. Wellcome Trust.
  27. Council, National Research (2014-01-09). Proposed Revisions to the Common Rule for the Protection of Human Subjects in the Behavioral and Social Sciences. doi:10.17226/18614. ISBN   9780309298063. PMID   25032406.
  28. 1 2 Wolters, Arne (2015). "Governance and the HSCIC's IG toolkit" (PDF). ukdataservice.ac.uk. Retrieved 2017-01-25.
  29. Sullivan, Frank. "The Scottish Health Informatics Programme". www.rss.org.uk. Retrieved 2017-01-25.
  30. Data Availability and Use: Australian Productivity Commission Inquiry Report. Productivity Commission. 2017. ISBN   978-1-74037-617-4.
  31. Welpton, Richard; Corti, Louise. "Access to sensitive data for research: the five safes". blog.ukdataservice.ac.ukpublisher=UK Data Service. Retrieved 2017-01-25.
  32. "Five Safes video". www.youtube.com. UK Data Service. Retrieved 2017-01-25.
  33. "How we keep IDI data safe". www.stats.govt.nzpublisher=Statistics New Zealand. Retrieved 2017-01-25.
  34. Stokes, Pete (2017). "The Five Safes: data privacy at ONS". blog.ons.gov.uk. Office for National Statistics. Retrieved 2017-01-28.
  35. Office of the National Data Commissioner (2022). "Sharing data safely". ONDC.
  36. Culnane, Chris; Rubinstein, Benjamin I. P.; Watts, David (2020). "Not fit for Purpose: A critical analysis of the 'Five Safes'" . Retrieved 2024-02-28.