Big data ethics

Last updated

At closer inspection, datasets often reveal details that are not superficially visible, as in this case where corneal reflections on the eye of the photographed person provide information about bystanders, including the photographer. Data ethics considers the implications.

Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. [1] Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.

Contents

Big data ethics are different from information ethics because the focus of information ethics is more concerned with issues of intellectual property and concerns relating to librarians, archivists, and information professionals, while big data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers, governments, and large corporations. However, since artificial intelligence or machine learning systems are regularly built using big data sets, the discussions surrounding data ethics are often intertwined with those in the ethics of artificial intelligence. [2] More recently, issues of big data ethics have also been researched in relation with other areas of technology and science ethics, including ethics in mathematics and engineering ethics, as many areas of applied mathematics and engineering use increasingly large data sets.

Principles

Data ethics is concerned with the following principles: [3]

Ownership

Ownership of data involves determining rights and duties over property, such as the ability to exercise control over and limit the sharing of personal data comprising one's digital identity. The question of ownership arises when one person records their observations on another person. The observer and the observed both state a claim to the data. Questions also arise as to the responsibilities that the observer and the observed have in relation to each other. These questions have become increasingly relevant with the Internet magnifying the scale and systematization of observing people and their thoughts. The question of personal data ownership relates to questions of corporate ownership, intellectual property, and slavery.[ citation needed ]

In the European Union, the General Data Protection Regulation indicates that individuals own their personal data. [4]

Transaction transparency

Concerns have been raised around how biases can be integrated into algorithm design resulting in systematic oppression. [5]

In terms of governance, big data ethics is concerned with which types of inferences and predictions should be made using big data technologies such as algorithms. [6]

Anticipatory governance is the practice of using predictive analytics to assess possible future behaviors. [7] This has ethical implications because it affords the ability to target particular groups and places which can encourage prejudice and discrimination [7] For example, predictive policing highlights certain groups or neighborhoods which should be watched more closely than others which leads to more sanctions in these areas, and closer surveillance for those who fit the same profiles as those who are sanctioned. [8]

The term "control creep" refers to data that has been generated with a particular purpose in mind but which is repurposed. [7] This practice is seen with airline industry data which has been repurposed for profiling and managing security risks at airports. [7]

Privacy

Privacy has been presented as a limitation to data usage which could also be considered unethical. [9] For example, the sharing of healthcare data can shed light on the causes of diseases, the effects of treatments, an can allow for tailored analyses based on individuals' needs. [9] This is of ethical significance in the big data ethics field because while many value privacy, the affordances of data sharing are also quite valuable, although they may contradict one's conception of privacy. Attitudes against data sharing may be based in a perceived loss of control over data and a fear of the exploitation of personal data. [9] However, it is possible to extract the value of data without compromising privacy.

Some scholars such as Jonathan H. King and Neil M. Richards are redefining the traditional meaning of privacy, and others to question whether or not privacy still exists. [6] In a 2014 article for the Wake Forest Law Review , King and Richard argue that privacy in the digital age can be understood not in terms of secrecy but in term of regulations which govern and control the use of personal information. [6] In the European Union, the right to be forgotten entitles EU countries to force the removal or de-linking of personal data from databases at an individual's request if the information is deemed irrelevant or out of date. [10] According to Andrew Hoskins, this law demonstrates the moral panic of EU members over the perceived loss of privacy and the ability to govern personal data in the digital age. [11] In the United States, citizens have the right to delete voluntarily submitted data. [10] This is very different from the right to be forgotten because much of the data produced using big data technologies and platforms are not voluntarily submitted. [10] While traditional notions of privacy are under scrutiny, different legal frameworks related to privacy in the EU and US demonstrate how countries are grappling with these concerns in the context of big data. For example, the "right to be forgotten" in the EU and the right to delete voluntarily submitted data in the US illustrate the varying approaches to privacy regulation in the digital age. [12]

How much data is worth

The difference in value between the services facilitated by tech companies and the equity value of these tech companies is the difference in the exchange rate offered to the citizen and the "market rate" of the value of their data. Scientifically there are many holes in this rudimentary calculation: the financial figures of tax-evading companies are unreliable, either revenue or profit could be more appropriate, how a user is defined, a large number of individuals are needed for the data to be valuable, possible tiered prices for different people in different countries, etc. Although these calculations are crude, they serve to make the monetary value of data more tangible. Another approach is to find the data trading rates in the black market. RSA publishes a yearly cybersecurity shopping list that takes this approach. [13]

This raises the economic question of whether free tech services in exchange for personal data is a worthwhile implicit exchange for the consumer. In the personal data trading model, rather than companies selling data, an owner can sell their personal data and keep the profit. [14]

Openness

The idea of open data is centered around the argument that data should be freely available and should not have restrictions that would prohibit its use, such as copyright laws. As of 2014 many governments had begun to move towards publishing open datasets for the purpose of transparency and accountability. [15] This movement has gained traction via "open data activists" who have called for governments to make datasets available to allow citizens to themselves extract meaning from the data and perform checks and balances themselves. [15] [6] King and Richards have argued that this call for transparency includes a tension between openness and secrecy. [6]

Activists and scholars have also argued that because this open-sourced model of data evaluation is based on voluntary participation, the availability of open datasets has a democratizing effect on a society, allowing any citizen to participate. [16] To some, the availability of certain types of data is seen as a right and an essential part of a citizen's agency. [16]

Open Knowledge Foundation (OKF) lists several dataset types it argues should be provided by governments for them to be truly open. [17] OKF has a tool called the Global Open Data Index (GODI), a crowd-sourced survey for measuring the openness of governments, [17] based on its Open Definition. GODI aims to be a tool for providing feedback to governments about the quality of their open datasets. [18]

Willingness to share data varies from person to person. Preliminary studies have been conducted into the determinants of the willingness to share data. For example, some have suggested that baby boomers are less willing to share data than millennials. [19]

See also

Footnotes

  1. Kitchin, Rob (August 18, 2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE. p. 27. ISBN   9781473908253.
  2. Floridi, Luciano; Taddeo, Mariarosaria (December 28, 2016). "What is data ethics?". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 374 (2083): 20160360. Bibcode:2016RSPTA.37460360F. doi:10.1098/rsta.2016.0360. ISSN   1364-503X. PMC   5124072 . PMID   28336805.
  3. Cote, Catherine (March 16, 2021). "5 Principles of Data Ethics for Business". Harvard Business School Online. Retrieved September 7, 2022.
  4. van Ooijen, I.; Vrabec, Helena U. (December 11, 2018). "Does the GDPR Enhance Consumers' Control over Personal Data? An Analysis from a Behavioural Perspective". Journal of Consumer Policy. 42 (1): 91–107. doi: 10.1007/s10603-018-9399-7 . hdl: 2066/216801 . ISSN   0168-7034. S2CID   158945891.
  5. O'Neil, Cathy (2016). Weapons of Math Destruction . Crown Books. ISBN   978-0553418811.
  6. 1 2 3 4 5 Richards and King, N. M. and J. H. (2014). "Big data ethics". Wake Forest Law Review. 49: 393–432. SSRN   2384174.
  7. 1 2 3 4 Kitchin, Rob (2014). The Data Revolution: Big Data, Open Data Infrastructure and Their Consequences. SAGE Publications. pp. 178–179.
  8. Zwitter, A. (2014). "Big Data Ethics". Big Data & Society. 1 (2): 4. doi: 10.1177/2053951714559253 .
  9. 1 2 3 Kostkova, Patty; Brewer, Helen; de Lusignan, Simon; Fottrell, Edward; Goldacre, Ben; Hart, Graham; Koczan, Phil; Knight, Peter; Marsolier, Corinne; McKendry, Rachel A.; Ross, Emma; Sasse, Angela; Sullivan, Ralph; Chaytor, Sarah; Stevenson, Olivia; Velho, Raquel; Tooke, John (February 17, 2016). "Who Owns the Data? Open Data for Healthcare". Frontiers in Public Health. 4: 7. doi: 10.3389/fpubh.2016.00007 . PMC   4756607 . PMID   26925395.
  10. 1 2 3 Walker, R. K. (2012). "The Right to be Forgotten". Hastings Law Journal. 64: 257–261.
  11. Hoskins, Andrew (November 4, 2014). "Digital Memory Studies |". memorystudies-frankfurt.com. Retrieved November 28, 2017.
  12. "ERRATUM". Ethics & Human Research. 44 (1): 17. January 2022. doi:10.1002/eahr.500113. ISSN   2578-2355. PMID   34910377.
  13. RSA (2018). "2018 Cybersecurity Shopping List" (PDF).
  14. László, Mitzi (November 1, 2017). "Personal Data trading Application to the New Shape Prize of the Global Challenges Foundation". online: Global Challenges Foundation. p. 27. Archived from the original on June 20, 2018. Retrieved June 20, 2018.
  15. 1 2 Kalin, Ian (2014). "Open Data Policy Improves Democracy". SAIS Review of International Affairs. 34 (1): 59–70. doi:10.1353/sais.2014.0006. S2CID   154068669.
  16. 1 2 Baack, Stefan (December 27, 2015). "Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism". Big Data & Society. 2 (2): 205395171559463. doi: 10.1177/2053951715594634 . S2CID   55542891.
  17. 1 2 Knowledge, Open. "Methodology - Global Open Data Index". index.okfn.org. Archived from the original on March 8, 2021. Retrieved November 23, 2017.
  18. Knowledge, Open. "About - Global Open Data Index". index.okfn.org. Archived from the original on April 21, 2021. Retrieved November 23, 2017.
  19. Emerce. "Babyboomers willen gegevens niet delen". emerce.nl. Retrieved May 12, 2016.

Related Research Articles

<span class="mw-page-title-main">Privacy</span> Seclusion from unwanted attention

Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively.

Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as data privacy or data protection.

The right to privacy is an element of various legal traditions that intends to restrain governmental and private actions that threaten the privacy of individuals. Over 150 national constitutions mention the right to privacy. On 10 December 1948, the United Nations General Assembly adopted the Universal Declaration of Human Rights (UDHR), originally written to guarantee individual rights of everyone everywhere; while right to privacy does not appear in the document, many interpret this through Article 12, which states: "No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks."

Media ethics is the subdivision dealing with the specific ethical principles and standards of media, including broadcast media, film, theatre, the arts, print media and the internet. The field covers many varied and highly controversial topics, ranging from war journalism to Benetton ad campaigns.

Data portability is a concept to protect users from having their data stored in "silos" or "walled gardens" that are incompatible with one another, i.e. closed platforms, thus subjecting them to vendor lock-in and making the creation of data backups or moving accounts between services difficult.

<span class="mw-page-title-main">Cyberethics</span>

Cyber ethics is the philosophic study of ethics pertaining to computers, encompassing user behavior and what computers are programmed to do, and how this affects individuals and society. For years, various governments have enacted regulations while organizations have defined policies about cyberethics.

Communication privacy management (CPM), originally known as communication boundary management, is a systematic research theory developed by Sandra Petronio in 1991. CPM theory aims to develop an evidence-based understanding of the way people make decisions about revealing and concealing private information. It suggests that individuals maintain and coordinate privacy boundaries with various communication partners depending on the perceived benefits and costs of information disclosure. Petronio believes disclosing private information will strengthen one's connections with others, and that we can better understand the rules for disclosure in relationships through negotiating privacy boundaries.

<span class="mw-page-title-main">Digital privacy</span>

Digital privacy is often used in contexts that promote advocacy on behalf of individual and consumer privacy rights in e-services and is typically used in opposition to the business practices of many e-marketers, businesses, and companies to collect and use such information and data. Digital privacy can be defined under three sub-related categories: information privacy, communication privacy, and individual privacy.

Privacy for research participants is a concept in research ethics which states that a person in human subject research has a right to privacy when participating in research. Some typical scenarios this would apply to include, or example, a surveyor doing social research conducts an interview with a participant, or a medical researcher in a clinical trial asks for a blood sample from a participant to see if there is a relationship between something which can be measured in blood and a person's health. In both cases, the ideal outcome is that any participant can join the study and neither the researcher nor the study design nor the publication of the study results would ever identify any participant in the study. Thus, the privacy rights of these individuals can be preserved.

<span class="mw-page-title-main">De-identification</span> Preventing personal identity from being revealed

De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws.

The right to be forgotten (RTBF) is the right to have private information about a person be removed from Internet searches and other directories under some circumstances. The concept has been discussed and put into practice in several jurisdictions, including Argentina, the European Union (EU), and the Philippines. The issue has arisen from desires of individuals to "determine the development of their life in an autonomous way, without being perpetually or periodically stigmatized as a consequence of a specific action performed in the past".

Google Spain SL, Google Inc. v Agencia Española de Protección de Datos, Mario Costeja González (2014) is a decision by the Court of Justice of the European Union (CJEU). It held that an Internet search engine operator is responsible for the processing that it carries out of personal information which appears on web pages published by third parties.

Data re-identification or de-anonymization is the practice of matching anonymous data with publicly available information, or auxiliary data, in order to discover the person the data belong to. This is a concern because companies with privacy policies, health care providers, and financial institutions may release the data they collect after the data has gone through the de-identification process.

Critical data studies is the exploration of and engagement with social, cultural, and ethical challenges that arise when working with big data. It is through various unique perspectives and taking a critical approach that this form of study can be practiced. As its name implies, critical data studies draws heavily on the influence of critical theory, which has a strong focus on addressing the organization of power structures. This idea is then applied to the study of data.

Data politics encompasses the political aspects of data including topics ranging from data activism, open data and open government. The ways in which data is collected, accessed, and what we do with that data has changed in contemporary society due to a number of factors surrounding issues of politics. An issue that arises from political data is often how disconnected people are from their own data, rarely gaining access to the data they produce. Large platforms like Google have a "better to ask forgiveness than permission" stance on data collection to which the greater population is largely ignorant, leading to movements within data activism.

DNA encryption is the process of hiding or perplexing genetic information by a computational method in order to improve genetic privacy in DNA sequencing processes. The human genome is complex and long, but it is very possible to interpret important, and identifying, information from smaller variabilities, rather than reading the entire genome. A whole human genome is a string of 3.2 billion base paired nucleotides, the building blocks of life, but between individuals the genetic variation differs only by 0.5%, an important 0.5% that accounts for all of human diversity, the pathology of different diseases, and ancestral story. Emerging strategies incorporate different methods, such as randomization algorithms and cryptographic approaches, to de-identify the genetic sequence from the individual, and fundamentally, isolate only the necessary information while protecting the rest of the genome from unnecessary inquiry. The priority now is to ascertain which methods are robust, and how policy should ensure the ongoing protection of genetic privacy.

Data shadows refer to the information that a person leaves behind unintentionally while taking part in daily activities such as checking their e-mails, scrolling through social media or even by using their debit or credit card.

Digital self-determination is a multidisciplinary concept derived from the legal concept of self-determination and applied to the digital sphere, to address the unique challenges to individual and collective agency and autonomy arising with increasing digitalization of many aspects of society and daily life.

A data ecosystem is the complex environment of co-dependent networks and actors that contribute to data collection, transfer and use. They can span across sectors - such as healthcare or finance, to inform one another's practices. A data ecosystem often consists of numerous data assemblages. Research into data ecosystems has developed in response to the rapid proliferation and availability of information through the web, which has contributed to the commodification of data.

Data care refers to treating people and their private information fairly and with dignity. Data has progressively become more and more utilized in our society all over the world. When it comes to securely storing a medical patient's data, an employee's data, or a citizen's private data. The concept of data care emerged from the increase of data usage over the years, it is a term used to describe the act of treating people and their data with care and respect. This concept elaborates on how caring for people's data is the responsibility of those who govern data, for example, businesses and policy makers. Along with how to care for it in an ethical manner, while keeping in mind the people that the data belongs to. And discussing the concept of 'slow computing' on how this can be properly utilized to help in creating and maintaining proper data care.

References