This article is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic.(December 2019) |
Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. [1] Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.
Big data ethics are different from information ethics because the focus of information ethics is more concerned with issues of intellectual property and concerns relating to librarians, archivists, and information professionals, while big data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers, governments, and large corporations. However, since artificial intelligence or machine learning systems are regularly built using big data sets, the discussions surrounding data ethics are often intertwined with those in the ethics of artificial intelligence. [2] More recently, issues of big data ethics have also been researched in relation with other areas of technology and science ethics, including ethics in mathematics and engineering ethics, as many areas of applied mathematics and engineering use increasingly large data sets.
Data ethics is concerned with the following principles: [3]
Ownership of data involves determining rights and duties over property, such as the ability to exercise control over and limit the sharing of personal data comprising one's digital identity. The question of ownership arises when one person records their observations on another person. The observer and the observed both state a claim to the data. Questions also arise as to the responsibilities that the observer and the observed have in relation to each other. These questions have become increasingly relevant with the Internet magnifying the scale and systematization of observing people and their thoughts. The question of personal data ownership relates to questions of corporate ownership, intellectual property, and slavery.[ citation needed ]
In the European Union, the General Data Protection Regulation indicates that individuals own their personal data. [4]
Concerns have been raised around how biases can be integrated into algorithm design resulting in systematic oppression. [5]
In terms of governance, big data ethics is concerned with which types of inferences and predictions should be made using big data technologies such as algorithms. [6]
Anticipatory governance is the practice of using predictive analytics to assess possible future behaviors. [7] This has ethical implications because it affords the ability to target particular groups and places which can encourage prejudice and discrimination [7] For example, predictive policing highlights certain groups or neighborhoods which should be watched more closely than others which leads to more sanctions in these areas, and closer surveillance for those who fit the same profiles as those who are sanctioned. [8]
The term "control creep" refers to data that has been generated with a particular purpose in mind but which is repurposed. [7] This practice is seen with airline industry data which has been repurposed for profiling and managing security risks at airports. [7]
Privacy has been presented as a limitation to data usage which could also be considered unethical. [9] For example, the sharing of healthcare data can shed light on the causes of diseases, the effects of treatments, an can allow for tailored analyses based on individuals' needs. [9] This is of ethical significance in the big data ethics field because while many value privacy, the affordances of data sharing are also quite valuable, although they may contradict one's conception of privacy. Attitudes against data sharing may be based in a perceived loss of control over data and a fear of the exploitation of personal data. [9] However, it is possible to extract the value of data without compromising privacy.
Some scholars such as Jonathan H. King and Neil M. Richards are redefining the traditional meaning of privacy, and others to question whether or not privacy still exists. [6] In a 2014 article for the Wake Forest Law Review , King and Richard argue that privacy in the digital age can be understood not in terms of secrecy but in term of regulations which govern and control the use of personal information. [6] In the European Union, the right to be forgotten entitles EU countries to force the removal or de-linking of personal data from databases at an individual's request if the information is deemed irrelevant or out of date. [10] According to Andrew Hoskins, this law demonstrates the moral panic of EU members over the perceived loss of privacy and the ability to govern personal data in the digital age. [11] In the United States, citizens have the right to delete voluntarily submitted data. [10] This is very different from the right to be forgotten because much of the data produced using big data technologies and platforms are not voluntarily submitted. [10] While traditional notions of privacy are under scrutiny, different legal frameworks related to privacy in the EU and US demonstrate how countries are grappling with these concerns in the context of big data. For example, the "right to be forgotten" in the EU and the right to delete voluntarily submitted data in the US illustrate the varying approaches to privacy regulation in the digital age. [12]
The difference in value between the services facilitated by tech companies and the equity value of these tech companies is the difference in the exchange rate offered to the citizen and the "market rate" of the value of their data. Scientifically there are many holes in this rudimentary calculation: the financial figures of tax-evading companies are unreliable, either revenue or profit could be more appropriate, how a user is defined, a large number of individuals are needed for the data to be valuable, possible tiered prices for different people in different countries, etc. Although these calculations are crude, they serve to make the monetary value of data more tangible. Another approach is to find the data trading rates in the black market. RSA publishes a yearly cybersecurity shopping list that takes this approach. [13]
This raises the economic question of whether free tech services in exchange for personal data is a worthwhile implicit exchange for the consumer. In the personal data trading model, rather than companies selling data, an owner can sell their personal data and keep the profit. [14]
The idea of open data is centered around the argument that data should be freely available and should not have restrictions that would prohibit its use, such as copyright laws. As of 2014 [update] many governments had begun to move towards publishing open datasets for the purpose of transparency and accountability. [15] This movement has gained traction via "open data activists" who have called for governments to make datasets available to allow citizens to themselves extract meaning from the data and perform checks and balances themselves. [15] [6] King and Richards have argued that this call for transparency includes a tension between openness and secrecy. [6]
Activists and scholars have also argued that because this open-sourced model of data evaluation is based on voluntary participation, the availability of open datasets has a democratizing effect on a society, allowing any citizen to participate. [16] To some, the availability of certain types of data is seen as a right and an essential part of a citizen's agency. [16]
Open Knowledge Foundation (OKF) lists several dataset types it argues should be provided by governments for them to be truly open. [17] OKF has a tool called the Global Open Data Index (GODI), a crowd-sourced survey for measuring the openness of governments, [17] based on its Open Definition. GODI aims to be a tool for providing feedback to governments about the quality of their open datasets. [18]
Willingness to share data varies from person to person. Preliminary studies have been conducted into the determinants of the willingness to share data. For example, some have suggested that baby boomers are less willing to share data than millennials. [19]
Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively.
Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as data privacy or data protection.
The right to privacy is an element of various legal traditions that intends to restrain governmental and private actions that threaten the privacy of individuals. Over 150 national constitutions mention the right to privacy. On 10 December 1948, the United Nations General Assembly adopted the Universal Declaration of Human Rights (UDHR), originally written to guarantee individual rights of everyone everywhere; while right to privacy does not appear in the document, many interpret this through Article 12, which states: "No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks."
Media ethics is the subdivision dealing with the specific ethical principles and standards of media, including broadcast media, film, theatre, the arts, print media and the internet. The field covers many varied and highly controversial topics, ranging from war journalism to Benetton ad campaigns.
Data portability is a concept to protect users from having their data stored in "silos" or "walled gardens" that are incompatible with one another, i.e. closed platforms, thus subjecting them to vendor lock-in and making the creation of data backups or moving accounts between services difficult.
Cyber ethics is the philosophic study of ethics pertaining to computers, encompassing user behavior and what computers are programmed to do, and how this affects individuals and society. For years, various governments have enacted regulations while organizations have defined policies about cyberethics.
Communication privacy management (CPM), originally known as communication boundary management, is a systematic research theory developed by Sandra Petronio in 1991. CPM theory aims to develop an evidence-based understanding of the way people make decisions about revealing and concealing private information. It suggests that individuals maintain and coordinate privacy boundaries with various communication partners depending on the perceived benefits and costs of information disclosure. Petronio believes disclosing private information will strengthen one's connections with others, and that we can better understand the rules for disclosure in relationships through negotiating privacy boundaries.
Digital privacy is often used in contexts that promote advocacy on behalf of individual and consumer privacy rights in e-services and is typically used in opposition to the business practices of many e-marketers, businesses, and companies to collect and use such information and data. Digital privacy can be defined under three sub-related categories: information privacy, communication privacy, and individual privacy.
Privacy for research participants is a concept in research ethics which states that a person in human subject research has a right to privacy when participating in research. Some typical scenarios this would apply to include, or example, a surveyor doing social research conducts an interview with a participant, or a medical researcher in a clinical trial asks for a blood sample from a participant to see if there is a relationship between something which can be measured in blood and a person's health. In both cases, the ideal outcome is that any participant can join the study and neither the researcher nor the study design nor the publication of the study results would ever identify any participant in the study. Thus, the privacy rights of these individuals can be preserved.
De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws.
The right to be forgotten (RTBF) is the right to have private information about a person be removed from Internet searches and other directories under some circumstances. The concept has been discussed and put into practice in several jurisdictions, including Argentina, the European Union (EU), and the Philippines. The issue has arisen from desires of individuals to "determine the development of their life in an autonomous way, without being perpetually or periodically stigmatized as a consequence of a specific action performed in the past".
Google Spain SL, Google Inc. v Agencia Española de Protección de Datos, Mario Costeja González (2014) is a decision by the Court of Justice of the European Union (CJEU). It held that an Internet search engine operator is responsible for the processing that it carries out of personal information which appears on web pages published by third parties.
Data re-identification or de-anonymization is the practice of matching anonymous data with publicly available information, or auxiliary data, in order to discover the person the data belong to. This is a concern because companies with privacy policies, health care providers, and financial institutions may release the data they collect after the data has gone through the de-identification process.
Critical data studies is the exploration of and engagement with social, cultural, and ethical challenges that arise when working with big data. It is through various unique perspectives and taking a critical approach that this form of study can be practiced. As its name implies, critical data studies draws heavily on the influence of critical theory, which has a strong focus on addressing the organization of power structures. This idea is then applied to the study of data.
Data politics encompasses the political aspects of data including topics ranging from data activism, open data and open government. The ways in which data is collected, accessed, and what we do with that data has changed in contemporary society due to a number of factors surrounding issues of politics. An issue that arises from political data is often how disconnected people are from their own data, rarely gaining access to the data they produce. Large platforms like Google have a "better to ask forgiveness than permission" stance on data collection to which the greater population is largely ignorant, leading to movements within data activism.
DNA encryption is the process of hiding or perplexing genetic information by a computational method in order to improve genetic privacy in DNA sequencing processes. The human genome is complex and long, but it is very possible to interpret important, and identifying, information from smaller variabilities, rather than reading the entire genome. A whole human genome is a string of 3.2 billion base paired nucleotides, the building blocks of life, but between individuals the genetic variation differs only by 0.5%, an important 0.5% that accounts for all of human diversity, the pathology of different diseases, and ancestral story. Emerging strategies incorporate different methods, such as randomization algorithms and cryptographic approaches, to de-identify the genetic sequence from the individual, and fundamentally, isolate only the necessary information while protecting the rest of the genome from unnecessary inquiry. The priority now is to ascertain which methods are robust, and how policy should ensure the ongoing protection of genetic privacy.
Data shadows refer to the information that a person leaves behind unintentionally while taking part in daily activities such as checking their e-mails, scrolling through social media or even by using their debit or credit card.
Digital self-determination is a multidisciplinary concept derived from the legal concept of self-determination and applied to the digital sphere, to address the unique challenges to individual and collective agency and autonomy arising with increasing digitalization of many aspects of society and daily life.
A data ecosystem is the complex environment of co-dependent networks and actors that contribute to data collection, transfer and use. They can span across sectors - such as healthcare or finance, to inform one another's practices. A data ecosystem often consists of numerous data assemblages. Research into data ecosystems has developed in response to the rapid proliferation and availability of information through the web, which has contributed to the commodification of data.
Data care refers to treating people and their private information fairly and with dignity. Data has progressively become more and more utilized in our society all over the world. When it comes to securely storing a medical patient's data, an employee's data, or a citizen's private data. The concept of data care emerged from the increase of data usage over the years, it is a term used to describe the act of treating people and their data with care and respect. This concept elaborates on how caring for people's data is the responsibility of those who govern data, for example, businesses and policy makers. Along with how to care for it in an ethical manner, while keeping in mind the people that the data belongs to. And discussing the concept of 'slow computing' on how this can be properly utilized to help in creating and maintaining proper data care.