This article is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic.(December 2019) |
Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. [1] Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.
Big data ethics are different from information ethics because the focus of information ethics is more concerned with issues of intellectual property and concerns relating to librarians, archivists, and information professionals, while big data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers, governments, and large corporations. However, since artificial intelligence or machine learning systems are regularly built using big data sets, the discussions surrounding data ethics are often intertwined with those in the ethics of artificial intelligence. [2] More recently, issues of big data ethics have also been researched in relation with other areas of technology and science ethics, including ethics in mathematics and engineering ethics, as many areas of applied mathematics and engineering use increasingly large data sets.
Data ethics is concerned with the following principles: [3]
Ownership of data involves determining rights and duties over property, such as the ability to exercise individual control over (including limit the sharing of) personal data comprising one's digital identity. The question of data ownership arises when someone records observations on an individual person. The observer and the observed both state a claim to the data. Questions also arise as to the responsibilities that the observer and the observed have in relation to each other. These questions have become increasingly relevant with the Internet magnifying the scale and systematization of observing people and their thoughts. The question of personal data ownership relates to questions of corporate ownership and intellectual property. [4]
In the European Union, some people argue that the General Data Protection Regulation indicates that individuals own their personal data, although this is contested. [5]
Concerns have been raised around how biases can be integrated into algorithm design resulting in systematic oppression [6] whether consciously or unconsciously. These manipulations often stem from biases in the data, the design of the algorithm, or the underlying goals of the organization deploying them. One major cause of algorithmic bias is that algorithms learn from historical data, which may perpetuate existing inequities. In many cases, algorithms exhibit reduced accuracy when applied to individuals from marginalized or underrepresented communities. A notable example of this is pulse oximetry, which has shown reduced reliability for certain demographic groups due to a lack of sufficient testing or information on these populations. [7] Additionally, many algorithms are designed to maximize specific metrics, such as engagement or profit, without adequately considering ethical implications. For instance, companies like Facebook and Twitter have been criticized for providing anonymity to harassers and for allowing racist content disguised as humor to proliferate, as such content often increases engagement. [8] These challenges are compounded by the fact that many algorithms operate as "black boxes" for proprietary reasons, meaning that the reasoning behind their outputs is not fully understood by users. This opacity makes it more difficult to identify and address algorithmic bias.
In terms of governance, big data ethics is concerned with which types of inferences and predictions should be made using big data technologies such as algorithms. [9]
Anticipatory governance is the practice of using predictive analytics to assess possible future behaviors. [10] This has ethical implications because it affords the ability to target particular groups and places which can encourage prejudice and discrimination [10] For example, predictive policing highlights certain groups or neighborhoods which should be watched more closely than others which leads to more sanctions in these areas, and closer surveillance for those who fit the same profiles as those who are sanctioned. [11]
The term "control creep" refers to data that has been generated with a particular purpose in mind but which is repurposed. [10] This practice is seen with airline industry data which has been repurposed for profiling and managing security risks at airports. [10]
Privacy has been presented as a limitation to data usage which could also be considered unethical. [12] For example, the sharing of healthcare data can shed light on the causes of diseases, the effects of treatments, an can allow for tailored analyses based on individuals' needs. [12] This is of ethical significance in the big data ethics field because while many value privacy, the affordances of data sharing are also quite valuable, although they may contradict one's conception of privacy. Attitudes against data sharing may be based in a perceived loss of control over data and a fear of the exploitation of personal data. [12] However, it is possible to extract the value of data without compromising privacy.
Government surveillance of big data has the potential to undermine individual privacy by collecting and storing data on phone calls, internet activity, and geolocation, among other things. For example, the NSA’s collection of metadata exposed in global surveillance disclosures raised concerns about whether privacy was adequately protected, even when the content of communications was not analyzed. The right to privacy is often complicated by legal frameworks that grant governments broad authority over data collection for “national security” purposes. In the United States, the Supreme Court has not recognized a general right to "informational privacy," or control over personal information, though legislators have addressed the issue selectively through specific statutes. [13] From an equity perspective, government surveillance and privacy violations tend to disproportionately harm marginalized communities. Historically, activists involved in the Civil rights movement were frequently targets of government surveillance as they were perceived as subversive elements. Programs such as COINTELPRO exemplified this pattern, involving espionage against civil rights leaders. This pattern persists today, with evidence of ongoing surveillance of activists and organizations. [14]
Additionally, the use of algorithms by governments to act on data obtained without consent introduces significant concerns about algorithmic bias. Predictive policing tools, for example, utilize historical crime data to predict “risky” areas or individuals, but these tools have been shown to disproportionately target minority communities. [15] One such tool, the COMPAS system, is a notable example; Black defendants are twice as likely to be misclassified as high risk compared to white defendants, and Hispanic defendants are similarly more likely to be classified as high risk than their white counterparts. [16] Marginalized communities often lack the resources or education needed to challenge these privacy violations or protect their data from nonconsensual use. Furthermore, there is a psychological toll, known as the “chilling effect,” where the constant awareness of being surveilled disproportionately impacts communities already facing societal discrimination. This effect can deter individuals from engaging in legal but potentially "risky" activities, such as protesting or seeking legal assistance, further limiting their freedoms and exacerbating existing inequities.
Some scholars such as Jonathan H. King and Neil M. Richards are redefining the traditional meaning of privacy, and others to question whether or not privacy still exists. [9] In a 2014 article for the Wake Forest Law Review , King and Richard argue that privacy in the digital age can be understood not in terms of secrecy but in term of regulations which govern and control the use of personal information. [9] In the European Union, the right to be forgotten entitles EU countries to force the removal or de-linking of personal data from databases at an individual's request if the information is deemed irrelevant or out of date. [17] According to Andrew Hoskins, this law demonstrates the moral panic of EU members over the perceived loss of privacy and the ability to govern personal data in the digital age. [18] In the United States, citizens have the right to delete voluntarily submitted data. [17] This is very different from the right to be forgotten because much of the data produced using big data technologies and platforms are not voluntarily submitted. [17] While traditional notions of privacy are under scrutiny, different legal frameworks related to privacy in the EU and US demonstrate how countries are grappling with these concerns in the context of big data. For example, the "right to be forgotten" in the EU and the right to delete voluntarily submitted data in the US illustrate the varying approaches to privacy regulation in the digital age. [19]
The difference in value between the services facilitated by tech companies and the equity value of these tech companies is the difference in the exchange rate offered to the citizen and the "market rate" of the value of their data. Scientifically there are many holes in this rudimentary calculation: the financial figures of tax-evading companies are unreliable, either revenue or profit could be more appropriate, how a user is defined, a large number of individuals are needed for the data to be valuable, possible tiered prices for different people in different countries, etc. Although these calculations are crude, they serve to make the monetary value of data more tangible. Another approach is to find the data trading rates in the black market. RSA publishes a yearly cybersecurity shopping list that takes this approach. [20]
This raises the economic question of whether free tech services in exchange for personal data is a worthwhile implicit exchange for the consumer. In the personal data trading model, rather than companies selling data, an owner can sell their personal data and keep the profit. [21]
The idea of open data is centered around the argument that data should be freely available and should not have restrictions that would prohibit its use, such as copyright laws. As of 2014 [update] many governments had begun to move towards publishing open datasets for the purpose of transparency and accountability. [22] This movement has gained traction via "open data activists" who have called for governments to make datasets available to allow citizens to themselves extract meaning from the data and perform checks and balances themselves. [22] [9] King and Richards have argued that this call for transparency includes a tension between openness and secrecy. [9]
Activists and scholars have also argued that because this open-sourced model of data evaluation is based on voluntary participation, the availability of open datasets has a democratizing effect on a society, allowing any citizen to participate. [23] To some, the availability of certain types of data is seen as a right and an essential part of a citizen's agency. [23]
Open Knowledge Foundation (OKF) lists several dataset types it argues should be provided by governments for them to be truly open. [24] OKF has a tool called the Global Open Data Index (GODI), a crowd-sourced survey for measuring the openness of governments, [24] based on its Open Definition. GODI aims to be a tool for providing feedback to governments about the quality of their open datasets. [25]
Willingness to share data varies from person to person. Preliminary studies have been conducted into the determinants of the willingness to share data. For example, some have suggested that baby boomers are less willing to share data than millennials. [26]
The fallout from Edward Snowden’s disclosures in 2013 significantly reshaped public discourse around data collection and the privacy principle of big data ethics. The case revealed that governments controlled and possessed far more information about civilians than previously understood, violating the principle of ownership, particularly in ways that disproportionately affected disadvantaged communities. For instance, activists were frequently targeted, including members of movements such as Occupy Wall Street and Black Lives Matter. [14] This revelation prompted governments and organizations to revisit data collection and storage practices to better protect individual privacy while also addressing national security concerns. The case also exposed widespread online surveillance of other countries and their citizens, raising important questions about data sovereignty and ownership. In response, some countries, such as Brazil and Germany, took action to push back against these practices. [14] However, many developing nations lacked the technological independence necessary or were too generally dependent on the nations surveilling them to resist such surveillance, leaving them at a disadvantage in addressing these concerns.
The Cambridge Analytica scandal highlighted significant ethical concerns in the use of big data. Data was harvested from approximately 87 million Facebook users without their explicit consent and used to display targeted political advertisements. This violated the currency principle of big data ethics, as individuals were initially unaware of how their data was being exploited. The scandal revealed how data collected for one purpose could be repurposed for entirely different uses, bypassing users' consent and emphasizing the need for explicit and informed consent in data usage. [27] Additionally, the algorithms used for ad delivery were opaque, challenging the principles of transaction transparency and openness. In some cases, the political ads spread misinformation, [27] often disproportionately targeting disadvantaged groups and contributing to knowledge gaps. Marginalized communities and individuals with lower digital literacy were disproportionately affected as they were less likely to recognize or act against exploitation. In contrast, users with more resources or digital literacy could better safeguard their data, exacerbating existing power imbalances.
Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively.
The right to privacy is an element of various legal traditions that intends to restrain governmental and private actions that threaten the privacy of individuals. Over 185 national constitutions mention the right to privacy. On 10 December 1948, the United Nations General Assembly adopted the Universal Declaration of Human Rights (UDHR); while the right to privacy does not appear in the document, many interpret this through Article 12, which states: "No one shall be subjected to arbitrary interference with their privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such interference or attacks."
A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.
The ethics of technology is a sub-field of ethics addressing ethical questions specific to the technology age, the transitional shift in society wherein personal computers and subsequent devices provide for the quick and easy transfer of information. Technology ethics is the application of ethical thinking to growing concerns as new technologies continue to rise in prominence.
Media ethics is the subdivision of applied ethics dealing with the specific ethical principles and standards of media, including broadcast media, film, theatre, the arts, print media and the internet. The field covers many varied and highly controversial topics, ranging from war journalism to Benetton ad campaigns.
Information ethics has been defined as "the branch of ethics that focuses on the relationship between the creation, organization, dissemination, and use of information, and the ethical standards and moral codes governing human conduct in society". It examines the morality that comes from information as a resource, a product, or as a target. It provides a critical framework for considering moral issues concerning informational privacy, moral agency, new environmental issues, problems arising from the life-cycle of information. It is very vital to understand that librarians, archivists, information professionals among others, really understand the importance of knowing how to disseminate proper information as well as being responsible with their actions when addressing information.
Privacy-enhancing technologies (PET) are technologies that embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. PETs allow online users to protect the privacy of their personally identifiable information (PII), which is often provided to and handled by services or applications. PETs use techniques to minimize an information system's possession of personal data without losing functionality. Generally speaking, PETs can be categorized as either hard or soft privacy technologies.
Data portability is a concept to protect users from having their data stored in "silos" or "walled gardens" that are incompatible with one another, i.e. closed platforms, thus subjecting them to vendor lock-in and making the creation of data backups or moving accounts between services difficult.
Cyberethics is "a branch of ethics concerned with behavior in an online environment". In another definition, it is the "exploration of the entire range of ethical and moral issues that arise in cyberspace" while cyberspace is understood to be "the electronic worlds made visible by the Internet." For years, various governments have enacted regulations while organizations have defined policies about cyberethics.
Digital privacy is often used in contexts that promote advocacy on behalf of individual and consumer privacy rights in e-services and is typically used in opposition to the business practices of many e-marketers, businesses, and companies to collect and use such information and data. Digital privacy, a crucial aspect of modern online interactions and services, can be defined under three sub-related categories: information privacy, communication privacy, and individual privacy.
DeepFace is a deep learning facial recognition system created by a research group at Facebook. It identifies human faces in digital images. The program employs a nine-layer neural network with over 120 million connection weights and was trained on four million images uploaded by Facebook users. The Facebook Research team has stated that the DeepFace method reaches an accuracy of 97.35% ± 0.25% on Labeled Faces in the Wild (LFW) data set where human beings have 97.53%. This means that DeepFace is sometimes more successful than human beings. As a result of growing societal concerns Meta announced that it plans to shut down Facebook facial recognition system, deleting the face scan data of more than one billion users. This change will represent one of the largest shifts in facial recognition usage in the technology's history. Facebook planned to delete by December 2021 more than one billion facial recognition templates, which are digital scans of facial features. However, it did not plan to eliminate DeepFace which is the software that powers the facial recognition system. The company has also not ruled out incorporating facial recognition technology into future products, according to Meta spokesperson.
Smart cities seek to implement information and communication technologies (ICT) to improve the efficiency and sustainability of urban spaces while reducing costs and resource consumption. In the context of surveillance, smart cities monitor citizens through strategically placed sensors around the urban landscape, which collect data regarding many different factors of urban living. From these sensors, data is transmitted, aggregated, and analyzed by governments and other local authorities to extrapolate information about the challenges the city faces in sectors such as crime prevention, traffic management, energy use and waste reduction. This serves to facilitate better urban planning and allows governments to tailor their services to the local population.
Critical data studies is the exploration of and engagement with social, cultural, and ethical challenges that arise when working with big data. It is through various unique perspectives and taking a critical approach that this form of study can be practiced. As its name implies, critical data studies draws heavily on the influence of critical theory, which has a strong focus on addressing the organization of power structures. This idea is then applied to the study of data.
Privacy in education refers to the broad area of ideologies, practices, and legislation that involve the privacy rights of individuals in the education system. Concepts that are commonly associated with privacy in education include the expectation of privacy, the Family Educational Rights and Privacy Act (FERPA), the Fourth Amendment, and the Health Insurance Portability and Accountability Act of 1996 (HIPAA). Most privacy in education concerns relate to the protection of student data and the privacy of medical records. Many scholars are engaging in an academic discussion that covers the scope of students’ privacy rights, from student in K-12 and even higher education, and the management of student data in an age of rapid access and dissemination of information.
Internet universality is a concept and framework adopted by UNESCO in 2015 to summarize their position on the internet. The concept recognizes that "the Internet is much more than infrastructure and applications; it is a network of economic and social interactions and relationships, which has the potential to enable human rights, empower individuals and communities, and facilitate sustainable development." The concept is based on four principles agreed upon by UNESCO member states: human rights, openness, accessibility, and multi-stakeholder participation, abbreviated as the R-O-A-M principles.
Algorithmic bias describes systematic and repeatable errors in a computer system that create "unfair" outcomes, such as "privileging" one category over another in ways different from the intended function of the algorithm.
Computational politics is the intersection between computer science and political science. The area involves the usage of computational methods, such as analysis tools and prediction methods, to present the solutions to political sciences questions. Researchers in this area use large sets of data to study user behavior. Common examples of such works are building a classifier to predict users' political bias in social media or finding political bias in the news. This discipline is closely related with digital sociology. However, the main focus of computational politics is on political related problems and analysis.
Digital self-determination is a multidisciplinary concept derived from the legal concept of self-determination and applied to the digital sphere, to address the unique challenges to individual and collective agency and autonomy arising with increasing digitalization of many aspects of society and daily life.
Automated decision-making (ADM) involves the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying degrees of human oversight or intervention. ADM involves large-scale data from a range of sources, such as databases, text, social media, sensors, images or speech, that is processed using various technologies including computer software, algorithms, machine learning, natural language processing, artificial intelligence, augmented intelligence and robotics. The increasing use of automated decision-making systems (ADMS) across a range of contexts presents many benefits and challenges to human society requiring consideration of the technical, legal, ethical, societal, educational, economic and health consequences.
The Black Box Society: The Secret Algorithms That Control Money and Information is a 2016 academic book authored by law professor Frank Pasquale that interrogates the use of opaque algorithms—referred to as black boxes—that increasingly control decision-making in the realms of search, finance, and reputation.