The topic of this article may not meet Wikipedia's general notability guideline .(December 2016) |
Critical data studies is the exploration of and engagement with social, cultural, and ethical challenges that arise when working with big data. It is through various unique perspectives and taking a critical approach that this form of study can be practiced. [1] As its name implies, critical data studies draws heavily on the influence of critical theory, which has a strong focus on addressing the organization of power structures. This idea is then applied to the study of data.
Interest in this unique field of critical data studies began in 2011 with scholars danah boyd and Kate Crawford posing various questions for the critical study of big data and recognizing its potential threatening impacts on society and culture. [2] It was not until 2014, and more exploration and conversations, that critical data studies was officially coined by scholars Craig Dalton and Jim Thatcher. [1] They put a large emphasis on understanding the context of big data in order to approach it more critically. Researchers such as Daniel Ribes, Robert Soden, Seyram Avle, Sarah E. Fox, and Phoebe Sengers focus on understanding data as a historical artifact and taking an interdisciplinary approach towards critical data studies. [3] Other key scholars in this discipline include Rob Kitchin and Tracey P. Lauriault who focus on reevaluating data through different spheres. [4]
Various critical frameworks that can be applied to analyze big data include Feminist, Anti-Racist, Queer, Indigenous, Decolonial, Anti-Ableist, as well as Symbolic and Synthetic data science. These frameworks help to make sense of the data by addressing power, biases, privacy, consent, and underrepresentation or misrepresentation concerns that exist in data as well as how to approach and analyze this data with a more equitable mindset.
In their article in which they coin the term 'critical data studies,' Dalton and Thatcher also provide several justifications as to why data studies is a discipline worthy of a critical approach. [5] First, 'big data' is an important aspect of twenty-first century society, and the analysis of 'big data' allows for a deeper understanding of what is happening and for what reasons. [1] Big data is important to critical data studies because it is the type of data used within this field. Big data does not necessarily refer to a large data set, it can have a data set with millions of rows, but also a data set that just has a wide variety and expansive scope of data with a smaller type of dataset. As well as having whole populations in the data set and not just sample sizes. Furthermore, big data as a technological tool and the information that it yields are not neutral, according to Dalton and Thatcher, making it worthy of critical analysis in order to identify and address its biases. Building off this idea, another justification for a critical approach is that the relationship between big data and society is an important one, and therefore worthy of study. [1]
Ribes et. al. argue there is a need for an interdisciplinary understanding of data as a historical artifact as a motivating aspect of critical data studies.The overarching consensus in the Computer-Supported Cooperative Work (CSCW) field, is that people should speak for the data, and not let the data speak for itself.
The sources of big data and it’s relationship to varied metadata can be a complicated one, which leads to data disorder and a need for an ethical analysis. [6] Additionally, Iliadis and Russo (2016) have called for studying data assemblages. [6] This is to say, data has innate technological, political, social, and economic histories that should be taken into consideration. Kitchin argues data is almost never raw, and it is almost always cooked , meaning that it is always spoken for by the data scientists utilizing it. Thus, Big Data should be open to a variety of perspectives, especially those of cultural and philosophical nature. Further, data contains hidden histories, ideologies, and philosophies. [6]
Big data technology can cause significant changes in society's structure and in the everyday lives of people, [1] and, being a product of society, big data technology is worthy of sociological investigation. [1] Moreover, data sets are almost never completely without any influence. Rather, data are shaped by the vision or goals of those gathering the data, and during the data collection process, certain things are quantified, stored, sorted and even discarded by the research team. [7] A critical approach is thus necessary in order to understand and reveal the intent behind the information being presented.One of these critical approaches has been through feminist data studies. This method applies feminist principles to critical studies and data collecting and analysis. The goal of this is to address the power imbalance in data science and society. According to Catherine D’Ignazio and Lauren F. Klein, a power analysis can be performed by examining power, challenging power, evaluating emotion and embodiment, rethinking binaries and hierarchies, embracing pluralism, considering context, and making labor visible. [8] Feminist data studies is part of the movement towards making data to benefit everyone and not to increase existing inequalities. Moreover, data alone cannot speak for themselves; in order to possess any concrete meaning, data must be accompanied by theoretical insight or alternative quantitative or qualitative research measures. [1] [9] Based on different social topics such as anti-racist data studies, critical data studies give a focus on those social issues concerning data. Specifically in anti-racist data studies they use a classification approach to get representation for those within that community. Desmond Upton Patton and others used their own classification system in the communities of Chicago to help target and reduce violence with young teens on twitter. They had students in those communities help them to decipher the terminology and emojis of these teens to target the language used in tweets that followed with violence outside of the computer screens. [10] This is just one real world example of critical data studies and its application. Dalton and Thatcher argue that if one were to only think of data in terms of its exploitative power, there is no possibility of using data for revolutionary, liberatory purposes. [1] Finally, Dalton and Thatcher propose that a critical approach in studying data allows for 'big data' to be combined with older, 'small data,' and thus create more thorough research, opening up more opportunities, questions and topics to be explored. [1] [11]
Data plays a pivotal role in the emerging knowledge economy, driving productivity, competitiveness, efficiency, sustainability, and capital accumulation. The ethical, political, and economic dimensions of data dynamically evolve across space and time, influenced by changing regimes, technologies, and priorities. Technically, the focus lies on handling, storing, and analyzing vast data sets, utilizing machine learning-based data mining and analytics. This technological advancement raises concerns about data quality, encompassing validity, reliability, authenticity, usability, and lineage. [12]
The use of data in modern society brings about new ways of understanding and measuring the world, but also brings with it certain concerns or issues. [13] Data scholars attempt to bring some of these issues to light in their quest to be critical of data.
Technical and organizational issues could include the scope of the data set, meaning there is too little or too much data to work with, leading to inaccurate results. It becomes crucial for critical data scholars to carefully consider the adequacy of data volume for their analyses.
The quality of the data itself is another facet of concern. The data itself could be of poor quality, such as an incomplete or messy data set with missing or inaccurate data values. This would lead researchers to have to make edits and assumptions about the data itself. Addressing these issues often requires scholars to make edits and assumptions about the data to ensure its reliability and relevance.
Data scientists could have improper access to the actual data set, limiting their abilities to analyze it. Linnet Taylor explains how gaps in data can arise when people of varying levels of power have certain rights to their data sources. These people in power can control what data is collected, how it is displayed and how it is analyzed. [14]
The capabilities of the research team also play a crucial role in the quality of data analytics. The research team may have inadequate skills or organizational capabilities which leads to the actual analytics performed on the dataset to be biased. This can also lead to ecological fallacies, meaning an assumption is made about an individual based on data or results from a larger group of people. [13]
These technical and organizational challenges highlight the complexity of working with data and emphasize the need for scholars to navigate a landscape where issues related to data scope, quality, access, and team capabilities are intricately interwoven.
Some of the normative and ethical concerns addressed by Kitchin include surveillance through one's data, (dataveillance [7] ) the privacy of one's data is referenced in this article and one of the main key points that the National Cybersecurity Alliance touches on is how data is rapidly becoming a necessity as companies recognize it as an asset and realize the potential value in collecting, using, and sharing it (National Cyber Security Alliance]), the ownership of one's data in which Scassa writes on how debates over ownership rights in data have been heating up. In Europe, policymakers have raised the possibility of creating sui generis ownership rights in data (Data Ownership), the security of one's data in which Data breaches pose a threat to both individuals and organizations. Learn more about data security breaches and what cybersecurity professionals do to prevent them (Data Security breach), anticipatory or corporate governance in which Corporate data and information are used interchangeably, but they are not the same terms. There are differences between these, and their purpose also differs. Corporate data is a raw form of information without proper meaning or usefulness unless it is processed and transformed into meaningful forms (Corporate Data and Information), and profiling individuals by their data. [5] This is heavily emphasized in data colonialism (Data colonialism), where data sovereignty is encouraged for individuals that are being harmed, because it can be a powerful tool for whom that data represents. A common theme across these approaches to data sovereignty is when and how to collect, protect, and share data with only those who have a legitimate or appropriate need to access it. All of these concerns must be taken into account by scholars of data in their objective to be critical.“The labels that we attach to the data are always going to be cruder and less representative of what they describe than what we would like them to be. Treating candidates under a single label, whether it's a gender label, whether it's a gender label, whether it's an age group, whether it's consumers of a particular product, or whether it’s people suffering from a particular disease, can cause people to be treated as interchangeable and fungible data points. Every one of those individuals with that label is unique and has the right to be respected as a person" (Vallor: Data Ethics). All of these concerns must be taken into account by scholars of data in their objective to be critical.
Following in the tradition of critical urban studies, [15] other scholars have raised similar concerns around data and digital information technologies in the urban context. [16] [17] [18] For example, Joe Shaw and Mark Graham have examined these in light of Henri Lefebvre's 'right to the city'. [19]
The most practical and concerning applications of critical data studies is the cross between ethics and privacy. Tendler, Hong,Kane, Kopaczynski, Terry, and Emanuel explain that in an age where private institutions use customer data to market, perform research on customer wants and needs, and more, it is vital to protect the data collected. When looking into the medical studies field one small step in protecting participants is informed consent. [20]
There are many algorithmic biases and discrimination in data. Many emphasize the importance of this in the healthcare field because of the gravity of data driven decision outcomes on patient care and how the data is used and why this data is collected. Institutions and companies can ensure fairness and fight systemic racism by using critical data studies to highlight algorithmic bias in data driven decision making. Nong explains how a very popular example of this is insurance algorithms and access to healthcare. Insurance companies use algorithms to allocate care resources across clients. The algorithms used demonstrated “a clear racial bias against Black patients” which caused estimated “health expenditures [to be] based on historical data structured by systemic racism and perpetuating that bias in access to care management” [21]
In many trained machine learning and artificial models, there is no standard model reporting procedure to properly document the performance characteristics. [22] When these models are applied to real life scenarios the consequences have major effects in the real world, most notably within the context of healthcare, education and law enforcement. Timnit Gebru explains how the lack of sufficient documentation for these models makes it challenging to users to assess their suitability for specific contexts, this is where the use of model cards comes into play. Model cards can provide short records to accompany machine learning models in order to provide information about the models characteristics, intended uses, potential biases, and measures of performance. The use of model cards aims to provide important information to its users about the capabilities and limitations of machine learning systems and ways to promote fair and inclusive outcomes with the use of machine learning technology. [23]
Data feminism framework promotes thinking about data and ethics guided by ideas of intersectional feminism. Data feminism emphasizes practices where data science reinforces power inequalities in the world and how users can use data to challenge existing power and commit to creating balanced data. According to D'ignazio and Klein, the intersectionality of data feminism acknowledges that data must account for intersecting factors like identity, race, class, etc. to provide a complete and accurate representation of individuals' experiences. This framework also highlights the importance of various ethical considerations by advocating for informed consent, privacy, and the responsibility data collectors have to individuals data is being collected from. [8]
Dataveillance is the monitoring of people on their online data. Unlike surveillance, dataveillance goes far beyond simply monitoring people for specific reasons. Dataveillance infiltrates people's lives with constant tracking for blanket and generalized purposes. According to Raley it has become the preferred way of monitoring people through various online presence. This framework focuses on ways to approach and understand how data is collected, processed, and used emphasizing ethical perspectives and protecting individuals information. [24] Datafication focuses on understanding the process associated with the emergence and use of big data. According to Jose and Dijck, it highlights the transformation of social actions into digital data allowing real time tracking and predictive analysis. Datafication emphasizes the interest driven process of data collection since social activities change while the transformation into data does not. It also examines how societal changes take effect as digital data becomes more prevalent in our everyday lives. Datafication stresses the complicated relationship between data and society and goes hand in hand with dataveillance. [25]
Algorithmic biases framework refers to the systematic and unjust biases against certain groups or outcomes in the algorithmic decision making process. Häußler says that users focus on how algorithms can produce discriminatory outcomes specifically when it comes to race, gender, age, and other characteristics, and can reinforce ideas of social inequities and unjust practices. Generally there are key components within the framework bias identification, data quality, impact assessment, fairness and equity, transparency, remediation, and implications. [26]
The ethics of artificial intelligence covers a broad range of topics within the field that are considered to have particular ethical stakes. This includes algorithmic biases, fairness, automated decision-making, accountability, privacy, and regulation. It also covers various emerging or potential future challenges such as machine ethics, lethal autonomous weapon systems, arms race dynamics, AI safety and alignment, technological unemployment, AI-enabled misinformation, how to treat certain AI systems if they have a moral status, artificial superintelligence and existential risks.
Imaging informatics, also known as radiology informatics or medical imaging informatics, is a subspecialty of biomedical informatics that aims to improve the efficiency, accuracy, usability and reliability of medical imaging services within the healthcare enterprise. It is devoted to the study of how information about and contained within medical images is retrieved, analyzed, enhanced, and exchanged throughout the medical enterprise.
Artificial intelligence marketing (AIM) is a form of marketing that uses artificial intelligence concepts and models such as machine learning, Natural process Languages, and Bayesian Networks to achieve marketing goals. The main difference between AIM and traditional forms of marketing resides in the reasoning, which is performed by a computer algorithm rather than a human.
In news media and social media, an echo chamber is an environment or ecosystem in which participants encounter beliefs that amplify or reinforce their preexisting beliefs by communication and repetition inside a closed system and insulated from rebuttal. An echo chamber circulates existing views without encountering opposing views, potentially resulting in confirmation bias. Echo chambers may increase social and political polarization and extremism. On social media, it is thought that echo chambers limit exposure to diverse perspectives, and favor and reinforce presupposed narratives and ideologies.
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity may lead to a higher false discovery rate. Though used sometimes loosely partly due to a lack of formal definition, the best interpretation is that it is a large body of information that cannot be comprehended when used in small amounts only.
Machine ethics is a part of the ethics of artificial intelligence concerned with adding or ensuring moral behaviors of man-made machines that use artificial intelligence, otherwise known as artificial intelligent agents. Machine ethics differs from other ethical fields related to engineering and technology. It should not be confused with computer ethics, which focuses on human use of computers. It should also be distinguished from the philosophy of technology, which concerns itself with technology's grander social effects.
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, scientific visualization, algorithms and systems to extract or extrapolate knowledge and insights from potentially noisy, structured, or unstructured data.
Mirko Tobias Schäfer is a media scholar at Utrecht University. He is an Associate Professor for AI, Data & Society at the Department for Information & Computing Sciences and Sciences Lead of the Data School.
Datafication is a technological trend turning many aspects of our life into data which is subsequently transferred into information realised as a new form of value. Kenneth Cukier and Viktor Mayer-Schönberger introduced the term datafication to the broader lexicon in 2013. Up until this time, datafication had been associated with the analysis of representations of our lives captured through data, but not on the present scale. This change was primarily due to the impact of big data and the computational opportunities afforded to predictive analytics.
Datafication is not the same as digitization, which takes analog content—books, films, photographs—and converts it into digital information, a sequence of ones and zeros that computers can read. Datafication is a far broader activity: taking all aspects of life and turning them into data [...] Once we datafy things, we can transform their purpose and turn the information into new forms of value
Dataveillance is the practice of monitoring and collecting online data as well as metadata. The word is a portmanteau of data and surveillance. Dataveillance is concerned with the continuous monitoring of users' communications and actions across various platforms. For instance, dataveillance refers to the monitoring of data resulting from credit card transactions, GPS coordinates, emails, social networks, etc. Using digital media often leaves traces of data and creates a digital footprint of our activity. Unlike sousveillance, this type of surveillance is not often known and happens discreetly. Dataveillance may involve the surveillance of groups of individuals. There exist three types of dataveillance: personal dataveillance, mass dataveillance, and facilitative mechanisms.
Data politics encompasses the political aspects of data including topics ranging from data activism, open data and open government. The ways in which data is collected, accessed, and what we do with that data has changed in contemporary society due to a number of factors surrounding issues of politics. An issue that arises from political data is often how disconnected people are from their own data, rarely gaining access to the data they produce. Large platforms like Google have a "better to ask forgiveness than permission" stance on data collection to which the greater population is largely ignorant, leading to movements within data activism.
Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.
Data shadows refer to the information that a person leaves behind unintentionally while taking part in daily activities such as checking their e-mails, scrolling through social media or even by using their debit or credit card.
Algorithmic bias describes systematic and repeatable errors in a computer system that create "unfair" outcomes, such as "privileging" one category over another in ways different from the intended function of the algorithm.
Ethics of quantification is the study of the ethical issues associated to different forms of visible or invisible forms of quantification. These could include algorithms, metrics/indicators, statistical and mathematical modelling, as noted in a review of various aspects of sociology of quantification.
Himabindu "Hima" Lakkaraju is an Indian-American computer scientist who works on machine learning, artificial intelligence, algorithmic bias, and AI accountability. She is currently an Assistant Professor at the Harvard Business School and is also affiliated with the Department of Computer Science at Harvard University. Lakkaraju is known for her work on explainable machine learning. More broadly, her research focuses on developing machine learning models and algorithms that are interpretable, transparent, fair, and reliable. She also investigates the practical and ethical implications of deploying machine learning models in domains involving high-stakes decisions such as healthcare, criminal justice, business, and education. Lakkaraju was named as one of the world's top Innovators Under 35 by both Vanity Fair and the MIT Technology Review.
Automated decision-making (ADM) involves the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying degrees of human oversight or intervention. ADM involves large-scale data from a range of sources, such as databases, text, social media, sensors, images or speech, that is processed using various technologies including computer software, algorithms, machine learning, natural language processing, artificial intelligence, augmented intelligence and robotics. The increasing use of automated decision-making systems (ADMS) across a range of contexts presents many benefits and challenges to human society requiring consideration of the technical, legal, ethical, societal, educational, economic and health consequences.
Data imaginaries are a form of cultural imaginary related to social conceptions of data, a concept that comes from the field of critical data studies. A data imaginary is a particular framing of data that defines what data are and what can be done with them. Imaginaries are produced by social institutions and practices and they influence how people understand and use the object of the imaginary, in this case data.
A data ecosystem is the complex environment of co-dependent networks and actors that contribute to data collection, transfer and use. It can span multiple sectors – such as healthcare or finance, to inform one another's practices. A data ecosystem often consists of numerous data assemblages. Research into data ecosystems has developed in response to the rapid proliferation and availability of information through the web, which has contributed to the commodification of data.
Data universalism is an epistemological framework that assumes a single universal narrative of any dataset without any consideration of geographical borders and social contexts. This assumption is enabled by a generalized approach in data collection. Data are used in universal endeavours across social, political, and physical sciences unrestricted from their local source and people. Data are gathered and transformed into a mutual understanding of knowing the world which forms theories of knowledge. One of many fields of critical data studies explores the geologies and histories of data by investigating data assemblages and tracing data lineage which unfolds data histories and geographies (p.35). This reveals intersections of data politics, praxes, and powers at play which challenges data universalism as a misguided concept.
{{cite journal}}
: CS1 maint: DOI inactive as of January 2024 (link)