Abeba Birhane

Last updated
Abeba Birhane
Born
Alma mater Bahir Dar University (BSc, BA)
University College Dublin (MSc, PhD)
Known for Algorithmic bias
Critical race theory
Computer vision
Scientific career
Fields Cognitive Science
Computer Science
Institutions University College Dublin
Deepmind

Abeba Birhane is an Ethiopian-born cognitive scientist who works at the intersection of complex adaptive systems, machine learning, algorithmic bias, and critical race studies. Birhane's work with Vinay Prabhu uncovered that large-scale image datasets commonly used to develop AI systems, including ImageNet and 80 Million Tiny Images, carried racist and misogynistic labels and offensive images. [1] [2] [3] She has been recognized by VentureBeat as a top innovator in computer vision [4] and named as one of the 100 most influential persons in AI 2023 by TIME magazine. [5]

Contents

Early life and education

Birhane was born in Ethiopia. [6] She received her Bachelors of Science in Psychology and a Bachelors of Arts in Philosophy from The Open University. [7] In 2015, she completed her Master of Science in Cognitive Science and, in 2021, her Ph.D. at the Complex Software Lab in the School of Computer Science at University College Dublin. [8] [9]

Career and research

Birhane studied the impacts of emerging AI technologies and how they shape individuals and local communities. She found that AI algorithms tend to disproportionately impact vulnerable groups such as older workers, trans people, immigrants, and children. Her research on relational ethics won the best paper award at NeurIPS’s Black in AI workshop in 2019. [10] She has also studied and written about algorithmic colonization. [11] Her work in decolonizing computational sciences addressed the inherited oppressions in current systems especially towards women of color.

In 2020, Birhane and Vinay Prabhu, principal machine learning scientist at UnifyID, published a paper examining the problematic data collection, labelling, classification, and consequences of large image datasets. [12] These datasets, including ImageNet and MIT's 80 Million Tiny Images, have been used to develop thousands of AI algorithms and systems. Birhane and Prabhu found that they contained many racist and misogynistic labels and slurs as well as offensive images. [13] This resulted in MIT voluntarily and formally taking down the 80 Million Tiny Images dataset. [14] [15] [16]

More recently, Birhane has worked with Rediet Abebe, George Obaido, and Sekou Remy on researching the barriers to data sharing in Africa. They found that power imbalances are significant in the data sharing process, even when the data comes from Africa. Their research was published at the ACM Conference on Fairness, Accountability, and Transparency. [17]

Selected awards

Related Research Articles

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance.

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods based on artificial neural networks (ANNs) with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">Mapillary</span> Swedish service for sharing crowdsourced geotagged photos

Mapillary is a service for sharing crowdsourced geotagged photos, developed by remote company Mapillary AB, based in Malmö, Sweden. Mapillary was launched in 2013 and acquired by Meta Platforms in 2020. It offers street level imagery similar to Google Street View.

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge, where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

<span class="mw-page-title-main">Artificial intelligence in healthcare</span> Overview of the use of artificial intelligence in healthcare

Artificial intelligence in healthcare is a term used to describe the use of machine-learning algorithms and software, or artificial intelligence (AI), to copy human cognition in the analysis, presentation, and understanding of complex medical and health care data, or to exceed human capabilities by providing new ways to diagnose, treat, or prevent disease. Specifically, AI is the ability of computer algorithms to approximate conclusions based solely on input data.

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

<span class="mw-page-title-main">Joy Buolamwini</span> Computer scientist and digital activist

Joy Adowaa Buolamwini is a Ghanaian-American-Canadian computer scientist and digital activist based at the MIT Media Lab. She founded the Algorithmic Justice League (AJL), an organization that works to challenge bias in decision-making software, using art, advocacy, and research to highlight the social implications and harms of artificial intelligence (AI).

Amazon Rekognition is a cloud-based software as a service (SaaS) computer vision platform that was launched in 2016. It has been sold to, and used by, a number of United States government agencies, including U.S. Immigration and Customs Enforcement (ICE) and Orlando, Florida police, as well as private entities.

<span class="mw-page-title-main">Computer Vision Annotation Tool</span> Free and open source, web-based image and video annotation tool

Computer Vision Annotation Tool (CVAT) is a free, open source, web-based image and video annotation tool which is used for labeling data for computer vision algorithms. Originally developed by Intel, CVAT is designed for use by a professional data annotation team, with a user interface optimized for computer vision annotation tasks.

Rediet Abebe is an Ethiopian computer scientist working in algorithms and artificial intelligence. She is an assistant professor of computer science at the University of California, Berkeley. Previously, she was a Junior Fellow at the Harvard Society of Fellows.

80 Million Tiny Images is a dataset intended for training machine learning systems. It contains 79,302,017 32×32 pixel color images, scaled down from images extracted from the World Wide Web in 2008 using automated web search queries on a set of 75,062 non-abstract nouns derived from WordNet. The words in the search terms were then used as labels for the images. The researchers used seven web search resources for this purpose: Altavista, Ask.com, Flickr, Cydral, Google, Picsearch and Webshots.

Toloka is a crowdsourcing platform and microtasking project launched by Yandex in 2014 to quickly markup large amounts of data, which are then used for machine learning and improving search algorithms. The proposed tasks are usually simple and do not require any special training from the performer. Most of the tasks are designed to improve algorithms that are used by modern technologies spanning self-driving vehicles, smart web searches, advanced voice assistants and e-commerce. Upon completion of each task the performer receives a reward based on the volume of images, videos, and unstructured text. The service has two app versions – for Android and iOS.

<span class="mw-page-title-main">Deborah Raji</span> Nigerian-Canadian computer scientist and activist

Inioluwa Deborah Raji is a Nigerian-Canadian computer scientist and activist who works on algorithmic bias, AI accountability, and algorithmic auditing. Raji has previously worked with Joy Buolamwini, Timnit Gebru, and the Algorithmic Justice League on researching gender and racial bias in facial recognition technology. She has also worked with Google’s Ethical AI team and been a research fellow at the Partnership on AI and AI Now Institute at New York University working on how to operationalize ethical considerations in machine learning engineering practice. A current Mozilla fellow, she has been recognized by MIT Technology Review and Forbes as one of the world's top young innovators.

Karen Hao is an American journalist and data scientist. Currently a contributing writer for The Atlantic and previously a foreign correspondent based in Hong Kong for The Wall Street Journal and senior artificial intelligence editor at the MIT Technology Review, she is best known for her coverage on AI research, technology ethics and the social impact of AI. Hao also co-produces the podcast In Machines We Trust and writes the newsletter The Algorithm.

The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.

Meta AI is an artificial intelligence laboratory owned by Meta Platforms Inc. Meta AI develops various forms of artificial intelligence, developing augmented and artificial reality technologies. Meta AI is an academic research laboratory focused on generating knowledge for the AI community. This is in contrast to Facebook's Applied Machine Learning (AML) team, which focuses on practical applications of its products.

<span class="mw-page-title-main">LAION</span> Non-profit German artificial intelligence organization

LAION is a German non-profit which makes open-sourced artificial intelligence models and datasets. It is best known for releasing a number of large datasets of images and captions scraped from the web which have been used to train a number of high-profile text-to-image models, including Stable Diffusion and Imagen.

<span class="mw-page-title-main">Art Recognition</span>

Art Recognition is a technology company headquartered in Adliswil, within the Zurich metropolitan area, Switzerland. Specializing in the application of artificial intelligence (AI) for the purposes of art authentication and the detection of art forgeries, Art Recognition integrates advanced algorithms and computer vision technology. The company's operations extend globally, with a primary aim to increase transparency and security in the art market.

References

  1. Song, Victoria (2 July 2020). "MIT Takes Down Popular AI Dataset Due to Racist, Misogynistic Content". Gizmodo. Retrieved 2021-03-07.
  2. "UCD student's research takes down an 80-million image artificial intelligence database". independent. Retrieved 2021-03-05.
  3. "From data strikes to data poisoning, how consumers can take back control from corporations". Fortune. Retrieved 2021-03-05.
  4. 1 2 "Announcing the AI Innovation Awards winners at Transform 2020". VentureBeat. 2020-07-16. Retrieved 2021-03-05.
  5. 1 2 "TIME100 AI 2023: Abeba Birhane". Time. 2023-09-07. Retrieved 2023-09-19.
  6. O'Connell, Claire. "Getting to the heart of machine learning and complex humans". The Irish Times. Retrieved 2021-03-07.
  7. "TCD student wins 2018 Mary Mulvihill Award – Mary Mulvihill Award". marymulvihillaward.ie. Retrieved 2021-03-07.
  8. "Abeba Birhane | Lero". lero.ie. Retrieved 2021-03-05.
  9. @Abebab (December 6, 2021). "i'm a dr" (Tweet) via Twitter.
  10. Gibney, Elizabeth (2020-01-24). "The battle for ethical AI at the world's biggest machine-learning conference". Nature . 577 (7792): 609. Bibcode:2020Natur.577..609G. doi: 10.1038/d41586-020-00160-y . PMID   31992885. S2CID   210938518.
  11. "Building Wakanda". VentureBeat. 2020-09-07. Retrieved 2021-03-07.
  12. Prabhu, Vinay Uday; Birhane, Abeba (2020-07-23). "Large image datasets: A pyrrhic win for computer vision?". arXiv: 2006.16923 [cs.CY].
  13. Gorey, Colm (2020-07-13). "80m images used to train AI pulled after researchers find string of racist terms". Silicon Republic. Retrieved 2021-03-05.
  14. "80 Million Tiny Images". groups.csail.mit.edu. Retrieved 2021-03-07.
  15. "MIT takes down 80 Million Tiny Images data set due to racist and offensive content". VentureBeat. 2020-07-02. Retrieved 2021-03-07.
  16. Ustik, Georgina (2020-07-01). "MIT removes huge dataset that teaches AI systems to use racist, misogynistic slurs". Neural | The Next Web. Retrieved 2021-03-07.
  17. "AI researchers detail obstacles to data sharing in Africa". VentureBeat. 2021-02-23. Retrieved 2021-03-05.
  18. "AI experts urge machine learning researchers to tackle climate change". VentureBeat. 2019-12-16. Retrieved 2021-03-05.
  19. "Hall of Fame". 100 Brilliant Women in AI Ethics. Retrieved 2021-02-27.
  20. "Seven researchers from Irish universities take home Lero prizes". siliconrepublic.com. 6 September 2022.
  21. "The 100 Most Influential People in AI 2023". Time. Retrieved 2023-09-19.