Emily M. Bender | |
---|---|
Born | 1973 (age 50–51) |
Known for | Research on the risks of large language models and ethics of NLP; coining the term 'Stochastic parrot'; research on the use of Head-driven phrase structure grammar in computational linguistics |
Spouse | Vijay Menon [1] |
Mother | Sheila Bender [2] |
Academic background | |
Alma mater | UC Berkeley and Stanford University [3] [4] |
Thesis | Syntactic variation and linguistic competence: The case of AAVE copula absence (2000 [3] [4] ) |
Doctoral advisor | Tom Wasow Penelope Eckert [4] |
Academic work | |
Discipline | Linguistics |
Sub-discipline | Syntax,computational linguistics |
Institutions | University of Washington |
Emily Menon Bender (born 1973) is an American linguist who is a professor at the University of Washington. She specializes in computational linguistics and natural language processing. She is also the director of the University of Washington's Computational Linguistics Laboratory. [5] [6] She has published several papers on the risks of large language models and on ethics in natural language processing. [7]
Bender earned an AB in Linguistics from UC Berkeley in 1995. She received her MA from Stanford University in 1997 and her PhD from Stanford in 2000 for her research on syntactic variation and linguistic competence in African American Vernacular English (AAVE). [8] [3] She was supervised by Tom Wasow and Penelope Eckert. [4]
Before working at University of Washington,Bender held positions at Stanford University,UC Berkeley and worked in industry at YY Technologies. [9] She currently holds several positions at the University of Washington,where she has been faculty since 2003,including professor in the Department of Linguistics,adjunct professor in the Department of Computer Science and Engineering,faculty director of the Master of Science in Computational Linguistics, [10] and director of the Computational Linguistics Laboratory. [11] Bender is the current holder of the Howard and Frances Nostrand Endowed Professorship. [12] [13]
Bender was elected VP-elect of the Association for Computational Linguistics in 2021. [14] Bender served as VP-elect in 2022,moving to Vice-President in 2023. She is serving as President through 2024, [15] [16] and will serve as Past President in 2025. Bender was elected a Fellow of the American Association for the Advancement of Science in 2022. [17]
Bender has published research papers on the linguistic structures of Japanese,Chintang,Mandarin,Wambaya,American Sign Language and English. [18]
Bender has constructed the LinGO Grammar Matrix,an open-source starter kit for the development of broad-coverage precision HPSG grammars. [19] [20] In 2013,she published Linguistic Fundamentals for Natural Language Processing:100 Essentials from Morphology and Syntax, and in 2019,she published Linguistic Fundamentals for Natural Language Processing II:100 Essentials from Semantics and Pragmatics with Alex Lascarides,which both explain basic linguistic principles in a way that makes them accessible to NLP practitioners.[ citation needed ]
In 2021,Bender presented a paper,"On the Dangers of Stochastic Parrots:Can Language Models Be Too Big? 🦜" co-authored with Google researcher Timnit Gebru and others at the ACM Conference on Fairness,Accountability,and Transparency [21] that Google tried to block from publication,part of a sequence of events leading to Gebru departing from Google,the details of which are disputed. [22] The paper concerned ethical issues in building natural language processing systems using machine learning from large text corpora. [23] Since then,she has invested efforts to popularize AI ethics and has taken a stand against hype over large language models. [24] [25]
The Bender Rule,which originated from the question Bender repeatedly asked at the research talks,is research advice for computational scholars to "always name the language you're working with". [1]
She draws a distinction between linguistic form versus linguistic meaning. [1] Form refers to the structure of language (e.g. syntax),whereas meaning refers to the ideas that language represents. In a 2020 paper,she argued that machine learning models for natural language processing which are trained only on form,without connection to meaning,cannot meaningfully understand language. [26] Therefore,she has argued that tools like ChatGPT have no way to meaningfully understand the text that they process,nor the text that they generate.[ citation needed ]
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.
In linguistics, syntax is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency), agreement, the nature of crosslinguistic variation, and the relationship between form and meaning (semantics). There are numerous approaches to syntax that differ in their central assumptions and goals.
Head-driven phrase structure grammar (HPSG) is a highly lexicalized, constraint-based grammar developed by Carl Pollard and Ivan Sag. It is a type of phrase structure grammar, as opposed to a dependency grammar, and it is the immediate successor to generalized phrase structure grammar. HPSG draws from other fields such as computer science and uses Ferdinand de Saussure's notion of the sign. It uses a uniform formalism and is organized in a modular way which makes it attractive for natural language processing.
Generative grammar is a research tradition in linguistics that aims to explain the cognitive basis of language by formulating and testing explicit models of humans' subconscious grammatical knowledge. Generative linguists, or generativists, tend to share certain working assumptions such as the competence–performance distinction and the notion that some domain-specific aspects of grammar are partly innate in humans. These assumptions are rejected in non-generative approaches such as usage-based models of language. Generative linguistics includes work in core areas such as syntax, semantics, phonology, psycholinguistics, and language acquisition, with additional extensions to topics including biolinguistics and music cognition.
Ivan Andrew Sag was an American linguist and cognitive scientist. He did research in areas of syntax and semantics as well as work in computational linguistics.
Paul Smolensky is Krieger-Eisenhower Professor of Cognitive Science at the Johns Hopkins University and a Senior Principal Researcher at Microsoft Research, Redmond Washington.
Joan Wanda Bresnan FBA is Sadie Dernham Patek Professor in Humanities Emerita at Stanford University. She is best known as one of the architects of the theoretical framework of lexical functional grammar.
Eva Hajičová [ˈɛva ˈɦajɪt͡ʃovaː] is a Czech linguist, specializing in topic–focus articulation and corpus linguistics. In 2006, she was awarded the Association for Computational Linguistics (ACL) Lifetime Achievement Award. She was named a fellow of the ACL in 2011.
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax, semantics (meaning), morphology, phonetics, phonology, and pragmatics. Subdisciplines such as biolinguistics and psycholinguistics bridge many of these divisions.
Lauri Juhani Karttunen was an adjunct professor in linguistics at Stanford and an ACL Fellow.
The following outline is provided as an overview of and topical guide to natural-language processing:
Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.
Georgia M. Green is an American linguist and academic. She is an emeritus professor at the University of Illinois at Urbana-Champaign. Her research has focused on pragmatics, speaker intention, word order and meaning. She has been an advisory editor for several linguistics journals or publishers and she serves on the usage committee for the American Heritage Dictionary.
Dynamic Syntax (DS) is a grammar formalism and linguistic theory whose overall aim is to explain the real-time processes of language understanding and production, and describe linguistic structures as happening step-by-step over time. Under the DS approach, syntactic knowledge is understood as the ability to incrementally analyse the structure and content of spoken and written language in context and in real-time. While it posits representations similar to those used in Combinatory categorial grammars (CCG), it builds those representations left-to-right going word-by-word. Thus it differs from other syntactic models which generally abstract away from features of everyday conversation such as interruption, backtracking, and self-correction. Moreover, it differs from other approaches in that it does not postulate an independent level of syntactic structure over words.
Timnit Gebru is an Eritrean Ethiopian-born computer scientist who works in the fields of artificial intelligence (AI), algorithmic bias and data mining. She is a co-founder of Black in AI, an advocacy group that has pushed for more Black roles in AI development and research. She is the founder of the Distributed Artificial Intelligence Research Institute (DAIR).
Bonnie Jean Dorr is an American computer scientist specializing in natural language processing, machine translation, automatic summarization, social computing, and explainable artificial intelligence. She is a professor and director of the Natural Language Processing Research Laboratory in the Department of Computer & Information Science & Engineering at the University of Florida. Gainesville, Florida She is professor emerita of computer science and linguistics and former dean at the University of Maryland, College Park, former associate director at the Florida Institute for Human and Machine Cognition,, and former president of the Association for Computational Linguistics.
Margaret Mitchell is a computer scientist who works on algorithmic bias and fairness in machine learning. She is most well known for her work on automatically removing undesired biases concerning demographic groups from machine learning models, as well as more transparent reporting of their intended use.
Mona Talat Diab is a computer science professor and director of Carnegie Mellon University's Language Technologies Institute. Previously, she was a professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning.
In machine learning, the term stochastic parrot is a metaphor to describe the theory that large language models, though able to generate plausible language, do not understand the meaning of the language they process. The term was coined by Emily M. Bender in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell.