Julia Hirschberg | |
---|---|
Born | |
Nationality | American |
Alma mater | University of Pennsylvania University of Michigan |
Known for | Natural Language Processing |
Awards | American Academy of Arts and Sciences (2018) IEEE Fellow (2017) National Academy of Engineering (2017) ACM Fellow (2015) ACL Fellow (2011) AAAI Fellow (1994) International Speech Communication Association Fellow (2011) Honorary Doctorate (Hedersdoktor) KTH (2007) Columbia Engineering School Alumni Association Distinguished Faculty Teaching award (2009) IEEE James L. Flanagan Speech and Audio Processing Award (2011) ISCA Medal for Scientific Achievement (2011) |
Scientific career | |
Fields | Computer Science |
Institutions | |
Thesis | A Theory of Scalar Implicature (1985) |
Website | www |
Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing. [1]
Hirschberg was elected a member of the National Academy of Engineering in 2017 for contributions to the use of prosody in text-to-speech and spoken dialogue systems, and to audio browsing and retrieval.
She is currently the Percy K. and Vida L. W. Hudson Professor of Computer Science at Columbia University.
Julia Linn Bell Hirschberg received her first Ph.D degree in History (16th-century Mexico) from University of Michigan in 1976. She served on the History faculty of Smith College from 1974 to 1982. She subsequently shifted to Computer Science studies, receiving her M.S. in Computer and Information Science from University of Pennsylvania in 1982 and a Ph.D in Computer and Information Science from University of Pennsylvania in 1985.
Upon graduation from University of Pennsylvania in 1985, Hirschberg joined AT&T Bell Labs as a Member of Technical staff in the Linguistics Research Department, where she worked on improving prosody assignment for Text-to-Speech Synthesis (TTS) in the Bell Labs TTS system. She was promoted to Department Head in 1994 when she created a new Human Computer Interface Research Lab. She and her department remained at Bell Labs until 1996 when they moved to AT&T Labs Research as part of a corporate reorganization. In 2002, she joined the Columbia University faculty as a Professor in the Department of Computer Science. She served as Chair of the Computer Science Department from 2012 to 2018.
Hirschberg's research has included prosody, discourse structure, spoken dialogue systems, speech search, and more recently analysis of deceptive speech. [2] Hirschberg was among the first to combine Natural Language Processing (NLP) approaches to discourse and dialogue with speech research. She pioneered techniques in text analysis for prosody assignment in Text-to-Speech synthesis at Bell laboratories in the 1980s and 1990s, developing corpus-based statistical models based upon syntactic and discourse information which are in general use today in TTS systems. [3] [4] With Janet Pierrehumbert, she developed a theoretical model of intonational meaning. [5] She was a leader in the development of the ToBI conventions for intonational description, which have been extended to numerous languages and which today are the most widely used standard for intonational annotation. [6]
Hirschberg has been a pioneer together with Gregory Ward in much experimental work on intonational sources of language meaning and how these interact with pragmatic phenomena, particularly on the meaning of accent (intonational prominent) items and the meaning of intonational contours. [7] [8] She also has innovated in numerous other areas involving prosody and meaning, including the role of grammatical function and surface position in pitch accent location, [9] the use of prosody in disambiguating cue phrases (discourse markers) with Diane Litman, [10] the role of prosody in disambiguation in English, Italian, and Spanish with Cinzia Avesani and Pilar Prieto, [11] and the automatic identification of speech recognition errors using prosodic information, [12] At AT&T Labs she worked with Fernando Pereira, Steve Whittaker, and others on speech search [13] and developing new interfaces for speech navigation. [14] At Columbia, she and her students have continued and extended research on spoken dialogue systems (automatically detecting speech recognition errors [15] and inappropriate system queries, [16] modeling turn-taking behavior, [17] dialogue entrainment, [18] modeling and generating clarification dialogues [19] ); on the automatic classification of trust, charisma, [20] deception [21] and emotion [22] from speech; on speech summarization; [23] prosody translation, hedging behavior in text and speech, [24] text-to-speech synthesis, and speech search in low resource languages. [25] She also holds several patents in TTS and in speech search. Corpora she and collaborators have collected include the Boston Directions Corpus, the Columbia SRI Colorado Deception Corpus, and the Columbia Games Corpus.
She has served on numerous technical boards and editorial committees, and is now on the Computing Research Association's (CRA) Board of Directors and serves as co-chair of CRA-W. [26] She is also noted for her leadership in broadening participation in computing. She has served as a member of the CRA Committee on the Status of Women in Computing Research CRA-W since 2010.
Hirschberg's notable awards include:
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.
In spoken language analysis, an utterance is a continuous piece of speech, by one person, before or after which there is silence on the part of the person. In the case of oral languages, it is generally, but not always, bounded by silence. Utterances do not exist in written language; only their representations do. They can be represented and delineated in written language in many ways.
In linguistics, prosody is the study of elements of speech that are not individual phonetic segments but which are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals.
In linguistics, a segment is "any discrete unit that can be identified, either physically or auditorily, in the stream of speech". The term is most used in phonetics and phonology to refer to the smallest elements in a language, and this usage can be synonymous with the term phone.
A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel.
In linguistics, intonation is the variation in pitch used to indicate the speaker's attitudes and emotions, to highlight or focus an expression, to signal the illocutionary act performed by a sentence, or to regulate the flow of discourse. For example, the English question "Does Maria speak Spanish or French?" is interpreted as a yes-or-no question when it is uttered with a single rising intonation contour, but is interpreted as an alternative question when uttered with a rising contour on "Spanish" and a falling contour on "French". Although intonation is primarily a matter of pitch variation, its effects almost always work hand-in-hand with other prosodic features. Intonation is distinct from tone, the phenomenon where pitch is used to distinguish words or to mark grammatical features.
Janet Pierrehumbert is Professor of Language Modelling in the Oxford e-Research Centre at the University of Oxford and a senior research fellow of Trinity College, Oxford. She developed an intonational model which includes a grammar of intonation patterns and an explicit algorithm for calculating pitch contours in speech, as well as an account of intonational meaning. It has been widely influential in speech technology, psycholinguistics, and theories of language form and meaning. Pierrehumbert is also affiliated with the New Zealand Institute of Language Brain and Behaviour at the University of Canterbury.
Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction. Pronunciation assessment does not determine unknown speech but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their intelligibility to listeners, sometimes along with often inconsequential prosody such as intonation, pitch, tempo, rhythm, and stress. Pronunciation assessment is also used in reading tutoring, for example in products such as Microsoft Teams and from Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia.
A spoken dialog system (SDS) is a computer system able to converse with a human with voice. It has two essential components that do not exist in a written text dialog system: a speech recognizer and a text-to-speech module. It can be further distinguished from command and control speech systems that can respond to requests but do not attempt to maintain continuity over time.
In linguistics, a prosodic unit is a segment of speech that occurs with specific prosodic properties. These properties can be those of stress, intonation, or tonal patterns.
ToBI is a set of conventions for transcribing and annotating the prosody of speech. The term "ToBI" is sometimes used to refer to the conventions used for describing American English specifically, which was the first ToBI system, developed by Mary Beckman and Janet Pierrehumbert, among others. Other ToBI systems have been defined for a number of languages; for example, J-ToBI refers to the ToBI conventions for Tokyo Japanese, and an adaptation of ToBI to describe Dutch intonation was developed by Carlos Gussenhoven, and called ToDI. Another variation of ToBI, called IViE, was established in 1998 to enable comparison between several dialects of British English.
Jennifer Sandra Cole is a professor of linguistics and Director of the Prosody and Speech Dynamics Lab at Northwestern University. Her research uses experimental and computational methods to study the sound structure of language. She was the founding General Editor of Laboratory Phonology (2009–2015) and a founding member of the Association for Laboratory Phonology.
Laboratory phonology is an approach to phonology that emphasizes the synergy between phonological theory and scientific experiments, including laboratory studies of human speech and experiments on the acquisition and productivity of phonological patterns. The central goal of laboratory phonology is "gaining an understanding of the relationship between the cognitive and physical aspects of human speech" through the use of an interdisciplinary approach that promotes scholarly exchange across disciplines, bridging linguistics with psychology, electrical engineering, and computer science, and other fields. Although spoken speech has represented the major area of research, the investigation of sign languages and manual signs as encoding elements is also included in laboratory phonology. Important antecedents of the field include work by Kenneth N. Stevens and Gunnar Fant on the acoustic theory of speech production, Ilse Lehiste's work on prosody and intonation, and Peter Ladefoged's work on typological variation and methods for data capture. Current research in laboratory phonology draws heavily on the theories of metrical phonology and autosegmental phonology which are sought to be tested with help of experimental procedures, in laboratory settings, or through linguistic data collection at field sites, and though evaluation with statistical methods, such as exploratory data analysis.
The Spoken English Corpus (SEC) is a speech corpus collection of recordings of spoken British English compiled during 1984–1987. The corpus manual can be found on ICAME.
Jacqueline Vaissière is a French phonetician.
Klaus J. Kohler is a German phonetician.
Barbara J. Grosz CorrFRSE is an American computer scientist and Higgins Professor of Natural Sciences at Harvard University. She has made seminal contributions to the fields of natural language processing and multi-agent systems. With Alison Simmons, she is co-founder of the Embedded EthiCS programme at Harvard, which embeds ethics lessons into computer science courses.
Peter John Roach is a British retired phonetician. He taught at the Universities of Leeds and Reading, and is best known for his work on the pronunciation of British English.
Dafydd Gibbon is a British emeritus professor of English and General Linguistics at Bielefeld University in Germany, specialising in computational linguistics, the lexicography of spoken languages, applied phonetics and phonology. He is particularly concerned with endangered languages and has received awards from the Ivory Coast, Nigeria and Poland.
Ani Nenkova is Principal Scientist at Adobe Research, currently on leave from her position as an Associate Professor of Computer and Information Science at the University of Pennsylvania. Her research focuses on computational linguistics and artificial intelligence, with an emphasis on developing computational methods for analysis of text quality and style, discourse, affect recognition, and summarization.