Naomi Sager

Last updated
Naomi Sager
Born1927 (age 9394)
Chicago, Illinois
NationalityAmerican
OccupationProfessor of computational linguistics
Known fornatural language processing for computers

Naomi Sager (born 1927) is an American computational linguistics research scientist. She is a former research professor at New York University, now retired. [1] She is a pioneer in the development of natural language processing for computers. [2]

Contents

Early life and education

Sager was born in Chicago, Illinois in 1927. In 1946 she earned a bachelor of philosophy degree from the University of Chicago. She obtained a Bachelor of Science in electrical engineering from Columbia University in 1953. [1]

Career

After graduating from Columbia, Sager worked for five years as an electronics engineer in the Biophysics Department of the Sloan-Kettering Institute for Cancer Research in New York City. [1] In 1959 she moved to the University of Pennsylvania, where she worked on natural language computer processing. She was part of the team that developed the first English language parsing program, running on the UNIVAC I. [3] Sager developed an algorithm to deal with syntactic ambiguity (where a sentence can be interpreted several ways due to ambiguity in its structure) and to convert sublanguage texts into suitable data formats for retrieval. [1] [4] This was "one of the first major practical applications of sublanguage analysis." [5] This work formed the basis for a PhD thesis, and in 1968 she was awarded a PhD in linguistics from the University of Pennsylvania. [1]

Her work in linguistics led her to New York University, where she collaborated with James Morris and Morris Salkoff to develop a parsing program based on natural language processing. In 1965 NYU launched the Linguistic String Project under Sager's leadership. It was aimed at developing computer methods to access information in the scientific and technical literature, based on linguistic principles. In particular, the team drew on Zellig Harris's discourse analysis methodology to develop a system for computer analysis of natural language. [6] Sager managed the project for 30 years until her retirement in 1995. [1]

At NYU she taught classes in natural language processing and advised doctoral students, many of whom (such as Jerry Hobbs and Carol Friedman) are now leaders in the field of natural language processing. [1]

Selected publications

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

Corpus linguistics is the study of language as a language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that reliable language analysis is more feasible with corpora collected in the field in its natural context ("realia"), and with minimal experimental interference.

Zellig Harris

Zellig Sabbettai Harris was an influential American linguist, mathematical syntactician, and methodologist of science. Originally a Semiticist, he is best known for his work in structural linguistics and discourse analysis and for the discovery of transformational structure in language. These developments from the first 10 years of his career were published within the first 25. His contributions in the subsequent 35 years of his career include transfer grammar, string analysis, elementary sentence-differences, algebraic structures in language, operator grammar, sublanguage grammar, a theory of linguistic information, and a principled account of the nature and origin of language.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

A sublanguage is a subset of a language. Sublanguages occur in natural language, computer programming language, and relational databases.

Discourse analysis

Discourse analysis (DA), or discourse studies, is an approach to the analysis of written, vocal, or sign language use, or any significant semiotic event.

<i>Syntactic Structures</i> Book by Noam Chomsky

Syntactic Structures is an influential work in linguistics by American linguist Noam Chomsky, originally published in 1957. It is an elaboration of his teacher's, Zellig Harris's, model of transformational generative grammar. A short monograph of about a hundred pages, Chomsky's presentation is recognized as one of the most significant studies of the 20th century. It contains the now-famous sentence "Colorless green ideas sleep furiously", which Chomsky offered as an example of a grammatically correct sentence that has no discernible meaning. Thus, Chomsky argued for the independence of syntax from semantics.

Treebank

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data. The exploitation of treebank data has been important ever since the first large-scale treebank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of treebanks is becoming more widely appreciated in linguistics research as a whole. For example, annotated treebank data has been crucial in syntactic research to test linguistic theories of sentence structure against large quantities of naturally occurring examples.

Linguistic categories include

LOLITA is a natural language processing system developed by Durham University between 1986 and 2000. The name is an acronym for "Large-scale, Object-based, Linguistic Interactor, Translator and Analyzer".

Formalism (linguistics)

In linguistics, formalism is a theoretical approach characterized by the idea that human language can be defined as a formal language like the language of mathematics and programming languages. It is contrasted with linguistic functionalism approaches like cognitive linguistics and usage-based linguistics.

Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.

Carol Friedman is a scientist and biomedical informatician. She is among the pioneers the use of expert systems in medical language processing and the explicit medical concept representation underpinning the use of entity–attribute–value modeling underpinning electronic medical records.

The following outline is provided as an overview of and topical guide to natural language processing:

Maurice Gross

Maurice Gross was a French linguist and scholar of Romance languages. Beginning in the late 1960s he developed Lexicon-Grammar, a method of formal description of languages with practical applications.

NooJ is a linguistic development environment software as well as a corpus processor constructed by Max Silberztein. NooJ allows linguists to construct the four classes of the Chomsky-Schützenberger hierarchy of generative grammars: Finite-State Grammars, Context-Free Grammars, Context-Sensitive Grammars as well as Unrestricted Grammars, using either a text editor, or a Graph editor.

Native-language identification (NLI) is the task of determining an author's native language (L1) based only on their writings in a second language (L2). NLI works through identifying language-usage patterns that are common to specific L1 groups and then applying this knowledge to predict the native language of previously unseen texts. This is motivated in part by applications in second-language acquisition, language teaching and forensic linguistics, amongst others.

Distributionalism was a general theory of language and a discovery procedure for establishing elements and structures of language based on observed usage. It can be seen as an elaboration of structuralism but takes a more computational approach. Originally mostly applied to understanding phonological processes and phonotactics, distributional methods were also applied to work on lexical semantics and provide the basis for the distributional hypothesis for meaning. Current computational approaches to learn the semantics of words from text in the form of word embeddings using machine learning are based on distributional theory.

Lexicon-Grammar is a method and a praxis of formalized description of human languages. It was developed by Maurice Gross since the end of the 1960s.

References

  1. 1 2 3 4 5 6 7 "Naomi Sager". New York University. Retrieved 4 March 2015.CS1 maint: discouraged parameter (link)
  2. Shortliffe, Edward H.; Cimino, James J. (eds.) (2014). Biomedical Informatics: Computer Applications in Health Care and Biomedicine (4th ed.). Springer. p. 257. ISBN   978-1-4471-4473-1.CS1 maint: extra text: authors list (link)
  3. Kornai, Andras, ed. (1999). Extended Finite State Models of Language, Volume 1. Cambridge University Press. p. 15. ISBN   978-0-521-63198-3.
  4. Aspects of Automated Natural Language Generation: 6th International Workshop on Natural Language Generation Trento, Italy, April 57, 1992. Springer Science & Business Media. 1992. p. 297. ISBN   978-3-540-55399-1.
  5. Kittredge, Richard; Lehrberger, John (eds.) (1982). Sublanguage: Studies of Language in Restricted Semantic Domains. Walter de Gruyter & Co. p. 2. ISBN   3-11-008244-6.CS1 maint: extra text: authors list (link)
  6. Sager, Naomi, and Nhan, Ngo Than, "The computability of strings, transformations, and sublanguage", pp. 78120. Chapter in The Legacy of Zellig Harris, Vol. 2, ed. by Bruce Nevin and Stephen M. Johnson, John Benjamins Publishing Co. (2002)