Natural Language Engineering

Last updated

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate speech. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Language engineering involves the creation of natural language processing systems, whose cost and outputs are measurable and predictable. It is a distinct field contrasted to natural language processing and computational linguistics. A recent trend of language engineering is the use of Semantic Web technologies for the creation, archiving, processing, and retrieval of machine processable language data.

<span class="mw-page-title-main">Association for Computational Linguistics</span> Professional organization devoted to linguistics

The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on natural language processing. Its namesake conference is one of the primary high impact conferences for natural language processing research, along with EMNLP. The conference is held each summer in locations where significant computational linguistics research is carried out.

Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computational linguistics.

The Centre for Excellence in Computational Engineeringand Networking (CEN) at Amrita Vishwa Vidyapeetham, a research university in India, is a research and teaching center works on technologies to solving computational problems that can be applied in real world projects. The centre is involved in research projects funded by organizations like ISRO, NPOL, Indian Ministry of Electronics and Information Technology and Department of Science and Technology.

Geoffrey Sampson is Professor of Natural Language Computing in the Department of Informatics, University of Sussex. He produces annotation standards for compiling corpora (databases) of ordinary usage of the English language. His work has been applied in automatic language-understanding software, and in writing-skills training. He has also analysed Ronald Coase's "theory of the firm" and the economic and political implications of e-business.

<span class="mw-page-title-main">Karen Spärck Jones</span>

Karen Spärck Jones was a self-taught programmer and a pioneering British computer scientist responsible for the concept of inverse document frequency (IDF), a technology that underlies most modern search engines. She was an advocate for women in the field of computer science. She even came up with a slogan: “Computing is too important to be left to men.” In 2019, The New York Times published her belated obituary in its series Overlooked, calling her "a pioneer of computer science for work combining statistics and linguistics, and an advocate for women in the field." From 2008, to recognize her achievements in the fields of information retrieval (IR) and natural language processing (NLP), the Karen Spärck Jones Award is awarded to a new recipient with outstanding research in one or both of her fields.

Informatics is the study of computational systems. According to the ACM Europe Council and Informatics Europe, informatics is synonymous with computer science and computing as a profession, in which the central notion is transformation of information. In other countries, the term "informatics" is used with a different meaning in the context of library science, in which case it is synonymous with data storage and retrieval.

Jun'ichi Tsujii is a Japanese computer scientist specializing in natural language processing and text mining, particularly in the field of biology and bioinformatics.

The following outline is provided as an overview of and topical guide to natural-language processing:

Walter Daelemans is professor in computational linguistics at the University of Antwerp. He is also a research director of the Computational Linguistics and Psycholinguistics Research Center (CLiPS). Daelemans holds a Ph.D. from the Katholieke Universiteit Leuven.

<span class="mw-page-title-main">Kathleen McKeown</span> American computer scientist

Kathleen R. McKeown is an American computer scientist, specializing in natural language processing. She is currently the Henry and Gertrude Rothschild Professor of Computer Science and is the Founding Director of the Institute for Data Sciences and Engineering at Columbia University.

Georgia M. Green is an American linguist and academic. She is an emeritus professor at the University of Illinois at Urbana-Champaign. Her research has focused on pragmatics, speaker intention, word order and meaning. She has been an advisory editor for several linguistics journals or publishers and she serves on the usage committee for the American Heritage Dictionary.

Ann Alicia Copestake is professor of computational linguistics and head of the Department of Computer Science and Technology at the University of Cambridge and a fellow of Wolfson College, Cambridge.

Ruslan Mitkov is a professor at Lancaster University, and a researcher in Natural Language Processing and Computational Linguistics. He completed his PhD at Technical University of Dresden under the supervision of Nikolaus Joachim Lehmann. He has published more than 240 refereed papers and is best known for his contributions to Anaphora Resolution, and his seminal work in computer-aided generation of multiple-choice tests among others.

Lillian Lee is a computer scientist whose research involves natural language processing, sentiment analysis, and computational social science. She is a professor of computer science and information science at Cornell University, and co-editor-in-chief of the journal Transactions of the Association for Computational Linguistics.

Alice Geraldine Baltina ter Meulen is a Dutch linguist, logician, and philosopher of language whose research topics include genericity in linguistics, intensional logic, generalized quantifiers, discourse representation theory, and the linguistic representation of time. She is a professor emerita at the University of Geneva.

References