Language technology

Last updated

Language technology, often called human language technology (HLT), studies methods of how computer programs or electronic devices can analyze, produce, modify or respond to human texts and speech. [1] Working with language technology often requires broad knowledge not only about linguistics but also about computer science. It consists of natural language processing (NLP) and computational linguistics (CL) on the one hand, many application oriented aspects of these, and more low-level aspects such as encoding and speech technology on the other hand.

Note that these elementary aspects are normally not considered to be within the scope of related terms such as natural language processing and (applied) computational linguistics, which are otherwise near-synonyms. As an example, for many of the world's lesser known languages, the foundation of language technology is providing communities with fonts and keyboard setups so their languages can be written on computers or mobile devices. [2]

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

The following outline is provided as an overview and topical guide to linguistics:

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subset of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.

<span class="mw-page-title-main">Theoretical computer science</span> Subfield of computer science and mathematics

Theoretical Computer Science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation (TOC), formal language theory, the lambda calculus and type theory.

Computational semiotics is an interdisciplinary field that applies, conducts, and draws on research in logic, mathematics, the theory and practice of computation, formal and natural language studies, the cognitive sciences generally, and semiotics proper. The term encompasses both the application of semiotics to computer hardware and software design and, conversely, the use of computation for performing semiotic analysis. The former focuses on what semiotics can bring to computation; the latter on what computation can bring to semiotics.

In the field of human–computer interaction, a Wizard of Oz experiment is a research experiment in which subjects interact with a computer system that subjects believe to be autonomous, but which is actually being operated or partially operated by an unseen human being.

Martin Kay was a computer scientist, known especially for his work in computational linguistics.

The Centre for Excellence in Computational Engineeringand Networking (CEN) at Amrita Vishwa Vidyapeetham, a research university in India, is a research and teaching center works on technologies to solving computational problems that can be applied in real world projects. The centre is involved in research projects funded by organizations like ISRO, NPOL, Indian Ministry of Electronics and Information Technology and Department of Science and Technology.The center is involved in the areas of Artificial intelligence, Cyber security, Computer networks, Computational Linguistics, Data science and Natural Language Processing. A translation project is underway to develop tools to translate web content from English to Indian languages. Research is also ongoing in the area of speech translation.

Linguistic categories include

Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.

Linguistics is the scientific study of language. Linguistics is based on a theoretical as well as a descriptive study of language and is also interlinked with the applied fields of language studies and language learning, which entails the study of specific languages. Before the 20th century, linguistics evolved in conjunction with literary study and did not employ scientific methods. Modern-day linguistics is considered a science because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language – i.e., the cognitive, the social, the cultural, the psychological, the environmental, the biological, the literary, the grammatical, the paleographical, and the structural.

<span class="mw-page-title-main">Digital infinity</span> Term in theoretical linguistics

Digital infinity is a technical term in theoretical linguistics. Alternative formulations are "discrete infinity" and "the infinite use of finite means". The idea is that all human languages follow a simple logical principle, according to which a limited set of digits—irreducible atomic sound elements—are combined to produce an infinite range of potentially meaningful expressions.

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">Pascale Fung</span> Professor

Pascale Fung (馮雁) is a professor in the Department of Electronic & Computer Engineering and the Department of Computer Science & Engineering at the Hong Kong University of Science & Technology(HKUST). She is the director of the newly established, multidisciplinary Centre for AI Research (CAiRE) at HKUST. She is an elected Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for her “contributions to human-machine interactions”, an elected Fellow of the International Speech Communication Association for “fundamental contributions to the interdisciplinary area of spoken language human-machine interactions” and an elected Fellow of the Association for Computational Linguistics (ACL) for her “significant contributions toward statistical NLP, comparable corpora, and building intelligent systems that can understand and empathize with humans”.

Madeleine Ashcraft Bates is a researcher in natural language processing who worked at BBN Technologies in Cambridge, Massachusetts from the early 1970s to the late 1990s. She was president of the Association for Computational Linguistics in 1985, and co-editor of the book Challenges in Natural Language Processing (1993).

<span class="mw-page-title-main">Vera Demberg</span> Professor of Computer Science and Computational Linguistics

Vera Demberg is a German computational linguist and professor of computer science and computational linguistics at Saarland University.

References

  1. Uszkoreit, Hans. "DFKI-LT - What is Language Technology" . Retrieved 16 November 2018.
  2. "SIL Writing Systems Technology". sil.org. Retrieved 9 December 2019.