Sign language recognition

Last updated

Sign Language Recognition (shortened generally as SLR) is a computational task that involves recognizing actions from sign languages. [1] This is an essential problem to solve especially in the digital world to bridge the communication gap that is faced by people with hearing impairments.

Contents

Solving the problem usually requires not only annotated color (RGB) data, but various other modalities like depth, sensory information, etc. are also useful.

Isolated sign language recognition

ISLR (also known as word-level SLR) is the task of recognizing individual signs or tokens called glosses from a given segment of signing video clip. This is commonly seen as a classification problem when recognizing from isolated videos, but requires other things like video segmentation to be handled when used for real-time applications.

Continuous sign language recognition

In CSLR (also known as sign language transcription), given a sign language sequence, the task is to predict all the signs (or glosses) in the video. This is more suitable for real-world transcription of sign languages. Depending on how it is solved, it can also sometimes be seen as an extension to the ISLR task.

Continuous sign language translation

Sign language translation refers to the problem of translating a sequence of signs (called glosses) to any required spoken language. It is generally modeled as an extension to the CSLR problem.

Related Research Articles

In the field of artificial intelligence, the most difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems, assuming intelligence is computational, is equivalent to that of solving the central artificial intelligence problem—making computers as intelligent as people, or strong AI. To call a problem AI-complete reflects an attitude that it would not be solved by a simple specific algorithm.

In the computer science subfield of algorithmic information theory, a Chaitin constant or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt. These numbers are formed from a construction due to Gregory Chaitin.

<span class="mw-page-title-main">Computer vision</span> Computerized information extraction from images

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, leading to non-linearity, randomness, collective dynamics, hierarchy, and emergence.

In computer science, an LALR parser or Look-Ahead, Left-to-right, Rightmost Derivation parser is part of the compiling process where human readable text is converted into computer instructions. An LALR parser is a software tool to process (parse) code into a very specific internal representation that the compiler can work from. This happens according to a set of production rules specified by a formal grammar for a computer language.

<span class="mw-page-title-main">Natural language processing</span> Field of linguistics and computer science

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Theory of computation</span> Academic subfield of computer science

In theoretical computer science and mathematics, the theory of computation is the branch that deals with what problems can be solved on a model of computation, using an algorithm, how efficiently they can be solved or to what degree. The field is divided into three major branches: automata theory and formal languages, computability theory, and computational complexity theory, which are linked by the question: "What are the fundamental capabilities and limitations of computers?".

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobservable ("hidden") states. As part of the definition, HMM requires that there be an observable process whose outcomes are "influenced" by the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about by observing HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction.

<span class="mw-page-title-main">Gloss (annotation)</span> Brief marginal notation of the meaning of a word or wording in a text

A gloss is a brief notation, especially a marginal one or an interlinear one, of the meaning of a word or wording in a text. It may be in the language of the text or in the reader's language if that is different.

Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is closely linked to the existence of an algorithm to solve the problem.

<span class="mw-page-title-main">Deterministic finite automaton</span> Finite-state machine

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

<span class="mw-page-title-main">Automated planning and scheduling</span> Branch of artificial intelligence

Automated planning and scheduling, sometimes denoted as simply AI planning, is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex and must be discovered and optimized in multidimensional space. Planning is also related to decision theory.

The expression computational intelligence (CI) usually refers to the ability of a computer to learn a specific task from data or experimental observation. Even though it is commonly considered a synonym of soft computing, there is still no commonly accepted definition of computational intelligence.

In linguistics and pedagogy, an interlinear gloss is a gloss placed between lines, such as between a line of original text and its translation into another language. When glossed, each line of the original text acquires one or more corresponding lines of transcription known as an interlinear text or interlinear glossed text (IGT)—interlinear for short. Such glosses help the reader follow the relationship between the source text and its translation, and the structure of the original language. In its simplest form, an interlinear gloss is simply a literal, word-for-word translation of the source text.

Task-based language teaching (TBLT), also known as task-based instruction (TBI), focuses on the use of authentic language to complete meaningful tasks in the target language. Such tasks can include visiting a doctor, conducting an interview, or calling customer service for help. Assessment is primarily based on task outcome rather than on accuracy of prescribed language forms. This makes TBLT especially popular for developing target language fluency and student confidence. As such, TBLT can be considered a branch of communicative language teaching (CLT).

Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system.

<span class="mw-page-title-main">Bacterial transcription</span>

Bacterial transcription is the process in which a segment of bacterial DNA is copied into a newly synthesized strand of messenger RNA (mRNA) with use of the enzyme RNA polymerase.

The following outline is provided as an overview of and topical guide to natural-language processing:

References

  1. Cooper, Helen; Holt, Brian; Bowden, Richard (2011). "Sign Language Recognition". Visual Analysis of Humans. Springer. pp. 539–562. doi:10.1007/978-0-85729-997-0_27. ISBN   978-0-85729-996-3. S2CID   1297591.{{cite book}}: |journal= ignored (help)