Janus Recognition Toolkit

Last updated October 17, 2022

Janus Recognition Toolkit (JRTk), sometimes referred to as Janus, is a general purpose speech recognition toolkit developed and maintained by the Interactive Systems Laboratories at Carnegie Mellon University and Karlsruhe Institute of Technology. It is useful for both research and application development and is part of the JANUS speech-to-speech translation system.^[1]

The JRTk provides a flexible Tcl/Tk script based environment which enables researchers to build state-of-the-art speech recognizers and allows them to develop, implement, and evaluate new methods. It implements an object oriented approach that unlike other toolkits is not a set of libraries and precompiled modules but a programmable shell with transparent, yet efficient objects.

Since version 5 JRTk features the IBIS decoder, a one-pass decoder that is based on a re-entrant single pronunciation prefix tree and makes use of the concept of linguistic context polymorphism. It is therefore able to incorporate full linguistic knowledge at an early stage. It is possible to decode in one pass, using the same engine in combination with a statistical n-gram language model as well as context- free grammars. It is also possible to use the decoder to rescore lattices in a very efficient way.

JRTk utilizes the concept of Hidden Markov Models (HMMs) for acoustic modeling and offers many state-of-the-art techniques for acoustic pre-processing, acoustic model training, and speech decoding. Through its flexible, object oriented architecture it allows to configure all components in a very flexible way (e.g., pre-processing steps to execute, HMM topology, training sequence, algorithm parameters, adaptation sequences, etc.), without the need to modify source code or recompile.

JRTk has been used by the Interactive System Labs in many projects for speech recognition, such as:

EU-BRIDGE^[2]
EVEIl-3D^[3]
BABEL^[4]
Quaero ^[5]
SFB 588^[6]
TC-STAR^[7]
FAME^[8]
Verbmobil
NESPOLE!^[9]

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it $— with unobservable (" hidden ") states. As part of the definition, HMM requires that there be an observable process whose outcomes are "influenced" by the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about by observing HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must not affect the outcome of at$

Keyword spotting is a problem that was historically first defined in the context of speech processing. In speech processing, keyword spotting deals with the identification of keywords in utterances.

Quaero was a European research and development program with the goal of developing multimedia and multilingual indexing and management tools for professional and general public applications. The European Commission approved the aid granted by France on 11 March 2008.

The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the European Research Area (ERA). Starting in 2014, the funding programmes were named Horizon.

Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation, and has more recently been superseded by neural machine translation in many applications.

CMU Sphinx, also called Sphinx for short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers and an acoustic model trainer (SphinxTrain).

Julius is a speech recognition engine, specifically a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. It can perform almost real-time computing (RTC) decoding on most current personal computers (PCs) in 60k word dictation task using word trigram (3-gram) and context-dependent Hidden Markov model (HMM). Major search methods are fully incorporated.

As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.

The Intelligence Advanced Research Projects Activity (IARPA) is an organization within the Office of the Director of National Intelligence responsible for leading research to overcome difficult challenges relevant to the United States Intelligence Community. IARPA characterizes its mission as follows: "To envision and lead high-risk, high-payoff research that delivers innovative technology for future overwhelming intelligence advantage."

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.

The NECOBELAC Project is a network of collaboration between Europe, Latin American and Caribbean (LAC) countries to spread know-how in scientific writing and provide the best tools to exploit open access information for the safeguard of public health.

Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology. Waibel's research interests focus on speech recognition and translation and human communication signals and systems. Waibel is known for the time delay neural network (TDNN), which he developed. It is the first convolutional neural network (CNN) trained by gradient descent, using the backpropagation algorithm. Alex Waibel introduced the TDNN 1987 at ATR in Japan.

The HPC-Europa programmes are European Union (EU) funded research initiatives in the field of high-performance computing (HPC). The programmes concentrate on the development of a European Research Area, and in particular, improving the ability of European researchers to access the European supercomputing infrastructure provided by the programmes' partners. The programme is currently in its third iteration, known as "HPC-Europa3" or "HPCE3", and fully titled the "Transnational Access Programme for a Pan-European Network of HPC Research Infrastructures and Laboratories for scientific computing".

Professor Nelson Morgan is the former director of the International Computer Science Institute (ICSI), where he was also the Speech Group leader. He is also a professor in residence (emeritus) of electrical engineering and computer science at the University of California, Berkeley. He recently has focused on campaign reform through empowering volunteerism. In that work, he co-founded UpRise Campaigns with Antonia Scatton, and later co-founded Neighbors Forward AZ with Alison Porter.

Kaldi is an open-source speech recognition toolkit written in C++ for speech recognition and signal processing, freely available under the Apache License v2.0.

In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR).

The EuroMatrixPlus is a project that ran from March 2009 to February 2012. EuroMatrixPlus succeeded a project called EuroMatrix and continued in further development and improvement of machine translation (MT) systems for languages of the European Union (EU).

The IARPA Babel program developed speech recognition technology for noisy telephone conversations. The main goal of the program was to improve the performance of keyword search on languages with very little transcribed data, i.e. low-resource languages. Data from 26 languages was collected with certain languages being held-out as "surprise" languages to test the ability of the teams to rapidly build a system for a new language.

References

↑ Sebastian Stüker. "KIT - Janus Recognition Toolkit". Isl.ira.uka.de. Retrieved 2012-04-23.
↑ Patricia Lichtblau (2012-02-01). "EU-Bridge - Homepage". Eu-bridge.eu. Retrieved 2012-04-23.
↑ Esben Eidevik (2012-04-15). "EVEIL-3D-Startseite". Eveil-3d.eu. Retrieved 2012-04-23.
↑ "IARPA - Solicitations - Office of Incisive Analysis, Babel Program". Iarpa.gov. Archived from the original on 2012-04-22. Retrieved 2012-04-23.
↑ "Bienvenue sur le site de Quaero". Quaero.org. Retrieved 2012-04-23.
↑ "SFB 588 - Humanoid Robots". Sfb588.uni-karlsruhe.de. Retrieved 2012-04-23.
↑ "European Commission : CORDIS : Projects & Results Service : Technology and corpora for speech to speech translation". Cordis.europa.eu. Retrieved 2016-07-16.
↑ "European Commission : CORDIS : Projects & Results Service : Facilitating Agent in Multicultural Exchange". Cordis.europa.eu. Retrieved 2016-07-16.
↑ "European Commission : CORDIS : Projects & Results Service : NEgotiating through SPOken Language in E-commerce". Cordis.europa.eu. Retrieved 2016-07-16.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Sebastian Stüker. "KIT - Janus Recognition Toolkit". Isl.ira.uka.de. Retrieved 2012-04-23.

[2] Patricia Lichtblau (2012-02-01). "EU-Bridge - Homepage". Eu-bridge.eu. Retrieved 2012-04-23.

[3] Esben Eidevik (2012-04-15). "EVEIL-3D-Startseite". Eveil-3d.eu. Retrieved 2012-04-23.

[4] "IARPA - Solicitations - Office of Incisive Analysis, Babel Program". Iarpa.gov. Archived from the original on 2012-04-22. Retrieved 2012-04-23.

[5] "Bienvenue sur le site de Quaero". Quaero.org. Retrieved 2012-04-23.

[6] "SFB 588 - Humanoid Robots". Sfb588.uni-karlsruhe.de. Retrieved 2012-04-23.

[7] "European Commission : CORDIS : Projects & Results Service : Technology and corpora for speech to speech translation". Cordis.europa.eu. Retrieved 2016-07-16.

[8] "European Commission : CORDIS : Projects & Results Service : Facilitating Agent in Multicultural Exchange". Cordis.europa.eu. Retrieved 2016-07-16.

[9] "European Commission : CORDIS : Projects & Results Service : NEgotiating through SPOken Language in E-commerce". Cordis.europa.eu. Retrieved 2016-07-16.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]