HFST

Last updated
Helsinki Finite-State Technology
Developer(s) HFST team
Initial release2008;16 years ago (2008)
Stable release
3.15.4 / February 13, 2021;3 years ago (2021-02-13) [1]
Repository
Written in C++, Prolog, Python
Operating system Cross-platform: Linux, Mac OS X, Windows
Platform x86
Available inEnglish
Type Finite-state toolkit
License GPLv3, part Apache
Website hfst.github.io

Helsinki Finite-State Technology (HFST) is a computer programming library and set of utilities for natural language processing with finite-state automata and finite-state transducers. It is free and open-source software, released under a mix of the GNU General Public License version 3 (GPLv3) and the Apache License.

Contents

Features

The library functions as an interchanging interface to multiple backends, such as OpenFST, foma and SFST. The utilities comprise various compilers, such as hfst-twolc (a compiler for morphological two-level rules), [2] hfst-lexc (a compiler for lexicon definitions) and hfst-regexp2fst (a regular expression compiler). Functions from Xerox's proprietary scripting language xfst is duplicated in hfst-xfst, and the pattern matching utility pmatch in hfst-pmatch, which goes beyond the finite-state formalism in having recursive transition networks (RTNs).

The library and utilities are written in C++, with an interface to the library in Python and a utility for looking up results from transducers ported to Java and Python.

Transducers in HFST may incorporate weights depending on the backend. For performing FST operations, this is currently only possible via the OpenFST backend. HFST provides two native backends, one designed for fast lookup (hfst-optimized-lookup), the other for format interchange. Both of them can be weighted.

Uses

HFST has been used for writing various linguistic tools, such as spell-checkers, hyphenators, and morphologies. [3] [4] Morphological dictionaries written in other formalisms have also been converted to HFST's formats. [5]

See also

Notes

  1. "Releases · hfst/hfst". github.com. Retrieved 2021-04-12.
  2. "A Short History of Two-Level Morphology".
  3. "GitHub - flammie/Omorfi: Open morphology for Finnish". GitHub . 2019-02-23.
  4. "How to Configure and Optimise Spellers".
  5. "Helsinki Finite-State Technology - Browse /Resources at SourceForge.net".

Related Research Articles

Programming languages can be grouped by the number and types of paradigms supported.

<span class="mw-page-title-main">Konsole</span> Terminal emulator

Konsole is a free and open-source terminal emulator graphical application which is part of KDE Applications and ships with the KDE desktop environment. Konsole was originally written by Lars Doelle. It ls licensed under the GPL-2.0-or-later and the GNU Free Documentation License.

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton (FSA) that maps between two sets of symbols. An FST is more general than an FSA. An FSA defines a formal language by defining a set of accepted strings, while an FST defines a relation between sets of strings.

IronPython is an implementation of the Python programming language targeting the .NET and Mono frameworks. The project is currently maintained by a group of volunteers at GitHub. It is free and open-source software, and can be implemented with Python Tools for Visual Studio, which is a free and open-source extension for Microsoft's Visual Studio IDE.

<span class="mw-page-title-main">Poppler (software)</span> Free library for creating PDF documents

Poppler is a free and open-source software library for rendering Portable Document Format (PDF) documents. Its development is supported by freedesktop.org. Commonly used on Linux systems, it powers the PDF viewers of the GNOME and KDE desktop environments.

Constraint grammar (CG) is a methodological paradigm for natural language processing (NLP). Linguist-written, context-dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation, inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally or globally. Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated, or blocked by interfering words or tags. Typical CGs consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing a high degree of robustness.

Enthought, Inc. is a software company based in Austin, Texas, United States that develops scientific and analytic computing solutions using primarily the Python programming language. It is best known for the early development and maintenance of the SciPy library of mathematics, science, and engineering algorithms and for its Python for scientific computing distribution Enthought Canopy.

<span class="mw-page-title-main">FEniCS Project</span>

The FEniCS Project is a collection of free and open-source software components with the common goal to enable automated solution of differential equations. The components provide scientific computing tools for working with computational meshes, finite-element variational formulations of ordinary and partial differential equations, and numerical linear algebra.

<span class="mw-page-title-main">Flask (web framework)</span> Python web framework

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. However, Flask supports extensions that can add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools.

<span class="mw-page-title-main">Veusz</span> Plotting software

Veusz is a scientific plotting package. Veusz is a Qt application written in Python, PyQt and NumPy. It is freely available for anyone to distribute under the terms of the GPL. It is designed to produce publication-quality plots. The name should be pronounced as "views".

Foma is a free and open source finite-state toolkit created and maintained by Mans Hulden. It includes a compiler, programming language, and C library for constructing finite-state automata and transducers (FST's) for various uses, most typically Natural Language Processing uses such as morphological analysis.

mpv (media player) Free and open-source media player software

mpv is free and open-source media player software based on MPlayer, mplayer2 and FFmpeg. It runs on several operating systems, including Unix-like operating systems and Microsoft Windows, along with having an Android port called mpv-android. It is cross-platform, running on ARM, PowerPC, x86/IA-32, x86-64, and MIPS architecture.

Google Test, often referred as gtest, is a specialized library utilized to conduct unit testing in the C++ programming language. This library operates under the terms of the BSD 3-clause license. Google Test is based on the xUnit architecture, a systematic methodology for assessing software components.

The following table compares notable software frameworks, libraries and computer programs for deep learning.

Mans Hulden is a researcher in computational linguistics currently holding the title of Assistant Professor at the Department of Linguistics of the University of Colorado Boulder. He teaches courses in computational linguistics, phonology, and phonetics, and is the creator and maintainer of the free and open source finite-state toolkit Foma.

<span class="mw-page-title-main">Keras</span> Neural network library

Keras is an open-source library that provides a Python interface for artificial neural networks. Keras was first independent software, then integrated into the TensorFlow library, and later supporting more. "Keras 3 is a full rewrite of Keras [and can be used] as a low-level cross-framework language to develop custom components such as layers, models, or metrics that can be used in native workflows in JAX, TensorFlow, or PyTorch — with one codebase." Keras 3 will be the default Keras version for TensorFlow 2.16 onwards, but Keras 2 can still be used.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), and OpenCL.

References

Lindén, Krister; Axelson, Erik; Drobac, Senka; Hardwick, Sam; Kuokkala, Juha; Niemi, Jyrki; Pirinen, Tommi; Silfverberg, Miikka (2013). "HFST - A System for Creating NLP Tools". In Mahlow, Cerstin; Piotrowski, Michael (eds.). Systems and Frameworks for Computational Morphology. Systems and Frameworks for Computational Morphology. Communications in Computer and Information Science. Vol. 380. Humboldt-Universität in Berlin: Springer. pp. 53–71.