Foma (software)

Last updated
foma
Developer(s) Måns Huldén
Initial release2010, 1314 years ago
Stable release
0.9.16alpha / December 13, 2011 (2011-12-13)
Repository
Written in C
Operating system Cross-platform
Type Finite State Toolkit
License Apache License 2.0
Website github.com/mhulden/foma/

Foma is a free and open source finite-state toolkit created and maintained by Mans Hulden. It includes a compiler, programming language, and C library for constructing finite-state automata and transducers (FST's) for various uses, most typically Natural Language Processing uses such as morphological analysis.

Contents

Foma can replace the proprietary Xerox Finite State Toolkit for compiling and running FST's written in the lexc and xfst formalisms. The speed is comparable with the Xerox tools for most lexicons, although Foma can be 3 or 4 times slower for very large lexicons (e.g. >100,000 words). [1] Foma is also one of the possible backends of the free and open source Helsinki Finite State Toolkit (where other backends provide support for further formalisms).

There are several FOSS morphologies written in lexc/xfst compatible with foma, e.g. for the Sámi, Cornish, Faroese, Finnish, Komi, Mari, Udmurt, Buriat, Greenlandic language and Iñupiaq languages. [2]

See also

Notes

  1. Hulden 2009.
  2. "Welcome to the Sámi language technology pages". Archived from the original on 2011-08-22. Retrieved 2011-08-22.

Related Research Articles

<span class="mw-page-title-main">Finite-state machine</span> Mathematical model of computation

A finite-state machine (FSM) or finite-state automaton, finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some inputs; the change from one state to another is called a transition. An FSM is defined by a list of its states, its initial state, and the inputs that trigger each transition. Finite-state machines are of two types—deterministic finite-state machines and non-deterministic finite-state machines. For any non-deterministic finite-state machine, an equivalent deterministic one can be constructed.

A lexicon is the vocabulary of a language or branch of knowledge. In linguistics, a lexicon is a language's inventory of lexemes. The word lexicon derives from Greek word λεξικόν, neuter of λεξικός meaning 'of or for words'.

The following outline is provided as an overview and topical guide to linguistics:

Foma may refer to:

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton (FSA) that maps between two sets of symbols. An FST is more general than an FSA. An FSA defines a formal language by defining a set of accepted strings, while an FST defines a relation between sets of strings.

<span class="mw-page-title-main">Fred Karlsson</span> Linguistics professor at the University of Helsinki (born 1946)

Fred Göran Karlsson is a professor emeritus of general linguistics at the University of Helsinki.

<span class="mw-page-title-main">Kimmo Koskenniemi</span>

Kimmo Matti Koskenniemi is the inventor of finite-state two-level models for computational phonology and morphology. He was a professor of Computational Linguistics at the University of Helsinki, Finland. In the early 1980s Koskenniemi's work became accessible by early adopters such as Lauri Karttunen, Ronald M. Kaplan and Martin Kay, first at the University of Texas Austin, later at the Xerox Palo Alto Research Center.

<span class="mw-page-title-main">Apertium</span> Open-source rule-based machine translation platform

Apertium is a free/open-source rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License.

Martin Kay was a computer scientist, known especially for his work in computational linguistics.

Constraint grammar (CG) is a methodological paradigm for natural language processing (NLP). Linguist-written, context-dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation, inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally or globally. Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated, or blocked by interfering words or tags. Typical CGs consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing a high degree of robustness.

Moses is a free software, statistical machine translation engine that can be used to train statistical models of text translation from a source language to a target language, developed by the University of Edinburgh. Moses then allows new source-language text to be decoded using these models to produce automatic translations in the target language. Training requires a parallel corpus of passages in the two languages, typically manually translated sentence pairs. Moses is released under the LGPL licence and available both as source code and binaries for Windows and Linux. Its development is primarily supported by the EuroMatrix project, with funding by the European Commission.

In the fields of computational linguistics and applied linguistics, a morphological dictionary is a linguistic resource that contains correspondences between surface form and lexical forms of words. Surface forms of words are those found in natural language text. The corresponding lexical form of a surface form is the lemma followed by grammatical information. In English give, gives, giving, gave and given are surface forms of the verb give. The lexical form would be "give", verb. There are two kinds of morphological dictionaries: morpheme-aligned dictionaries and full-form (non-aligned) dictionaries.

<span class="mw-page-title-main">FEniCS Project</span>

The FEniCS Project is a collection of free and open-source software components with the common goal to enable automated solution of differential equations. The components provide scientific computing tools for working with computational meshes, finite-element variational formulations of ordinary and partial differential equations, and numerical linear algebra.

Linguistics is the scientific study of language. Linguistics is based on a theoretical as well as a descriptive study of language and is also interlinked with the applied fields of language studies and language learning, which entails the study of specific languages. Before the 20th century, linguistics evolved in conjunction with literary study and did not employ scientific methods. Modern-day linguistics is considered a science because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language – i.e., the cognitive, the social, the cultural, the psychological, the environmental, the biological, the literary, the grammatical, the paleographical, and the structural.

Lauri Juhani Karttunen was an adjunct professor in linguistics at Stanford and an ACL Fellow. He died in 2022.

Mans Hulden is a researcher in computational linguistics currently holding the title of Assistant Professor at the Department of Linguistics of the University of Colorado Boulder. He teaches courses in computational linguistics, phonology, and phonetics, and is the creator and maintainer of the free and open source finite-state toolkit Foma.

<span class="mw-page-title-main">HFST</span>

Helsinki Finite-State Technology (HFST) is a computer programming library and set of utilities for natural language processing with finite-state automata and finite-state transducers. It is free and open-source software, released under a mix of the GNU General Public License version 3 (GPLv3) and the Apache License.

References