LanguageTool

Last updated
LanguageTool
Developer(s) Daniel Naber and Marcin Miłkowski
Initial release15 August 2005;18 years ago (2005-08-15)
Stable release
6.2 [1]   OOjs UI icon edit-ltr-progressive.svg / 2 July 2023; 6 October 2023
Repository
Written in Java
Platform Java SE
Size
  • Desktop app: 156 MB [2]
  • n-gram data: 8.34 GB [3]
Type Grammar checker
License GNU LGPL v2.1+
Website languagetool.org   OOjs UI icon edit-ltr-progressive.svg

LanguageTool is a free and open-source grammar, style, and spell checker, and all its features are available for download. [4] The LanguageTool website connects to a proprietary sister project called LanguageTool Premium (formerly LanguageTool Plus), which provides improved error detection for English and German, as well as easier revision of longer texts, following the open-core model.

Contents

Overview

LanguageTool was started by Daniel Naber for his diploma thesis [5] in 2003 (then written in Python). It now supports 31 languages, each developed by volunteer maintainers, usually native speakers of each language. [6] Based on error detection patterns, rules are created and then tested for a given text. The core app itself is free and open-source and can be downloaded for offline use. Some languages use 'n-gram' data, [7] which is massive and requires considerable processing power and I/O speed, for some extra detections. As such, LanguageTool is also offered as a web service that does the processing of 'n-grams' data on the server-side. LanguageTool Premium also uses n-grams as part of its freemium business model.

LanguageTool web service can be used via a web interface in a web browser, or via a specialized client-side plug-ins for Microsoft Office, LibreOffice, Apache OpenOffice, Vim, Emacs, Firefox, Thunderbird, and Google Chrome.

LanguageTool does not check a sentence for grammatical correctness, but whether it contains typical errors. Therefore, it is easy to invent ungrammatical sentences that LanguageTool will still accept. Error detection succeeds with a variety of rules based on XML or written in Java. [8] XML-based rules can be created using an online form. [9]

More recent developments rely on large n-gram libraries that offer suggestions for improving misspellings with the help of artificial neural networks. [10]

See also

Related Research Articles

<span class="mw-page-title-main">AbiWord</span> Free software word processor

AbiWord is a free and open-source word processor. It is written in C++ and since version 3 it is based on GTK+ 3. The name "AbiWord" is derived from the root of the Spanish word "abierto", meaning "open".

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

A web service (WS) is either:

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

The Organization for the Advancement of Structured Information Standards is a nonprofit consortium that works on the development, convergence, and adoption of projects - both open standards and open source - for cybersecurity, blockchain, Internet of things (IoT), emergency management, cloud computing, legal data exchange, energy, content technologies, and other areas.

<span class="mw-page-title-main">Bluefish (software)</span> Text editor

Bluefish is a free and open-source software advanced text editor with a variety of tools for programming and website development. It supports coding languages including HTML, XHTML, CSS, XML, PHP, C, C++, JavaScript, Java, Go, Vala, Ada, D, SQL, Perl, ColdFusion, JSP, Python, Ruby, and shell. It is available for many platforms, including Linux, macOS and Windows, and can be used via integration with GNOME or run as a stand-alone application. Designed as a compromise between plain text editors and full programming IDEs, Bluefish is lightweight, fast and easy to learn, while providing many IDE features. It has been translated into 17 languages.

<span class="mw-page-title-main">Damn Small Linux</span> Lightweight, desktop-oriented Linux distribution

Damn Small Linux (DSL) is a computer operating system for the x86 family of personal computers. It is free and open-source software under the terms of the GNU GPL and other free and open-source licenses. It was designed to run graphical user interface applications on older PC hardware, for example, machines with 486 and early Pentium microprocessors and very little random-access memory (RAM). DSL is a live CD with a size of 50 megabytes (MB). What originally began as an experiment to see how much software could fit in 50 MB eventually became a full Linux distribution. It can be installed on storage media with small capacities, like bootable business cards, USB flash drives, various memory cards, and Zip drives.

<span class="mw-page-title-main">Nmap</span> Network scanner

Nmap is a network scanner created by Gordon Lyon. Nmap is used to discover hosts and services on a computer network by sending packets and analyzing the responses.

<span class="mw-page-title-main">Inkscape</span> Free open-source vector graphics editor

Inkscape is a free and open-source vector graphics editor for traditional Unix-compatible systems such as GNU/Linux, BSD derivatives and Illumos, as well as Windows and macOS. It offers a rich set of features and is widely used for both artistic and technical illustrations such as cartoons, clip art, logos, typography, diagramming and flowcharting. It uses vector graphics to allow for sharp printouts and renderings at unlimited resolution and is not bound to a fixed number of pixels like raster graphics. Inkscape uses the standardized Scalable Vector Graphics (SVG) file format as its main format, which is supported by many other applications including web browsers. It can import and export various other file formats, including SVG, AI, EPS, PDF, PS and PNG.

<span class="mw-page-title-main">Spell checker</span> Software to help correct spelling errors

In software, a spell checker is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic dictionary, or search engine.

jEdit Cross platform text editor

jEdit is a free software text editor available under GPL-2.0-or-later. It is written in Java and runs on any operating system with Java support, including BSD, Linux, macOS and Windows.

<span class="mw-page-title-main">Grammar checker</span> Computer program that verifies written text for grammatical correctness

A grammar checker, in computing terms, is a program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a word processor, but are also available as a stand-alone application that can be activated from within programs that work with editable text.

<span class="mw-page-title-main">Softcatalà</span>

Softcatalà is a non-profit association that promotes the use of the Catalan language on computing, Internet and new technologies. This association consists of computer specialists, philologists, translators, students and all kind of volunteers that work in the field of translating software into Catalan, in order to preserve this language in the English-controlled software environment. They also offer several linguistic tools to help users improve their language knowledge.

<span class="mw-page-title-main">Metalink</span> File format that describes one or more computer files available for download

Metalink is an extensible metadata file format that describes one or more computer files available for download. It specifies files appropriate for the user's language and operating system; facilitates file verification and recovery from data corruption; and lists alternate download sources.

A foreign language writing aid is a computer program or any other instrument that assists a non-native language user in writing decently in their target language. Assistive operations can be classified into two categories: on-the-fly prompts and post-writing checks. Assisted aspects of writing include: lexical, syntactic, lexical semantic and idiomatic expression transfer, etc. Different types of foreign language writing aids include automated proofreading applications, text corpora, dictionaries, translation aids and orthography aids.

<span class="mw-page-title-main">Go-oo</span>

Go-oo is a discontinued free office suite which started as a set of patches for OpenOffice.org, then later became an independent fork of OpenOffice.org with a number of enhancements, sponsored by Novell.

<span class="mw-page-title-main">LibreOffice Writer</span> Open-source word processor

LibreOffice Writer is the free and open-source word processor and desktop publishing component of the LibreOffice software package and is a fork of OpenOffice.org Writer. Writer is a word processor similar to Microsoft Word and Corel's WordPerfect with many similar features, and file format compatibility.

<span class="mw-page-title-main">Ginger Software</span> American-Israeli software startup

Ginger Software is an American and Israeli start-up specialized in natural language processing and AI. The main products are tools aiming to improve written communications, develop English speaking skills and boost productivity. The company was founded in 2008 by Yael Karov and Avner Zangvil. Ginger Software uses the context of complete sentences to suggest corrections. In December 2011, Ginger Software was one of nine projects approved by the Board of Governors of the Israel-U.S. Binational Industrial Research and Development Foundation for a funding of $8.1 million. The company also raised $3 million from private Israeli and US investors in 2009.

References

  1. "Release 6.2". 2 July 2023. Retrieved 8 July 2023.
  2. "Index of /download/". languagetool.org.
  3. "Index of /download/ngram-data/". languagetool.org.
  4. "LanguageTool - Spell and Grammar Checker". LanguageTool.
  5. Daniel Naber. "A Rule-Based Style and Grammar Checker" (PDF). Daniel Naber.de. Retrieved 30 June 2018.
  6. "Supported languages". 28 December 2016. Retrieved 29 December 2016.
  7. "N-Gram Data Download Page". languagetool.org. 2019-03-30. Retrieved 2019-03-30.
  8. "Linux Administration", Pro Oracle Database 10g RAC on Linux, Berkeley, CA: Apress, pp. 385–400, 2006, doi:10.1007/978-1-4302-0214-1_15, ISBN   978-1-59059-524-4 , retrieved 2022-02-23
  9. "Create a new LanguageTool rule". community.languagetool.org. Retrieved 2023-10-26.
  10. SKILL 2018 : Fachwissenschaftlicher Informatik-Kongress, Studierendenkonferenz Informatik, 26.-27. September 2018, Berlin. Gesellschaft für Informatik. [Bonn]. 2018. ISBN   978-3-88579-448-6. OCLC   1066024545.{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: others (link)