Spell checker

Last updated

In software, a spell checker (or spelling checker or spell check) is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic dictionary, or search engine.

Contents

Eye have a spelling chequer,
It came with my Pea Sea.
It plane lee marks four my revue
Miss Steaks I can knot sea.

Eye strike the quays and type a whirred
And weight four it two say
Weather eye am write oar wrong
It tells me straight a weigh.

Eye ran this poem threw it,
Your shore real glad two no.
Its vary polished in its weigh.
My chequer tolled me sew.

A chequer is a bless thing,
It freeze yew lodes of thyme.
It helps me right all stiles of righting,
And aides me when eye rime.

Each frays come posed up on my screen
Eye trussed too bee a joule.
The chequer pours o'er every word
Two cheque sum spelling rule.

The original version of this poem was written by Jerrold H. Zar in 1992. An unsophisticated spell checker will find little or no fault with this poem because it checks words in isolation. A more sophisticated spell checker will make use of a language model to consider the context in which a word occurs.
Google Chrome spell checker in action for the above poem, the word "chequer" marked as unrecognized word Spell checker.jpg
Google Chrome spell checker in action for the above poem, the word "chequer" marked as unrecognized word

Design

A basic spell checker carries out the following processes:

It is unclear whether morphological analysisallowing for many forms of a word depending on its grammatical roleprovides a significant benefit for English, though its benefits for highly synthetic languages such as German, Hungarian, or Turkish are clear.

As an adjunct to these components, the program's user interface allows users to approve or reject replacements and modify the program's operation.

Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. [1] An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words. This approach usually requires a lot of effort to obtain sufficient statistical information. Key advantages include needing less runtime storage and the ability to correct errors in words that are not included in a dictionary. [2]

In some cases, spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the see also entries of encyclopedias.

Clustering algorithms have also been used for spell checking [3] combined with phonetic information. [4]

History

Pre-PC

In 1961, Les Earnest, who headed the research on this budding technology, saw it necessary to include the first spell checker that accessed a list of 10,000 acceptable words. [5] Ralph Gorin, a graduate student under Earnest at the time, created the first true spelling checker program written as an applications program (rather than research) for general English text: SPELL for the DEC PDP-10 at Stanford University's Artificial Intelligence Laboratory, in February 1971. [6] Gorin wrote SPELL in assembly language, for faster action; he made the first spelling corrector by searching the word list for plausible correct spellings that differ by a single letter or adjacent letter transpositions and presenting them to the user. Gorin made SPELL publicly accessible, as was done with most SAIL (Stanford Artificial Intelligence Laboratory) programs, and it soon spread around the world via the new ARPAnet, about ten years before personal computers came into general use. [7] SPELL, its algorithms and data structures inspired the Unix ispell program.

The first spell checkers were widely available on mainframe computers in the late 1970s. A group of six linguists from Georgetown University developed the first spell-check system for the IBM corporation. [8]

Henry Kučera invented one for the VAX machines of Digital Equipment Corp in 1981. [9]

Unix

The International Ispell program commonly used in Unix is based on R. E. Gorin's SPELL. It was converted to C by Pace Willisson at MIT. [10]

The GNU project has its spell checker GNU Aspell. Aspell's main improvement is that it can more accurately suggest correct alternatives for misspelled English words. [11]

Due to the inability of traditional spell checkers to check words in complex inflected languages, Hungarian László Németh developed Hunspell, a spell checker that supports agglutinative languages and complex compound words. Hunspell also uses Unicode in its dictionaries. [12] Hunspell replaced the previous MySpell in OpenOffice.org in version 2.0.2.

Enchant is another general spell checker, derived from AbiWord. Its goal is to combine programs supporting different languages such as Aspell, Hunspell, Nuspell, Hspell (Hebrew), Voikko (Finnish), Zemberek (Turkish) and AppleSpell under one interface. [13]

PCs

The first spell checkers for personal computers appeared in 1980, such as "WordCheck" for Commodore systems which was released in late 1980 in time for advertisements to go to print in January 1981. [14] Developers such as Maria Mariani [8] and Random House [15] rushed OEM packages or end-user products into the rapidly expanding software market. On the pre-Windows PCs, these spell checkers were standalone programs, many of which could be run in terminate-and-stay-resident mode from within word-processing packages on PCs with sufficient memory.

However, the market for standalone packages was short-lived, as by the mid-1980s developers of popular word-processing packages like WordStar and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to many European and eventually even Asian languages. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-agglutinative languages like Hungarian and Finnish. Although the size of the word-processing market in a country like Iceland might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many national markets as possible as part of their global marketing strategy.

When Apple developed "a system-wide spelling checker" for Mac OS X so that "the operating system took over spelling fixes," [16] it was a first: one "didn't have to maintain a separate spelling checker for each" program. [17] Mac OS X's spellcheck coverage includes virtually all bundled and third party applications.

Visual Tools'VT Speller, introduced in 1994, was "designed for developers of applications that support Windows." [18] [19] It came with a dictionary but had the ability to build and incorporate use of secondary dictionaries. [20]

Browsers

Web browsers such as Firefox and Google Chrome offer spell checking support, using Hunspell. Prior to using Hunspell, Firefox and Chrome Chrome used MySpell and GNU Aspell, respectively. [21]

Specialties

Some spell checkers have separate support for medical dictionaries to help prevent medical errors. [22] [23] [24]

Functionality

The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for typos but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms.

It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of corpus linguistics that the word baht is more frequently a misspelling of bath or bat than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced than if the spelling errors of the many more people who discuss baths were overlooked.

A screenshot of Enchant, the AbiWord spell checker English-language screenshot of Enchant, AbiWord's spell checker - The quick brown fox jumps over the lazy dog.png
A screenshot of Enchant, the AbiWord spell checker

The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as Oracle's short-lived CoAuthor and allowed a user to view the results after a document was processed and correct only the words that were known to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and Microsoft Word since Word 95.

Spell checkers became increasingly sophisticated; now capable of recognizing grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as homophone errors) and will flag neologisms and foreign words as misspellings. Nonetheless, spell checkers can be considered as a type of foreign language writing aid that non-native language learners can rely on to detect and correct their misspellings in the target language. [25]

Spell-checking for languages other than English

English is unusual in that most words used in formal writing have a single spelling that can be found in a typical dictionary, with the exception of some jargon and modified words. In many languages, words are often concatenated into new combinations of words. In German, compound nouns are frequently coined from other existing nouns. Some scripts do not clearly separate one word from another, requiring word-splitting algorithms. Each of these presents unique challenges to non-English language spell checkers.

Context-sensitive spell checkers

There has been research on developing algorithms that are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the context of the surrounding words. Not only does this allow words such as those in the poem above to be caught, but it mitigates the detrimental effect of enlarging dictionaries, allowing more words to be recognized. For example, baht in the same paragraph as Thai or Thailand would not be recognized as a misspelling of bath. The most common example of errors caught by such a system are homophone errors, such as the bold words in the following sentence:

Their coming toosea if itsreel.

The most successful algorithm to date is Andrew Golding and Dan Roth's "Winnow-based spelling correction algorithm", [26] published in 1999, which is able to recognize about 96% of context-sensitive spelling errors, in addition to ordinary non-word spelling errors. Context-sensitive spell checkers appeared in the now-defunct applications Microsoft Office 2007 [27] and Google Wave. [28]

Grammar checkers attempt to fix problems with grammar beyond spelling errors, including incorrect choice of words.

See also

Related Research Articles

Spelling is a set of conventions that regulate the way of using graphemes to represent a language in its written form. In other words, spelling is the rendering of speech sound (phoneme) into writing (grapheme). Spelling is one of the elements of orthography, and highly standardized spelling is a prescriptive element.

GNU Aspell, usually called just Aspell, is a free software spell checker designed to replace Ispell. It is the standard spell checker for the GNU operating system. It also compiles for other Unix-like operating systems and Windows. The main program is licensed under the GNU Lesser General Public License, the documentation under the GNU Free Documentation License. Dictionaries for it are available for about 70 languages. The primary maintainer is Kevin Atkinson.

Code completion is an autocompletion feature in many integrated development environments (IDEs) that speeds up the process of coding applications by fixing common mistakes and suggesting lines of code. This usually happens through popups while typing, querying parameters of functions, and query hints related to syntax errors. Modern code completion software typically uses generative artificial intelligence systems to predict lines of code. Code completion and related tools serve as documentation and disambiguation for variable names, functions, and methods, using static analysis.

<span class="mw-page-title-main">Autocomplete</span> Computing feature predicting ending to a word a user is typing

Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing. In Android and iOS smartphones, this is called predictive text. In graphical user interfaces, users can typically press the tab key to accept a suggestion or the down arrow key to accept one of several.

<span class="mw-page-title-main">Ispell</span> Spelling checker for Unix

Ispell is a spelling checker for Unix that supports most Western languages. It offers several interfaces, including a programmatic interface for use by editors such as Emacs. Unlike GNU Aspell, ispell will only suggest corrections that are based on a Damerau–Levenshtein distance of 1; it will not attempt to guess more distant corrections based on English pronunciation rules.

The purpose of Pspell was to provide a generic interface to the system spelling checking libraries. It was, and sometimes still is, used in computer programming such as C, and is licensed under the GNU Lesser General Public License.

MySpell is a spell checker that was formerly included with OOo Writer of the free OpenOffice.org office suite.

<span class="mw-page-title-main">Grammar checker</span> Computer program that verifies written text for grammatical correctness

A grammar checker, in computing terms, is a program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a word processor, but are also available as a stand-alone application that can be activated from within programs that work with editable text.

This article provides basic comparisons for notable text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles. This article may not be up-to-date or necessarily all-inclusive.

Hunspell is a spell checker and morphological analyser designed for languages with rich morphology and complex word compounding and character encoding, originally designed for the Hungarian language.

Spelling suggestion is a feature of many computer software applications used to suggest plausible replacements for words that are likely to have been misspelled.

Enchant is a free software project developed as part of the AbiWord word processor with the aim of unifying access to the various existing spell-checker software. Enchant wraps a common set of functionality present in a variety of existing products/libraries, and exposes a stable API/ABI for doing so. Where a library doesn't implement some specific functionality, Enchant will emulate it.

A foreign language writing aid is a computer program or any other instrument that assists a non-native language user in writing decently in their target language. Assistive operations can be classified into two categories: on-the-fly prompts and post-writing checks. Assisted aspects of writing include: lexical, syntactic, lexical semantic and idiomatic expression transfer, etc. Different types of foreign language writing aids include automated proofreading applications, text corpora, dictionaries, translation aids and orthography aids.

The Cupertino effect occurs when a spell checker erroneously replaces or flags correctly spelled words that are not in its dictionary, including words that may be recognized as standard or permissible by other dictionaries.

OpenTaal is a Dutch foundation which provides free Dutch language files to be used in open-source software spell checking, hyphenation, thesaurus and grammar checking.

Virastyar is a Persian add-in for Microsoft Word that performs Persian spell checking, character standardization, Pinglish transliteration, punctuation correction and calendar conversion. It can conjugate approximately 46,000 simple verb tense, and use inflection and morphological rules to recognize possible extensions of a word. It covers approximately 2,800 non-verbal inflections for nouns, adjectives, adverbs, prepositions, numerals, classifiers, and pronouns.

spell is the standard English language spell checker for Unix, Plan 9, and Unix-like operating systems.

<span class="mw-page-title-main">Ginger Software</span> American-Israeli software startup

Ginger Software is an American and Israeli start-up specialized in natural language processing and AI. The main products are tools aiming to improve written communications, develop English speaking skills and boost productivity. The company was founded in 2008 by Yael Karov and Avner Zangvil. Ginger Software uses the context of complete sentences to suggest corrections. In December 2011, Ginger Software was one of nine projects approved by the Board of Governors of the Israel-U.S. Binational Industrial Research and Development Foundation for a funding of $8.1 million. The company also raised $3 million from private Israeli and US investors in 2009.

The Writer's Workbench (wwb) is a grammar checker created by Lorinda Cherry and Nina Macdonald of Bell Labs. It is perhaps the earliest grammar checker to receive wide usage on Unix systems.

Spell Catcher, originally known as Thunder!, is a stand-alone spell checker for Atari ST, Macintosh and Microsoft Windows systems. It was published continually from 1985 until the untimely 2012 death of the primary developer, Evan Gross. Its original name refers to its lightning-fast speed, which set it apart from other spell checkers on the platform like Spellswell.

References

  1. Perner, Petra (2010-07-05). Advances in Data Mining: Applications and Theoretical Aspects: 10th Industrial Conference, ICDM 2010, Berlin, Germany, July 12-14, 2010. Proceedings. Springer Science & Business Media. ISBN   978-3-642-14399-1.
  2. U.S. Patent 6618697, Method for rule-based correction of spelling and grammar errors
  3. de Amorim, R.C.; Zampieri, M. (2013) Effective Spell Checking Methods Using Clustering Algorithms. Archived 2017-08-17 at the Wayback Machine Proceedings of Recent Advances in Natural Language Processing (RANLP2013). Hissar, Bulgaria. p. 172-178.
  4. Zampieri, M.; de Amorim, R.C. (2014) Between Sound and Spelling: Combining Phonetics and Clustering Algorithms to Improve Target Word Recovery. Proceedings of the 9th International Conference on Natural Language Processing (PolTAL). Lecture Notes in Computer Science (LNCS). Springer. p. 438-449.
  5. Earnest, Les. "The First Three Spelling Checkers" (PDF). Stanford University. Archived from the original (PDF) on 22 October 2012. Retrieved 10 October 2011.
  6. Peterson, James (December 1980). Computer Programs for Detecting and Correcting Spelling Errors (PDF). Retrieved 2011-02-18.
  7. Earnest, Les. Visible Legacies for Y3K (PDF). Archived from the original (PDF) on 2011-07-20. Retrieved 2011-02-18.
  8. 1 2 "Georgetown U Faculty & Staff: The Center for Language, Education & Development". Archived from the original on 2009-02-05. Retrieved 2008-12-18., citation: "Maria Mariani... was one of a group of six linguists from Georgetown University who developed the first spell-check system for the IBM corporation."
  9. Harvey, Charlotte Bruce (May–June 2010). "Teaching Computers to Spell (obituary for Henry Kučera)". Brown Alumni Magazine. p. 79.
  10. "International Ispell". www.cs.hmc.edu. Retrieved 2023-02-19.
  11. "GNU Aspell". aspell.net. Retrieved 2023-02-19.
  12. "Hunspell: About". hunspell.github.io. Retrieved 2023-02-19.
  13. AbiWord/enchant, AbiWord, 2023-02-13, retrieved 2023-02-19
  14. Advertisement (January 1981). "Micro Computer Industries, Ltd" (PDF). Compute! Magazine, Issue 8, Vol. 3, No. 1. p. 119.
  15. Advertisement (November 1982). "The Spelling Bee Is Over". PC Magazine . p. 165. Retrieved 21 October 2013.
  16. David Pogue (2009). Mac OS X Snow Leopard: The Missing Manual.
  17. David Pogue (2015). Switching to the Mac: The Missing Manual. "O'Reilly Media, Inc.". ISBN   9781491948125.
  18. "VisualTools VT-Speller". Computerworld . February 21, 1994. p. 68.
  19. "Browse September 27, 1993". VT-SPELLER
  20. Peter G. Aitken (November 8, 1994). "Spell-Checking for your Apps". PC Magazine . p. 299.
  21. "Aspell and Hunspell: A Tale of Two Spell Checkers". battlepenguin.com.
  22. "Medical Spell Checker for Firefox and Thunderbird". e-MedTools. 2017. Archived from the original on 2019-05-04. Retrieved 2018-08-29.
  23. Quathamer, Dr. Tobias (2016). "German medical dictionary words". Dr. Tobias Quathamer. Retrieved 2018-08-29.
  24. Friedman, Richard A.; D, M (2003). "CASES; Do Spelling and Penmanship Count? In Medicine, You Bet". The New York Times. Retrieved 2018-08-29.
  25. Banks, T. (2008). Foreign Language Learning Difficulties and Teaching Strategies. (pp. 29). Master's Thesis, Dominican University of California. Retrieved 19 March 2012.
  26. Golding, Andrew R.; Roth, Dan (1999). "Journal Article". Machine Learning. 34. SpringerLink: 107–130. doi:10.1023/A:1007545901558. S2CID   12283016.
  27. Walt Mossberg (4 January 2007). "Review". Wall Street Journal. Retrieved 24 September 2010.
  28. "Google Operating System". googlesystem.blogspot.com. 29 May 2009. Retrieved 25 September 2010. "Google's Context-Sensitive Spell Checker". May 29, 2009. Retrieved 25 September 2010.