Grammar checker

Last updated
AbiWord checks English grammar using link grammar. Abiword grammar.jpg
AbiWord checks English grammar using link grammar.

A grammar checker, in computing terms, is a program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a word processor, but are also available as a stand-alone application that can be activated from within programs that work with editable text.

Contents

The implementation of a grammar checker makes use of natural language processing. [1] [2]

History

The earliest "grammar checkers" were programs that checked for punctuation and style inconsistencies, rather than a complete range of possible grammatical errors. The first system was called Writer's Workbench, and was a set of writing tools included with Unix systems as far back as the 1970s. [3] [4] The whole Writer's Workbench package included several separate tools to check for various writing problems. The "diction" tool checked for wordy, trite, clichéd or misused phrases in a text. The tool would output a list of questionable phrases, and provide suggestions for improving the writing. The "style" tool analyzed the writing style of a given text. It performed a number of readability tests on the text and output the results, and gave some statistical information about the sentences of the text.

Aspen Software of Albuquerque, New Mexico released the earliest version of a diction and style checker for personal computers, Grammatik , in 1981. Grammatik was first available for a Radio Shack - TRS-80, and soon had versions for CP/M and the IBM PC. Reference Software International of San Francisco, California, acquired Grammatik in 1985. Development of Grammatik continued, and it became an actual grammar checker that could detect writing errors beyond simple style checking.

Other early diction and style checking programs included Punctuation & Style, Correct Grammar, RightWriter and PowerEdit. [5] While all the earliest programs started out as simple diction and style checkers, all eventually added various levels of language processing, and developed some level of true grammar checking capability.

Until 1992, grammar checkers were sold as add-on programs. There were a large number of different word processing programs available at that time, with WordPerfect and Microsoft Word the top two in market share. In 1992, Microsoft decided to add grammar checking as a feature of Word, and licensed CorrecText, a grammar checker from Houghton Mifflin that had not yet been marketed as a standalone product. WordPerfect answered Microsoft's move by acquiring Reference Software, and the direct descendant of Grammatik is still included with WordPerfect.

As of 2019, grammar checkers are built into systems like Google Docs and Sapling.ai, [6] browser extensions like Grammarly and Qordoba, desktop applications like Ginger, free and open-source software like LanguageTool, [7] and text editor plugins like those available from WebSpellChecker Software.

Technical issues

The earliest writing style programs checked for wordy, trite, clichéd, or misused phrases in a text. This process was based on simple pattern matching. The heart of the program was a list of many hundreds or thousands of phrases that are considered poor writing by many experts. The list of questionable phrases included alternative wording for each phrase. The checking program would simply break text into sentences, check for any matches in the phrase dictionary, flag suspect phrases and show an alternative. These programs could also perform some mechanical checks. For example, they would typically flag doubled words, doubled punctuation, some capitalization errors, and other simple mechanical mistakes.

True grammar checking is more complex. While a programming language has a very specific syntax and grammar, this is not so for natural languages. One can write a somewhat complete formal grammar for a natural language, but there are usually so many exceptions in real usage that a formal grammar is of minimal help in writing a grammar checker. One of the most important parts of a natural language grammar checker is a dictionary of all the words in the language, along with the part of speech of each word. The fact that a natural word may be used as any one of several parts of speech (such as "free" being used as an adjective, adverb, noun, or verb) greatly increases the complexity of any grammar checker.

A grammar checker will find each sentence in a text, look up each word in the dictionary, and then attempt to parse the sentence into a form that matches a grammar. Using various rules, the program can then detect various errors, such as agreement in tense, number, word order, and so on. It is also possible to detect some stylistic problems with the text. For example, some popular style guides such as The Elements of Style deprecate excessive use of the passive voice. Grammar checkers may attempt to identify passive sentences and suggest an active-voice alternative.

The software elements required for grammar checking are closely related to some of the development issues that need to be addressed for speech recognition software. In voice recognition, parsing can be used to help predict which word is most likely intended, based on part of speech and position in the sentence. In grammar checking, the parsing is used to detect words that fail to follow accepted grammar usage.

Recently,[ when? ] research has focused on developing algorithms which can recognize grammar errors based on the context of the surrounding words.[ clarification needed ]

Criticism

Grammar checkers are considered as a type of foreign language writing aid which non-native speakers can use to proofread their writings as such programs endeavor to identify syntactical errors. [8] However, as with other computerized writing aids such as spell checkers, popular grammar checkers are often criticized when they fail to spot errors and incorrectly flag correct text as erroneous. The linguist Geoffrey K. Pullum argued in 2007 that they were generally so inaccurate as to do more harm than good: "for the most part, accepting the advice of a computer grammar checker on your prose will make it much worse, sometimes hilariously incoherent." [9]

See also

Related Research Articles

Lint is the computer science term for a static code analysis tool used to flag programming errors, bugs, stylistic errors and suspicious constructs. The term originates from a Unix utility that examined C language source code. A program which performs this function is also known as a "linter".

Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in Large language models (LLMs), but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.

Proofreading is an iterative process of comparing galley proofs against the original manuscripts or graphic artworks to identify transcription errors in the typesetting process. In the past, proofreaders would place corrections or proofreading marks along the margins. In modern publishing, material is generally provided in electronic form, traditional typesetting is no longer used and thus this kind of transcription no longer occurs.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

<span class="mw-page-title-main">Copy editing</span> Improving the formatting, style, and accuracy of text

Copy editing is the process of revising written material ("copy") to improve quality and readability, as well as ensuring that a text is free of errors in grammar, style and accuracy. The Chicago Manual of Style states that manuscript editing encompasses "simple mechanical corrections through sentence-level interventions to substantial remedial work on literary style and clarity, disorganized passages, baggy prose, muddled tables and figures, and the like ". In the context of print publication, copy editing is done before typesetting and again before proofreading. Outside traditional book and journal publishing, the term "copy editing" is used more broadly, and is sometimes referred to as proofreading; the term sometimes encompasses additional tasks.

<span class="mw-page-title-main">Spell checker</span> Software to help correct spelling errors

In software, a spell checker is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic dictionary, or search engine.

Computer-aided translation (CAT), also referred to as computer-assisted translation or computer-aided human translation (CAHT), is the use of software, also known as a translator, to assist a human translator in the translation process. The translation is created by a human, and certain aspects of the process are facilitated by software; this is in contrast with machine translation (MT), in which the translation is created by a computer, optionally with some human intervention.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.

Constraint grammar (CG) is a methodological paradigm for natural language processing (NLP). Linguist-written, context-dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation, inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally or globally. Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated, or blocked by interfering words or tags. Typical CGs consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing a high degree of robustness.

A foreign language writing aid is a computer program or any other instrument that assists a non-native language user in writing decently in their target language. Assistive operations can be classified into two categories: on-the-fly prompts and post-writing checks. Assisted aspects of writing include: lexical, syntactic, lexical semantic and idiomatic expression transfer, etc. Different types of foreign language writing aids include automated proofreading applications, text corpora, dictionaries, translation aids and orthography aids.

Grammatik was the first grammar checking program developed for home computer systems. Aspen Software of Albuquerque, NM, released the earliest version of this diction and style checker for personal computers. It was first released no later than 1981, and was inspired by the Writer's Workbench.

<span class="mw-page-title-main">Ginger Software</span> American-Israeli software startup

Ginger Software is an American and Israeli start-up specialized in natural language processing and AI. The main products are tools aiming to improve written communications, develop English speaking skills and boost productivity. The company was founded in 2008 by Yael Karov and Avner Zangvil. Ginger Software uses the context of complete sentences to suggest corrections. In December 2011, Ginger Software was one of nine projects approved by the Board of Governors of the Israel-U.S. Binational Industrial Research and Development Foundation for a funding of $8.1 million. The company also raised $3 million from private Israeli and US investors in 2009.

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">LanguageTool</span> Free and open-source spell and grammar checker

LanguageTool is a free and open-source grammar, style, and spell checker, and all its features are available for download. The LanguageTool website connects to a proprietary sister project called LanguageTool Premium, which provides improved error detection for English and German, as well as easier revision of longer texts, following the open-core model.

The Writer's Workbench (wwb) is a grammar checker created by Lorinda Cherry and Nina Macdonald of Bell Labs. It is perhaps the earliest grammar checker to receive wide usage on Unix systems.


Microsoft Editor is a closed source AI-powered writing assistant available for Word, Outlook, and as a Chromium browser extension part of Office 365. It includes the essentials in a writing assistant, such as a grammar and spell checker. Microsoft provides a basic version of Editor for free but users need to have a Microsoft account.

Reference Software International, Inc. (RSI), was an American software developer active from 1985 to 1993 and based in Albuquerque, New Mexico, and San Francisco, California. The company released several productivity and reference software packages during their lifespan, including the highly popular Grammatik grammar checker, for IBM PCs and compatibles running DOS. The company was acquired by WordPerfect Corporation in 1993.

References

  1. Vikrant Bhateja; João Manuel R.S. Tavares; B. Padmaja Rani; V. Kamakshi Prasad; K. Srujan Raju (23 July 2018). Proceedings of the Second International Conference on Computational Intelligence and Informatics: ICCII 2017. Springer. ISBN   978-981-10-8228-3.
  2. Robert Dale; Hermann Moisl; Harold Somers (25 July 2000). Handbook of Natural Language Processing. CRC Press. ISBN   978-0-8247-9000-4.
  3. "Ideas - O'Reilly Media". www.linuxdevcenter.com.
  4. "The Linux Cookbook: Tips and Techniques for Everyday Use - Grammar and Reference". dsl.org.
  5. Inc, InfoWorld Media Group (28 October 1991). InfoWorld. InfoWorld Media Group, Inc. p.  68 via Internet Archive.{{cite book}}: |last= has generic name (help)
  6. "Sapling | AI Writing Assistant for Customer-Facing Teams | 60% More Suggestions | Try for Free". sapling.ai.
  7. "How Google Docs grammar check compares to its alternatives". TechRepublic. 4 April 2019.
  8. Ramírez Bustamante, Flora; Sánchez León, Fernando (5 August 1996). "GramCheck: A grammar and style checker" (PDF). Coling '96: 175–181. arXiv: cmp-lg/9607001 . Bibcode:1996cmp.lg....7001R. doi:10.3115/992628.992661. S2CID   12829285.{{cite journal}}: Cite journal requires |journal= (help)
  9. Geoffrey K. Pullum (October 26, 2007). "Monkeys will check your grammar". Language Log. Retrieved 8 March 2010.