WordSmith (software)

Last updated
WordSmith Tools
Developer(s) Mike Scott, Oxford University Press
Initial release1996;26 years ago (1996)
Stable release
8.0.0.62 / December 19, 2021;11 months ago (2021-12-19) [1]
Operating system Windows
Type Concordancer, Corpus manager
License Proprietary
Website lexically.net/wordsmith/

WordSmith Tools is a software package primarily for linguists, in particular for work in the field of corpus linguistics. It is a collection of modules for searching patterns in a language. The software handles many languages.

Contents

Development and acquisition

The program suite was developed by the British linguist Mike Scott at the University of Liverpool and released as version 1.0 in 1996. It was based on MicroConcord co-developed by Mike Scott and Tim Johns, published by Oxford University Press in 1993. Versions 1.0 through 4.0 were sold exclusively by Oxford University Press, the current version 8.0 and previous versions are now also distributed by Lexical Analysis Software Limited. The software runs under Windows. WordSmith is a download-only product which is registered by entering a code costing 50 pounds sterling for a single user license. However, WordSmith 4.0 can now be downloaded and used free.

Functionality and applications

The core areas of the software package includes three modules:

Each of the modules offers a number of other features in relation to the text corpus or text being analysed. Thus, for example, collocation and dispersion plots are computed with a concordance search. In addition, there are a number of additional modules that are useful for the preparation, clean-up and format the text corpus. WordSmith Tools can be used in 80 different languages. WordSmith Tools is - along with several other software products similar in nature - an internationally popular program for the work based on corpus-linguistic methodology. It is used by investigators in assorted fields as can be seen in the list below of works using the software.

Papers which use WordSmith

Related Research Articles

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.

In linguistics, a corpus or text corpus is a language resource consisting of a large and structured set of texts. In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

<span class="mw-page-title-main">Parallel text</span> Text placed alongside its translation or translations

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.

Semantic prosody, also discourse prosody, describes the way in which certain seemingly neutral words can be perceived with positive or negative associations through frequent occurrences with particular collocations. Coined in analogy to linguistic prosody, popularised by Bill Louw.

The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities.

<span class="mw-page-title-main">Concordance (publishing)</span> List of words or terms in a published book

A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context. Concordances have been compiled only for works of special importance, such as the Vedas, Bible, Qur'an or the works of Shakespeare, James Joyce or classical Latin and Greek authors, because of the time, difficulty, and expense involved in creating a concordance in the pre-computer era.

John McHardy Sinclair was a Professor of Modern English Language at Birmingham University from 1965 to 2000. He pioneered work in corpus linguistics, discourse analysis, lexicography, and language teaching.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistic for analysis of corpora

Data-driven learning (DDL) is an approach to foreign language learning. Whereas most language learning is guided by teachers and textbooks, data-driven learning treats language as data and students as researchers undertaking guided discovery tasks. Underpinning this pedagogical approach is the data - information - knowledge paradigm. It is informed by a pattern-based approach to grammar and vocabulary, and a lexicogrammatical approach to language in general. Thus the basic task in DDL is to identify patterns at all levels of language. From their findings, foreign language students can see how an aspect of language is typically used, which in turn informs how they can use it in their own speaking and writing. Learning how to frame language questions and use the resources to obtain data and interpret it is fundamental to learner autonomy. When students arrive at their own conclusions through such procedures, they use their higher order thinking skills and are creating knowledge.

The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired professor of corpus linguistics at Brigham Young University (BYU).

Corpus-assisted discourse studies is related historically and methodologically to the discipline of corpus linguistics. The principal endeavor of corpus-assisted discourse studies is the investigation, and comparison of features of particular discourse types, integrating into the analysis the techniques and tools developed within corpus linguistics. These include the compilation of specialised corpora and analyses of word and word-cluster frequency lists, comparative keyword lists and, above all, concordances.

The Survey of English Usage was the first research centre in Europe to carry out research with corpora. The Survey is based in the Department of English Language and Literature at University College London.

Ultralingua is a single-click and drag-and-drop multilingual translation dictionary, thesaurus, and language reference utility. The full suite of Ultralingua language tools is available free online without the need for download and installation. As well as its online products, the developer offers premium downloadable language software with extended features and content for Macintosh and Windows computer platforms, smartphones, and other hand held devices.

Corpora in Translation Studies Gradually the translator’s workplace has changed over the last ten years. Personal computers now have the capacity to process information easier and quicker than ever before, and so today's computer could be considered an important or even essential tool in translation. However, problems arise in the use of computers in translation, as the computer is no substitute for traditional tools such as monolingual and bilingual dictionaries, terminologies and encyclopaedias on paper or in digital format and although we can easily access a large amount of information, we need to find the right and reliable information.

<span class="mw-page-title-main">Sketch Engine</span> Corpus manager and text analysis software

Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in 90+ languages.

The International Computer Archive of Modern and Medieval English (ICAME) is an international group of linguists and data scientists working in corpus linguistics to digitise English texts. The organisation was founded in Oslo, Norway in 1977 as the International Computer Archive of Modern English, before being renamed to its current title.

A corpus manager is a tool for multilingual corpus analysis, which allows effective searching in corpora.

<span class="mw-page-title-main">Adam Kilgarriff</span>

Adam Kilgarriff was a corpus linguist, lexicographer, and co-author of Sketch Engine.

<span class="mw-page-title-main">SkELL</span>

SkELL is a free corpus-based web tool that allows language learners and teachers find authentic sentences for specific target word(s). For any word or a phrase, SkELL displays a concordance that lists example sentences drawn from a special text corpus crawled from the World Wide Web, which has been cleaned of spam and includes only high-quality texts covering everyday, standard, formal, and professional language. There are versions of SkELL for English, Russian, German, Italian, Czech and Estonian.

References

  1. "Change log, WordSmith version 8.0" . Retrieved 28 November 2022.