Thesaurus

Last updated
Historical Thesaurus of the Oxford English Dictionary, two-volume set Historical Thesaurus.jpg
Historical Thesaurus of the Oxford English Dictionary , two-volume set

In general usage, a thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which provides definitions for words, and generally lists them in alphabetical order. The main purpose of such reference works is for users "to find the word, or words, by which [an] idea may be most fitly and aptly expressed," quoting Peter Mark Roget, author of Roget's Thesaurus . [1]

Reference work Publication to which one can refer for confirmed facts

A reference work is a book or periodical to which one can refer for information. The information is intended to be found quickly when needed. Reference works are usually referred to for particular pieces of information, rather than read beginning to end. The writing style used in these works is informative; the authors avoid use of the first person, and emphasize facts. Many reference works are compiled by a team of contributors whose work is coordinated by one or more editors rather than by an individual author. Indices are commonly provided in many types of reference work. Updated editions are usually published as needed, in some cases annually. Reference works include dictionaries, thesauruses, encyclopedias, almanacs, bibliographies, and catalogs. Many reference works are available in electronic form and can be obtained as application software, CD-ROMs, DVDs, or online through the Internet.

Dictionary collection of words and their meanings

A dictionary, sometimes known as a wordbook, is a collection of words in one or more specific languages, often arranged alphabetically, which may include information on definitions, usage, etymologies, pronunciations, translation, etc. or a book of words in one language with their equivalents in another, sometimes known as a lexicon. It is a lexicographical reference that shows inter-relationships among the data.

Peter Mark Roget British physician, philologist

Peter Mark Roget was a British physician, natural theologian and lexicographer. He is best known for publishing, in 1852, the Thesaurus of English Words and Phrases, a classified collection of related words.

Contents

Although including synonyms, a thesaurus should not be taken as a complete list of all the synonyms for a particular word. The entries are also designed for drawing distinctions between similar words and assisting in choosing exactly the right word. Unlike a dictionary, a thesaurus entry does not give the definition of words.

In library science and information science, thesauri have been widely used to specify domain models. Recently, thesauri have been implemented with Simple Knowledge Organization System (SKOS). [2]

Library science is an interdisciplinary or multidisciplinary field that applies the practices, perspectives, and tools of management, information technology, education, and other areas to libraries; the collection, organization, preservation, and dissemination of information resources; and the political economy of information. Martin Schrettinger, a Bavarian librarian, coined the discipline within his work (1808–1828) Versuch eines vollständigen Lehrbuchs der Bibliothek-Wissenschaft oder Anleitung zur vollkommenen Geschäftsführung eines Bibliothekars. Rather than classifying information based on nature-oriented elements, as was previously done in his Bavarian library, Schrettinger organized books in alphabetical order. The first American school for library science was founded by Melvil Dewey at Columbia University in 1887.

Information science field primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval and dissemination of information

Information science is a field primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information. Practitioners within and outside the field study application and usage of knowledge in organizations along with the interaction between people, organizations, and any existing information systems with the aim of creating, replacing, improving, or understanding information systems. Historically, information science is associated with computer science, psychology, technology and intelligence agencies. However, information science also incorporates aspects of diverse fields such as archival science, cognitive science, commerce, law, linguistics, museology, management, mathematics, philosophy, public policy, and social sciences.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

Etymology

The word "thesaurus" is derived from 16th-century New Latin, in turn from Latin thēsaurus , which is the Latinisation of the Greek θησαυρός (thēsauros), "treasure, treasury, storehouse". [3] The word thēsauros is of uncertain etymology. Douglas Harper derives it from the root of the Greek verb τιθέναι tithenai, "to put, to place." [3] Robert Beekes rejected an Indo-European derivation and suggested a Pre-Greek suffix *-arwo-. [4]

New Latin Form of the Latin language between c. 1375 and c. 1900

New Latin was a revival in the use of Latin in original, scholarly, and scientific works between c. 1375 and c. 1900. Modern scholarly and technical nomenclature, such as in zoological and botanical taxonomy and international scientific vocabulary, draws extensively from New Latin vocabulary. In such use, New Latin is subject to new word formation. As a language for full expression in prose or poetry, however, it is often distinguished from its successor, Contemporary Latin.

Latin Indo-European language of the Italic family

Latin is a classical language belonging to the Italic branch of the Indo-European languages. The Latin alphabet is derived from the Etruscan and Greek alphabets and ultimately from the Phoenician alphabet.

Ancient Greek Version of the Greek language used from roughly the 9th century BC to the 6th century AD

The ancient Greek language includes the forms of Greek used in Ancient Greece and the ancient world from around the 9th century BC to the 6th century AD. It is often roughly divided into the Archaic period, Classical period, and Hellenistic period. It is antedated in the second millennium BC by Mycenaean Greek and succeeded by Medieval Greek.

From the 16th to the 19th centuries, the term "thesaurus" was applied to any dictionary or encyclopedia, as in the Thesaurus Linguae Latinae (Dictionary of the Latin Language, 1532), and the Thesaurus Linguae Graecae (Dictionary of the Greek Language, 1572). The meaning "collection of words arranged according to sense" is first attested in 1852 in Roget's title and thesaurer is attested in Middle English for "treasurer". [3]

Encyclopedia type of reference work

An encyclopedia or encyclopaedia is a reference work or compendium providing summaries of knowledge either from all branches or from a particular field or discipline. Encyclopedias are divided into articles or entries that are often arranged alphabetically by article name and sometimes by thematic categories. Encyclopedia entries are longer and more detailed than those in most dictionaries. Generally speaking, unlike dictionary entries—which focus on linguistic information about words, such as their etymology, meaning, pronunciation, use, and grammatical forms—encyclopedia articles focus on factual information concerning the subject named in the article's title.

<i>Thesaurus Linguae Latinae</i> organization

The Thesaurus Linguae Latinae is a monumental dictionary of Latin founded on historical principles. It encompasses the Latin language from the time of its origin to the time of Isidore of Seville.

The Thesaurus Linguae Graecae (TLG) is a research center at the University of California, Irvine. The TLG was founded in 1972 by Marianne McDonald with the goal to create a comprehensive digital collection of all surviving texts written in Greek from antiquity to the present era. Since 1972, the TLG has collected and digitized most surviving literary texts written in Greek from Homer to the fall of Constantinople in 1453 CE, and beyond. Theodore Brunner (1934-2007) directed the project from 1972 until his retirement from the University of California in 1998. Maria Pantelia, also a classics professor at UC Irvine, succeeded Theodore Brunner in 1998, and has been directing the TLG since. TLG's name is shared with its online database, the full title of which is Thesaurus Linguae Graecae: A Digital Library of Greek Literature.

History

Peter Mark Roget, author of the first modern thesaurus. Roget P M.jpg
Peter Mark Roget, author of the first modern thesaurus.

In antiquity, Philo of Byblos authored the first text that could now be called a thesaurus. In Sanskrit, the Amarakosha is a thesaurus in verse form, written in the 4th century. The Amarakosha mentions 18 prior works, but they have all been lost.[ citation needed ]

Philo of Byblos, also known as Herennius Philon, was an antiquarian writer of grammatical, lexical and historical works in Greek. He is chiefly known for his Phoenician history assembled from the writings of Sanchuniathon.

Sanskrit language of ancient Indian subcontinent

Sanskrit is a language of ancient India with a 3,500-year history. It is the primary liturgical language of Hinduism and the predominant language of most works of Hindu philosophy as well as some of the principal texts of Buddhism and Jainism. Sanskrit, in its variants and numerous dialects, was the lingua franca of ancient and medieval India. In the early 1st millennium CE, along with Buddhism and Hinduism, Sanskrit migrated to Southeast Asia, parts of East Asia and Central Asia, emerging as a language of high culture and of local ruling elites in these regions.

<i>Amarakosha</i> thesaurus of Sanskrit written by the ancient Indian scholar Amarasimha

The Amarakosha is the popular name for Namalinganushasanam a thesaurus in Sanskrit written by the ancient Indian scholar Amarasimha. It may be the oldest extant kosha. The author himself mentions 18 prior works, but they have all been lost. There have been more than 40 commentaries on the Amarakosha.

The first modern thesaurus was Roget's Thesaurus , first compiled in 1805 by Peter Mark Roget, and last published in 1852. Since its publication, it has never been out of print and is still a widely used work across the English-speaking world. [5] Entries in Roget's Thesaurus are listed conceptually rather than alphabetically. Roget described his thesaurus in the foreword to the first edition:

It is now nearly fifty years since I first projected a system of verbal classification similar to that on which the present work is founded. Conceiving that such a compilation might help to supply my own deficiencies, I had, in the year 1805, completed a classed catalogue of words on a small scale, but on the same principle, and nearly in the same form, as the Thesaurus now published. [6]

Thesauri have been used to perform automatic word-sense disambiguation [7] and text simplification for machine translation systems. [8]

See also

Related Research Articles

Roget's Thesaurus is a widely used English-language thesaurus, created in 1805 by Peter Mark Roget (1779–1869), British physician, natural theologian and lexicographer. It was released to the public on 29 April 1852. The original edition had 15,000 words, and each new edition has been larger. Roget was inspired by the Utilitarian teachings of Jeremy Bentham and wished to help "those who are painfully groping their way and struggling with the difficulties of composition [...] this work processes to hold out a helping hand". The Karpeles Library Museum houses the original manuscript in its collection.

WordNet computational lexicon of English

WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and are freely available for download from the WordNet website. Both the lexicographic data and the compiler for producing the distributed database are available.

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many or all domains of discourse.

In computational linguistics, word-sense disambiguation (WSD) is an open problem concerned with identifying which sense of a word is used in a sentence. The solution to this problem impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Synonym Words or phrases having the same meaning

A synonym is a word or phrase that means exactly or nearly the same as another word or phrase in the same language. Words that are synonyms are said to be synonymous, and the state of being a synonym is called synonymy. For example, the words begin, start, commence, and initiate are all synonyms of one another. Words are typically synonymous in one particular sense: for example, long and extended in the context long time or extended time are synonymous, but long cannot be used in the phrase extended family. Synonyms with exactly the same meaning share a seme or denotational sememe, whereas those with inexactly similar meanings share a broader denotational or connotational sememe and thus overlap within a semantic field. The former are sometimes called cognitive synonyms and the latter, near-synonyms, plesionyms or poecilonyms.

Glossary Alphabetical list of terms relevant to a certain field of study or action

A glossary, also known as a vocabulary or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms. Traditionally, a glossary appears at the end of a book and includes terms within that book that are either newly introduced, uncommon, or specialized. While glossaries are most commonly associated with non-fiction books, in some cases, fiction novels may come with a glossary for unfamiliar terms.

Wiktionary Free online dictionary that anyone can edit

Wiktionary is a multilingual, web-based project to create a free content dictionary of terms in all natural languages and a number of artificial languages. These entries may contain definitions, pronunciation guides, inflections, usage examples, related terms, images for illustration, among other features. It is collaboratively edited via a wiki. Its name is a portmanteau of the words wiki and dictionary. It is available in 171 languages and in Simple English. Like its sister project Wikipedia, Wiktionary is run by the Wikimedia Foundation, and is written collaboratively by volunteers, dubbed "Wiktionarians". Its wiki software, MediaWiki, allows almost anyone with access to the website to create and edit entries.

Gloss (annotation) Brief marginal notation of the meaning of a word or wording in a text

A gloss is a brief notation, especially a marginal one or an interlinear one, of the meaning of a word or wording in a text. It may be in the language of the text, or in the reader's language if that is different.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

The semantic spectrum is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.

This article lists some attested vocabulary of Vulgar Latin, which developed from standard Latin into all the various Romance languages. Apart from attested, typically formal vocabulary in Standard Latin, the distinctive vocabulary of Vulgar Latin came from several sources. Much of the vocabulary came through the influence of substrate languages associated with peoples either conquered by, trading with or invading the Roman Empire; many of whom came to speak forms of Latin. Other vocabulary came from novel innovation; grammaticalized and productive lexical processes and innovations.

A conceptual dictionary is a dictionary that groups words by concept or semantic relation instead of arranging them in alphabetical order. Examples of conceptual dictionaries are picture dictionaries, thesauri, and visual dictionaries. Onelook.com and Diccionario Ideológico de la Lengua Española are specific online and print examples.

The New World of English Words, or, a General Dictionary is a dictionary compiled by Edward Phillips and first published in London in 1658. It was the first folio English dictionary.

ISO 25964 is the international standard for thesauri, published in two parts as follows:

ISO 25964 Information and documentation - Thesauri and interoperability with other vocabulariesPart 1: Thesauri for information retrieval [published August 2011]  Part 2: Interoperability with other vocabularies [published March 2013]

Knowledge Organization Systems (KOS), concept system or concept scheme is a generic term used in knowledge organization about authority files, classification schemes, thesauri, topic maps, ontologies etc.

BabelNet multilingual semantic network and encyclopedic dictionary

BabelNet is a multilingual lexicalized semantic network and ontology developed at the Sapienza University of Rome, at the Department of Computer Science Linguistic Computing Laboratory. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an "encyclopedic dictionary" that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions in many languages harvested from both WordNet and Wikipedia.

In the context of information retrieval, a thesaurus is a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.

A historical dictionary or dictionary on historical principles is a type of dictionary which deals not only with the present-day meanings of words but also the historical development of their forms and meanings. It may also describe the vocabulary of an earlier stage of a language's development without covering present-day usage at all. A historical dictionary is primarily of interest to scholars of language, but may also be used as a general dictionary or by those who are casually interested in understanding a word's development over time.

References

  1. Roget, Peter. 1852. Thesaurus of English Language Words and Phrases.
  2. Miles, Alistair; Bechhofer, Sean (2009). "SKOS simple knowledge organization system reference". W3C recommendation. 18: W3C.
  3. 1 2 3 "thesaurus". Online Etymology Dictionary .
  4. R. S. P. Beekes, Etymological Dictionary of Greek, Brill, 2009, p. 548.
  5. "Introduction - Oxford Scholarship". oxfordscholarship.com. doi:10.1093/acprof:oso/9780199254729.001.0001/acprof-9780199254729-chapter-1 . Retrieved 26 March 2018.
  6. Lloyd 1982, p. xix[ full citation needed ]
  7. Yarowsky, David. "Word-sense disambiguation using statistical models of Roget's categories trained on large corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992.
  8. Siddharthan, Advaith. "An architecture for a text simplification system." Language Engineering Conference, 2002. Proceedings. IEEE, 2002.