Concordance (publishing)

Last updated

A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context. Historically, concordances have been compiled only for works of special importance, such as the Vedas, [1] Bible, Qur'an or the works of Shakespeare, James Joyce or classical Latin and Greek authors, [2] because of the time, difficulty, and expense involved in creating a concordance in the pre-computer era.

Contents

Mordecai Nathan's Hebrew-Latin Concordance of the Bible Mordechai nathan hebrew latin concordance.jpg
Mordecai Nathan's Hebrew-Latin Concordance of the Bible

A concordance is more than an index, with additional material such as commentary, definitions and topical cross-indexing which makes producing one a labor-intensive process even when assisted by computers.

In the precomputing era, search technology was unavailable, and a concordance offered readers of long works such as the Bible something comparable to search results for every word that they would have been likely to search for. Today, the ability to combine the result of queries concerning multiple terms (such as searching for words near other words) has reduced interest in concordance publishing. In addition, mathematical techniques such as latent semantic indexing have been proposed as a means of automatically identifying linguistic information based on word context.

A bilingual concordance is a concordance based on aligned parallel text.

A topical concordance is a list of subjects that a book covers (usually The Bible), with the immediate context of the coverage of those subjects. Unlike a traditional concordance, the indexed word does not have to appear in the verse. The best-known topical concordance is Nave's Topical Bible.

The first Bible concordance was compiled for the Vulgate Bible by Hugh of St Cher (d.1262), who employed 500 friars to assist him. In 1448, Rabbi Mordecai Nathan completed a concordance to the Hebrew Bible. It took him ten years. A concordance to the Greek New Testament was published in 1546 by Sixt Birck, and the Septuagint was done a by Conrad Kircher in 1602. The first concordance to the English Bible was published in 1550 by Mr Marbeck. According to Cruden, it did not employ the verse numbers devised by Robert Stephens in 1545, but "the pretty large concordance" of Mr Cotton did. Then followed Cruden's Concordance and Strong's Concordance.

Use in linguistics

Concordances are frequently used in linguistics, when studying a text. For example:

Concordancing techniques are widely used in national text corpora such as American National Corpus (ANC), British National Corpus (BNC), and Corpus of Contemporary American English (COCA) available on-line. Stand-alone applications that employ concordancing techniques are known as concordancers [3] or more advanced corpus managers. Some of them have integrated part-of-speech taggers (POS taggers) and enable the user to create their own POS-annotated corpora to conduct various types of searches adopted in corpus linguistics. [4]

Inversion

The reconstruction of the text of some of the Dead Sea Scrolls involved a concordance.

Access to some of the scrolls was governed by a "secrecy rule" that allowed only the original International Team or their designates to view the original materials. After the death of Roland de Vaux in 1971, his successors repeatedly refused to even allow the publication of photographs to other scholars. This restriction was circumvented by Martin Abegg in 1991, who used a computer to "invert" a concordance of the missing documents made in the 1950s which had come into the hands of scholars outside of the International Team, to obtain an approximate reconstruction of the original text of 17 of the documents. [5] [6] This was soon followed by the release of the original text of the scrolls.

See also

Related Research Articles

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguistics to run quantitative analyses on linguistic concepts, otherwise harder to quantify.

<span class="mw-page-title-main">Key Word in Context</span> Common format for concordance lines

Key Word In Context (KWIC) is the most common format for concordance lines. The term KWIC was first coined by Hans Peter Luhn. The system was based on a concept called keyword in titles which was first proposed for Manchester libraries in 1864 by Andrea Crestadoro.

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

<i>Hapax legomenon</i> Word that only appears once in a given text or record

In corpus linguistics, a hapax legomenon is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "being said once".

<span class="mw-page-title-main">Parallel text</span> Text placed alongside its translation or translations

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.

Stop words are the words in a stop list which are filtered out before or after processing of natural language data (text) because they are insignificant. There is no single universal list of stop words used by all natural language processing tools, nor any agreed upon rules for identifying stop words, and indeed not all tools even use such a list. Therefore, any group of words can be chosen as the stop words for a given purpose. The "general trend in [information retrieval] systems over time has been from standard use of quite large stop lists to very small stop lists to no stop list whatsoever".

<span class="mw-page-title-main">Strong's Concordance</span> Bible concordance, constructed under the direction of James Strong

The Exhaustive Concordance of the Bible, generally known as Strong's Concordance, is a Bible concordance, an index of every word in the King James Version (KJV), constructed under the direction of James Strong. Strong first published his Concordance in 1890, while professor of exegetical theology at Drew Theological Seminary.

<span class="mw-page-title-main">Brown Corpus</span> Data set of American English in 1961

The Brown University Standard Corpus of Present-Day American English is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing 500 samples of English, totaling roughly one million words, compiled from works published in the United States in 1961.

In linguistics, the term lexis designates the complete set of all possible words in a language, or a particular subset of words that are grouped by some specific linguistic criteria. For example, the general term English lexis refers to all words of the English language, while more specific term English religious lexis refers to a particular subset within English lexis, encompassing only words that are semantically related to the religious sphere of life.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.

Data-driven learning (DDL) is an approach to foreign language learning. Whereas most language learning is guided by teachers and textbooks, data-driven learning treats language as data and students as researchers undertaking guided discovery tasks. Underpinning this pedagogical approach is the data - information - knowledge paradigm. It is informed by a pattern-based approach to grammar and vocabulary, and a lexicogrammatical approach to language in general. Thus the basic task in DDL is to identify patterns at all levels of language. From their findings, foreign language students can see how an aspect of language is typically used, which in turn informs how they can use it in their own speaking and writing. Learning how to frame language questions and use the resources to obtain data and interpret it is fundamental to learner autonomy. When students arrive at their own conclusions through such procedures, they use their higher order thinking skills and are creating knowledge.

Young's Analytical Concordance to the Bible is a Bible concordance to the King James Version compiled by Robert Young. First published in 1879, it contains "about 311,000 references subdivided under the Hebrew and Greek originals with the literal meaning and pronunciation of each."

The International Corpus of English(ICE) is a set of corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.

The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired professor of corpus linguistics at Brigham Young University (BYU).

<span class="mw-page-title-main">Bible concordance</span> Verbal index to the Bible

A Bible concordance is a concordance, or verbal index, to the Bible. A simple form lists Biblical words alphabetically, with indications to enable the inquirer to find the passages of the Bible where the words occur.

WordSmith Tools is a software package primarily for linguists, in particular for work in the field of corpus linguistics. It is a collection of modules for searching patterns in a language. The software handles many languages.

<span class="mw-page-title-main">Sketch Engine</span> Corpus manager and text analysis software

Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in 90+ languages.

A corpus manager is a tool for multilingual corpus analysis, which allows effective searching in corpora.

The Logos Complete Study Bible is a study Bible published in 1972 by Logos International. It is based upon The Cross-Reference Bible, published in 1910.

References

  1. Bloomfield, Maurice (1990). A Vedic Concordance. Motilal Banarsidass Publ. ISBN   81-208-0654-9.
  2. Wisbey, Roy (April 1962). "Concordance Making by Electronic Computer: Some Experiences with the Wiener Genesis". The Modern Language Review. Modern Humanities Research Association. 57 (2): 161–172. doi:10.2307/3720960. JSTOR   3720960.
  3. "Introduction to WordSmith". lexically.net. Retrieved 2021-01-20.
  4. "Linguistic Toolbox". Yatsko.zohosites.com. Retrieved 2019-08-26.
  5. Hawrysch, George (2002-08-04). "Dr. George Hawrysch's speech on concordance book launch". The Ukrainian Weekly, No. 31, Vol. LXX. Ukrainian National Association. Archived from the original on 2008-12-04. Retrieved 2008-06-19.
  6. Jillette, Penn. "You May Already be a "Computer Expert"". Archived from the original on 2008-03-03. Retrieved 2008-06-14.