Transcription (linguistics)

Last updated

Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances (speech or sign language ) or preexisting text in another writing system.


Transcription should not be confused with translation, which means representing the meaning of a source-language text in a target language (e.g. Los Angeles into The Angels) or with transliteration which means representing the spelling of a text from one script to another.

In the academic discipline of linguistics, transcription is an essential part of the methodologies of (among others) phonetics, conversation analysis, dialectology, and sociolinguistics. It also plays an important role for several subfields of speech technology. Common examples for transcriptions outside academia are the proceedings of a court hearing such as a criminal trial (by a court reporter) or a physician's recorded voice notes (medical transcription). This article focuses on transcription in linguistics.

Phonetic vs. orthographic transcription

Broadly speaking, there are two possible approaches to linguistic transcription. Phonetic transcription focuses on phonetic and phonological properties of spoken language. Systems for phonetic transcription thus furnish rules for mapping individual sounds or phones to written symbols. Systems for orthographic transcription, by contrast, consist of rules for mapping spoken words onto written forms as prescribed by the orthography of a given language. Phonetic transcription operates with specially defined character sets, usually the International Phonetic Alphabet.

Which type of transcription is chosen depends mostly on the research interests pursued. Since phonetic transcription strictly foregrounds the phonetic nature of language, it is most useful for phonetic or phonological analyses. Orthographic transcription, on the other hand, has a morphological and a lexical component alongside the phonetic component (which aspect is represented to which degree depends on the language and orthography in question). It is thus more convenient wherever meaning-related aspects of spoken language are investigated. Phonetic transcription is doubtlessly more systematic in a scientific sense, but it is also harder to learn, more time-consuming to carry out and less widely applicable than orthographic transcription.

As theory

Mapping spoken language onto written symbols is not as straightforward a process as may seem at first glance. Written language is an idealization, made up of a limited set of clearly distinct and discrete symbols. Spoken language, on the other hand, is a continuous (as opposed to discrete) phenomenon, made up of a potentially unlimited number of components. There is no predetermined system for distinguishing and classifying these components and, consequently, no preset way of mapping these components onto written symbols.

Literature is relatively consistent in pointing out the nonneutrality of transcription practices. There is not and cannot be a neutral transcription system. Knowledge of social culture enters directly into the making of a transcript. They are captured in the texture of the transcript (Baker, 2005).

Transcription systems

Transcription systems are sets of rules which define how spoken language is to be represented in written symbols. Most phonetic transcription systems are based on the International Phonetic Alphabet or, especially in speech technology, on its derivative SAMPA.

Examples for orthographic transcription systems (all from the field of conversation analysis or related fields) are:

CA (conversation analysis)

Arguably the first system of its kind, originally sketched in (Sacks et al. 1978), later adapted for the use in computer readable corpora as CA-CHAT by (MacWhinney 2000). The field of Conversation Analysis itself includes a number of distinct approaches to transcription and sets of transcription conventions. These include, among others, Jefferson Notation. To analyze conversation, recorded data is typically transcribed into a written form that is agreeable to analysts. There are two common approaches. The first, called narrow transcription, captures the details of conversational interaction such as which particular words are stressed, which words are spoken with increased loudness, points at which the turns-at-talk overlap, how particular words are articulated, and so on. If such detail is less important, perhaps because the analyst is more concerned with the overall gross structure of the conversation or the relative distribution of turns-at-talk amongst the participants, then a second type of transcription known as broad transcription may be sufficient (Williamson, 2009).

Jefferson Transcription System

The Jefferson Transcription System is a set of symbols, developed by Gail Jefferson, which is used for transcribing talk. Having had some previous experience in transcribing when she was hired in 1963 as a clerk typist at the UCLA Department of Public Health to transcribe sensitivity-training sessions for prison guards, Jefferson began transcribing some of the recordings that served as the materials out of which Harvey Sacks' earliest lectures were developed. Over four decades, for the majority of which she held no university position and was unsalaried, Jefferson's research into talk-in-interaction has set the standard for what became known as conversation analysis (CA). Her work has greatly influenced the sociological study of interaction, but also disciplines beyond, especially linguistics, communication, and anthropology. [1] This system is employed universally by those working from the CA perspective and is regarded as having become a near-globalized set of instructions for transcription. [2]

DT (discourse transcription)

A system described in (DuBois et al. 1992), used for transcription of the Santa Barbara Corpus of Spoken American English (SBCSAE), later developed further into DT2.

GAT (Gesprächsanalytisches Transkriptionssystem – Conversation analytic transcription system)

A system described in (Selting et al. 1998), later developed further into GAT2 (Selting et al. 2009), widely used in German speaking countries for prosodically oriented conversation analysis and interactional linguistics. [3] [4]

HIAT (Halbinterpretative Arbeitstranskriptionen – Semiinterpretative working transcriptions)

Arguably the first system of its kind, originally described in (Ehlich and Rehbein 1976) – see (Ehlich 1992) for an English reference - adapted for the use in computer readable corpora as (Rehbein et al. 2004), and widely used in functional pragmatics. [5] [6] [7]


Transcription was originally a process carried out manually, i.e. with pencil and paper, using an analogue sound recording stored on, e.g., a Compact Cassette. Nowadays, most transcription is done on computers. Recordings are usually digital audio files or video files, and transcriptions are electronic documents. Specialized computer software exists to assist the transcriber in efficiently creating a digital transcription from a digital recording.

Two types of transcription software can be used to assist the process of transcription: one that facilitates manual transcription and the other automated transcription. For the former, the work is still very much done by a human transcriber who listens to a recording and types up what is heard in a computer, and this type of software is often a multimedia player with functionality such as playback or changing speed. For the latter, automated transcription is achieved by a speech-to-text engine which converts audio or video files into electronic text. Some of the software would also include the function of annotation. [8]

See also

Related Research Articles

International Phonetic Alphabet Alphabetic system of phonetic notation

The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation of speech sounds in written form. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators and translators.

In phonology and linguistics, a phoneme is a unit of sound that distinguishes one word from another in a particular language.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways, such as Greek ⟨α⟩ → ⟨a⟩, Cyrillic ⟨д⟩ → ⟨d⟩, Greek ⟨χ⟩ → the digraph ⟨ch⟩, Armenian ⟨ն⟩ → ⟨n⟩ or Latin ⟨æ⟩ → ⟨ae⟩.

Transcription refers to the process of producing a copy of something piece by small piece, including:

Glottal stop Sound made by stopping airflow in the glottis

The glottal plosive or stop is a type of consonantal sound used in many spoken languages, produced by obstructing airflow in the vocal tract or, more precisely, the glottis. The symbol in the International Phonetic Alphabet that represents this sound is ⟨ʔ⟩.

Phonetic transcription is the visual representation of speech sounds by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the International Phonetic Alphabet.

Conversation analysis

Conversation analysis (CA) is an approach to the study of social interaction, embracing both verbal and non-verbal conduct, in situations of everyday life. CA originated as a sociological method, but has since spread to other fields. CA began with a focus on casual conversation, but its methods were subsequently adapted to embrace more task- and institution-centered interactions, such as those occurring in doctors' offices, courts, law enforcement, helplines, educational settings, and the mass media. As a consequence, the term 'conversation analysis' has become something of a misnomer, but it has continued as a term for a distinctive and successful approach to the analysis of sociolinguistic interactions.

Wyandot language

Wyandot is the Iroquoian language traditionally spoken by the people known variously as Wyandot or Wyandotte, descended from the Huron Wendat. It was last spoken by members located primarily in Oklahoma, United States and Quebec, Canada. Linguists have traditionally considered Wyandot as a dialect or modern form of Wendat.

The voiced labial–palatal approximant is a type of consonantal sound, used in some spoken languages. It has two constrictions in the vocal tract: with the tongue on the palate, and rounded at the lips. The symbol in the International Phonetic Alphabet that represents this sound is ⟨ɥ⟩, a rotated lowercase letter ⟨h⟩, or occasionally ⟨⟩, since it is a labialized.

Wylie transliteration Method for transliterating Tibetan script

Wylie transliteration is a method for transliterating Tibetan script using only the letters available on a typical English-language typewriter. The system is named for the American scholar Turrell V. Wylie, who created the system and described it in a 1959 paper published in the Harvard Journal of Asiatic Studies. It has subsequently become a standard transliteration scheme in Tibetan studies, especially in the United States.

The Marshallese language (Marshallese: new orthography Kajin M̧ajeļ or old orthography Kajin Majōl, also known as Ebon, is a Micronesian language spoken in the Marshall Islands. The language is spoken by about 44,000 people in the Marshall Islands, making it the principal language of the country. There are also roughly 6,000 speakers outside of the Marshall Islands, including those in Nauru and the United States.

Americanist phonetic notation, also known as the North American Phonetic Alphabet (NAPA), the Americanist Phonetic Alphabet or the American Phonetic Alphabet (APA), is a system of phonetic notation originally developed by European and American anthropologists and language scientists for the phonetic and phonemic transcription of indigenous languages of the Americas and for languages of Europe. It is still commonly used by linguists working on, among others, Slavic, Uralic, Semitic languages and for the languages of the Caucasus and of India; however, Uralists commonly use a variant known as the Uralic Phonetic Alphabet. Despite its name, the term "Americanist phonetic alphabet" has always been widely used outside the Americas. For example, a version of it is the standard for the transcription of Arabic in articles published in the Zeitschrift der Deutschen Morgenländischen Gesellschaft, the journal of the German Oriental Society.

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition. Its earliest transcripts date from the 1960s, and it now has contents in 26 languages from 130 different corpora, all of which are publicly available worldwide. Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and directed to the child speech of adults.

A pronunciation respelling for English is a notation used to convey the pronunciation of words in the English language, which does not have a phonemic orthography.

Gail Jefferson was an American sociologist with an emphasis in sociolinguistics. She was, along with Harvey Sacks and Emanuel Schegloff, one of the founders of the area of research known as conversation analysis (CA). She is remembered for the methods and notational conventions she developed for transcribing talk. The Jefferson Transcription System used widely in CA research is named after her.

In linguistics, a prosodic unit, often called an intonation unit or intonational phrase, is a segment of speech that occurs with a single prosodic contour. The abbreviation IU is used and therefore the full form is often found as intonation unit, despite the fact that technically it is a unit of prosody rather than intonation, which is only one element of prosody.

Interactional linguistics is an interdisciplinary approach to grammar and interaction in the fields of linguistics, the sociology of language, and anthropology. Not only is Interactional Linguistics about language grammar and use, but it encompasses a wide range of language as well – syntax, phonetics, phonology, morphology, semantics, pragmatics, and so on. Interactional linguistics is a project in which linguistic structures and uses are formed through interaction and it aims at helping understanding how languages are formed through interaction.

Jibu is a Jukunoid language spoken in the Taraba State of Nigeria by 30,000 people.

Orthographic transcription is a transcription method that employs the standard spelling system of each target language.

Lilias Armstrong British phonetician (1882-1937)

Lilias Eveline Armstrong was an English phonetician. She worked at University College London, where she attained the rank of reader. Armstrong is most known for her work on English intonation as well as the phonetics and tone of Somali and Kikuyu. Her book on English intonation, written with Ida C. Ward, was in print for 50 years. Armstrong also provided some of the first detailed descriptions of tone in Somali and Kikuyu.


  1. "Gail Jefferson Obituary • 1938-2008 • Quotes from authorities, colleagues, friends". Archived from the original on 2015-02-09.CS1 maint: unfit URL (link)
  2. Davidson, C. (2007). Independent writing in current approaches to writing instruction: What have we overlooked? English Teaching: Practice and Critique. Volume 6, Number 1.
  3. Selting, Margret / Auer, Peter / Barden, Birgit / Bergmann, Jörg / Couper-Kuhlen, Elizabeth / Günthner, Susanne / Meier, Christoph / Quasthoff, Uta / Schlobinski, Peter / Uhmann, Susanne (1998): Gesprächsanalytisches Transkriptionssystem (GAT). In: Linguistische Berichte 173, 91-122.
  4. Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J., Bergmann, P., Birkner, K., Couper-Kuhlen, E., Deppermann, A., Gilles, P., Günthner, S., Hartung, M., Kern, F., Mertzlufft, C., Meyer, C., Morek, M., Oberzaucher, F., Peters, J., Quasthoff, U., Schütte, W., Stukenbrock, A., Uhmann, S. (2009): Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). In: Gesprächsforschung (10), 353-402.
  5. Ehlich, K. (1992). HIAT - a Transcription System for Discourse Data. In: Edwards, Jane / Lampert, Martin (eds.): Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum, 123-148.
  6. Ehlich, K. & Rehbein, J. (1976) Halbinterpretative Arbeitstranskriptionen (HIAT). In: Linguistische Berichte (45), 21-41.
  7. Rehbein, J.; Schmidt, T.; Meyer, B.; Watzke, F. & Herkenrath, A. (2004) Handbuch für das computergestützte Transkribieren nach HIAT. In: Arbeiten zur Mehrsprachigkeit, Folge B (56).
  8. Chen, Yu-Hua; Bruncak, Radovan (2019). "Transcribear – Introducing a secure online transcription and annotation tool". Digital Scholarship in the Humanities. 35 (2): 265–275. doi:10.1093/llc/fqz016.

Further reading