ELAN software

Last updated
ELAN
Developer(s) The Language Archive
Initial release2000;23 years ago (2000)
Stable release
6.1 / March 12, 2021;2 years ago (2021-03-12) [1]
Written in Java
Operating system Windows, macOS, Linux
Platform IA-32, x86-64
Available inEnglish
Type Language documentation, qualitative data analysis
License GPLv3
Website archive.mpi.nl/tla/elan

ELAN is computer software, a professional tool to manually and semi-automatically annotate and transcribe audio or video recordings. [2] It has a tier-based data model that supports multi-level, multi-participant annotation of time-based media. It is applied in humanities and social sciences research (language documentation, sign language and gesture research) for the purpose of documentation and of qualitative and quantitative analysis. [3] It is distributed as free and open source software under the GNU General Public License, version 3.

Contents

ELAN is a well established professional-grade software and is widely used in academia. [4] [5] [6] It has been well received in several academic disciplines, for example, in psychology, medicine, psychiatry, education, and behavioral studies, on topics such as human computer interaction, [7] sign language and conversation analysis, [8] [9] [10] group interactions, [11] music therapy, [12] bilingualism and child language acquisition, [13] analysis of non-verbal communication and gesture analysis, [14] and animal behavior. [15]

Several third-party tools have been developed to enrich and analyse ELAN data and corpora. [16] [17] [18] [19]

Features

Its features include:

History

ELAN is developed by the Max Planck Institute for Psycholinguistics in Nijmegen. The first version was released around the year 2000 under the name EAT, Eudico Annotation Tool. It was renamed to ELAN in 2002. Since then, two to three new versions are released each year. It is developed in the programming language Java with interfaces to platform native media frameworks developed in C, C++, and Objective-C.

See also

Related Research Articles

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguistics to run quantitative analyses on linguistic concepts, otherwise harder to quantify.

Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances or preexisting text in another writing system.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition. Its earliest transcripts date from the 1960s, and as of 2015 has contents in 26 languages from 230 different corpora, all of which are publicly available worldwide. Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and directed to the child speech of adults.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. It was established to provide support, advice and information on TM technologies and to disseminate information from the larger TM community, while also providing services and tools in response to the requirements of the United Kingdom academic community.

A speech corpus is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models. In linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.

The International Corpus of English(ICE) is a set of corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.

The Survey of English Usage was the first research centre in Europe to carry out research with corpora. The Survey is based in the Department of English Language and Literature at University College London.

Computer-assistedqualitative data analysis software (CAQDAS) offers tools that assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

<span class="mw-page-title-main">Emotion recognition</span> Process of visually interpreting emotions

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

The field of language documentation in the modern context involves a complex and ever-evolving set of tools and methods, and the study and development of their use - and, especially, identification and promotion of best practices - can be considered a sub-field of language documentation proper. Among these are ethical and recording principles, workflows and methods, hardware tools, and software tools.

In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications."

<span class="mw-page-title-main">Hierarchical Event Descriptors</span> Hierarchical Event Descriptors (HED) is a framework and vocabulary for annotating experiments.

Hierarchical Event Descriptors (HED) is a conceptual and software framework that includes a family of controlled vocabularies for annotating experimental metadata and experienced events on the timeline of neuroimaging and behavioral experiments. The goal of HED is to standardize annotations and the mechanisms for handling these annotations to enable searching, comparing, and extracting data of interest for analysis. HED is the event annotation mechanism used by the Brain Imaging Data Structure standard for describing events.

References

  1. "Release notes - The Language Archive". archive.mpi.nl. Retrieved 2021-04-04.
  2. Twilhaar, Jan Nijen; Bogaerde, Beppie van den (2016): Concise Lexicon for Sign Linguistics. John Benjamins Publishing Company. p63.
  3. Wittenburg, Peter; Brugman, Hennie; Russel, Albert; Klassmann, Alex; Sloetjes, Han (2006). ELAN: a Professional Framework for Multimodality Research. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06).
  4. Kong, Anthony Pak Hin (2016). "Multi-linear Transcription and Analysis of Oral Discourse". In Kong, Anthony Pak Hin (ed.). Analysis of Neurogenic Disordered Discourse Production: From Theory to Practice. Routledge. doi:10.4324/9781315639376. ISBN   978-1-315-63937-6.
  5. Orfanidou, Eleni; Woll, Bencie; Morgan, Gary (2015): Research Methods in Sign Language Studies: A Practical Guide. John Wiley & Sons. p274.
  6. Ruhi, Şükriye; Haugh, Michael; Schmidt, Thomas (2014): Best Practices for Spoken Corpora in Linguistic Research. Cambridge Scholars Publishing. p 188.
  7. Giuliani, Manuel; Mirnig, Nicole; Stollnberger, Gerald; Stadler, Susanne; Buchner, Roland; Tscheligi, Manfred (8 July 2015). "Systematic analysis of video data from different human–robot interaction studies: a categorization of social signals during error situations". Frontiers in Psychology. 6: 931. doi: 10.3389/fpsyg.2015.00931 . PMC   4495306 . PMID   26217266.
  8. Zahedi, M.; Dreuw, P.; Rybach, D.; Bungeroth, J.; Ney, H. (2006). Continuous Sign Language Recognition – Approaches from Speech Recognition and Available Data Resources. CiteSeerX   10.1.1.413.2233 .
  9. Crasborn, O. A.; Bank, R. An annotation scheme for the linguistic study of mouth actions in sign languages. hdl:2066/132960.
  10. Manrique, Elizabeth; Enfield, N. J. (15 September 2015). "Suspending the next turn as a form of repair initiation: evidence from Argentine Sign Language". Frontiers in Psychology. 6: 1326. doi: 10.3389/fpsyg.2015.01326 . PMC   4569752 . PMID   26441710.
  11. Orfanos, Stavros; Akther, Syeda Ferhana; Abdul-Basit, Muhammad; McCabe, Rosemarie; Priebe, Stefan (10 February 2017). "Using video-annotation software to identify interactions in group therapies for schizophrenia: assessing reliability and associations with outcomes". BMC Psychiatry. 17 (1): 65. doi: 10.1186/s12888-017-1217-2 . PMC   5301334 . PMID   28183293.
  12. Spiro, Neta; Himberg, Tommi (5 May 2016). "Analysing change in music therapy interactions of children with communication difficulties". Philosophical Transactions of the Royal Society B: Biological Sciences. 371 (1693): 20150374. doi:10.1098/rstb.2015.0374. PMC   4843612 . PMID   27069051.
  13. Chen Pichler, Deborah; Hochgesang, Julie A.; Lillo-Martin, Diane; Müller de Quadros, Ronice (2010). "Conventions for sign and speech transcription of child bimodal bilingual corpora in ELAN". Acquiring Sign Language as a First Language. 1 (1): 11–40. doi:10.1075/lia.1.1.03che. PMC   3102315 . PMID   21625371.
  14. Kong, Anthony Pak-Hin; Law, Sam-Po; Kwan, Connie Ching-Yin; Lai, Christy; Lam, Vivian (2015). "A Coding System with Independent Annotations of Gesture Forms and Functions during Verbal Communication: Development of a Database of Speech and GEsture (DoSaGE)". Journal of Nonverbal Behavior. 39 (1): 93–111. doi:10.1007/s10919-014-0200-6. PMC   4319117 . PMID   25667563.
  15. Ravignani, Andrea; Olivera, Vicente; Gingras, Bruno; Hofer, Riccardo; Hernández, Carlos; Sonnweber, Ruth-Sophie; Fitch, W. (31 July 2013). "Primate Drum Kit: A System for Studying Acoustic Pattern Production by Non-Human Primates Using Acceleration and Strain Sensors". Sensors. 13 (8): 9790–9820. Bibcode:2013Senso..13.9790R. doi: 10.3390/s130809790 . PMC   3812580 . PMID   23912427.
  16. Andersson, Richard; Sandgren, Olof (23 February 2016). "ELAN Analysis Companion (EAC): A Software Tool for Time-course Analysis of ELAN-annotated Data". Journal of Eye Movement Research. 9 (3). doi: 10.16910/jemr.9.3.1 .
  17. Holle, Henning; Rein, Robert (9 August 2014). Holle.pdf "EasyDIAg: A tool for easy determination of interrater agreement" (PDF). Behavior Research Methods. 47 (3): 837–847. doi:10.3758/s13428-014-0506-7. PMID   25106813. S2CID   43570421.{{cite journal}}: Check |url= value (help)
  18. Kousidis, S.; Pfeiffer, T.; Schlangen, D. (2013). MINT.tools: Tools and adaptors supporting acquisition, annotation and analysis of multimodal corpora (PDF). Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. pp. 2649–2653.
  19. Berez, A.; Cox, C. (2009): CuPED (Customizable Presentation of ELAN Documents). URL: http://sweet.artsrn.ualberta.ca/cdcox/cuped/ (accessed 2017/03/21)

Notes