Voyant Tools

Last updated
Voyant Tools
Developer(s) Stéfan Sinclair & Geoffrey Rockwell
Initial release2003
Stable release
2.2
Repository https://github.com/voyanttools/Voyant
Operating system Cross-platform
Available in10 languages
Type Text analysis, statistical analysis, data mining
Licence Web Application: Creative Commons Attribution 4.0 International Code: GPL3
Website http://voyant-tools.org

Voyant Tools is an open-source, web-based application for performing text analysis. It supports scholarly reading and interpretation of texts or corpus, particularly by scholars in the digital humanities, but also by students and the general public. It can be used to analyze online texts or ones uploaded by users. [1] Voyant has a large, international user base: in October 2016 alone, Voyant's main server had 81,686 page views originating from 156 countries, invoking the tool 1,173,252 times. [2]

Contents

Voyant "was conceived to enhance reading through lightweight text analytics such as word frequency lists, frequency distribution plots, and KWIC displays." [3] Its interface is composed of panels which perform these varied analytical tasks. These panels can also be embedded in external web texts (e.g. a web article could include a Voyant panel that creates a word cloud from it). The book Hermeneutica: Computer-Assisted Interpretation in the Humanities demonstrates different approaches to text analysis using Voyant. [4]

History

Voyant Tools was developed by Stéfan Sinclair (McGill) and Geoffrey Rockwell (University of Alberta) and continues to be updated. It developed out of earlier text analysis tools including HyperPo, Taporware, and TACT. Contributors have included Andrew MacDonald, Cyril Briquet, Lisa Goddard, and Mark Turcato. [1]

Range of Uses

Researchers have used Voyant Tools to analyze texts in a wide range of contexts including literature, [5] language teaching, [6] healthcare, [7] [8] and system architecture. [9] Describing approaches to studying the internet using web scraping, Black has noted that "the Voyant Tools project is an excellent source to learn about the kinds of data that humanists can extract from Internet sources because it already supports text extraction from webpages." [10]

A number of international digital humanities projects are running Voyant on their own servers. These include the French Huma-Num project, the Italian CNR ILC, and the German DARIAH-DE project. [2]

Related Research Articles

Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances or preexisting text in another writing system.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

<span class="mw-page-title-main">Content analysis</span> Research method for studying documents and communication artifacts

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is their non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

<span class="mw-page-title-main">Analytics</span> Discovery, interpretation, and communication of meaningful patterns in data

Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming, and operations research to quantify performance.

<span class="mw-page-title-main">Data management</span> Disciplines related to managing data as a resource

Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization’s data so it can be analyzed for decision making.

<span class="mw-page-title-main">Biocomplexity Institute of Virginia Tech</span> Research organization for computational biology and related fields

The Biocomplexity Institute of Virginia Tech is a research organization specializing in bioinformatics, computational biology, and systems biology. The institute has more than 250 personnel, including over 50 tenured and research faculty. Research at the institute involves collaboration in diverse disciplines such as mathematics, computer science, biology, plant pathology, biochemistry, systems biology, statistics, economics, synthetic biology and medicine. The institute develops -omic and bioinformatic tools and databases that can be applied to the study of human, animal and plant diseases as well as the discovery of new vaccine, drug and diagnostic targets.

<span class="mw-page-title-main">Lev Manovich</span>

Lev Manovich is an artist, an author and a theorist of digital culture. He is a Distinguished Professor at the Graduate Center of the City University of New York. Manovich played a key role in creating four new research fields: new media studies (1991-), software studies (2001-), cultural analytics (2007-) and AI aesthetics (2018-). Manovich's current research focuses on generative media, AI culture, digital art, and media theory.

<span class="mw-page-title-main">Business analyst</span> Person who analyses and documents a business

A business analyst (BA) is a person who processes, interprets and documents business processes, products, services and software through analysis of data. The role of a business analyst is to ensure business efficiency increases through their knowledge of both IT and business function.

Web analytics is the measurement, collection, analysis, and reporting of web data to understand and optimize web usage. Web analytics is not just a process for measuring web traffic but can be used as a tool for business and market research and assess and improve website effectiveness. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It can be used to estimate how traffic to a website changes after launching a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views, or create user behavior profiles. It helps gauge traffic and popularity trends, which is useful for market research.

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.

<span class="mw-page-title-main">Digital humanities</span> Area of scholarly activity

Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analysis of their application. DH can be defined as new ways of doing scholarship that involve collaborative, transdisciplinary, and computationally engaged research, teaching, and publishing. It brings digital tools and methods to the study of the humanities with the recognition that the printed word is no longer the main medium for knowledge production and distribution.

TAPoR is a gateway that highlights tools and code snippets usable for textual criticism of all types. The project is housed at the University of Alberta, and is currently led by Geoffrey Rockwell, Stéfan Sinclair, Kirsten C. Uszkalo, and Milena Radzikowska. Users of the portal explore tools to use in their research, and can rate, review, and comment on tools, browse curated lists of recommended tools, and add tags to tools. Tool pages on TAPoR consist of a short description, authorial information, a screenshot of the tool, tags, suggested related tools, and user ratings and comments. Code snippet pages also contain an excerpt of code and a link to the full code's location online.

The Applied Research in Patacriticism (ARP) was a digital humanities lab based at the University of Virginia founded and run by Jerome McGann and Johanna Drucker. ARP's open-source tools include Juxta, IVANHOE, and Collex. Collex is the social software and faceted browsing backbone of the NINES federation. ARP was funded by the Mellon Foundation.

Digital history is the use of digital media to further historical analysis, presentation, and research. It is a branch of the digital humanities and an extension of quantitative history, cliometrics, and computing. Digital history is commonly digital public history, concerned primarily with engaging online audiences with historical content, or, digital research methods, that further academic research. Digital history outputs include: digital archives, online presentations, data visualizations, interactive maps, timelines, audio files, and virtual worlds to make history more accessible to the user. Recent digital history projects focus on creativity, collaboration, and technical innovation, text mining, corpus linguistics, network analysis, 3D modeling, and big data analysis. By utilizing these resources, the user can rapidly develop new analyses that can link to, extend, and bring to life existing histories.

Cultural analytics refers to the use of computational, visualization, and big data methods for the exploration of contemporary and historical cultures. While digital humanities research has focused on text data, cultural analytics has a particular focus on massive cultural data sets of visual material – both digitized visual artifacts and contemporary visual and interactive media. Taking on the challenge of how to best explore large collections of rich cultural content, cultural analytics researchers developed new methods and intuitive visual techniques that rely on high-resolution visualization and digital image processing. These methods are used to address both the existing research questions in humanities, to explore new questions, and to develop new theoretical concepts that fit the mega-scale of digital culture in the early 21st century.

Computer-assistedqualitative data analysis software (CAQDAS) offers tools that assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc.

Computational musicology is an interdisciplinary research area between musicology and computer science. Computational musicology includes any disciplines that use computation in order to study music. It includes sub-disciplines such as mathematical music theory, computer music, systematic musicology, music information retrieval, digital musicology, sound and music computing, and music informatics. As this area of research is defined by the tools that it uses and its subject matter, research in computational musicology intersects with both the humanities and the sciences. The use of computers in order to study and analyze music generally began in the 1960s, although musicians have been using computers to assist them in the composition of music beginning in the 1950s. Today, computational musicology encompasses a wide range of research topics dealing with the multiple ways music can be represented.

The Maryland Institute for Technology in the Humanities (MITH) is an international research center that works with humanities in the 21st century. A collaboration among the University of Maryland College of Arts and Humanities, Libraries, and Office of Information Technology, MITH cultivates research agendas clustered around digital tools, text mining and visualization, and the creation and preservation of electronic literature, digital games and virtual worlds.

<i>Index Thomisticus</i> Digital humanities project

The Index Thomisticus was a digital humanities project begun in the 1940s that created a concordance to 179 texts centering around Thomas Aquinas. Led by Roberto Busa, the project indexed 10,631,980 words over the course of 34 years, initially onto punched cards. It is considered a pioneering project in the field of digital humanities.

References

  1. 1 2 "Voyant Tools Help". voyant-tools.org. Retrieved 2016-11-24.
  2. 1 2 Sinclair, Stéfan; Rockwell, Geoffrey (2016). "Voyant Facts". Hermeneuti.ca: Computer-Assisted Interpretation in the Humanities. Stéfan Sinclair & Geoffrey Rockwell. Retrieved 2016-12-20.
  3. Klein, Lauren F.; Eisenstein, Jacob; Sun, Iris (2015). "Exploratory Thematic Analysis for Digitized Archival Collections". Digital Scholarship in the Humanities. 30 (Supp. 1): i138. doi: 10.1093/llc/fqv052 .
  4. Rockwell, Geoffrey; Sinclair, Stéfan (2016). Hermeneutica: Computer-Assisted Interpretation in the Humanities. Cambridge: MIT Press. ISBN   9780262332057.
  5. Rambsy, Kenton (2016). "Text-Mining Short Fiction by Zora Neale Hurston and Richard Wright using Voyant Tools". CLA Journal. 59 (3): 251–258.
  6. McIlroy, Tara (2013). "Exploring Poetry and Identity in a Language Learning Environment". Studies in Linguistics and Language Teaching. 24: 31–45.
  7. De Caro, W.; Mitello, L.; Marucci, A.R.; Lancia, L.; Sansoni, J. (2016). "Textual Analysis and Data Mining: An Interpreting Research on Nursing". Studies in Health Technology and Informatics. 225: 948. PMID   27332424.
  8. Maramba, Inocencio Daniel; et al. (2015). "Web-based textual analysis of free-text patient experience comments from a survey in primary care". JMIR Medical Informatics. 3 (2): e20. doi: 10.2196/medinform.3783 . PMC   4439523 . PMID   25947632.
  9. Moullec, Marie-Lise; Jankovic, Marija; Eckert, Claudia (2016). "Selecting system architecture: What a single industrial experiment can tell us about the traps to avoid when choosing selection criteria". System Architecture Design. 30 (3): 250–262.
  10. Black, Michael L. (2016). "The World Wide Web as Complex Data Set: Expanding the Digital Humanities into the Twentieth Century and Beyond through Internet Research". International Journal of Humanities and Arts Computing . 10 (1): 106. doi:10.3366/ijhac.2016.0162.