Transkribus

Last updated

Transkribus is a platform for the text recognition, image analysis and structure recognition of historical documents.

The platform was created in the context of the two EU projects "tranScriptorium" (2013–2015) [1] and "READ" (Recognition and Enrichment of Archival Documents – 2016–2019). [2] It was developed by the University of Innsbruck. Since July 1, 2019 the platform has been directed and further developed by the READ-COOP. [3]

The platform integrates tools developed by research groups throughout Europe, including the Pattern Recognition and Human Language Technology (PRHLT) group of the Technical University of Valencia and the Computational Intelligence Technology Lab (CITlab) group of University of Rostock.

Comparable programs that offer similar functions are eScriptorium and OCR4All. [4]

Related Research Articles

<span class="mw-page-title-main">World Wide Web Consortium</span> Main international standards organization for the World Wide Web

The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in the development of standards for the World Wide Web. As of 5 March 2023, W3C had 462 members. W3C also engages in education and outreach, develops software and serves as an open forum for discussion about the Web.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">SAP</span> German multinational enterprise-software company

SAP SE is a German multinational software company based in Walldorf, Baden-Württemberg. It develops enterprise software to manage business operations and customer relations. The company is the world's leading enterprise resource planning (ERP) software vendor.

<span class="mw-page-title-main">Scriptorium</span> Room in medieval European monasteries for writing

A scriptorium was a writing room in medieval European monasteries for the copying and illuminating of manuscripts by scribes.

<span class="mw-page-title-main">Facial recognition system</span> Technology capable of matching a face from an image against a database of faces

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.

<span class="mw-page-title-main">Alexander Kluge</span> German author, philosopher, academic and film director

Alexander Kluge is a German author, philosopher, academic and film director.

<span class="mw-page-title-main">Newton OS</span> Discontinued operating system by Apple Inc.

Newton OS is a discontinued operating system for the Apple Newton PDAs produced by Apple Computer, Inc. between 1993 and 1997. It was written entirely in C++ and trimmed to be low power consuming and use the available memory efficiently. Many applications were pre-installed in the ROM of the Newton to save on RAM and flash memory storage for user applications.

Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI and MusicXML . In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used.

<span class="mw-page-title-main">OCRopus</span>

OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.

<span class="mw-page-title-main">CODESYS</span> Development environment for programming controller applications

Codesys is an integrated development environment for programming controller applications according to the international industrial standard IEC 61131-3.

<span class="mw-page-title-main">Virtual assistant</span> Software agent

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

<span class="mw-page-title-main">German National Library of Economics</span> Research library of economics

The National Library of Economics is the world's largest research infrastructure for economic literature, online as well as offline. The ZBW is a member of the Leibniz Association and has been a foundation under public law since 2007. Several times the ZBW received the international LIBER Award for its innovative work in librarianship. The ZBW allows for access of millions of documents and research on economics, partnering with over 40 research institutions to create a connective Open Access portal and social web of research. Through its EconStor and EconBiz, researchers and students have accessed millions of datasets and thousands of articles. The ZBW also edits two journals: Wirtschaftsdienst and Intereconomics.

<span class="mw-page-title-main">Digital Scriptorium</span> Consortium of American libraries

Digital Scriptorium (DS) is a non-profit, tax-exempt consortium of American libraries with collections of medieval and early modern manuscripts, that is, handwritten books made in the traditions of the world's scribal cultures. The DS Catalog represents these manuscript collections in a web-based platform form building a national union catalog for teaching and scholarly research in medieval and early modern studies.

Analyzed Layout and Text Object (ALTO) is an open XML Schema developed by the EU-funded project called METAe.

ownCloud Free software for cloud computing

ownCloud, a Kiteworks Company, is a free and open-source software project for content collaboration and sharing and syncing of files in distributed and federated enterprise scenarios. It allows companies and remote end-users to organize their documents on servers, computers, and mobile devices and work with them collaboratively while keeping a centrally organized and synchronized state.

<span class="mw-page-title-main">Akoma Ntoso</span>

Akoma Ntoso (Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies) is an international technical standard for representing executive, legislative and judiciary documents in a structured manner using a domain specific, legal XML vocabulary.

<span class="mw-page-title-main">Patrick Breyer</span> German activist and Pirate Party MEP (1977–)

Patrick Breyer is a German digital rights activist, jurist, Pirate Party Germany politician, and – since 2019 – Member of the European Parliament (MEP). From 2012 to 2017 he was a member of the state parliament of Schleswig-Holstein and from April 2016 until the end of the legislative period he was also the leader of the Pirate group in that assembly. Breyer is one of four European Pirate Party MEPs in the 2019–2024 term along with three Czech Pirate Party members, all of whom are members of the Greens / EFA parliamentary group.

Gaia-X is an initiative to develop a federated secure data infrastructure for Europe, whereby data are shared, with users retaining control over their data access and usage, and according to some to ensure European digital sovereignty. It aims to develop digital governance, based on European values of transparency, openness, data protection, and security, which can be applied to cloud technologies to obtain transparency and controllability across data and services. The project name is a reference to the Greek goddess Gaia.

Page Analysis and Ground Truth Elements (PAGE) is an XML standard for encoding digitised documents. Comparable to ALTO (XML), it allows the organisation and structure of a page and its contents to be described.

eScriptorium

eScriptorium is a platform for manual or automated segmentation and text recognition of historical manuscripts and prints.

References

  1. "TranSkriptorium". transkriptorium.com. Retrieved 24 February 2022.[ title missing ]
  2. "tranScriptorium Project Page". cordis.europa.eu. Retrieved 1 February 2020.[ title missing ]
  3. "Read-Coop". readcoop.eu. Retrieved 1 February 2020.[ title missing ]
  4. "OCR4all | forTEXT" . Retrieved 2023-06-20.