Transkribus

Last updated
Transkribus Transkribus.jpg
Transkribus

Transkribus is a platform for the text recognition, image analysis and structure recognition of historical documents.

Contents

The platform was created in the context of the two EU projects "tranScriptorium" (2013–2015) [1] and "READ" (Recognition and Enrichment of Archival Documents – 2016–2019). [2] It was developed by the University of Innsbruck. Since July 1, 2019 the platform has been directed and further developed by the READ-COOP. [3]

The platform integrates tools developed by research groups throughout Europe, including the Pattern Recognition and Human Language Technology (PRHLT) group of the Technical University of Valencia and the Computational Intelligence Technology Lab (CITlab) group of University of Rostock.

Comparable programs that offer similar functions are eScriptorium and OCR4All. [4]

Related Research Articles

<span class="mw-page-title-main">World Wide Web Consortium</span> Main international standards organization for the World Wide Web

The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in the development of standards for the World Wide Web. As of 5 March 2023, W3C had 462 members. W3C also engages in education and outreach, develops software and serves as an open forum for discussion about the Web.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

XUL, which stands for XML User Interface Language, is a user interface markup language developed by Mozilla. XUL is an XML dialect for writing graphical user interfaces, enabling developers to write user interface elements in a manner similar to web pages.

An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. An open file format is licensed with an open license. For example, an open format can be implemented by both proprietary and free and open-source software, using the typical software licenses used by each. In contrast to open file formats, closed file formats are considered trade secrets.

<span class="mw-page-title-main">SAP</span> German multinational enterprise-software company

SAP SE is a German multinational software company based in Walldorf, Baden-Württemberg, Germany. It develops enterprise software to manage business operation and customer relations. The company is the world's largest enterprise resource planning (ERP) software vendor.

<span class="mw-page-title-main">Scriptorium</span> Room in medieval European monasteries for writing

A scriptorium was a writing room in medieval European monasteries for the copying and illuminating of manuscripts by scribes.

Wikisource is an online wiki-based digital library of free-content textual sources operated by the Wikimedia Foundation. Wikisource is the name of the project as a whole; it is also the name for each instance of that project, one for each language. The project's aim is to host all forms of free text, in many languages, and translations. Originally conceived as an archive to store useful or important historical texts, it has expanded to become a general-content library. The project officially began on November 24, 2003, under the name Project Sourceberg, a play on Project Gutenberg. The name Wikisource was adopted later that year and it received its own domain name.

<span class="mw-page-title-main">Alexander Kluge</span> German author, philosopher, academic and film director

Alexander Kluge is a German author, philosopher, academic and film director.

The following tables compare general and technical information for a number of office suites:

Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI and MusicXML . In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used.

Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

<span class="mw-page-title-main">German National Library of Economics</span> Research library of economics

The National Library of Economics is the world's largest research infrastructure for economic literature, online as well as offline. The ZBW is a member of the Leibniz Association and has been a foundation under public law since 2007. Several times the ZBW received the international LIBER Award for its innovative work in librarianship. The ZBW allows for access of millions of documents and research on economics, partnering with over 40 research institutions to create a connective Open Access portal and social web of research. Through its EconStor and EconBiz, researchers and students have accessed millions of datasets and thousands of articles. The ZBW also edits two journals: Wirtschaftsdienst and Intereconomics.

<span class="mw-page-title-main">1C Company</span> Russian computer software developer

1C Company is a Russian software developer, distributor and publisher based in Moscow. It develops, manufactures, licenses, supports and sells computer software, related services and video games.

<span class="mw-page-title-main">Digital Scriptorium</span> Consortium of American libraries

Digital Scriptorium (DS) is a non-profit, tax-exempt consortium of American libraries with collections of medieval and early modern manuscripts, that is, handwritten books made in the traditions of the world's scribal cultures. The DS Catalog represents these manuscript collections in a web-based platform form building a national union catalog for teaching and scholarly research in medieval and early modern studies.

Analyzed Layout and Text Object (ALTO) is an open XML Schema developed by the EU-funded project called METAe.

ownCloud Free software for cloud computing

ownCloud, a Kiteworks Company, is a free and open-source software project for content collaboration and sharing and syncing of files in distributed and federated enterprise scenarios. It allows companies and remote end-users to organize their documents on servers, computers, and mobile devices and work with them collaboratively while keeping a centrally organized and synchronized state.

<span class="mw-page-title-main">Akoma Ntoso</span>

Akoma Ntoso (Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies) is an international technical standard for representing executive, legislative and judiciary documents in a structured manner using a domain specific, legal XML vocabulary.

Gaia-X is an initiative to develop a federated secure data infrastructure for Europe, whereby data are shared, with users retaining control over their data access and usage, and according to some to ensure European digital sovereignty. It aims to develop digital governance, based on European values of transparency, openness, data protection, and security, which can be applied to cloud technologies to obtain transparency and controllability across data and services. The project name is a reference to the Greek goddess Gaia.

Page Analysis and Ground Truth Elements (PAGE) is an XML standard for encoding digitised documents. Comparable to ALTO (XML), it allows the organisation and structure of a page and its contents to be described.

eScriptorium OCR software

eScriptorium is a platform for manual or automated segmentation and text recognition of historical manuscripts and prints.

References

  1. "TranSkriptorium". transkriptorium.com. Retrieved 24 February 2022.[ title missing ]
  2. "tranScriptorium Project Page". cordis.europa.eu. Retrieved 1 February 2020.[ title missing ]
  3. "Read-Coop". readcoop.eu. Retrieved 1 February 2020.[ title missing ]
  4. "OCR4all | forTEXT" . Retrieved 2023-06-20.

Further reading