Medieval Nordic Text Archive (Menota) is a network of leading Nordic archives, libraries and research departments working with medieval texts and manuscript facsimiles. The aim of Menota is to preserve and publish medieval texts in digital form and to adapt and develop encoding standards necessary for this work.
Menota was established in 2001 and at the time of writing (June 2015) it offers 20 texts with a total of approx. 1 million words. The texts are mostly rendered on the diplomatic level (i.e. following the manuscripts in most matters of orthography), while some also are rendered on a very close level, the facsimile level (rendering abbreviations as such and some allographic variation), and others also on a normalised level, in which the orthography corresponds to the one found in grammars and dictionaries and text series like Íslenzk fornrit.
In addition to the archive of texts, Menota also offers a handbook in XML text encoding, The Menota handbook. This is based on the Guidelines of the Text Encoding Initiative, and discusses a number of encoding questions relating to vernacular manuscripts. The handbook is published digitally on the Menota site, and it offers a full TEI-style Document Type Definition and a Relax NG schema for anyone who wants to encode Medieval Nordic manuscripts.
Menota welcomes transcriptions of all kinds of Medieval Nordic primary sources, i.e. directly from the manuscript itself or a good facsimile of it, as long as the transcription has been proofread to an acceptable level and it is delivered in a valid XML file according to the schema available on the Menota site.
Menota follows the recommendations of the Medieval Unicode Font Initiative with respect to the encoding and display of special characters. On the normalised level of text rendering, all necessary characters will be found in the official part of the Unicode Standard, but some characters on a diplomatic level and several on a facsimile level can only be displayed by using characters in the Private Use Area of Unicode. MUFI offers several free or low-cost fonts for this use.
Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.
Web pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in an HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes.
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.
The ogonek is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American languages. It is also placed on the lower right corner of consonants in some Latin transcriptions of various indigenous languages of the Caucasus mountains.
Mojibake is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.
In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.
The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains the TEI technical standard, a journal, a wiki, a GitHub repository and a toolchain.

The r rotunda ⟨ ꝛ ⟩, "rounded r", is a historical calligraphic variant of the minuscule (lowercase) letter Latin r used in full script-like typefaces, especially blackletters.
Insular script is a medieval script system originating from Ireland that spread to England and continental Europe under the influence of Irish Christianity. Irish missionaries took the script to continental Europe, where they founded monasteries such as Bobbio. The scripts were also used in monasteries like Fulda, which were influenced by English missionaries. They are associated with Insular art, of which most surviving examples are illuminated manuscripts. It greatly influenced modern Gaelic type and handwriting.
The Avestan alphabet is a writing system developed during Iran's Sasanian era (226–651 CE) to render the Avestan language.
In digital typography, the Medieval Unicode Font Initiative (MUFI) is a project which aims to coordinate the encoding and display of special characters in medieval texts written in the Latin alphabet or in runes, which are not otherwise encoded as part of Unicode.
Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.
A whitespace character is a character data element that represents white space when text is rendered for display by a computer.
PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking and encryption. The ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and a user interface for reading embedded annotations.
L, or l, is the twelfth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is el, plural els.
Siddhaṃ, also known in its later evolved form as Siddhamātṛkā, is a medieval Brahmic abugida, derived from the Gupta script and ancestral to the Nāgarī, Eastern Nagari, Tirhuta, Odia and Nepalese scripts.
Record type is a family of typefaces designed to allow medieval manuscripts to be published as near-facsimiles of the originals. The typefaces include many special characters intended to replicate the various scribal abbreviations and other unusual glyphs typically found in such manuscripts. They were used in the publication of archival texts between 1774 and 1900.
The Music Encoding Initiative (MEI) is an open-source effort to create a system for representation of musical documents in a machine-readable structure. MEI closely mirrors work done by text scholars in the Text Encoding Initiative (TEI) and while the two encoding initiatives are not formally related, they share many common characteristics and development practices. The term "MEI", like "TEI", describes the governing organization and the markup language. The MEI community solicits input and development directions from specialists in various music research communities, including technologists, librarians, historians, and theorists in a common effort to discuss and define best practices for representing a broad range of musical documents and structures. The results of these discussions are then formalized into the MEI schema, a core set of rules for recording physical and intellectual characteristics of music notation documents. This schema is expressed in an XML schema Language, with RelaxNG being the preferred format. The MEI schema is developed using the One-Document-Does-it-all (ODD) format, a literate programming XML format developed by the Text Encoding Initiative.
Menotec was an infrastructure project funded by the Norwegian Research Council (2010–2012) with the aim of transcribing and annotating a text corpus of Old Norwegian texts.