Open Scripture Information Standard

Last updated
Open Scripture Information Standard
Filename extension xml
Latest release
2.1.1
(2006;13 years ago (2006))
Website crosswire.org/osis/

Open Scripture Information Standard (OSIS) is an XML application (or schema), that defines tags for marking up Bibles, theological commentaries, and other related literature.

XML Markup language developed by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

Markup language Modern system for annotating a document

In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts, which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly. It also avoids the specification of fonts and dimensions which may not apply to many users.

Contents

Description

The schema is very similar to that of the Text Encoding Initiative, though on the one hand much simpler (by omission of many unneeded constructs), and on the other hand adding much more detailed metadata, and a formal canonical reference system to identify books, chapters, verses, and particular locations within verses.

Text Encoding Initiative An academic community concerned with practices for semantic markup of texts

The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains an eponymous technical standard, a journal, a wiki, a GitHub repository and a toolchain.

Metadata data about data

Metadata is "data that provides information about other data". In short, it's data about data. Many distinct types of metadata exist, including descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata.

The metadata includes a "work declaration" for the work itself, and for each work it references. A work declaration provides basic catalog information based on the Dublin Core standard, and assigns a local short name for the work (similar to XML namespace declarations).

Dublin Core vocabulary terms that can be used to describe web resources

The Dublin Core Schema is a small set of vocabulary terms that can be used to describe digital resources, as well as physical resources such as books or CDs, and objects like artworks. The full set of Dublin Core metadata terms can be found on the Dublin Core Metadata Initiative (DCMI) website. The original set of 15 classic metadata terms, known as the Dublin Core Metadata Element Set (DCMES), is endorsed in the following standards documents:

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

Significant Features

OSIS gives particular attention to encoding overlapping markup, because Bibles exhibit such markup frequently, for example verses crossing paragraph boundaries and vice versa. The OSIS schema introduced a method for encoding overlap in XML, known as Trojan milestones, or "Clix". [1] [2]

In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non-hierarchical manner. A document with overlapping markup cannot be represented as a tree. This is also known as concurrent markup. Overlap happens, for instance, in poetry, where there may be a metrical structure of feet and lines; a linguistic structure of sentences and quotations; and a physical structure of volumes and pages and editorial annotations.

Development

The OSIS schema was developed by the Bible Technologies Group, a joint committee sponsored by the American Bible Society and the Society of Biblical Literature. Other participants in the standards work are the United Bible Societies, SIL International, and various national Bible societies, along with individual expert volunteers.

The American Bible Society (ABS) is a United States–based nondenominational Bible society which publishes, distributes and translates the Bible and provides study aids and other tools to help people engage with the Bible. Founded on May 11, 1816, in New York City, it is probably best known for its Good News Translation of the Bible, with its contemporary vernacular. They also publish the Contemporary English Version. The American Bible Society is a member of the Forum of Bible Agencies International. ABS's headquarters relocated from 1865 Broadway in New York City to Philadelphia in August 2015.

The Society of Biblical Literature (SBL), founded in 1880 as the Society of Biblical Literature and Exegesis, is an American-based learned society dedicated to the academic study of the Bible and related ancient literature. Its current stated mission is to "foster biblical scholarship". Membership is open to the public, and consists of over 8,500 individuals from over 80 countries. As a scholarly organization, SBL has been a constituent society of the American Council of Learned Societies since 1929.

The United Bible Societies (UBS) is a worldwide federation of Bible societies. In 1946 delegates from 13 countries formed the UBS, as an effort to coordinate the activities of the Bible societies. The first headquarters were London and in Geneva. The current General Secretary of The United Bible Societies is Michael Perreau. United Bible Societies is also a collaborating agency with the Forum of Bible Agencies International.

The officers include Steven DeRose (chair), Kees DeBlois (vice-chair), and Patrick Durusau (editor). As of mid-2006, the current version is 2.1.1.

Steven J DeRose is a computer scientist noted for his contributions to Computational Linguistics and to key standards related to document processing, mostly around ISO's Standard Generalized Markup Language (SGML) and W3C's Extensible Markup Language (XML).

See also

Related Research Articles

Geography Markup Language used to describe geographical features

The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. Key to GML's utility is its ability to integrate all forms of geographic information, including not only conventional "vector" or discrete objects, but coverages and sensor data.

JSON Text-based open standard designed for human-readable data interchange

In computing, JavaScript Object Notation (JSON) ( "Jason") is an open-standard file format that uses human-readable text to transmit data objects consisting of attribute–value pairs and array data types. It is a very common data format, with a diverse range of applications, such as serving as replacement for XML in AJAX systems.

The Theological Markup Language (ThML) is a "royalty-free" XML-based format created in 1998 by the Christian Classics Ethereal Library (CCEL) to create electronic theological texts. Other formats such as STEP and Logos Library System (LLS) were found unacceptable by CCEL as they are proprietary, prompting the creation of the new language. The ThML format borrowed elements from a somewhat similar format, the Text Encoding Initiative (TEI).

Michael Sperberg-McQueen American computer programmer

C. Michael Sperberg-McQueen is an American markup language specialist. He was co-editor of the Extensible Markup Language (XML) 1.0 spec (1998), and chair of the XML Schema working group.

Data exchange is the process of taking data structured under a source schema and transforming it into data structured under a target schema, so that the target data is an accurate representation of the source data. Data exchange allows data to be shared between different computer programs.

eXtensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

A structured document is an electronic document where some method such as markup or embedded coding, is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" rather than as "Helvetica bold 24" or "indented Courier". Such portions in general are commonly called "components" or "elements" of a document.

MECS is the Multi-Element Code System, a markup system developed by the Wittgenstein Archives at the University of Bergen. It is very similar to SGML and XML except that it allows elements to overlap.

The Music Encoding Initiative (MEI) is an open-source effort to create a system for representation of musical documents in a machine-readable structure. MEI closely mirrors work done by text scholars in the Text Encoding Initiative (TEI) and while the two encoding initiatives are not formally related, they share many common characteristics and development practices. The term "MEI", like "TEI", describes the governing organization and the markup language. The MEI community solicits input and development directions from specialists in various music research communities, including technologists, librarians, historians, and theorists in a common effort to discuss and define best practices for representing a broad range of musical documents and structures. The results of these discussions are then formalized into the MEI schema, a core set of rules for recording physical and intellectual characteristics of music notation documents. This schema is expressed in an XML Schema Language, with RelaxNG being the preferred format. The MEI schema is developed using the One-Document-Does-it-all (ODD) format, a literate programming XML format developed by the Text Encoding Initiative.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulae, graphics, bibliographies etc.

The Publishing Requirements for Industry Standard Metadata (PRISM) specification defines a set of XML metadata vocabularies for syndicating, aggregating, post-processing and multi-purposing content. PRISM provides a framework for the interchange and preservation of content and metadata, a collection of elements to describe that content, and a set of controlled vocabularies listing the values for those elements. PRISM can be XML, RDF/XML, or XMP and incorporates Dublin Core elements. PRISM can be thought of as a set of XML tags used to contain the metadata of articles and even tag article content.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

CLIX, a method of using valid XML for overlapping markup.

References

  1. Steven DeRose, Markup Overlap: A Review and a Horse, Proceedings of the Extreme Markup (2004)
  2. Syd Bauman, TEI HORSEing Around, Archived 2016-08-11 at the Wayback Machine Proceedings of the Extreme Markup (2005), (http://en.scientificcommons.org/43599936 Abstract)