Scribe (markup language)

Last updated
Scribe (markup language)
Developed by Brian Reid, Scribe Systems
Type of format Markup language
Extended to Texinfo
Open format?Yes

Scribe is a markup language and word processing system that pioneered the use of descriptive markup. [1] [2] Scribe was revolutionary when it was proposed, because it involved for the first time a clean separation of presentation and content. [3] [4] [5]

Contents

History

Beginnings

Scribe was designed and developed by Brian Reid of Carnegie Mellon University. It formed the subject of his 1980 doctoral dissertation, for which he received the Association for Computing Machinery's Grace Murray Hopper Award in 1982. [1]

Reid presented a paper describing Scribe in the same conference session in 1981 in which Charles Goldfarb presented GML (developed in 1969), [6] the immediate predecessor of SGML.

Scribe sold to Unilogic

In 1979, at the end of his graduate-student career, Reid sold Scribe to a Pittsburgh-area software company called Unilogic (later renamed Scribe Systems [7] ), founded by Michael Shamos, another Carnegie Mellon computer scientist, to market the program. Reid said he simply was looking for a way to unload the program on developers that would keep it from going into the public domain.

Michael Shamos was embroiled in a dispute with Carnegie Mellon administrators over the intellectual-property rights to Scribe. The dispute with the administration was settled out of court, and the university conceded it had no claim to Scribe. [8]

Time-bomb

Reid agreed to insert a set of time-dependent functions (called "time bombs") that would deactivate freely copied versions of the program after a 90-day expiration date. To avoid deactivation, users paid the software company, which then issued a code that defused the internal time-bomb feature.

Richard Stallman saw this as a betrayal of the programmer ethos. Instead of honoring the notion of "share-and-share alike", Reid had inserted a way for companies to compel programmers to pay for information access. [9]

Stallman's Texinfo is "loosely based on Brian Reid's Scribe and other formatting languages of the time"[ citation needed ]. [10]

Using Scribe word processor

Using Scribe involved a two phase process:

The Scribe markup language defined the words, lines, pages, spacing, headings, footings, footnotes, numbering, tables of contents, etc. in a way similar to HTML. The Scribe compiler used a database of Styles (containing document format definitions), which defined the rules for formatting a document in a particular style.

Because of the separation between the content (structure) of the document, and its style (format), writers did not need to concern themselves with the details of formatting. In this, there are similarities to the LaTeX document preparation system by Leslie Lamport.

The markup language

The idea of using markup language, in which meta-information about the document and its formatting were contained within the document itself, first saw widespread use in a program called RUNOFF; Scribe contained the first robust implementation of declarative markup language. [11]

In Scribe, markup was introduced with an @ sign, followed either by a Begin-End block or by a direct token invocation:

@Heading(The Beginning) @Begin(Quotation)     Let's start at the very beginning, a very good place to start @End(Quotation)

It was also possible to pass parameters:

@MakeSection(tag=beginning, title="The Beginning")

Typically, large documents were composed of Chapters, with each chapter in a separate file. These files were then referenced by a master document file, thereby concatenating numerous components into a single large source document. The master file typically also defined styles (such as fonts and margins) and declared macros like MakeSection shown above; macros had limited programmatic features. From that single concatenated source, Scribe computed chapter numbers, page numbers, and cross-references.

These processes replicate features in later markup languages like HTML. Placing styles in a separate file gave some advantages like Cascading Style Sheets, and programmed macros presaged the document manipulation aspects of JavaScript.

The FinalWord word processor from Mark of the Unicorn, which became Borland's Sprint, featured a markup language which resembled a simplified version of Scribe's. Before being packaged as FinalWord, earlier versions of the editor and formatter had been sold separately as MINCE ("MINCE Is Not Complete Emacs") and Scribble respectively.

LaTeX extends TeX with the descriptive markup ideas of Scribe.

See also

Related Research Articles

<span class="mw-page-title-main">Literate programming</span> A programming approach of software development

Literate programming is a programming paradigm introduced in 1984 by Donald Knuth in which a computer program is given as an explanation of how it works in a natural language, such as English, interspersed (embedded) with snippets of macros and traditional source code, from which compilable source code can be generated. The approach is used in scientific computing and in data science routinely for reproducible research and open access purposes. Literate programming tools are used by millions of programmers today.

<span class="mw-page-title-main">LaTeX</span> Document preparation system

LaTeX is a software system for document preparation. When writing, the writer uses plain text as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

Markuplanguage is a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitating automated processing.

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

troff, short for "typesetter roff", is the major component of a document processing system developed by Bell Labs for the Unix operating system. troff and the related nroff were both developed from the original roff.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

The Document Style Semantics and Specification Language (DSSSL) is an international standard developed to provide stylesheets for SGML documents.

<span class="mw-page-title-main">GNU TeXmacs</span> Open-source word processor

GNU TeXmacs is a scientific word processor and typesetting component of the GNU Project. It originated as a variant of GNU Emacs with TeX functionalities, though it shares no code with those programs, while using TeX fonts. It is written and maintained by Joris van der Hoeven and a group of developers. The program produces structured documents with a WYSIWYG user interface. New document styles can be created by the user. The editor provides high-quality typesetting algorithms and TeX and other fonts for publishing professional looking documents.

<span class="mw-page-title-main">Typesetting</span> Composition of text by means of arranging physical types or digital equivalents

Typesetting is the composition of text by means of arranging physical type in mechanical systems or glyphs in digital systems representing characters. Stored types are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

In computing, a polyglot is a computer program or script written in a valid form of multiple programming languages or file formats. The name was coined by analogy to multilingualism. A polyglot file is composed by combining syntax from two or more different formats. When the file formats are to be compiled or interpreted as source code, the file can be said to be a polyglot program, though file formats and source code syntax are both fundamentally streams of bytes, and exploiting this commonality is key to the development of polyglots. Polyglot files have practical applications in compatibility, but can also present a security risk when used to bypass validation or to exploit a vulnerability.

<span class="mw-page-title-main">Texinfo</span> Typesetting syntax used for generating documentation in both on-line and printed form

Texinfo is a typesetting syntax used for generating documentation in both on-line and printed form with a single source file. It is implemented by a computer program released as free software of the same name, created and made available by the GNU Project from the Free Software Foundation.

Generalized Markup Language (GML) is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT. SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A starter set of tags in GML is provided with the DCF product.

Brian Keith Reid is an American computer scientist. He developed an early use of a markup language in his 1980 doctoral dissertation. His other principal interest has been computer networking and the development of the Internet.

SCRIPT, any of a series of text markup languages starting with Script under Control Program-67/Cambridge Monitor System (CP-67/CMS) and Script/370 under Virtual Machine Facility/370 (VM/370) and the Time Sharing Option (TSO) of OS/VS2; the current version, SCRIPT/VS, is part of IBM's Document Composition Facility (DCF) for IBM z/VM and z/OS systems. SCRIPT was developed for CP-67/CMS by Stuart Madnick at MIT, succeeding CTSS RUNOFF.

XGMML is an XML application based on GML which is used for graph description. Technically, while GML is not related to XML nor SGML, XGMML is an XML application that is so designed that there's a 1:1 relation towards GML for trivial conversion between the two formats.

A structured document is an electronic document where some method of markup is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" rather than as "Helvetica bold 24" or "indented Courier". Such portions in general are commonly called "components" or "elements" of a document.

SGMLguid, also known as "CERN SGML", "Waterloo based SGML", and "Waterloo SGML", was an early SGML application developed and used at CERN between 1986 and 1990. It served as a model of the earliest HTML specifications.

References

  1. 1 2 "1982 – Brian K. Reid". Grace Murray Hopper Award . Retrieved 2009-02-24. For his contributions in the area of computerized text-production and typesetting systems, specifically Scribe which represents a major advance in this area. It embodies several innovations based on computer science research in programming language design, knowledge-based systems, computer document processing, and typography.
  2. "Scribe(ID:2481/scr010) - Text-formatting language". Online Historical Encyclopaedia of Programming Languages (hopl.info). Retrieved 2009-02-24. Brian Reid. Ground-breaking text-formatting language. Reason for Reid getting a Hopper Medal in 1982.
  3. "Markup Technologies '98 Conference. Agenda and Schedule". xml.coverpages.org. November 1998. Retrieved 2009-02-24. Brian Reid's work with markup systems began in the 1970s. He independently invented and implemented descriptive markup and developed its theory. His Scribe system may have been the cleanest separation of structure and format ever built. His dissertation on it was already complete in 1981, the year he presented in Lausanne in the same session where Charles Goldfarb publicly presented GML; SGML was proposed about a year later
  4. "XML Linking". xml.indelv.com. November 1998. Retrieved 2009-02-24. "Generalized", "generic", or "descriptive" markup has been discovered several times, apparently independently. Scribe [Reid 1981] is an early formatter based on structure rather than formatting commands.
  5. Brian K. Reid, "A high-level approach to computer document formatting", Proceedings of the POPL '80 Proceedings of the 7th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p. 24-31 doi : 10.1145/567446.567449
  6. see GML Wiki article
  7. PostScript Printer Driver Optimization Case Study , Adobe Systems, Technical Note #5042, 31 March 1992. Page 5.
  8. The Chronicle: August 10, 2001: 2 Scholars Face Off in Copyright Clash
  9. Williams, Sam (March 2002). "Free as in Freedom - Richard Stallman's Crusade for Free Software". O'Reilly . Retrieved 2008-09-26. For Reid, the deal was a win-win. Scribe didn't fall into the public domain, and Unilogic recouped on its investment. For Stallman, it was a betrayal of the programmer ethos, pure and simple. Instead of honoring the notion of share-and-share alike, Reid had inserted a way for companies to compel programmers to pay for information access.
  10. TexInfo
  11. Crockford, Douglas (2007-06-28). "Scribe" . Retrieved 2010-04-12.