SGMLguid

Last updated

SGMLguid, also known as "CERN SGML", [1] "Waterloo based SGML", [2] and "Waterloo SGML", [3] was an early SGML application developed and used at CERN between 1986 and 1990. It served as a model of the earliest HTML specifications.

Contents

History

In 1984, CERN started the CERNDOC project, a document filing and retrieval system that would standardize CERN's manifold and mutually incompatible documentation practices. [4] The project adapted an earlier documentation system developed at the Rutherford Laboratory, a British particle physics research facility. [5] Written in the Rexx programming language, installed on an IBM 3090-200 mainframe computer, and running on the VM/CMS operating system, [4] the system stored tens of thousands of documents in a hierarchical structure. It offered keyword searching and was able to display documents on a screen or send them to a printer. [6]

CERNDOC supported two markup systems: a GML application named CERNPAPER, developed locally in 1985, [7] [8] and a SGML application created in 1986 by Anders Berglund, who was at the time responsible for text processing in the CERN data handling division. Berglund mapped a Waterloo SCRIPT macro set onto SGML, basing his application on the document type defined in Annex E of ISO 8879 [1] and on AAP DTD, the American Association of Publishers' document type. [9] [5] Prior art also includes the IBM GML starter set. [10] [11] [12] The application features an extensive tag set for preparing foils, memos, letters, scientific papers, and manuals, amongst other use cases. [8]

In 1990, when Eric van Herwijnen acted as head of text processing in the CERN Administrative Services Department, CERN replaced CERNDOC with the IBM Document Composition Facility (DCF), thereby obsoleting both CERNPAPER and SGMLguid. [2] To replace these applications, Herwijnen and Michel Goossens mapped IBM's Bookmaster macro sets onto a number of DTDs. [3] [13]

CERN discontinued its use of mainframe computing in 1994. [14]

Relevance for HTML

Tim Berners-Lee, who was working as a CERN contractor when he created the Web, encountered CERNguid in October 1987, when CERN's Online Computing Group started to maintain its documentation in CERNDOC. Berners-Lee found its hierarchical structure highly limiting. [6]

For HTML, Berners-Lee adopted SGML syntax and a subset of the tags specified in CERN's SGMLguid. [5]

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

A markuplanguage is a text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate automated processing.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

In computing, AAP DTD is a set of three SGML Document Type Definitions for scientific documents, defined by the Association of American Publishers. It was ratified as a U.S. standard under the name ANSI/NISO Z39.59 in 1988, and evolved into the international ISO 12083 standard in 1993. It was supplanted as a U.S. standard by ANSI/ISO 12083 in 1995.

<span class="mw-page-title-main">Typesetting</span> Composition of text by means of arranging physical types or digital equivalents

Typesetting is the composition of text for publication, display, or distribution by means of arranging physical type in mechanical systems or glyphs in digital systems representing characters. Stored types are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

<span class="mw-page-title-main">Charles Goldfarb</span> Co-inventor of the concept of markup languages, including GML, which led to SGML, HTML, and XML

Charles F. Goldfarb, is known as the father of Standard Generalized Markup Language (SGML) and grandfather of HTML and the World Wide Web, also referred to as WWW, W3, or the Web. He co-invented the concept of markup languages.

Generalized Markup Language (GML) is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT. SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A starter set of tags in GML is provided with the DCF product.

In the Standard Generalized Markup Language (SGML), an entity is a primitive data type, which associates a string with either a unique alias or an SGML reserved word. Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of plain text, SGML tags, and/or references to previously defined entities. Certain entity types may also invoke external documents. Entities are called by reference.

<span class="mw-page-title-main">History of the World Wide Web</span> Information system running in the Internet and its history

The World Wide Web is a global information medium that users can access via computers connected to the Internet. The term is often mistakenly used as a synonym for the Internet, but the Web is a service that operates over the Internet, just as email and Usenet do. The history of the Internet and the history of hypertext date back significantly further than that of the World Wide Web.

SCRIPT, any of a series of text markup languages starting with Script under Control Program-67/Cambridge Monitor System (CP-67/CMS) and Script/370 under Virtual Machine Facility/370 (VM/370) and the Time Sharing Option (TSO) of OS/VS2; the current version, SCRIPT/VS, is part of IBM's Document Composition Facility (DCF) for IBM z/VM and z/OS systems. SCRIPT was developed for CP-67/CMS by Stuart Madnick at MIT, succeeding CTSS RUNOFF.

Document Content Architecture, or DCA for short, is a standard developed by IBM for text documents in the early 1980s. DCA was used on mainframe and IBM i systems and formed the basis of DisplayWrite's file format. DCA was later extended as MO:DCA, which added embedded data files.

A Formal Public Identifier (FPI) is a short piece of text with a particular structure that may be used to uniquely identify a product, specification or document. FPIs were introduced as part of Standard Generalized Markup Language (SGML), and serve particular purposes in formats historically derived from SGML. Some of their most common uses are as part of document type declarations (DOCTYPEs) and document type definitions (DTDs) in SGML, XML and historically HTML, but they are also used in the vCard and iCalendar file formats to identify the software product which generated the file.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

<span class="mw-page-title-main">History of hypertext</span>

Hypertext is text displayed on a computer or other electronic device with references (hyperlinks) to other text that the reader can immediately access, usually by a mouse click or keypress sequence. Early conceptions of hypertext defined it as text that could be connected by a linking system to a range of other documents that were stored outside that text. In 1934 Belgian bibliographer, Paul Otlet, developed a blueprint for links that telescoped out from hypertext electrically to allow readers to access documents, books, photographs, and so on, stored anywhere in the world.

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

ISO 12083 is an international SGML standard for document interchange between authors and publishers. It features separate Document Type Definitions for books, serials, articles, and math. Derived from AAP DTD, it was first published in 1993, revised in 1994, and last confirmed in 2016.

References

  1. 1 2 Berglund, Anders (1986-10-27), CERN SGML User's Guide (PDF), CERN, p. v
  2. 1 2 van Herwijnen, Eric (January 1990). "Text Processing Policy" (PDF). CERN Computer Newsletter. No. 198. pp. 16–17.
  3. 1 2 Goossens, Michel (January 1990). "SGML/Bookmaster on VM/CMS" (PDF). CERN Computer Newsletter. No. 198. pp. 17–19.
  4. 1 2 Esteveny, L.; Van Herwijnen, Eric (1987-10-01). "CERNDOC : A Document Filing and Retrieval System" (PDF). CERN Document Server: a document filing and retrieval system. SHARE Conference. Chicago. Retrieved 2017-09-03.
  5. 1 2 3 Hopgood, Bob (2001). "History of the Web". W3.org. Retrieved 2017-08-24.
  6. 1 2 Gillies, James; Cailliau, Robert (2000). How the Web was Born: The Story of the World Wide Web . Oxford: Oxford University Press. p.  178. ISBN   978-0-19-286207-5.
  7. van Herwijnen, Eric (May 1985). "CERNPAPER Users guide". CERN Internal US Note DD/US/50. Geneva: CERN.
  8. 1 2 Goossens, Michel (2013-06-14). "Michel Goossens - Interview" (Interview). Interviewed by Dave Walden. Retrieved 2017-09-03.
  9. Berners-Lee, Tim (1992). "HTML Tags". W3.org. Retrieved 2017-08-24.
  10. Document Composition Facility: Generalized Markup Language Starter Set Reference, SG20-9187-3, IBM, 1985
  11. Document Composition Facility: Generalized Markup Language Starter Set User's Guide, SH20-9186, IBM, 1985
  12. DeRose, S. J. (1998). The SGML FAQ Book: Understanding the Foundation of HTML and XML. Dordrecht: Kluwer. p. 37. ISBN   978-0-585-34049-4.
  13. Goossens, Michel (1990). The SGML/BookMaster System at CERN: User's Guide. Geneva: CERN. Retrieved 2017-09-01.
  14. Williams, David (April 1994). "Computing – Moving Away from the Mainframe" (PDF). CERN Courier. Vol. 34, no. 3. pp. 16–17.