Document Structuring Conventions

Last updated

Document Structuring Conventions, or DSC, is a set of standards for PostScript, based on the use of comments, that specifies a way to structure a PostScript file and a way to expose that structure in a machine-readable way. A PostScript file that conforms to DSC is called a conforming document.

Contents

The need for a structuring convention arises since PostScript is a Turing-complete programming language. There is thus no guaranteed method — short of actually printing the document — to do things like determining how many pages long a given document is or how large a given page is, or how to skip to a particular page. The addition of structure, with DSC comments exposing that structure, helps provide a way for, e.g., an intelligent print spooler to have the ability to rearrange the pages for printing, or for a page layout program to find the bounding box of a PostScript file used as a graphic image. Collectively, any such program that takes PostScript files as input data is called a document manager.

In order for a PostScript print file to properly distill to PDF using Adobe tools, it should conform to basic DSC standards.

Some DSC comments serve a second function, specifying a way to tell the document manager to do certain things, like inserting a font or other PostScript code (collectively called resources) into the file. DSC comments that serve this second function are more akin to preprocessing directives and are not purely comments. Documents using those kinds of DSC comments require a functioning document manager to come out as intended; sending them directly to a printer will not work.

DSC is the basis for Encapsulated PostScript (EPS): EPS files are documents that conform to the DSC standards with further restrictions.

The set of DSC comments can be expanded by a mechanism called the Open Structuring Conventions, which, together with the EPS specification, form the basis of early versions of the Adobe Illustrator Artwork file format.

DSC at a glance

The basic premise of DSC is the separation of prolog (static definitions) and script (code that affects job-specific printed output), plus the disallowing of certain PostScript operators deemed inappropriate for page descriptions. This ensures a basic level of predictability in the PostScript code, thus forming the basis of document manageability.

An optional, additional layer of document manageability is provided by separating the script into a document setup section, zero or more functionally independent pages, and an optional trailer (cleanup code). (“Zero pages” in DSC usually means “one page without the use of the PostScript ‘showpage’ operator.) The functional independence between pages, plus the disallowing of more PostScript operators in the pages section, form the basis for page independence, which allows pages to be reordered, and independently and randomly accessed.

This imposed structure is then exposed by delimiting the PostScript file with DSC comments, which normally begin with two percent signs followed by a keyword. Some keywords need to be followed by a colon, an optional space character, and then a series of arguments.

Finally, the document is marked as conforming by starting it with a comment starting with “%!PS-Adobe-” followed by the DSC version number.

Sections of reusable PostScript code can be modularized into procsets (procedure sets, corresponding to function libraries in other programming languages), in order to ease the generation of PostScript code. Procsets and other PostScript resources (for example, fonts) can be omitted from the PostScript file itself, and externally referenced by a directive-like DSC comment; such external referencing, however, can only work with a document manager that understands such DSC comments.

DSC version 3.0 was released on September 25, 1992. The specification states, "Even though the DSC comments are a layer of communication beyond the PostScript language and do not affect the final output, their use is considered to be good PostScript language programming style." Thus, most PostScript-producing programs output DSC-conformant comments along with the code, although some such programs do not actually produce conforming documents.

Example

A DSC-conforming document (this one generated by dvips) might begin:

%!PS-Adobe-2.0%%Creator: dvips(k) 5.95a Copyright 2005 Radical Eye Software%%Title: texput.dvi%%Pages: 1%%PageOrder: Ascend%%BoundingBox: 0 0 612 792%%DocumentPaperSizes: Letter%%EndComments

which has the following meaning:

  1. marks the document as conforming to version 2.0 of the DSC
  2. identifies the PostScript-producing program as dvips 5.95a
  3. identifies the document title
  4. tells the document manager that the document consists of one page
  5. tells the document manager that pages are independent (i.e., not in Special ordering) and appear in ascending order in the document; in this example, since the document only consists of one page, this information is not usually relevant, but will be needed if additional pages are to be inserted by a document manager
  6. tells the document manager the coordinates, measured in PostScript points, of the bounding box for all the pages taken together; 0 0 612 792 is the coordinates of a US Letter–sized page
  7. tells the document manager what kind of paper sizes are used in the whole document; in this example only one size is used, namely the US Letter size
  8. marks the end of the prolog

See also

Related Research Articles

<span class="mw-page-title-main">PDF</span> Portable Document Format, a digital file format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020.

<span class="mw-page-title-main">PostScript</span> File format and programming language

PostScript is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it can be used for many other purposes as well. PostScript was created at Adobe Systems by John Warnock, Charles Geschke, Doug Brotz, Ed Taft and Bill Paxton from 1982 to 1984. The most recent version, PostScript 3, was released in 1997.

SVG is an XML-based vector image format for defining two-dimensional graphics, having support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium since 1999.

Encapsulated PostScript (EPS) is a Document Structuring Convention (DSC) conforming PostScript document format usable as a graphics file format. The format was developed as early as 1987 by John Warnock and Chuck Geschke, the founders of Adobe, together with Aldus. The basis of early versions of the Adobe Illustrator Artwork file format is formed by EPS together with the DSC Open Structuring Conventions.

<span class="mw-page-title-main">Adobe Illustrator</span> Vector graphics editor from Adobe Inc.

Adobe Illustrator is a vector graphics editor and design program developed and marketed by Adobe Inc. Originally designed for the Apple Macintosh, development of Adobe Illustrator began in 1985. Along with Creative Cloud, Illustrator CC was released. The latest version, Illustrator 2024, was released on October 10, 2023, and is the 28th generation in the product line. Adobe Illustrator was reviewed as the best vector graphics editing program in 2021 by PC Magazine.

<span class="mw-page-title-main">Device independent file format</span> Typesetting file format

The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages and for printing.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

YAML(see § History and name) is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax which intentionally differs from Standard Generalized Markup Language (SGML). It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...] for lists and {...} for maps but forbids tab characters to use as indentation thus only some JSON files are valid YAML 1.2.

<span class="mw-page-title-main">Adobe ColdFusion</span> Rapid Web app development platform

Adobe ColdFusion is a commercial rapid web-application development computing platform created by J. J. Allaire in 1995. ColdFusion was originally designed to make it easier to connect simple HTML pages to a database. By version 2 (1996) it had become a full platform that included an IDE in addition to a full scripting language.

<span class="mw-page-title-main">ActionScript</span> Object-oriented programming language created for the Flash multimedia platform

ActionScript is an object-oriented programming language originally developed by Macromedia Inc.. It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript, though it originally arose as a sibling, both being influenced by HyperTalk. ActionScript code is usually converted to byte-code format by a compiler.

XFA stands for XML Forms Architecture, a family of proprietary XML specifications that was suggested and developed by JetForm to enhance the processing of web forms. It can be also used in PDF files starting with the PDF 1.5 specification. The XFA specification is referenced as an external specification necessary for full application of the ISO 32000-1 specification. The XML Forms Architecture was not standardized as an ISO standard, and has been deprecated in PDF 2.0.

An image file format is a file format for a digital image. There are many formats that can be used, such as JPEG, PNG, and GIF. Most formats up until 2022 were for storing 2D images, not 3D ones. The data stored in an image file format may be compressed or uncompressed. If the data is compressed, it may be done so using lossy compression or lossless compression. For graphic design applications, vector formats are often used. Some image file formats support transparency.

Open XML Paper Specification is an open specification for a page description language and a fixed-document format. Microsoft developed it as the XML Paper Specification (XPS). In June 2009, Ecma International adopted it as international standard ECMA-388.

Design rule for Camera File system (DCF) is a JEITA specification which defines a file system for digital cameras, including the directory structure, file naming method, character set, file format, and metadata format. It is currently the de facto industry standard for digital still cameras. The file format of DCF conforms to the Exif specification, but the DCF specification also allows use of any other file formats. As of 2021, the latest version of the standard was 2.0, issued in 2010.

PDF/UA, formally ISO 14289, is an International Organization for Standardization (ISO) standard for accessible PDF technology. A technical specification intended for developers implementing PDF writing and processing software, PDF/UA provides definitive terms and requirements for accessibility in PDF documents and applications. For those equipped with appropriate software, conformance with PDF/UA ensures accessibility for people with disabilities who use assistive technology such as screen readers, screen magnifiers, joysticks and other technologies to navigate and read electronic content.

PostScript fonts are font files encoded in outline font specifications developed by Adobe Systems for professional digital typesetting. This system uses PostScript file format to encode font information.

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.

The Portable Document Format (PDF) was created by Adobe Systems, introduced at the Windows and OS/2 Conference in January 1993 and remained a proprietary format until it was released as an open standard in 2008. Since then, it has been under the control of an International Organization for Standardization (ISO) committee of volunteer industry experts.