Document Content Architecture

Last updated
DCA
Developed by IBM
Type of format Document file format
Extended to MO:DCA

Document Content Architecture, or DCA for short, is a standard developed by IBM for text documents in the early 1980s. DCA was used on mainframe and IBM i systems and formed the basis of DisplayWrite's file format. DCA was later extended as MO:DCA (Mixed Object Document Content Architecture), which added embedded data files.

Contents

The original purpose of DCA was to provide a common document format that could be used across multiple IBM word processing platforms, such as the IBM PC, IBM mainframes, the Displaywriter System, and the IBM 5520 Administrative System. [1]

DCA defines two types of documents: [2] [3]

Description

DCA defines a data stream representing a document.

Documents may contain fonts, overlays and other resource objects required at presentation time to present the data properly. Finally, documents may contain resource objects, such as a document index and tagging elements supporting the search and navigation of document data, for a variety of application purposes. [4] :2

MO:DCA is the wrapper or container for various objects that can make up the document. Each object is defined by its own subordinate architecture. The architectures are: [4]

Each architecture uses a series of binary structured fields to describe its corresponding object.

Revisable-Form Text

Revisable-Form Text
Developed by IBM
Type of format Document file format

Revisable-Form Text (abbreviated RFT or RFT-DCA) is part of DCA. It is sometimes referred to as Revisable Format Text. It was used by IBM DisplayWriter 4 and 5 word processors on System/360 and 370 mainframe computers, and OfficeVision/400 to allow transfer of formatted documents to other systems. [1]

RFT has a counterpart Final-Form Text (abbreviated FFT or FFT-DCA), which was not intended to be editable and was output-only.

History

The drive to initiate international standards for the DCAs was initiated in 1980 at the IBM Rochester facility. A team, consisting of two MODCA architects, an RTOCA architect, and a PTOCA architect, was assembled. These architects were responsible for forming IBM consensus for the design of the data streams and to take the work into the international standards arena. There was a concerted effort to bring the international community into the development. This decision was based in part on the experience gained over the acceptance of GML into an international SGML standard. To avoid the long delay of creating the architecture, they wanted to get everyone involved early. SGML [5] standardization had taken many years to develop. IBM's work with document content had been pushed by the needs of main frame computers where GML and DCA were in use, but that experience was pointing to a need for standardized component architectures for revisable and non-revisable text in particular.

In 1981, shortly after its inception, the group was moved along with the IBM 5280 Distributed Data System to IBM Austin near Round Rock, TX, where the work continued with mixed success. As the architectures were becoming more firmly positioned on the international stage, the team was moved again in 1987 to The IBM Dallas Programming Center, where in 1998 it was disbanded and the work on the DCA architectures discontinued due mainly to the PC community which had gone in a different direction of necessity. The DCA architectures were fully completed, but not completely agreed upon after 18 years. There were no active implementations in sight. [1]

The PC world had decided on HTML (believed to be an application of the SGML international standard) and used portions of it for their purposes. Microsoft Word eventually used the similar datastream for the internal working datastream for storage of editable content. Even though the SGML standard was available, it was impractical to use the full SGML parser implementation, so a potential subset of it became the de facto standard for revisable text used today in the PC arena.

At about the same time, Adobe Systems designed and produced the printable document encoding PDF, which has become the standard for PC-produced printable documents. The international standard was set in 2008, with input from the users, who decided to use the products offered in great numbers. The decision was driven by the need for the product, and the solution found was far more acceptable than the standards committees could design. Over 10 years of work had not produced an acceptable method, and the PC computing community created what they needed in less time. [3]

Attempting to achieve a consensus document data stream was quickly out-flanked by the available and usable content provided by the companies who did not attempt to share with others, but created a workable solution and successfully sold it to users. The output of the word processing software is 'printed' into the PDF format provided by the most used presentation product. For example, Microsoft Word provides a printer selection 'Microsoft Print to PDF' to produce the requisite output for a PDF document. A similar method could have been used to produce the international standard had one eventually arrived.

When IBM disbanded its Dallas Programming Center in 1998, the entire staff of architects retired and left the company, except the manager, who was moved, ending the DCA architecture project for the foreseeable future at IBM. [1]

See also

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

HyperText Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">IBM 3270</span> Family of block-oriented display terminals and printers made by IBM

The IBM 3270 is a family of block oriented display and printer computer terminals introduced by IBM in 1971 and normally used to communicate with IBM mainframes. The 3270 was the successor to the IBM 2260 display terminal. Due to the text color on the original models, these terminals are informally known as green screen terminals. Unlike a character-oriented terminal, the 3270 minimizes the number of I/O interrupts required by transferring large blocks of data known as data streams, and uses a high speed proprietary communications interface, using coaxial cable.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

A markuplanguage is a text-encoding system which specifies the structure and formatting of a document and potentially the relationship between its parts. Markup can control the display of a document or enrich its content to facilitate automated processing.

<span class="mw-page-title-main">PDF</span> Portable Document Format, a digital file format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020.

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">IBM DisplayWrite</span>

DisplayWrite is a word processor software application that IBM developed and marketed for the IBM PC and PCjr. It was among the company's first internally developed, commercially sold PC software titles.

<span class="mw-page-title-main">Typesetting</span> Composition of text by means of arranging physical types or digital equivalents

Typesetting is the composition of text for publication, display, or distribution by means of arranging physical type in mechanical systems or glyphs in digital systems representing characters. Stored types are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

Common User Access (CUA) is a standard for user interfaces to operating systems and computer programs. It was developed by IBM and first published in 1987 as part of their Systems Application Architecture. Used originally in the MVS/ESA, VM/CMS, OS/400, OS/2 and Microsoft Windows operating systems, parts of the CUA standard are now implemented in programs for other operating systems, including variants of Unix. It is also used by Java AWT and Swing.

Advanced Function Presentation (AFP) is a presentation architecture and family of associated printer software and hardware that provides for document and information presentation independent of specific applications and devices.

Generalized Markup Language (GML) is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT. SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A starter set of tags in GML is provided with the DCF product.

Intelligent Printer Data Stream (IPDS) is Info Print Solution Company's Systems Application Architecture host-to-printer data stream for Advanced Function Presentation subsystems. It provides an attachment-independent interface for controlling and managing all point addressable (APA) printers that allow the presentation of pages containing an architecturally unlimited mixture of different data types, including text, image, graphics, bar code, and object container. It is used by a variety of Info Print and OEM print servers that drive all points addressable (APA) page printers. Generally, these printers are at the medium to the high end of the print speed and volume spectrum.

IBM Distributed Office Support System, or DISOSS is a centralized document distribution and filing application for IBM's mainframe computers running the MVS and VSE operating systems. DISOSS runs under both the CICS transaction processing system and the IMS/DS transaction processing system, and later versions use the SNADS architecture of peer to peer communication for distributed services.

MO:DCA is an IBM compound document format for text and graphics elements in a document. The 'Mixed Object' refers to the fact that an MO:DCA file can contain multiple types of objects, including text, images, vector graphics, and barcodes.

<span class="mw-page-title-main">LEXX (text editor)</span>

LEXX is a text editor which was possibly the first to use live parsing and colour syntax highlighting. It was written by Mike Cowlishaw of IBM in 1985. The name was chosen because he wrote it as a tool for lexicographers, during an assignment for Oxford University Press's 'New Oxford English Dictionary'. The program ran on mainframes under VM/CMS. LEXX's design was based on several other editors written by the same author augmented by the ability to dynamically parse text and display colour on the new colour terminals that had recently become available. It was programmable using dynamically-loaded compiled commands or using interpreted commands.

the AFP Conversion and Indexing Facility or ACIF is an Advanced Function Presentation utility program. ACIF is distributed with Print Services Facility for z/OS, z/VM, and z/VSE.

<span class="mw-page-title-main">Distributed Data Management Architecture</span> Open, published architecture for creating, managing and accessing data on a remote computer

Distributed Data Management Architecture (DDM) is IBM's open, published software architecture for creating, managing and accessing data on a remote computer. DDM was initially designed to support record-oriented files; it was extended to support hierarchical directories, stream-oriented files, queues, and system command processing; it was further extended to be the base of IBM's Distributed Relational Database Architecture (DRDA); and finally, it was extended to support data description and conversion. Defined in the period from 1980 to 1993, DDM specifies necessary components, messages, and protocols, all based on the principles of object-orientation. DDM is not, in itself, a piece of software; the implementation of DDM takes the form of client and server products. As an open architecture, products can implement subsets of DDM architecture and products can extend DDM to meet additional requirements. Taken together, DDM products implement a distributed file system.

References

  1. 1 2 3 4 Henkel, Tom (21 May 1984), "IBM taking the standardization route to DPP", Computerworld , vol. 18, no. 21, IDG Enterprise, p. 7, ISSN   0010-4841
  2. "PC Magazine Encyclopedia" . Retrieved July 25, 2012.
  3. 1 2 de la Beaujardière, Jean Marie (1988). "Well-established document interchange formats". Document Manipulation and Typography: Proceedings of the International Conference on Electronic Publishing, Document Manipulation and Typography, Nice (France) April 20-22 1988. CUP Archive. p. 83. ISBN   978-0-521-36294-8.
  4. 1 2 IBM Corporation (May 2006). Mixed Object Document Content Architecture Reference (PDF). Retrieved Feb 7, 2020.
  5. "Home". sgmlsource.com.