XSL Formatting Objects

Last updated
XSL Formatting Objects
Filename extension
.xml, .fo
Internet media type application/xml,
text/xml (deprecated)
Uniform Type Identifier (UTI) public.xml
Developed by World Wide Web Consortium (W3C)
Latest release
1.1
December 5, 2006;15 years ago (2006-12-05)
Type of format Markup language
Contained by XML
Open format?Yes
Website www.w3.org/TR/xsl11/

XSL-FO (XSL Formatting Objects) is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL (Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and formatting of XML data. The other parts of XSL are XSLT and XPath. Version 1.1 of XSL-FO was published in 2006.

Contents

XSL-FO is considered feature complete by W3C: [1] the last update for the Working Draft was in January 2012, and its Working Group closed in November 2013. [2]

Basics

Unlike the combination of HTML and CSS, XSL-FO is a unified presentational language. It has no semantic markup as this term is used in HTML. And, unlike CSS which modifies the default presentation of an external XML or HTML document, it stores all of the document's data within itself.

The general idea behind XSL-FO's use is that the user writes a document, not in FO, but in an XML language. XHTML, DocBook, and TEI are all possible examples. Then, the user obtains an XSLT transform, either by writing one themselves or by finding one for the document type in question. This XSLT transform converts the XML into XSL-FO.

Once the XSL-FO document is generated, it is then passed to an application called an FO processor. FO processors convert the XSL-FO document into something that is readable, printable or both. The most common output of XSL-FO is a PDF file or as PostScript, but some FO processors can output to other formats like RTF files or even just a window in the user's GUI displaying the sequence of pages and their contents.

The XSLT language itself was originally conceived only for this purpose; it is now in widespread use for more general XML transformations. This transformation step is taken so much for granted in XSL-FO that it is not uncommon for people to call the XSLT that turns XML into XSL-FO the actual XSL-FO document itself. Even tutorials on XSL-FO tend to be written with XSLT commands around the FO processing instructions.

The XSLT transformation step is exceptionally powerful. It allows for the automatic generation of a table of contents, linked references, an index, and various other possibilities.

An XSL-FO document is not like a PDF or a PostScript document. It does not definitively describe the layout of the text on various pages. Instead, it describes what the pages look like and where the various contents go. From there, an FO processor determines how to position the text within the boundaries described by the FO document. The XSL-FO specification even allows different FO processors to have varying responses with regard to the resultant generated pages.

For example, some FO processors can hyphenate words to minimize space when breaking a line, while others choose not to. Different processors may even use different hyphenation algorithms, ranging from very simple to more complex hyphenation algorithms that take into account whether the previous or next line also is hyphenated. These will change, in some borderline cases quite substantially, the layout of the various pages. There are other cases where the XSL-FO specification explicitly allows FO processors some degree of choice with regard to layout.

This differentiation between FO processors, creating inconsistent results between processors is often not a concern. This is because the general purpose behind XSL-FO is to generate paged, printed media. XSL-FO documents themselves are usually used as intermediaries, mostly to generate either PDF files or a printed document as the final form to be distributed. This is as opposed to how HTML is generated and distributed as a final form directly to the user. Distributing the final PDF rather than the formatting language input (whether HTML/CSS or XSL-FO) means on the one hand that recipients aren't affected by the unpredictability resulting from differences among formatting language interpreters, while on the other hand means that the document cannot easily adapt to different recipient needs, such as different page size or preferred font size, or tailoring for on-screen versus on-paper versus audio presentation.

Language concepts

The XSL-FO language was designed for paged media; as such, the concept of pages is an integral part of XSL-FO's structure.

FO works best for what could be called "content-driven" design. This is the standard method of layout for books, articles, legal documents, and so forth. It involves a single flowing span of fairly contiguous text, with various repeating information built into the margins of a page. This is as opposed to "layout-driven" design, which is used in newspapers or magazines. If content in those documents does not fit in the required space, some of it is trimmed away until it does fit. XSL-FO does not easily handle the tight restrictions of magazine layout; indeed, in many cases, it lacks the ability to express some forms of said layout.

Despite the basic nature of the language's design, it is capable of a great deal of expressiveness. Tables, lists, side floats, and a variety of other features are available. These features are comparable to CSS's layout features, though some of those features are expected to be built by the XSLT.

Document structure

XSL-FO documents are XML documents, but they do not have to conform to any DTD or schema. Instead, they conform to a syntax defined in the XSL-FO specification.

XSL-FO documents contain two required sections. The first section details a list of named page layouts. The second section is a list of document data, with markup, that uses the various page layouts to determine how the content fills the various pages.

Page layouts define the properties of the page. They can define the directions for the flow of text, so as to match the conventions for the language in question. They define the size of a page as well as the margins of that page. More importantly, they can define sequences of pages that allow for effects where the odd and even pages look different. For example, one can define a page layout sequence that gives extra space to the inner margins for printing purposes; this allows more space to be given to the margin where the book will be bound.

The document data portion is broken up into a sequence of flows, where each flow is attached to a page layout. The flows contain a list of blocks which, in turn, each contain a list of text data, inline markup elements, or a combination of the two. Content may also be added to the margins of the document, for page numbers, chapter headings and the like.

Blocks and inline elements function in much the same way as for CSS, though some of the rules for padding and margins differ between FO and CSS. The direction, relative to the page orientation, for the progression of blocks and inlines can be fully specified, thus allowing FO documents to function under languages that are read different from English. The language of the FO specification, unlike that of CSS 2.1, uses direction-neutral terms like start and end rather than left and right when describing these directions.

XSL-FO's basic content markup is derived from CSS and its cascading rules. As such, many attributes in XSL-FO propagate into the child elements unless explicitly overridden.

Capabilities of XSL-FO v1.0

XSL-FO is capable of a great deal of textual layout functionality. In addition to the information as specified above, XSL-FO's language allows for the specification of the following.

Multiple columns

A page can be defined to have multiple columns. When this is the case, blocks flow from one column into the next by default. Individual blocks can be set to span all columns, creating a textual break in the page. The columns above this break will flow into each other, as will the columns below the break. But no text is allowed to flow from the above section to the below section.

Because of the nature of XSL-FO's page specification, multiple pages may actually have different numbers and widths of columns. As such, text can flow from a 3 column page to a 5 column page to a 1 column page quite easily.

All FO features work within the restrictions of a multi-column page.

We can span multiple columns by specifying two attributes i.e.,. span, padding-after .

Lists

An XSL-FO list is, essentially, two sets of blocks stacked side by side. An entry consists of a block on the "left", or start inline direction, and a block sequence on the "right", or end inline direction. The block on the left is conceptually what would be the number or bullet in a list. However, it could just as easily be a string of text, as one might see in a glossary entry. The block on the right works as expected. Both of these blocks can be block containers, or have multiple blocks in a single list entry.

Numbering of XSL-FO lists, when they are numbered, is expected to be done by the XSLT, or whatever other process, that generated the XSL-FO document. As such, number lists are to be explicitly numbered in XSL-FO.

Pagination controls

The user can specify Widow and Orphan for blocks or for the flow itself, and allow the attributes to cascade into child blocks. Additionally, blocks can be specified to be kept together on a single page. For example, an image block and the description of that image can be set to never be separated. The FO processor will do its best to adhere to these commands, even if it requires creating a great deal of empty space on a page.

Footnotes

The user can create footnotes that appear at the bottom of a page. The footnote is written, in the FO document, in the regular flow of text at the point where it is referenced. The reference is represented as an inline definition, though it is not required. The body is one or more blocks that are placed by the FO processor to the bottom of the page. The FO processor guarantees that wherever the reference is, the footnote cited by that reference will begin on the same page. This will be so even if it means creating extra empty space on a page.

Tables

An FO table functions much like an HTML/CSS table. The user specifies rows of data for each individual cell. The user can, also, specify some styling information for each column, such as background color. Additionally, the user can specify the first row as a table header row, with its own separate styling information.

The FO processor can be told exactly how much space to give each column, or it can be told to auto-fit the text in the table.

Text orientation controls

FO has extensive controls for orienting blocks of text. One can, in the middle of a page, designate a block of text to be oriented in a different orientation. These oriented blocks can be used for languages in a different orientation from the rest of the document, or simply if one needs to orient the text for layout purposes. These blocks can contain virtually any kind of content, from tables to lists or even other blocks of reoriented text.

Miscellaneous

Capabilities of XSL-FO v1.1

Version 1.1 of XSL-FO adds a number of new features to version 1.0.

Multiple flows and flow mapping

XSL-FO 1.0 was fairly restrictive about what text was allowed to go in what areas of a page. Version 1.1 loosens these restrictions significantly, allowing flowing text to be mapped into multiple explicit regions on a page. This allows for more newspaper-like typesetting.

Bookmarks

Many output formats for XSL-FO processors, specifically PDF, have bookmarking features. These allow the format to specify a string of text in a separate window that can be selected by the user. When selected, the document window scrolls immediately to a specific region of the document.

XSL-FO v1.1 now provides the ability to create named bookmarks in XSL-FO, thus allowing the processor to pass this on to an output format that supports it.

Indexing

XSL-FO 1.1 has features that support the generation of an index that might be found at the back of a book. This is done through referencing of properly marked-up elements in the FO document.

Last page citation

The last page can be generated without providing an explicit in-document reference to a specific anchor in the FO document. The definition of "last page" can be restricted to within a specific set of pages or to cover the entire document. This allows the user to specify something like, "Page 2 out of 15", where page 15 is the page number of a last page definition.

Table markers

Table markers allow the user to create dynamic content within table headers and footers, such as running totals at the bottom of each page of a table or "table continued" indicators.

Inside/outside floats

XSL-FO 1.1 adds the keywords "inside" and "outside" for side floats, which makes it possible to achieve page layouts with marginalia positioned on the outside or inside edges of pages. Inside refers to the side of the page towards the book binding, and outside refers to the side of a page away from the book binding.

Refined graphic sizing

XSL-FO 1.1 refines the functionality for sizing of graphics to fit, with the ability to shrink to fit (but not grow to fit), as well as the ability to define specific scaling steps. In addition, the resulting scaling factor can be referenced for display (for example, to say in a figure caption, "image shown is 50% actual size").

Advantages

Drawbacks

When trying to decide whether or not XSL-FO will work for a given document, the following typographic and layout requirements usually indicate that XSL-FO will not work (although some of these may be satisfied by proprietary extensions):

Replacement

XML and HTML standards, with the CSS standard, since CSS2 (paged media module) starts to supply basic features to printed media. With the CSS Paged Media Module Level 3, W3C is completing the formulation of an integrated standard for document formatting and to generate PDFs. So, since 2013, [2] CSS3-paged is a W3C proposal for an XSL-FO replacement.

Design notes for a Version 2.0 of XSL Formatting Objects were first published in 2009 and last updated in 2012. [3]

See also

Related Research Articles

XML Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

In computing, the term Extensible Stylesheet Language (XSL) is used to refer to a family of languages used to transform and render XML documents.

XSLT is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. XSLT 1.0 is widely supported in modern web browsers.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

Typesetting Composition of text by means of arranging physical types or digital equivalents

Typesetting is the composition of text by means of arranging physical type or its digital equivalents. Stored letters and other symbols are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

An HTML element is a type of HTML document component, one of several types of HTML nodes. HTML document is composed of a tree of simple HTML nodes, such as text nodes, and HTML elements, which add semantics and formatting to parts of document. Each element can have HTML attributes specified. Elements can also have content, including other elements and text.

An XML editor is a markup language editor with added functionality to facilitate the editing of XML. This can be done using a plain text editor, with all the code visible, but XML editors have added facilities like tag completion and menus and buttons for tasks that are common in XML editing, based on data supplied with document type definition (DTD) or the XML tree.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

XFA stands for XML Forms Architecture, a family of proprietary XML specifications that was suggested and developed by JetForm to enhance the processing of web forms. It can be also used in PDF files starting with the PDF 1.5 specification. The XFA specification is referenced as an external specification necessary for full application of the ISO 32000-1 specification. The XML Forms Architecture was not standardized as an ISO standard, and has been deprecated in PDF 2.0.

A web style sheet is a form of separation of presentation and content for web design in which the markup of a webpage contains the page's semantic content and structure, but does not define its visual layout (style). Instead, the style is defined in an external style sheet file using a style sheet language such as CSS or XSLT. This design approach is identified as a "separation" because it largely supersedes the antecedent methodology in which a page's markup defined both style and structure.

The following tables compare XML compatibility and support for a number of browser engines.

In computing, the two primary stylesheet languages are Cascading Style Sheets (CSS) and the Extensible Stylesheet Language (XSL). While they are both called stylesheet languages, they have very different purposes and ways of going about their tasks.

Oxygen XML Editor

The Oxygen XML Editor is a multi-platform XML editor, XSLT/XQuery debugger and profiler with Unicode support. It is a Java application, so it can run in Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.

The DocBook XSL stylesheets are a set of XSLT stylesheets for the XML-based DocBook language.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

A Processing Instruction (PI) is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application.

Diazo, previously named xdv, is a general-purpose, open source website theming tool. It is written in Python and generates XSLT. Diazo creates a separation between theme pages and transformation rules, allowing web designers to work on templates in plain HTML, without knowledge of XSLT or special template-related codes.

Antenna House Formatter is a proprietary software program that uses either XSL-FO or Cascading Style Sheets (CSS) to convert XML and HTML documents into PDF, SVG, PostScript, XPS, text, and Microsoft Word formats. It supports 30 scripts and over 80 languages.

References

  1. "XSL-FO Current Status - W3C". www.w3.org. Retrieved 2016-08-05.
  2. 1 2 Quin, Liam (November 2, 2013). "Re: [xsl] xsl 2.0?". XSL-List – Open Forum on XSL (Mailing list).{{cite mailing list}}: External link in |mailing-list= (help)
  3. "XSL-FO 2.0 Design Notes Published - W3C". www.w3.org. Retrieved 2018-11-09.