Document type declaration

Last updated

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document (for example, a web page) with a document type definition (DTD) (for example, the formal definition of a particular version of HTML 2.0 - 4.0). [1] In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

Contents

The HTML layout engines in modern web browsers perform DOCTYPE "sniffing" or "switching", wherein the DOCTYPE in a document served as text/html determines a layout mode, such as "quirks mode" or "standards mode". The text/html serialization of HTML5, which is not SGML-based, uses the DOCTYPE only for mode selection. Since web browsers are implemented with special-purpose HTML parsers, rather than general-purpose DTD-based parsers, they do not use DTDs and never access them even if a URL is provided. The DOCTYPE is retained in HTML5 as a "mostly useless, but required" header only to trigger "standards mode" in common browsers. [2]

Syntax

The general syntax for a document type declaration is:

<!DOCTYPEroot-elementPUBLIC"/quotedFPI/""/quotedURI/"[<!-- internal subset declarations -->]>

or

<!DOCTYPEroot-elementSYSTEM"/quotedURI/"[<!-- internal subset declarations -->]>

Document type name

The opening <!DOCTYPE syntax is followed by separating syntax [3] :403–404 (such as spaces, [3] :297–298,372 or (except in XML) comments opened and closed by a doubled ASCII hyphen), [3] :372,391 followed by a document type name [3] :403–404 (i.e. the name of the root element that the DTD applies to trees descending from). In XML, the root element that represents the document is the first element in the document. For example, in XHTML, the root element is <html>, being the first element opened (after the doctype declaration) and last closed.

Since the syntax for the external identifier and internal subset are both optional, [3] :403–404 the document type name is the only information which it is mandatory to give in a DOCTYPE declaration.

External identifier

The DOCTYPE declaration can optionally contain an external identifier , following the root element name (and separating syntax such as spaces), but before any internal subset. [3] :403–404 This begins with either the keyword SYSTEM or the keyword PUBLIC, [3] :379 specifying whether the DTD is specified using a public identifier identifying it as a public text, i.e. one shared between multiple computer systems (regardless of whether it is an available public text available to the general public, or an unavailable public text shared only within an organisation). [3] :180–182 If the PUBLIC keyword is used, it is followed by the public identifier enclosed in double or single ASCII quotation marks. The public identifier does not point to a storage location, but is rather a unique fixed string intended to be looked up in a table (such as an SGML catalog); [3] :180 however, in some (but not all) SGML profiles, the public identifier must be constructed using a particular syntax called Formal Public Identifier (FPI), which specifies the owner as well as whether it is available to the general public. [3] :182–183

The public identifier (if present) or SYSTEM keyword (otherwise) may (and, in XML, must) [4] be followed by a "system identifier" that is likewise enclosed in quotation marks. Although the interpretation of system identifiers in general SGML is entirely system-dependent (and might be a filename, database key, offset, or something else), [3] :378 XML requires that they be URIs. [5] For example, the FPI for XHTML 1.1 is "-//W3C//DTD XHTML 1.1//EN" and, there are 3 possible system identifiers available for XHTML 1.1 depending on the needs. One of them is the URL reference "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd". It means that the XML parser must locate the DTD in a system specific fashion, in this case, by means of a URL reference of the DTD enclosed in double quote marks.

In XHTML documents, the doctype declaration must always explicitly specify a system identifier. In SGML-based documents like HTML, on the other hand, the appropriate system identifier may automatically be inferred from the given public identifier. This association might e.g. be performed by means of a catalog file resolving the FPI to a system identifier. [6] The SYSTEM keyword can (except in XML) also be used without a system identifier following, indicating that a DTD exists but should be inferred from the document type name. [3] :378

Internal subset

The last, optional, part of a DOCTYPE declaration is surrounded by literal square brackets ([]), and called an internal subset. It can be used to add/edit entities or add/edit PUBLIC keyword behaviors. [7] It is possible, but uncommon, to include the entire DTD in-line in the document, within the internal subset, rather than referencing it from an external file. [3] :402 Conversely, the internal subset is sometimes forbidden within simple SGML profiles, notably those for basic HTML parsers that don't implement a full SGML parser.

If both an internal DTD subset and an external identifier are included in a DOCTYPE declaration, the internal subset is processed first, and the external DTD subset is treated as if it were transcluded at the end of the internal subset. Since earlier definitions take precedence over later definitions in a DTD, this allows the internal subset to override definitions in the external subset. [3] :402–403

Example

The first line of a World Wide Web page may read as follows:

<!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><htmllang="ar"dir="ltr"xmlns="http://www.w3.org/1999/xhtml">

This document type declaration for XHTML includes by reference a DTD, whose public identifier is -//W3C//DTD XHTML 1.0 Transitional//EN and whose system identifier is http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd. An entity resolver may use either identifier for locating the referenced external entity. No internal subset has been indicated in this example or the next ones. The root element is declared to be html and, therefore, it is the first tag to be opened after the end of the doctype declaration in this example and the next ones, too. The HTML tag is not part of the doctype declaration but has been included in the examples for orientation purposes.

Common DTDs

Some common DTDs have been put into lists. W3C has produced a list of DTDs commonly used in the web, which contains the "bare" HTML5 DTD, older XHTML/HTML DTDs, DTDs of common embedded XML-based formats like MathML and SVG as well as "compound" documents that combine those formats. [8] Both W3C HTML5 and its corresponding WHATWG version recommend browsers to only accept XHTML DTDs of certain FPIs and to prefer using internal logic over fetching external DTD files. It further specifies an "internal DTD" for XHTML which is merely a list of HTML entity names. [9] :§13.2

HTML 4.01 DTDs

Strict DTD does not allow presentational markup with the argument that Cascading Style Sheets should be used for that instead. This is how the Strict DTD looks:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"   "http://www.w3.org/TR/html4/strict.dtd"><html>

Transitional DTD allows some older PUBLIC and attributes that have been deprecated:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"   "http://www.w3.org/TR/html4/loose.dtd"><html>

If frames are used, the Frameset DTD must be used instead, like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"   "http://www.w3.org/TR/html4/frameset.dtd"><html>

XHTML 1.0 DTDs

XHTML's DTDs are also Strict, Transitional and Frameset.

XHTML Strict DTD. No deprecated tags are supported and the code must be written correctly according to XML Specification.

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><htmlxmlns="http://www.w3.org/1999/xhtml"xml:lang="en"lang="en">

XHTML Transitional DTD is like the XHTML Strict DTD, but deprecated tags are allowed.

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><htmlxmlns="http://www.w3.org/1999/xhtml"xml:lang="en"lang="en">

XHTML Frameset DTD is the only XHTML DTD that supports Frameset. The DTD is below.

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"><htmlxmlns="http://www.w3.org/1999/xhtml"xml:lang="en"lang="en">

XHTML 1.1 DTD

XHTML 1.1 is the most current finalized revision of XHTML, introducing support for XHTML Modularization. XHTML 1.1 has the stringency of XHTML 1.0 Strict.

<!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML 1.1//EN"  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

XHTML Basic DTDs

XHTML Basic 1.0

<!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML Basic 1.0//EN"  "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">

XHTML Basic 1.1

<!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML Basic 1.1//EN"  "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">

HTML5 DTD-less DOCTYPE

HTML5 uses a DOCTYPE declaration which is very short, due to its lack of references to a DTD in the form of a URL or FPI. All it contains is the tag name of the root element of the document, HTML. [10] In the words of the specification draft itself:

<!DOCTYPE html>, case-insensitively.

With the exception of the lack of a URI or the FPI string (the FPI string is treated case sensitively by validators), this format (a case-insensitive match of the string !DOCTYPE HTML) is the same as found in the syntax of the SGML based HTML 4.01 DOCTYPE. Both in HTML4 and in HTML5, the formal syntax is defined in upper case letters, even if both lower case and mixes of lower case upper case are also treated as valid.

In XHTML5 the DOCTYPE must be a case-sensitive match of the string "<!DOCTYPE html>". This is because in XHTML syntax all HTML element names are required to be in lower case, including the root element referenced inside the HTML5 DOCTYPE.

The DOCTYPE is optional in XHTML5 and may simply be omitted. [11] However, if the markup is to be processed as both XML and HTML, a DOCTYPE should be used. [12]

See also

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

XHTML Basic is an XML-based structured markup language primarily designed for simple user agents, often found in mobile devices such as mobile phones, PDAs, pagers, and set-top boxes..

<span class="mw-page-title-main">W3C Markup Validation Service</span> Validator service by the World Wide Web Consortium

The Markup Validation Service is a validator by the World Wide Web Consortium (W3C) that allows Internet users to check pre-HTML5 HTML and XHTML documents for well-formed markup against a document type definition. Markup validation is an important step towards ensuring the technical quality of web pages. However, it is not a complete measure of web standards conformance. Though W3C validation is important for browser compatibility and site usability, it has not been confirmed what effect it has on search engine optimization.

Apache Wicket, commonly referred to as Wicket, is a component-based web application framework for the Java programming language conceptually similar to JavaServer Faces and Tapestry. It was originally written by Jonathan Locke in April 2004. Version 1.0 was released in June 2005. It graduated into an Apache top-level project in June 2007.

In computing, quirks mode is an approach used by web browsers to maintain backward compatibility with web pages designed for old web browsers, instead of strictly complying with web standards in standards mode. This behavior has since been codified, so what was previously standards mode is now referred to as simply no quirks mode.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

<span class="mw-page-title-main">XHTML Mobile Profile</span> Hypertextual computer language standard

XHTML Mobile Profile is an obsolete hypertextual computer language designed specifically for mobile phones and other resource-constrained devices.

<span class="mw-page-title-main">HTML5</span> Fifth and previous version of hypertext markup language

HTML5 is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

A Formal Public Identifier (FPI) is a short piece of text with a particular structure that may be used to uniquely identify a product, specification or document. FPIs were introduced as part of Standard Generalized Markup Language (SGML), and serve particular purposes in formats historically derived from SGML. Some of their most common uses are as part of document type declarations (DOCTYPEs) and document type definitions (DTDs) in SGML, XML and historically HTML, but they are also used in the vCard and iCalendar file formats to identify the software product which generated the file.

XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

In computing, Facelets is an open-source Web template system under the Apache license and the default view handler technology for Jakarta Faces. The language requires valid input XML documents to work. Facelets supports all of the JSF UI components and focuses completely on building the JSF component tree, reflecting the view for a JSF application.

Apache Click is a page and component oriented web application framework for the Java language and is built on top of the Java Servlet API.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

References

  1. HTML2 HTML3 HTML4
  2. "The HTML syntax ― HTML5" . Retrieved 2011-06-05.
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Goldfarb, Charles F. (1990). The SGML Handbook. Oxford: Clarendon Press. ISBN   0-19-853737-9.
  4. Walsh, Norman (2001-08-06). "XML Catalogs". The Organization for the Advancement of Structured Information Standards (OASIS).
  5. Clark, James (1997-12-15). "Comparison of SGML and XML". W3C. NOTE-sgml-xml-971215.
  6. "The DOCTYPE Declaration". Archived from the original on 2011-08-14. Retrieved 2011-09-09.
  7. "DOCTYPE Declaration". msdn.microsoft.com.
  8. "W3C QA - Recommended list of Doctype declarations you can use in your Web document". www.w3.org. Retrieved 22 March 2019.
  9. "HTML Standard". html.spec.whatwg.org. Retrieved 22 March 2019.
  10. "The HTML syntax ― HTML5". Web Hypertext Application Technology Working Group . Retrieved 2011-06-05. 3. A string that is an ASCII case-insensitive match for the string "DOCTYPE". 5. A string that is an ASCII case-insensitive match for the string "HTML".
  11. "The XHTML syntax ― HTML5". Web Hypertext Application Technology Working Group. Archived from the original on 2012-06-18. Retrieved 2009-09-01.
  12. "Polyglot Markup: HTML-Compatible XHTML Documents". World Wide Web Consortium . Retrieved 2012-01-17.