XML Signature

Last updated

XML Signature (also called XMLDSig, XML-DSig, XML-Sig) defines an XML syntax for digital signatures and is defined in the W3C recommendation XML Signature Syntax and Processing. Functionally, it has much in common with PKCS #7 but is more extensible and geared towards signing XML documents. It is used by various Web technologies such as SOAP, SAML, and others.

Contents

XML signatures can be used to sign dataa resourceof any type, typically XML documents, but anything that is accessible via a URL can be signed. An XML signature used to sign a resource outside its containing XML document is called a detached signature; if it is used to sign some part of its containing document, it is called an enveloped signature; [1] if it contains the signed data within itself it is called an enveloping signature. [2]

Structure

An XML Signature consists of a Signature element in the http://www.w3.org/2000/09/xmldsig# namespace. The basic structure is as follows:

<Signature><SignedInfo><CanonicalizationMethod/><SignatureMethod/><Reference><Transforms/><DigestMethod/><DigestValue/></Reference><Reference/>etc. </SignedInfo><SignatureValue/><KeyInfo/><Object/></Signature>

Validation and security considerations

When validating an XML Signature, a procedure called Core Validation is followed.

  1. Reference Validation: Each Reference's digest is verified by retrieving the corresponding resource and applying any transforms and then the specified digest method to it. The result is compared to the recorded DigestValue; if they do not match, validation fails.
  2. Signature Validation: The SignedInfo element is serialized using the canonicalization method specified in CanonicalizationMethod, the key data is retrieved using KeyInfo or by other means, and the signature is verified using the method specified in SignatureMethod.

This procedure establishes whether the resources were really signed by the alleged party. However, because of the extensibility of the canonicalization and transform methods, the verifying party must also make sure that what was actually signed or digested is really what was present in the original data, in other words, that the algorithms used there can be trusted not to change the meaning of the signed data.

Because the signed document's structure can be tampered with leading to "signature wrapping" attacks, the validation process should also cover XML document structure. Signed element and signature element should be selected using absolute XPath expression, not getElementByName methods. [4]

XML canonicalization

The creation of XML Signatures is substantially more complex than the creation of an ordinary digital signature because a given XML Document (an "Infoset", in common usage among XML developers) may have more than one legal serialized representation. For example, whitespace inside an XML Element is not syntactically significant, so that <Elem > is syntactically identical to <Elem>.

Since the digital signature ensures data integrity, a single-byte difference would cause the signature to vary. Moreover, if an XML document is transferred from computer to computer, the line terminator may be changed from CR to LF to CR LF, etc. A program that digests and validates an XML document may later render the XML document in a different way, e.g. adding excess space between attribute definitions with an element definition, or using relative (vs. absolute) URLs, or by reordering namespace definitions. Canonical XML is especially important when an XML Signature refers to a remote document, which may be rendered in time-varying ways by an errant remote server.

To avoid these problems and guarantee that logically-identical XML documents give identical digital signatures, an XML canonicalization transform (frequently abbreviated C14n) is employed when signing XML documents (for signing the SignedInfo, a canonicalization is mandatory). These algorithms guarantee that semantically-identical documents produce exactly identical serialized representations.

Another complication arises because of the way that the default canonicalization algorithm handles namespace declarations; frequently a signed XML document needs to be embedded in another document; in this case the original canonicalization algorithm will not yield the same result as if the document is treated alone. For this reason, the so-called Exclusive Canonicalization, which serializes XML namespace declarations independently of the surrounding XML, was created.

Benefits

XML Signature is more flexible than other forms of digital signatures such as Pretty Good Privacy and Cryptographic Message Syntax, because it does not operate on binary data, but on the XML Infoset, allowing to work on subsets of the data (this is also possible with binary data in non-standard ways, for example encoding blocks of binary data in base64 ASCII), having various ways to bind the signature and signed information, and perform transformations. Another core concept is canonicalization, that is to sign only the "essence", eliminating meaningless differences like whitespace and line endings.

Issues

There are criticisms directed at the architecture of XML security in general, [5] and at the suitability of XML canonicalization in particular as a front end to signing and encrypting XML data due to its complexity, inherent processing requirement, and poor performance characteristics. [6] [7] [8] The argument is that performing XML canonicalization causes excessive latency that is simply too much to overcome for transactional, performance sensitive SOA applications.

These issues are being addressed in the XML Security Working Group. [9] [10]

Without proper policy and implementation [4] the use of XML Dsig in SOAP and WS-Security can lead to vulnerabilities, [11] such as XML signature wrapping. [12]

Applications

An example of applications of XML Signatures:

See also

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, concepts, or information resources such as web pages and books. Some URIs provide a means of locating and retrieving information resources on a network ; these are Uniform Resource Locators (URLs). A URL provides the location of the resource. A URI identifies the resource by name at the specified location or URL. Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it, these are Uniform Resource Names (URNs). The web technologies that use URIs are not limited to web browsers. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

<span class="mw-page-title-main">S-expression</span> Data serialization format

In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

In cryptography, X.509 is an International Telecommunication Union (ITU) standard defining the format of public key certificates. X.509 certificates are used in many Internet protocols, including TLS/SSL, which is the basis for HTTPS, the secure protocol for browsing the web. They are also used in offline applications, like electronic signatures.

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath languages. In many implementations, the Schematron XML is processed into XSLT code for deployment anywhere that XSLT can be used.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

Security Assertion Markup Language is an open standard for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider. SAML is an XML-based markup language for security assertions. SAML is also:

XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.

XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.

XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

Canonical XML is a normal form of XML, intended to allow relatively simple comparison of pairs of XML documents for equivalence; for this purpose, the Canonical XML transformation removes non-meaningful differences between the documents. Any XML document can be converted to Canonical XML.

Security Assertion Markup Language 2.0 (SAML 2.0) is a version of the SAML standard for exchanging authentication and authorization identities between security domains. SAML 2.0 is an XML-based protocol that uses security tokens containing assertions to pass information about a principal between a SAML authority, named an Identity Provider, and a SAML consumer, named a Service Provider. SAML 2.0 enables web-based, cross-domain single sign-on (SSO), which helps reduce the administrative overhead of distributing multiple authentication tokens to the user. SAML 2.0 was ratified as an OASIS Standard in March 2005, replacing SAML 1.1. The critical aspects of SAML 2.0 are covered in detail in the official documents SAMLCore, SAMLBind, SAMLProf, and SAMLMeta.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

Content Assembly Mechanism (CAM) is an XML-based standard for creating and managing information exchanges that are interoperable and deterministic descriptions of machine-processable information content flows into and out of XML structures. CAM is a product of the OASIS Content Assembly Technical Committee.

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:

References

  1. "XML Signature Syntax and Processing Version 1.1".
  2. "XML Signature Syntax and Processing Version 1.1".
  3. XML-Signature XPath Filter 2.0
  4. 1 2 Pawel Krawczyk (2013). "Secure SAML validation to prevent XML signature wrapping attacks".
  5. "Why XML Security is Broken".
  6. Performance of Web Services Security
  7. Performance Comparison of Security Mechanisms for Grid Services
  8. Zhang, Jimmy (January 9, 2007). "Accelerate WSS applications with VTD-XML". JavaWorld . Retrieved 2020-07-24.
  9. W3C Workshop on Next Steps for XML Signature and XML Encryption, 2007
  10. "XML Security 2.0 Requirements and Design Considerations".
  11. "XML Signature Element Wrapping Attacks and Countermeasures" (PDF). IBM Research Division. Retrieved 2023-09-07.
  12. Juraj Somorovsky; Andreas Mayer; Jorg Schwenk; Marco Kampmann; Meiko Jensen (2012). "On Breaking SAML: Be Whoever You Want to Be" (PDF).
  13. "SBR Assurance" . Retrieved 2023-09-07.