X.690

Last updated

X.690 is an ITU-T standard specifying several ASN.1 encoding formats:

Contents

The Basic Encoding Rules (BER) were the original rules laid out by the ASN.1 standard for encoding data into a binary format. The rules, collectively referred to as a transfer syntax in ASN.1 parlance, specify the exact octets (8-bit bytes) used to encode data.

X.680 defines a syntax for declaring data types, for example: booleans, numbers, strings, and compound structures. Each type definition also includes an identifying number. X.680 defines several primitive data types, for example: BooleanType, IntegerType, OctetStringType. (ASN.1 also provides for constructed types built from other types.) Types are associated with a class. For example, the primitive types are part of the universal class. The three other classes (application, private, and context-specific) are essentially different scopes to support customization for specific applications. Combined, the class and type form a tag, which therefore corresponds to a unique data definition. X.690 includes rules for encoding those tags, data values (content), and the lengths of that encoded data.

BER, along with two subsets of BER (the Canonical Encoding Rules and the Distinguished Encoding Rules), are defined by the ITU-T's X.690 standards document, which is part of the ASN.1 document series.

BER encoding

Basic Encoding Rules specifies in general terms, a partially self-describing and self-delimiting protocol for encoding ASN.1 data structures. Each data element is to be encoded as a type identifier, a length description, the actual data elements, and, where necessary, an end-of-content marker. These types of encodings are commonly called type–length–value (TLV) encodings. However, in BER's terminology, it is identifier-length-contents.

This type of format would allow a receiver to decode the ASN.1 information from an incomplete stream, without requiring any pre-knowledge of the size, content, or semantic meaning of the data, though some specifics of the protocol would need to be provided or reverse-engineered from representative samples of traffic or software. [1]

Data encoding consists of three or four components, in the following order:

Identifier octets
Type
Length octets
Length
Contents octets
Value
End-of-Contents octets
(only if indefinite form)

Note that if a Length is zero, then there are no Contents octets, e.g. the NULL type. The End-of-Contents octets are only used for the indefinite form of Length.

Identifier octets

The BER identifier octets encode the ASN.1 tags. The list of Universal Class tags can be found at Rec. ITU-T X.680, clause 8, table 1. [2] The following tags are native to ASN.1:

Types, universal class
NamePermitted constructionTag number
DecimalHexadecimal
End-of-Content (EOC)Primitive00
BOOLEANPrimitive11
INTEGERPrimitive22
BIT STRINGBoth33
OCTET STRINGBoth44
NULLPrimitive55
OBJECT IDENTIFIER Primitive66
Object DescriptorBoth77
EXTERNALConstructed88
REAL (float)Primitive99
ENUMERATEDPrimitive10A
EMBEDDED PDVConstructed11B
UTF8String Both12C
RELATIVE-OIDPrimitive13D
TIMEPrimitive14E
Reserved15F
SEQUENCE and SEQUENCE OFConstructed1610
SET and SET OFConstructed1711
NumericStringBoth1812
PrintableString Both1913
T61String Both2014
VideotexStringBoth2115
IA5String Both2216
UTCTime Both2317
GeneralizedTime Both2418
GraphicStringBoth2519
VisibleStringBoth261A
GeneralStringBoth271B
UniversalString Both281C
CHARACTER STRINGConstructed291D
BMPString Both301E
DATEPrimitive311F
TIME-OF-DAYPrimitive3220
DATE-TIMEPrimitive3321
DURATIONPrimitive3422
OID-IRIPrimitive3523
RELATIVE-OID-IRIPrimitive3624

Encoding

The identifier octets encode the ASN.1 tag's class number and type number. It also encodes whether the contents octets represent a constructed or primitive value. The Identifier spans one or more octets.

Octet 1Octet 2 ... n
Only if tag type > 3010
8765432187654321
Tag classP/CTag type (if 0–3010)Long Form
3110 = Long Form1=More7 bits of Tag type

In the initial octet, bit 6 encodes whether the type is primitive or constructed, bit 7–8 encode the tag's class, and bits 1–5 encode the tag's type. The following values are possible:

ClassValueDescription
Universal0The type is native to ASN.1
Application1The type is only valid for one specific application
Context-specific2Meaning of this type depends on the context (such as within a sequence, set or choice)
Private3Defined in private specifications
P/CValueDescription
Primitive (P)0The contents octets directly encode the value.
Constructed (C)1The contents octets contain 0, 1, or more encodings.

If the tag's type fits in the 5-bits (0-3010), then the Identifier spans just one byte: Short Form. If the tag's type is too large for the 5-bit tag field (> 3010), it has to be encoded in further octets: Long Form.

The initial octet encodes the class and primitive/constructed as before, and bits 1–5 are 1. The tag number is encoded in the following octets, where bit 8 of each is 1 if there are more octets, and bits 1–7 encode the tag number. The tag number bits combined, big-endian, encode the tag number. The least number of following octets should be encoded; that is, bits 1–7 should not all be 0 in the first following octet.

Length octets

There are two forms of the length octets: The definite form and the indefinite form.

First length octet
FormBits
87654321
Definite, short0Length (0–127)
Indefinite10
Definite, long1Number of following octets (1–126)
Reserved1127

Definite form

This encodes the number of content octets and is always used if the type is primitive or constructed and data are immediately available. There is a short form and a long form, which can encode different ranges of lengths. Numeric data is encoded as unsigned integers with the least significant bit always first (to the right).

The short form consists of a single octet in which bit 8 is 0, and bits 1–7 encode the length (which may be 0) as a number of octets.

The long form consists of 1 initial octet followed by 1 or more subsequent octets, containing the length. In the initial octet, bit 8 is 1, and bits 1–7 (excluding the values 0 and 127) encode the number of octets that follow. [1] The following octets encode, as big-endian, the length (which may be 0) as a number of octets.

Long form example, length 435
Octet 1Octet 2Octet 3
100000100000000110110011
Long form2 length octets1101100112 = 43510 content octets

Indefinite form

This does not encode the length at all, but that the content octets finish at marker octets. This applies to constructed types and is typically used if the content is not immediately available at encoding time.

It consists of a single octet, in which bit 8 is 1, and bits 1–7 are 0. Then, two end-of-contents octets must terminate the content octets.

Contents octets

The contents octets encode the element data value. [1]

Note that there may be no contents octets (hence, the element has a length of 0) if only the existence of the ASN.1 object, or its emptiness, is to be noted. For example, this is the case for an ASN.1 NULL value.

CER encoding

CER (Canonical Encoding Rules) is a restricted variant of BER for producing unequivocal transfer syntax for data structures described by ASN.1. Whereas BER gives choices as to how data values may be encoded, CER (together with DER) selects just one encoding from those allowed by the basic encoding rules, eliminating the rest of the options. CER is useful when the encodings must be preserved; e.g., in security exchanges.

DER encoding

DER (Distinguished Encoding Rules) is a restricted variant of BER for producing unequivocal transfer syntax for data structures described by ASN.1. Like CER, DER encodings are valid BER encodings. DER is the same thing as BER with all but one sender's options removed.

DER is a subset of BER providing for exactly one way to encode an ASN.1 value. DER is intended for situations when a unique encoding is needed, such as in cryptography, and ensures that a data structure that needs to be digitally signed produces a unique serialized representation. DER can be considered a canonical form of BER. For example, in BER a Boolean value of true can be encoded as any of 255 non-zero byte values, while in DER there is one way to encode a boolean value of true.

The most significant DER encoding constraints are:

  1. Length encoding must use the definite form
    • Additionally, the shortest possible length encoding must be used
  2. Bitstring, octetstring, and restricted character strings must use the primitive encoding
  3. Elements of a Set are encoded in sorted order, based on their tag value

DER is widely used for digital certificates such as X.509.

BER, CER and DER compared

The key difference between the BER format and the CER or DER formats is the flexibility provided by the Basic Encoding Rules. BER, as explained above, is the basic set of encoding rules given by ITU-T X.690 for the transfer of ASN.1 data structures. It gives senders clear rules for encoding data structures they want to send, but also leaves senders some encoding choices. As stated in the X.690 standard, "Alternative encodings are permitted by the basic encoding rules as a sender's option. Receivers who claim conformance to the basic encoding rules shall support all alternatives". [1]

A receiver must be prepared to accept all legal encodings in order to legitimately claim BER-compliance. By contrast, both CER and DER restrict the available length specifications to a single option. As such, CER and DER are restricted forms of BER and serve to disambiguate the BER standard.

CER and DER differ in the set of restrictions that they place on the sender. The basic difference between CER and DER is that DER uses definitive length form and CER uses indefinite length form in some precisely defined cases. That is, DER always has leading length information, while CER uses end-of-contents octets instead of providing the length of the encoded data. Because of this, CER requires less metadata for large encoded values, while DER does it for small ones.

In order to facilitate a choice between encoding rules, the X.690 standards document provides the following guidance:

The distinguished encoding rules is more suitable than the canonical encoding rules if the encoded value is small enough to fit into the available memory and there is a need to rapidly skip over some nested values. The canonical encoding rules is more suitable than the distinguished encoding rules if there is a need to encode values that are so large that they cannot readily fit into the available memory or it is necessary to encode and transmit a part of a value before the entire value is available. The basic encoding rules is more suitable than the canonical or distinguished encoding rules if the encoding contains a set value or set-of value and there is no need for the restrictions that the canonical and distinguished encoding rules impose.

Criticisms of BER encoding

There is a common perception of BER as being "inefficient" compared to alternative encoding rules. It has been argued by some that this perception is primarily due to poor implementations, not necessarily any inherent flaw in the encoding rules. [3] These implementations rely on the flexibility that BER provides to use encoding logic that is easier to implement, but results in a larger encoded data stream than necessary. Whether this inefficiency is reality or perception, it has led to a number of alternative encoding schemes, such as the Packed Encoding Rules, which attempt to improve on BER performance and size.

Other alternative formatting rules, which still provide the flexibility of BER but use alternative encoding schemes, are also being developed. The most popular of these are XML-based alternatives, such as the XML Encoding Rules and ASN.1 SOAP. [4] In addition, there is a standard mapping to convert an XML Schema to an ASN.1 schema, which can then be encoded using BER. [5]

Usage

Despite its perceived problems, BER is a popular format for transmitting data, particularly in systems with different native data encodings.

By comparison, the more definite DER encoding is widely used to transfer digital certificates such as X.509.

See also

Related Research Articles

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as the Internet Protocol refer to an 8-bit byte as an octet. Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness.

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

High-Level Data Link Control (HDLC) is a bit-oriented code-transparent synchronous data link layer protocol developed by the International Organization for Standardization (ISO). The standard for HDLC is ISO/IEC 13239:2002.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

X.400 is a suite of ITU-T recommendations that define the ITU-T Message Handling System (MHS).

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

A management information base (MIB) is a database used for managing the entities in a communication network. Most often associated with the Simple Network Management Protocol (SNMP), the term is also used more generically in contexts such as in OSI/ISO Network management model. While intended to refer to the complete collection of management information available on an entity, it is often used to refer to a particular subset, more correctly referred to as MIB-module.

T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languages. It is also used by older versions of LDAP. While T.61 continues to be supported in modern versions of X.500 and X.509, it has been deprecated in favor of Unicode. It is also called Code page 1036, CP1036, or IBM 01036.

<span class="mw-page-title-main">Data Matrix</span> Two-dimensional matrix barcode

A Data Matrix is a two-dimensional code consisting of black and white "cells" or dots arranged in either a square or rectangular pattern, also known as a matrix. The information to be encoded can be text or numeric data. Usual data size is from a few bytes up to 1556 bytes. The length of the encoded data depends on the number of cells in the matrix. Error correction codes are often used to increase reliability: even if one or more cells are damaged so it is unreadable, the message can still be read. A Data Matrix symbol can store up to 2,335 alphanumeric characters.

Fast Infoset is an international standard that specifies a binary encoding format for the XML Information Set as an alternative to the XML document format. It aims to provide more efficient serialization than the text-based XML format.

The octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as the byte has historically been used for storage units of a variety of sizes.

JPEG XR is an image compression standard for continuous tone photographic images, based on the HD Photo specifications that Microsoft originally developed and patented. It supports both lossy and lossless compression, and is the preferred image format for Ecma-388 Open XML Paper Specification documents.

KLV (Key-Length-Value) is a data encoding standard, often used to embed information in video feeds. The standard uses a type–length–value encoding scheme. Items are encoded into Key-Length-Value triplets, where key identifies the data, length specifies the data's length, and value is the data itself. It is defined in SMPTE 336M-2007, approved by the Society of Motion Picture and Television Engineers. Due to KLV's large degree of interoperability, it has also been adopted by the Motion Imagery Standards Board.

Packetized Elementary Stream (PES) is a specification in the MPEG-2 Part 1 (Systems) (ISO/IEC 13818-1) and ITU-T H.222.0 that defines carrying of elementary streams (usually the output of an audio or video encoder) in packets within MPEG program streams and MPEG transport streams. The elementary stream is packetized by encapsulating sequential data bytes from the elementary stream inside PES packet headers.

A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. VLQ is identical to LEB128 except in endianness. See the example below.

The Encoding Control Notation (ECN) is a standardized formal language that is part of the Abstract Syntax Notation One (ASN.1) family of international standards. ECN is designed to be used along with ASN.1, and each ECN specification (a coherent set of encoding definitions) is explicitly related to a particular ASN.1 specification (a coherent set of type definitions).

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

ISO/IEC 20248Automatic Identification and Data Capture Techniques – Data Structures – Digital Signature Meta Structure is an international standard specification under development by ISO/IEC JTC 1/SC 31/WG 2. This development is an extension of SANS 1368, which is the current published specification. ISO/IEC 20248 and SANS 1368 are equivalent standard specifications. SANS 1368 is a South African national standard developed by the South African Bureau of Standards.

References

  1. 1 2 3 4 Information technology – ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER), ITU-T X.690, 07/2002
  2. "ITU-T Recommendation database".
  3. Lin, Huai-An. “Estimation of the Optimal Performance of ASN.1/BER Transfer Syntax”. ACM Computer Communication Review. July 93, 45 - 58.
  4. ITU-T Rec. X.892, ISO/IEC 24824-2
  5. ITU-T X.694, ISO/IEC ISO/IEC 8825-5