Overhead (engineering)

Last updated

In engineering, some methods or components make special demands on the system. The extra design features necessary to meet these demands are called overhead. For instance, in electrical engineering, a particular integrated circuit might draw large current, requiring a robust power delivery circuit and a heat-dissipation mechanism.

Example

An example from software engineering is the encoding of information and data. The date and time "2011-07-12 07:18:47" can be expressed as Unix time with the 32-bit signed integer 1310447927, consuming only 4 bytes. Represented as ISO 8601 formatted UTF-8 encoded string 2011-07-12 07:18:47 the date would consume 19 bytes, a size overhead of 375% over the binary integer representation. As XML this date can be written as follows with an overhead of 218 characters, while adding the semantic context that it is a CHANGEDATE with index 1.

<?xml version="1.0" encoding="UTF-8"?><DATETIMEqualifier="CHANGEDATE"index="1"><YEAR>2011</YEAR><MONTH>07</MONTH><DAY>12</DAY><HOUR>07</HOUR><MINUTE>18</MINUTE><SECOND>47</SECOND></DATETIME>

The 349 bytes resulting from the UTF-8 encoded XML correspond to a size overhead of 8625% over the original integer representation.

See also

Related Research Articles

Binary-coded decimal

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

HTML has been in use since 1991, but HTML 4.0 was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII, two goals are worth considering: the information's integrity, and universal browser display.

Unicode Character encoding standard

Unicode is an information technology (IT) standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020, there is a total of 143,859 characters, with Unicode 13.0 covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.

Web pages authored using hypertext markup language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set" which defines the set of characters that may be present in a HTML document and assigns numbers to them and the "external character encoding" or "charset" used to encode a given document as a sequence of bytes.

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFFBYTE ORDER MARK, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

Unary coding, or the unary numeral system and also sometimes called thermometer code, is an entropy encoding that represents a natural number, n, with n ones followed by a zero or with n − 1 ones followed by a zero. For example 5 is represented as 111110 or 11110. Some representations use n or n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality. Unary coding is both a prefix-free code and a self-synchronizing code.

UTF-7 is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character. Alternative names are C string, which refers to the C programming language and ASCIIZ.

A Java class file is a file containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files containing Java classes. If a source file has more than one class, each class is compiled into a separate class file.

This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in the standards and so software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to perform a specific task. It is a special case of engineering overhead. Overhead can be a deciding factor in software design, with regard to structure, error correction, and feature inclusion. Examples of computing overhead may be found in functional programming, data transfer, and data structures.

A Canonical S-expression is a binary encoding form of a subset of general S-expression. It was designed for use in SPKI to retain the power of S-expressions and ensure canonical form for applications such as digital signatures while achieving the compactness of a binary form and maximizing the speed of parsing.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

LEB128 or Little Endian Base 128 is a form of variable-length code compression used to store an arbitrarily large integer in a small number of bytes. LEB128 is used in the DWARF debug file format and the WebAssembly binary encoding for all integer literals.

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:

A character literal is a type of literal in programming for the representation of a single character's value within the source code of a computer program.

C++ is a general-purpose programming language with imperative, object-oriented, and generic programming features. Many criticisms have been leveled at C++'s design by well-known software developers including Linus Torvalds, Richard Stallman, Joshua Bloch, Rob Pike, Ken Thompson, and Donald Knuth.