This article needs additional citations for verification .(February 2018) |
Overhead in computer systems consists of shared functions that benefit all users or processes but are not directly attributable to any specific task. It is thus similar to overhead in organizations. Computer system overhead shows up as slower processing, less memory, less storage capacity, less network bandwidth, or bigger latency than would be expected from reading the system specifications. [1] It is a special case of engineering overhead. Overhead can be a deciding factor in software design, with regard to structure, error correction, and feature inclusion. Examples of computing overhead may be found in object-oriented programming (OOP), functional programming,[ citation needed ] data transfer, data structures, and file systems on data storage devices.
A programmer/software engineer may have a choice of several algorithms, encodings, data types or data structures, each of which have known characteristics. When choosing among them, their respective overhead should also be considered.
In software engineering, overhead can influence the decision whether or not to include features in new products, or indeed whether to fix bugs. A feature that has a high overhead may not be included – or needs a big financial incentive to do so. Often, even though software providers are well aware of bugs in their products, the payoff of fixing them is not worth the reward, because of the overhead.
For example, an implicit data structure or succinct data structure may provide low space overhead, but at the cost of slow performance (space/time tradeoff).
Algorithmic complexity is generally specified using Big O notation. This makes no comment on how long something takes to run or how much memory it uses, but how its increase depends on the size of the input. Overhead is deliberately not part of this calculation, since it varies from one machine to another, whereas the fundamental running time of an algorithm does not.
This should be contrasted with algorithmic efficiency, which takes into account all kinds of resources – a combination (though not a trivial one) of complexity and overhead.
Invoking a function introduces a small run-time overhead. [2] Sometimes the compiler can minimize this overhead by inlining some of these function calls. [3]
In a CPU cache, the "cache size" (or capacity) refers to how much data a cache stores. For instance, a "4 KB cache" is a cache that holds 4 KB of data. The "4 KB" in this example excludes overhead bits such as frame, address, and tag information. [4]
Reliably sending a payload of data over a communications network requires sending more than just the payload itself. It also involves sending various control and signalling data (TCP) required to reach the destination. This creates a so-called protocol overhead as the additional data does not contribute to the intrinsic meaning of the message. [5] [6]
In telephony, number dialing and call set-up time are overheads. In two-way (but half-duplex) radios, the use of "over" and other signaling needed to avoid collisions is an overhead.
Protocol overhead can be expressed as a percentage of non-application bytes (protocol and frame synchronization) divided by the total number of bytes in the message.
The encoding of information and data introduces overhead too. The date and time "2011-07-12 07:18:47" can be expressed as Unix time with the 32-bit signed integer 1310447927
, consuming only 4 bytes. Represented as ISO 8601 formatted UTF-8 encoded string 2011-07-12 07:18:47
the date would consume 19 bytes, a size overhead of 375% over the binary integer representation. As XML this date can be written as follows with an overhead of 218 characters, while adding the semantic context that it is a CHANGEDATE with index 1.
<?xml version="1.0" encoding="UTF-8"?><datetimequalifier="changedate"index="1"><year>2011</year><month>07</month><day>12</day><hour>07</hour><minute>18</minute><second>47</second></datetime>
The 349 bytes, resulting from the UTF-8 encoded XML, correlates to a size overhead of 8625% over the original integer representation.
Besides the files themselves, computer file systems take a portion of the space to store directory names and listings, file names, files' sector locations, attributes such as the date and time of the last modification and creation, how the files are fragmented, written and free parts of the space, and a journal on some file systems.
Many small files create more overhead than a low number of large files.
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a character encoding are known as code points and collectively comprise a code space, a code page, or character map.
In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.
Web pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in an HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes.
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit. Almost every webpage is stored in UTF-8.
NT File System (NTFS) is a proprietary journaling file system developed by Microsoft in the 1990s.
UTF-16 (16-bit Unicode Transformation Format) is a character encoding method capable of encoding all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two 16-bitcode units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed, including most emoji and important CJK characters such as for personal and place names.
Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.
The byte-order mark (BOM) is a particular usage of the special Unicode character code, U+FEFFZERO WIDTH NO-BREAK SPACE, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:
A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system.
In computer programming, a magic number is any of the following:
In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.
In computing, a Personal Storage Table (.pst) is an open proprietary file format used to store copies of messages, calendar events, and other items within Microsoft software such as Microsoft Exchange Client, Windows Messaging, and Microsoft Outlook. The open format is controlled by Microsoft who provide free specifications and free irrevocable technology licensing.
JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of name–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.
This article compares Unicode encodings in two types of environments: 8-bit clean environments, and environments that forbid the use of byte values with the high bit set. Originally, such prohibitions allowed for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. The Standard Compression Scheme for Unicode and the Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.
In engineering, some methods or components make special demands on the system. The extra design features necessary to meet these demands are called overhead. For instance, in electrical engineering, a particular integrated circuit might draw large current, requiring a robust power delivery circuit and a heat-dissipation mechanism.
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.
In computer programming, a manifest file is a file containing metadata for a group of accompanying files that are part of a set or coherent unit. For example, the files of a computer program may have a manifest describing the name, version number, license and the constituent files of the program. The term "manifest" is borrowed from a cargo shipping procedure, where a ship manifest would list the crew and/or cargo of a vessel.
Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.
BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.