Overhead (computing)

Last updated

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to perform a specific task. It is a special case of engineering overhead. Overhead can be a deciding factor in software design, with regard to structure, error correction, and feature inclusion. Examples of computing overhead may be found in Object Oriented Programming (OOP), functional programming,[ citation needed ] data transfer, and data structures.

Contents

Software design

Choice of implementation

A programmer/software engineer may have a choice of several algorithms, encodings, data types or data structures, each of which have known characteristics. When choosing among them, their respective overhead should also be considered.

Tradeoffs

In software engineering, overhead can influence the decision whether or not to include features in new products, or indeed whether to fix bugs. A feature that has a high overhead may not be included – or needs a big financial incentive to do so. Often, even though software providers are well aware of bugs in their products, the payoff of fixing them is not worth the reward, because of the overhead.

For example, an implicit data structure or succinct data structure may provide low space overhead, but at the cost of slow performance (space/time tradeoff).

Run-time complexity of software

Algorithmic complexity is generally specified using Big O notation. This makes no comment on how long something takes to run or how much memory it uses, but how its increase depends on the size of the input. Overhead is deliberately not part of this calculation, since it varies from one machine to another, whereas the fundamental running time of an algorithm does not.

This should be contrasted with algorithmic efficiency, which takes into account all kinds of resources – a combination (though not a trivial one) of complexity and overhead.

Examples

Computer programming (run-time and computational overhead)

Invoking a function introduces a small run-time overhead. [1] Sometimes the compiler can minimize this overhead by inlining some of these function calls. [2]

CPU caches

In a CPU cache, the "cache size" (or capacity) refers to how much data a cache stores. For instance, a "4 KB cache" is a cache that holds 4 KB of data. The "4 KB" in this example excludes overhead bits such as frame, address, and tag information. [3]

Communications (data transfer overhead)

Reliably sending a payload of data over a communications network requires sending more than just payload itself. It also involves sending various control and signalling data (TCP) required to reach the destination. This creates a so-called protocol overhead as the additional data does not contribute to the intrinsic meaning of the message. [4] [5]

In telephony, number dialing and call set-up time are overheads. In two-way (but half-duplex) radios, the use of "over" and other signaling needed to avoid collisions is an overhead.

Protocol overhead can be expressed as a percentage of non-application bytes (protocol and frame synchronization) divided by the total number of bytes in the message.

Encodings and data structures (size overhead)

The encoding of information and data introduces overhead too. The date and time "2011-07-12 07:18:47" can be expressed as Unix time with the 32-bit signed integer 1310447927, consuming only 4 bytes. Represented as ISO 8601 formatted UTF-8 encoded string 2011-07-12 07:18:47 the date would consume 19 bytes, a size overhead of 375% over the binary integer representation. As XML this date can be written as follows with an overhead of 218 characters, while adding the semantic context that it is a CHANGEDATE with index 1.

<?xml version="1.0" encoding="UTF-8"?><datetimequalifier="changedate"index="1"><year>2011</year><month>07</month><day>12</day><hour>07</hour><minute>18</minute><second>47</second></datetime>

The 349 bytes, resulting from the UTF-8 encoded XML, correlates to a size overhead of 8625% over the original integer representation.

File systems

Besides the files themselves, computer file systems take a portion of the space to store directory names and listings, file names, files' sector locations, attributes such as the date and time of the last modification and creation, how the files are fragmented, written and free parts of the space, and a journal on some file systems.

Many small files create more overhead than a low number of large files.

See also

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

<span class="mw-page-title-main">UTF-16</span> Variable-width encoding of Unicode, using one or two 16-bit code units

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as "UCS-2" (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed, including most emoji and important CJK characters such as for personal and place names.

Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

The byte-order mark (BOM) is a particular usage of the special Unicode character code, U+FEFFZERO WIDTH NO-BREAK SPACE, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file (EOF) marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. Most text files need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

In computer programming, a magic number is any of the following:

The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been under development since either 1996 or 1998 by Igor Pavlov and was first used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio and a variable compression-dictionary size, while still maintaining decompression speed similar to other commonly used compression algorithms.

HFS Plus or HFS+ is a journaling file system developed by Apple Inc. It replaced the Hierarchical File System (HFS) as the primary file system of Apple computers with the 1998 release of Mac OS 8.1. HFS+ continued as the primary Mac OS X file system until it was itself replaced with the Apple File System (APFS), released with macOS High Sierra in 2017. HFS+ is also one of the formats supported by the iPod digital music player.

In computing, a Personal Storage Table (.pst) is an open proprietary file format used to store copies of messages, calendar events, and other items within Microsoft software such as Microsoft Exchange Client, Windows Messaging, and Microsoft Outlook. The open format is controlled by Microsoft who provide free specifications and free irrevocable technology licensing.

This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

In engineering, some methods or components make special demands on the system. The extra design features necessary to meet these demands are called overhead. For instance, in electrical engineering, a particular integrated circuit might draw large current, requiring a robust power delivery circuit and a heat-dissipation mechanism.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

LEB128 or Little Endian Base 128 is a variable-length code compression used to store arbitrarily large integers in a small number of bytes. LEB128 is used in the DWARF debug file format and the WebAssembly binary encoding for all integer literals.

Lempel–Ziv–Stac is a lossless data compression algorithm that uses a combination of the LZ77 sliding-window compression algorithm and fixed Huffman coding. It was originally developed by Stac Electronics for tape compression, and subsequently adapted for hard disk compression and sold as the Stacker disk compression software. It was later specified as a compression algorithm for various network protocols. LZS is specified in the Cisco IOS stack.

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

References

  1. "Inline functions (C++)". Microsoft Learn. Microsoft. Retrieved 22 March 2024.
  2. Mahaffey, Terry (24 July 2019). "Inlining Decisions in Visual Studio". C++ Team Blog. Microsoft.
  3. Sorin, Daniel J. (2009). "Caches and Memory Hierarchies" (PDF). Retrieved March 13, 2019. Presentation for course in Computer Architecture.
  4. Common Performance Issues in Network Applications Part 1: Interactive Applications, Windows XP Technical Articles, Microsoft
  5. Protocol Overhead in IP/ATM Networks, Minnesota Supercomputer Center