Units of information |
Information-theoretic |
---|
Data storage |
Quantum information |
The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. [1] The bit represents a logical state with one of two possible values. These values are most commonly represented as either "1" or "0", but other representations such as true/false, yes/no, on/off, or +/− are also widely used.
The relation between these values and the physical states of the underlying storage or device is a matter of convention, and different assignments may be used even within the same device or program. It may be physically implemented with a two-state device.
A contiguous group of binary digits is commonly called a bit string , a bit vector, or a single-dimensional (or multi-dimensional) bit array . A group of eight bits is called one byte , but historically the size of the byte is not strictly defined. [2] Frequently, half, full, double and quadruple words consist of a number of bytes which is a low power of two. A string of four bits is usually a nibble .
In information theory, one bit is the information entropy of a random binary variable that is 0 or 1 with equal probability, [3] or the information that is gained when the value of such a variable becomes known. [4] [5] As a unit of information or negentropy, the bit is also known as a shannon , [6] named after Claude E. Shannon. As a measure of the length of a digital string that is encoded as symbols over a 0-1 (binary) alphabet, the bit has been called a binit, [7] but this usage is now rare. [8]
In data compression, the goal is to find a shorter representation for a string, so that it requires fewer bits of storage -- but it must be "compressed" before storage and then (generally) "decompressed" before it is used in a computation. The field of Algorithmic Information Theory is devoted to the study of the "irreducible information content" of a string (i.e. its shortest-possible representation length, in bits), under the assumption that the receiver has minimal a priori knowledge of the method used to compress the string.
The symbol for the binary digit is either "bit", per the IEC 80000-13:2008 standard, or the lowercase character "b", per the IEEE 1541-2002 standard. Use of the latter may create confusion with the capital "B" which is the international standard symbol for the byte.
Ralph Hartley suggested the use of a logarithmic measure of information in 1928. [9] Claude E. Shannon first used the word "bit" in his seminal 1948 paper "A Mathematical Theory of Communication". [10] [11] [12] He attributed its origin to John W. Tukey, who had written a Bell Labs memo on 9 January 1947 in which he contracted "binary information digit" to simply "bit". [10]
A bit can be stored by a digital device or other physical system that exists in either of two possible distinct states. These may be the two stable states of a flip-flop, two positions of an electrical switch, two distinct voltage or current levels allowed by a circuit, two distinct levels of light intensity, two directions of magnetization or polarization, the orientation of reversible double stranded DNA, etc.
Perhaps the earliest example of a binary storage device was the punched card invented by Basile Bouchon and Jean-Baptiste Falcon (1732), developed by Joseph Marie Jacquard (1804), and later adopted by Semyon Korsakov, Charles Babbage, Herman Hollerith, and early computer manufacturers like IBM. A variant of that idea was the perforated paper tape. In all those systems, the medium (card or tape) conceptually carried an array of hole positions; each position could be either punched through or not, thus carrying one bit of information. The encoding of text by bits was also used in Morse code (1844) and early digital communications machines such as teletypes and stock ticker machines (1870).
The first electrical devices for discrete logic (such as elevator and traffic light control circuits, telephone switches, and Konrad Zuse's computer) represented bits as the states of electrical relays which could be either "open" or "closed". When relays were replaced by vacuum tubes, starting in the 1940s, computer builders experimented with a variety of storage methods, such as pressure pulses traveling down a mercury delay line, charges stored on the inside surface of a cathode-ray tube, or opaque spots printed on glass discs by photolithographic techniques.
In the 1950s and 1960s, these methods were largely supplanted by magnetic storage devices such as magnetic-core memory, magnetic tapes, drums, and disks, where a bit was represented by the polarity of magnetization of a certain area of a ferromagnetic film, or by a change in polarity from one direction to the other. The same principle was later used in the magnetic bubble memory developed in the 1980s, and is still found in various magnetic strip items such as metro tickets and some credit cards.
In modern semiconductor memory, such as dynamic random-access memory or a solid-state drive, the two values of a bit are represented by two levels of electric charge stored in a capacitor or a floating-gate MOSFET. In certain types of programmable logic arrays and read-only memory, a bit may be represented by the presence or absence of a conducting path at a certain point of a circuit. In optical discs, a bit is encoded as the presence or absence of a microscopic pit on a reflective surface. In one-dimensional bar codes and two-dimensional QR codes, bits are encoded as lines or squares which may be either black or white.
In modern digital computing, bits are transformed in Boolean logic gates.
Bits are transmitted one at a time in serial transmission. By contrast, multiple bits are transmitted simultaneously in a parallel transmission. A serial computer processes information in either a bit-serial or a byte-serial fashion. From the standpoint of data communications, a byte-serial transmission is an 8-way parallel transmission with binary signalling.
In programming languages such as C, a bitwise operation operates on binary strings as though they are vectors of bits, rather than interpreting them as binary numbers.
Data transfer rates are usually measured in decimal SI multiples. For example, a channel capacity may be specified as 8 kbit/s = 8 kb/s = 1 kB/s.
File sizes are often measured in (binary) IEC multiples of bytes, for example 1 KiB = 1024 bytes = 8192 bits. Confusion may arise in cases where (for historic reasons) filesizes are specified with binary multipliers using the ambiguous prefixes K, M, and G rather than the IEC standard prefixes Ki, Mi, and Gi. [13]
Mass storage devices are usually measured in decimal SI multiples, for example 1 TB = bytes.
Confusingly, the storage capacity of a directly-addressable memory device, such as a DRAM chip, or an assemblage of such chips on a memory module, is specified as a binary multiple -- using the ambiguous prefix G rather than the IEC recommended Gi prefix. For example, a DRAM chip that is specified (and advertised) as having "1 GB" of capacity has bytes of capacity. As at 2022, the difference between the popular understanding of a memory system with "8 GB" of capacity, and the SI-correct meaning of "8 GB" was still causing difficulty to software designers. [14]
The bit is not defined in the International System of Units (SI). However, the International Electrotechnical Commission issued standard IEC 60027, which specifies that the symbol for binary digit should be 'bit', and this should be used in all multiples, such as 'kbit', for kilobit. [15] However, the lower-case letter 'b' is widely used as well and was recommended by the IEEE 1541 Standard (2002). In contrast, the upper case letter 'B' is the standard and customary symbol for byte.
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Orders of magnitude of data |
Multiple bits may be expressed and represented in several ways. For convenience of representing commonly reoccurring groups of bits in information technology, several units of information have traditionally been used. The most common is the unit byte, coined by Werner Buchholz in June 1956, which historically was used to represent the group of bits used to encode a single character of text (until UTF-8 multibyte encoding took over) in a computer [2] [16] [17] [18] [19] and for this reason it was used as the basic addressable element in many computer architectures. By 1993, the trend in hardware design had converged on the 8-bit byte. [20] However, because of the ambiguity of relying on the underlying hardware design, the unit octet was defined to explicitly denote a sequence of eight bits.
Computers usually manipulate bits in groups of a fixed size, conventionally named "words". Like the byte, the number of bits in a word also varies with the hardware design, and is typically between 8 and 80 bits, or even more in some specialized computers. In the early 21st century, retail personal or server computers have a word size of 32 or 64 bits.
The International System of Units defines a series of decimal prefixes for multiples of standardized units which are commonly also used with the bit and the byte. The prefixes kilo (103) through yotta (1024) increment by multiples of one thousand, and the corresponding units are the kilobit (kbit) through the yottabit (Ybit).
The byte is a unit of digital information that most commonly consists of eight bits. 1 byte (B) = 8 bits (bit). Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as the Internet Protocol refer to an 8-bit byte as an octet. Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness.
In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.
A binary prefix is a unit prefix that indicates a multiple of a unit of measurement by an integer power of two. The most commonly used binary prefixes are kibi (symbol Ki, meaning 210 = 1024), mebi (Mi, 220 = 1048576), and gibi (Gi, 230 = 1073741824). They are most often used in information technology as multipliers of bit and byte, when expressing the capacity of storage devices or the size of computer files.
The gigabyte is a multiple of the unit byte for digital information. The prefix giga means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB.
Hexadecimal is a positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbols, hexadecimal uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values 0 to 9 and "A"–"F" to represent values from ten to fifteen.
In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.
The kilobyte is a multiple of the unit byte for digital information.
The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix mega is a multiplier of 1000000 (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes of information. This definition has been incorporated into the International System of Quantities.
In computing and telecommunications, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.
In computing, a block, sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length; a block size. Data thus structured are said to be blocked. The process of putting data into blocks is called blocking, while deblocking is the process of extracting data from blocks. Blocked data is normally stored in a data buffer, and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data stream. For some devices, such as magnetic tape and CKD disk devices, blocking reduces the amount of external storage required for the data. Blocking is almost universally employed when storing data to 9-track magnetic tape, NAND flash memory, and rotating media such as floppy disks, hard disks, and optical discs.
In computing, a memory address is a reference to a specific memory location in memory used by both software and hardware. These addresses are fixed-length sequences of digits, typically displayed and handled as unsigned integers. This numerical representation is based on the features of CPU, as well programming language constructs that treat the memory like an array.
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word is an important characteristic of any specific processor design or computer architecture.
IEEE 1541-2002 is a standard issued in 2002 by the Institute of Electrical and Electronics Engineers (IEEE) concerning the use of prefixes for binary multiples of units of measurement related to digital electronics and computing. IEEE 1541-2021 revises and supersedes IEEE 1541–2002, which is 'inactive'.
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.
The octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as the byte has historically been used for storage units of a variety of sizes.
ISO/IEC 80000, Quantities and units, is an international standard describing the International System of Quantities (ISQ). It was developed and promulgated jointly by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). It serves as a style guide for using physical quantities and units of measurement, formulas involving them, and their corresponding units, in scientific and educational documents for worldwide use. The ISO/IEC 80000 family of standards was completed with the publication of the first edition of Part 1 in November 2009.
The JEDEC memory standards are the specifications for semiconductor memory circuits and similar storage devices promulgated by the Joint Electron Device Engineering Council (JEDEC) Solid State Technology Association, a semiconductor trade and engineering standardization organization.
This timeline of binary prefixes lists events in the history of the evolution, development, and use of units of measure that are germane to the definition of the binary prefixes by the International Electrotechnical Commission (IEC) in 1998, used primarily with units of information such as the bit and the byte.
A unit of information is any unit of measure of digital data size. In digital computing, a unit of information is used to describe the capacity of a digital data storage device. In telecommunications, a unit of information is used to describe the throughput of a communication channel. In information theory, a unit of information is used to measure information contained in messages and the entropy of random variables.
Werner Buchholz was a German-American computer scientist. After growing up in Europe, Buchholz moved to Canada and then to the United States. He worked for International Business Machines (IBM) in New York. In June 1956, he coined the term "byte" for a unit of digital information. In 1990, he was recognized as a computer pioneer by the Institute of Electrical and Electronics Engineers.
[…] With IBM's STRETCH computer as background, handling 64-character words divisible into groups of 8 (I designed the character set for it, under the guidance of Dr. Werner Buchholz, the man who DID coin the term "byte" for an 8-bit grouping). […] The IBM 360 used 8-bit characters, although not ASCII directly. Thus Buchholz's "byte" caught on everywhere. I myself did not like the name for many reasons. […]
The choice of a logarithmic base corresponds to the choice of a unit for measuring information. If the base 2 is used the resulting units may be called binary digits, or more briefly bits, a word suggested by J. W. Tukey.
[…] Most important, from the point of view of editing, will be the ability to handle any characters or digits, from 1 to 6 bits long […] the Shift Matrix to be used to convert a 60-bit word, coming from Memory in parallel, into characters, or "bytes" as we have called them, to be sent to the Adder serially. The 60 bits are dumped into magnetic cores on six different levels. Thus, if a 1 comes out of position 9, it appears in all six cores underneath. […] The Adder may accept all or only some of the bits. […] Assume that it is desired to operate on 4 bit decimal digits, starting at the right. The 0-diagonal is pulsed first, sending out the six bits 0 to 5, of which the Adder accepts only the first four (0-3). Bits 4 and 5 are ignored. Next, the 4 diagonal is pulsed. This sends out bits 4 to 9, of which the last two are again ignored, and so on. […] It is just as easy to use all six bits in alphanumeric work, or to handle bytes of only one bit for logical analysis, or to offset the bytes by any number of bits. […]
[…] The first reference found in the files was contained in an internal memo written in June 1956 during the early days of developing Stretch. A byte was described as consisting of any number of parallel bits from one to six. Thus a byte was assumed to have a length appropriate for the occasion. Its first use was in the context of the input-output equipment of the 1950s, which handled six bits at a time. The possibility of going to 8 bit bytes was considered in August 1956 and incorporated in the design of Stretch shortly thereafter. The first published reference to the term occurred in 1959 in a paper "Processing Data in Bits and Pieces" by G A Blaauw, F P Brooks Jr and W Buchholz in the IRE Transactions on Electronic Computers , June 1959, page 121. The notions of that paper were elaborated in Chapter 4 of Planning a Computer System (Project Stretch) , edited by W Buchholz, McGraw-Hill Book Company (1962). The rationale for coining the term was explained there on page 40 as follows:
Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (ie, different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters. (The term is coined from bite , but respelled to avoid accidental mutation to bit.)
System/360 took over many of the Stretch concepts, including the basic byte and word sizes, which are powers of 2. For economy, however, the byte size was fixed at the 8 bit maximum, and addressing at the bit level was replaced by byte addressing. […]