Floating-point formats |
---|
IEEE 754 |
|
Other |
Alternatives |
In computing, Microsoft Binary Format (MBF) is a format for floating-point numbers which was used in Microsoft's BASIC languages, including MBASIC, GW-BASIC and QuickBASIC prior to version 4.00. [1] [2] [3] [4] [5] [6] [7]
There are two main versions of the format. The original version was designed for memory-constrained systems and stored numbers in 32 bits (4 bytes), with a 23-bit mantissa, 1-bit sign, and an 8-bit exponent. Extended (12k) BASIC included a double-precision type with 64 bits.
During the period when it was being ported from the Intel 8080 platform to the MOS 6502 processor, computers were beginning to ship with more memory as a standard feature. This version was offered with the original 32-bit format or an optional expanded 40-bit (5-byte) format. The 40-bit format was used by most home computers of the 1970s and 1980s. These two versions are sometimes known as "6-digit" and "9-digit", respectively. [8]
On PCs with x86 processor, QuickBASIC, prior to version 4, reintroduced the double-precision format using a 55-bit mantissa in a 64-bit (8-byte) format. MBF was abandoned during the move to QuickBASIC 4, which used the standard IEEE 754 format, introduced a few years earlier.
Bill Gates and Paul Allen were working on Altair BASIC in 1975. They were developing the software at Harvard University on a DEC PDP-10 running their Altair emulator. [9] One thing they lacked was code to handle floating-point numbers, required to support calculations with very big and very small numbers, [9] which would be particularly useful for science and engineering. [10] [11] One of the proposed uses of the Altair was as a scientific calculator. [12]
At a dinner at Currier House, an undergraduate residential house at Harvard, Gates and Allen complained to their dinner companions that they had to write this code [9] and one of them, Monte Davidoff, told them that he had written floating-point routines before and convinced Gates and Allen that he was capable of writing the Altair BASIC floating-point code. [9] At the time, while IBM had introduced their own programs, there was no standard for floating-point numbers, so Davidoff had to come up with his own. He decided that 32 bits would allow enough range and precision. [13] When Allen had to demonstrate it to MITS, it was the first time it ran on an actual Altair. [14] But it worked, and when he entered ‘PRINT 2+2’, Davidoff's adding routine gave the correct answer. [9]
A copy of the source code for Altair BASIC resurfaced in 1999. In the late 1970s, Gates's former tutor and dean Harry Lewis had found it behind some furniture in an office in Aiden, and put it in a file cabinet. After more or less forgetting about its existence for a long time, Lewis eventually came up with the idea of displaying the listing in the lobby. Instead, it was decided on preserving the original listing and producing several copies for display and preservation, after librarian and conservator Janice Merrill-Oldham pointed out its importance. [15] [16] A comment in the source credits Davidoff as the writer of Altair BASIC's math package. [15] [16]
Altair BASIC took off, and soon most early home computers ran some form of Microsoft BASIC. [17] [18] The BASIC port for the 6502 CPU, such as used in the Commodore PET, took up more space due to the lower code density of the 6502. Because of this it would likely not fit in a single ROM chip together with the machine-specific input and output code. Since an extra chip was necessary, extra space was available, and this was used in part to extend the floating-point format from 32 to 40 bits. [8] This extended format was not only provided by Commodore BASIC 1 & 2, but was also supported by Applesoft BASIC I & II since version 1.1 (1977), KIM-1 BASIC since version 1.1a (1977), and MicroTAN BASIC since version 2b (1980). [8] Not long afterwards, the Z80 ports, such as Level II BASIC for the TRS-80 (1978), introduced the 64-bit, double-precision format as a separate data type from 32-bit, single-precision. [19] [20] [21] Microsoft used the same floating-point formats in their implementation of Fortran [22] and for their macro assembler MASM, [23] although their spreadsheet Multiplan [24] [25] and their COBOL implementation used binary-coded decimal (BCD) floating point. [26] Even so, for a while MBF became the de facto floating-point format on home computers, to the point where people still occasionally encounter legacy files and file formats using it. [27] [28] [29] [30] [31] [32]
In a parallel development, Intel had started the development of a floating-point coprocessor in 1976. [33] [34] William Morton Kahan, as a consultant to Intel, suggested that Intel use the floating point of Digital Equipment Corporation's (DEC) VAX. The first VAX, the VAX-11/780 had just come out in late 1977, and its floating point was highly regarded. VAX's floating-point formats differed from MBF only in that it had the sign in the most significant bit. [35] [36] However, seeking to market their chip to the broadest possible market, Kahan was asked to draw up specifications. [33] When rumours of Intel's new chip reached its competitors, they started a standardization effort, called IEEE 754, to prevent Intel from gaining too much ground. As an 8-bit exponent was not wide enough for some operations desired for double-precision numbers, e.g. to store the product of two 32-bit numbers, [1] Intel's proposal and a counter-proposal from DEC used 11 bits, like the time-tested 60-bit floating-point format of the CDC 6600 from 1965. [34] [37] [38] Kahan's proposal also provided for infinities, which are useful when dealing with division-by-zero conditions; not-a-number values, which are useful when dealing with invalid operations; denormal numbers, which help mitigate problems caused by underflow; [37] [39] [40] and a better balanced exponent bias, which could help avoid overflow and underflow when taking the reciprocal of a number. [41] [42]
By the time QuickBASIC 4.00 was released,[ when? ] the IEEE 754 standard had become widely adopted—for example, it was incorporated into Intel's 387 coprocessor and every x86 processor from the 486 on. QuickBASIC versions 4.0 and 4.5 use IEEE 754 floating-point variables by default, but (at least in version 4.5) there is a command-line option /MBF for the IDE and the compiler that switches from IEEE to MBF floating-point numbers, to support earlier-written programs that rely on details of the MBF data formats. Visual Basic also uses the IEEE 754 format instead of MBF.
MBF numbers consist of an 8-bit base-2 exponent, a sign bit (positive mantissa: s = 0; negative mantissa: s = 1) and a 23-, [43] [8] 31- [8] or 55-bit [43] mantissa of the significand. There is always a 1-bit implied to the left of the explicit mantissa, and the radix point is located before this assumed bit. The exponent is encoded with a bias of 128[ citation needed ], so that exponents −127…−1[ citation needed ] are represented by x = 1…127 (01h…7Fh)[ citation needed ], exponents 0…127[ citation needed ] are represented by x = 128…255 (80h…FFh)[ citation needed ], with a special case for x = 0 (00h) representing the whole number being zero.
The MBF double-precision format provides less scale than the IEEE 754 format, and although the format itself provides almost one extra decimal digit of precision, in practice the stored values are less accurate because IEEE calculations use 80-bit intermediate results, and MBF doesn't. [1] [3] [43] [44] Unlike IEEE floating point, MBF doesn't support denormal numbers, infinities or NaNs. [45]
MBF single-precision format (32 bits, "6-digit BASIC"): [43] [8]
Exponent | Sign | Significand |
---|---|---|
Bit 31...24 (8 bit) | Bit 23 (1 bit) | Bit 22...0 (23 bit) |
xxxxxxxx | s | mmmmmmm mmmmmmmm mmmmmmmm |
MBF extended-precision format (40 bits, "9-digit BASIC"): [8]
Exponent | Sign | Significand |
---|---|---|
Bit 39...32 (8 bit) | Bit 31 (1 bit) | Bit 30...0 (31 bit) |
xxxxxxxx | s | mmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm |
MBF double-precision format (64 bits): [43] [1]
Exponent | Sign | Significand |
---|---|---|
Bit 63...56 (8 bit) | Bit 55 (1 bit) | Bit 54...0 (55 bit) |
xxxxxxxx | s | mmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm mmmmmmmm |
In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:
IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.
Double-precision floating-point format is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide range of numeric values by using a floating radix point.
In computer science, subnormal numbers are the subset of denormalized numbers that fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest positive normal number is subnormal, while denormal can also refer to numbers outside that range.
Microsoft BASIC is the foundation software product of the Microsoft company and evolved into a line of BASIC interpreters and compiler(s) adapted for many different microcomputers. It first appeared in 1975 as Altair BASIC, which was the first version of BASIC published by Microsoft as well as the first high-level programming language available for the Altair 8800 microcomputer.
The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.
The significand is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its significant digits. Depending on the interpretation of the exponent, the significand may represent an integer or a fractional number.
Hexadecimal floating point is a format for encoding floating-point numbers first introduced on the IBM System/360 computers, and supported on subsequent machines based on that architecture, as well as machines which were intended to be application-compatible with System/360.
In C and related programming languages, long double
refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double
. As with C's other floating-point types, it may not necessarily map to an IEEE format.
In computing, minifloats are floating-point values represented with very few bits. This reduced precision makes them ill-suited for general-purpose numerical calculations, but they are useful for special purposes such as:
Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.
RGBE or Radiance HDR is an image format invented by Gregory Ward Larson for the Radiance rendering system. It stores pixels as one byte each for RGB values with a one byte shared exponent. Thus it stores four bytes per pixel.
Offset binary, also referred to as excess-K, excess-N, excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned number n+K, K being the biasing value or offset. There is no standard for offset binary, but most often the K for an n-bit binary word is K = 2n−1 (for example, the offset for a four-digit binary number would be 23=8). This has the consequence that the minimal negative value is represented by all-zeros, the "zero" value is represented by a 1 in the most significant bit and zero in all other bits, and the maximal positive value is represented by all-ones (conveniently, this is the same as using two's complement but with the most significant bit inverted). It also has the consequence that in a logical comparison operation, one gets the same result as with a true form numerical comparison operation, whereas, in two's complement notation a logical comparison will agree with true form numerical comparison operation if and only if the numbers being compared have the same sign. Otherwise the sense of the comparison will be inverted, with all negative values being taken as being larger than all positive values.
In computing, half precision is a binary floating-point computer number format that occupies 16 bits in computer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular image processing and neural networks.
In computing, quadruple precision is a binary floating-point–based computer number format that occupies 16 bytes with precision at least twice the 53-bit double precision.
Single-precision floating-point format is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
The bfloat16 floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a shortened (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use. Bfloat16 is used to reduce the storage requirements and increase the calculation speed of machine learning algorithms.
SCELBAL, short for SCientific ELementary BAsic Language, is a version of the BASIC programming language released in 1976 for the SCELBI and other early Intel 8008 and 8080-based microcomputers like the Mark-8. Later add-ons to the language included an extended math package and string handling. The original version required 8 kB of RAM, while the additions demanded at least 12 kB.
A BASIC interpreter is an interpreter that enables users to enter and run programs in the BASIC language and was, for the first part of the microcomputer era, the default application that computers would launch. Users were expected to use the BASIC interpreter to type in programs or to load programs from storage.
TensorFloat-32 or TF32 is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs.
{{cite book}}
: |website=
ignored (help){{cite book}}
: |website=
ignored (help){{cite book}}
: |website=
ignored (help)[…] This document describes the abandoned CompuTrac data format, which until recently was actively used by Equis' MetaStock charting software. […]
{{cite web}}
: CS1 maint: numeric names: authors list (link)[…] _fmsbintoieee(float *src4, float *dest4) […] MS Binary Format […] byte order => m3 | m2 | m1 | exponent […] m1 is most significant byte => sbbb|bbbb […] m3 is the least significant byte […] m = mantissa byte […] s = sign bit […] b = bit […] MBF is bias 128 and IEEE is bias 127. […] MBF places the decimal point before the assumed bit, while IEEE places the decimal point after the assumed bit. […] ieee_exp = msbin[3] - 2; /* actually, msbin[3]-1-128+127 */ […] _dmsbintoieee(double *src8, double *dest8) […] MS Binary Format […] byte order => m7 | m6 | m5 | m4 | m3 | m2 | m1 | exponent […] m1 is most significant byte => smmm|mmmm […] m7 is the least significant byte […] MBF is bias 128 and IEEE is bias 1023. […] MBF places the decimal point before the assumed bit, while IEEE places the decimal point after the assumed bit. […] ieee_exp = msbin[7] - 128 - 1 + 1023; […]
[…] IEEE 754 Single format […] The exponent is biased by 127. There is an assumed 1 bit before the radix point (so the assumed mantissa is 1.ffff… where f's are the fraction bits) […] Microsoft Binary Format (single precision) […] The exponent is biased by 128. There is an assumed 1 bit after the radix point (so the assumed mantissa is 0.1ffff… where f's are the fraction bits) […] the IEEE mantissa is twice the MBF mantissa. […] to convert from MBF to IEEE single […] subtract 2 from the exponent (one for the bias change, one for the mantissa factor), and then rearrange the sign and exponent bits. The fraction does not change. To convert from IEEE single to MBF, […] add 2 to the exponent (one for the bias change, one for the mantissa factor), and then rearrange the sign and exponent bits. The fraction does not change. […]