Long double

Last updated

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

Contents

long double in C

History

The long double type was present in the original 1989 C standard, [1] but support was improved by the 1999 revision of the C standard, or C99, which extended the standard library to include functions operating on long double such as sinl() and strtold().

Long double constants are floating-point constants suffixed with "L" or "l" (lower-case L), e.g., 0.333333333333333333L. Without a suffix, the evaluation depends on FLT_EVAL_METHOD.

Implementations

On the x86 architecture, most C compilers implement long double as the 80-bit extended precision type supported by x86 hardware (generally stored as 12 or 16 bytes to maintain data structure alignment), as specified in the C99 / C11 standards (IEC 60559 floating-point arithmetic (Annex F)). An exception is Microsoft Visual C++ for x86, which makes long double a synonym for double. [2] The Intel C++ compiler on Microsoft Windows supports extended precision, but requires the /Qlongdouble switch for long double to correspond to the hardware's extended precision format. [3]

Compilers may also use long double for the IEEE 754 quadruple-precision binary floating-point format (binary128). This is the case on HP-UX, [4] Solaris/SPARC, [5] MIPS with the 64-bit or n32 ABI, [6] , 64-bit ARM (AArch64) [7] (on operating systems using the standard AAPCS calling conventions, such as Linux), and z/OS with FLOAT(IEEE) [8] [9] [10] . Most implementations are in software, but some processors have hardware support.

On some PowerPC and SPARCv9 machines,[ citation needed ]long double is implemented as a double-double arithmetic, where a long double value is regarded as the exact sum of two double-precision values, giving at least a 106-bit precision; with such a format, the long double type does not conform to the IEEE floating-point standard. Otherwise, long double is simply a synonym for double (double precision), e.g. on 32-bit ARM [11] , 64-bit ARM (AArch64) (on Windows [12] and macOS [13] ) and on 32-bit MIPS [14] (old ABI, a.k.a. o32).

With the GNU C Compiler, long double is 80-bit extended precision on x86 processors regardless of the physical storage used for the type (which can be either 96 or 128 bits), [15] On some other architectures, long double can be double-double (e.g. on PowerPC [16] [17] [18] ) or 128-bit quadruple precision (e.g. on SPARC [19] ). As of gcc 4.3, a quadruple precision is also supported on x86, but as the nonstandard type __float128 rather than long double. [20]

Although the x86 architecture, and specifically the x87 floating-point instructions on x86, supports 80-bit extended-precision operations, it is possible to configure the processor to automatically round operations to double (or even single) precision. Conversely, in extended-precision mode, extended precision may be used for intermediate compiler-generated calculations even when the final results are stored at a lower precision (i.e. FLT_EVAL_METHOD == 2). With gcc on Linux, 80-bit extended precision is the default; on several BSD operating systems (FreeBSD and OpenBSD), double-precision mode is the default, and long double operations are effectively reduced to double precision. [21] (NetBSD 7.0 and later, however, defaults to 80-bit extended precision [22] ). However, it is possible to override this within an individual program via the FLDCW "floating-point load control-word" instruction. [21] On x86_64 the BSDs default to 80-bit extended precision. Microsoft Windows with Visual C++ also sets the processor in double-precision mode by default, but this can again be overridden within an individual program (e.g. by the _controlfp_s function in Visual C++ [23] ). The Intel C++ Compiler for x86, on the other hand, enables extended-precision mode by default. [24] On IA-32 OS X, long double is 80-bit extended precision. [25]

Other specifications

In CORBA (from specification of 3.0, which uses "ANSI/IEEE Standard 754-1985" as its reference), "the long double data type represents an IEEE double-extended floating-point number, which has an exponent of at least 15 bits in length and a signed fraction of at least 64 bits", with GIOP/IIOP CDR, whose floating-point types "exactly follow the IEEE standard formats for floating point numbers", marshalling this as what seems to be IEEE 754-2008 binary128 a.k.a. quadruple precision without using that name.

See also

Related Research Articles

Floating-point arithmetic Computer format for representing real numbers

In computing, floating-point arithmetic (FP) is arithmetic using formulaic representation of real numbers as an approximation to support a trade-off between range and precision. For this reason, floating-point computation is often found in systems which include very small and very large real numbers, which require fast processing times. A number is, in general, represented approximately to a fixed number of significant digits and scaled using an exponent in some fixed base; the base for the scaling is normally two, ten, or sixteen. A number that can be represented exactly is of the following form:

x86 Family of instruction set architectures

x86 is a family of instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.

In computing, endianness is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address. Endianness may also be used to describe the order in which the bits are transmitted over a communication channel. Bit-endianess is seldom used in other contexts.

Double-precision floating-point format is a computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple, IBM, and Freescale Semiconductor — the AIM alliance. It is implemented on versions of the PowerPC processor architecture, including Motorola's G4, IBM's G5 and POWER6 processors, and P.A. Semi's PWRficient PA6T. AltiVec is a trademark owned solely by Freescale, so the system is also referred to as Velocity Engine by Apple and VMX by IBM and P.A. Semi.

x86-64 Type of instruction set which is a 64-bit version of the x86 instruction set

x86-64 is the 64-bit version of the x86 instruction set. It introduces two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode. With 64-bit mode and the new paging mode, it supports vastly larger amounts of virtual memory and physical memory than is possible on its 32-bit predecessors, allowing programs to store larger amounts of data in memory. x86-64 also expands general-purpose registers to 64-bit, as well extends the number of them from 8 to 16, and provides numerous other enhancements. Floating point operations are supported via mandatory SSE2-like instructions, and x87/MMX style registers are generally not used ; instead, a set of 32 vector registers, 128 bits each, is used. In 64-bit mode, instructions are modified to support 64-bit operands and 64-bit addressing mode. The compatibility mode allows 16- and 32-bit user applications to run unmodified coexisting with 64-bit applications if the 64-bit operating system supports them. As the full x86 16-bit and 32-bit instruction sets remain implemented in hardware without any intervening emulation, these older executables can run with little or no performance penalty, while newer or modified applications can take advantage of new features of the processor design to achieve performance improvements. Also, a processor supporting x86-64 still powers on in real mode for full backward compatibility with the 8086, as x86 processors supporting protected mode have done since the 80286.

SSE2 is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.

C99 C programming language standard, 1999 revision

C99 is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version (C90) with new features for the language and the standard library, and helps implementations make better use of available computer hardware, such as IEEE 754-1985 floating-point arithmetic, and compiler technology. The C11 version of the C programming language standard, published in 2011, replaces C99.

strictfp is a modifier in the Java programming language that restricts floating-point calculations to ensure portability. The strictfp command was introduced into Java with the Java virtual machine (JVM) version 1.2 and is available for use on all currently updated Java VMs.

In computer software, in compiler theory, an intrinsic function is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it may substitute a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function, the compiler has an intimate knowledge of an intrinsic function and can thus better integrate and optimize it for a given situation.

In computer architecture, 128-bit integers, memory addresses, or other data units are those that are 128 bits wide. Also, 128-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size.

x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that worked in tandem with corresponding x86 CPUs. These microchips had names ending in "87". This was also known as the NPX. Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent function and its inverse, for example.

This article describes the calling conventions used when programming x86 architecture microprocessors.

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.

Clang Compiler front-end

Clang is a compiler front end for the C, C++, Objective-C and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA and HIP frameworks. It uses the LLVM compiler infrastructure as its back end and has been part of the LLVM release cycle since LLVM 2.6.

Advanced Vector Extensions are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

In computing, quadruple precision is a binary floating point–based computer number format that occupies 16 bytes with precision more than twice the 53-bit double precision.

In computing, Microsoft Binary Format (MBF) is a format for floating-point numbers which was used in Microsoft's BASIC language products, including MBASIC, GW-BASIC and QuickBASIC prior to version 4.00.

AArch64 64-bit extension of the ARM architecture

AArch64 or ARM64 is the 64-bit extension of the ARM architecture.

References

  1. ANSI/ISO 9899-1990 American National Standard for Programming Languages - C, section 6.1.2.5.
  2. MSDN homepage, about Visual C++ compiler
  3. Intel Developer Site
  4. Hewlett Packard (1992). "Porting C Programs". HP-UX Portability Guide - HP 9000 Computers (PDF) (2nd ed.). pp. 5-3 and 5-37.
  5. Sun Numerical Computation Guide, Chapter 2: IEEE Arithmetic
  6. "MIPSpro™ N32 ABI Handbook" (PDF). 1999. Retrieved 2020-05-26.
  7. "Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)". 2020-10-01. Archived (PDF) from the original on 2020-10-02.
  8. "Floating-point types". 2020-10-09. Retrieved 2020-10-09.
  9. Schwarz, Eric (June 22, 2015). "The IBM z13 SIMD Accelerators for Integer, String, and Floating-Point" (PDF). Retrieved July 13, 2015.
  10. Schwarz, E. M.; Krygowski, C. A. (September 1999). "The S/390 G5 floating-point unit". IBM Journal of Research and Development. 43 (5/6): 707–721. doi:10.1147/rd.435.0707 . Retrieved October 10, 2020.
  11. "ARM® Compiler toolchain Compiler Reference, Version 5.03" (PDF). 2013. Section 6.3 Basic data types. Retrieved 2019-11-08.
  12. "llvm/llvm-project". GitHub. Retrieved 2020-09-03.
  13. "llvm/llvm-project". GitHub. Retrieved 2020-09-03.
  14. "System V Application Binary Interface: MIPS(r) Processor Supplement" (PDF) (3rd ed.). 1996. Retrieved 2020-05-26.
  15. Using the GNU Compiler Collection, x86 Options.
  16. Using the GNU Compiler Collection, RS/6000 and PowerPC Options
  17. Inside Macintosh - PowerPC Numerics Archived 2012-10-09 at the Wayback Machine
  18. 128-bit long double support routines for Darwin
  19. SPARC Options
  20. GCC 4.3 Release Notes
  21. 1 2 Brian J. Gough and Richard M. Stallman, An Introduction to GCC, section 8.6 Floating-point issues (Network Theory Ltd., 2004).
  22. "Significant changes from NetBSD 6.0 to 7.0".
  23. _controlfp_s, Microsoft Developer Network (2/25/2011).
  24. Intel C++ Compiler Documentation, Using the -fp-model (/fp) Option.
  25. https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/130-IA-32_Function_Calling_Conventions/IA32.html