Intrinsic function

Last updated

In computer software, in compiler theory, an intrinsic function, also called built-in function or builtin function, is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it may substitute a sequence of automatically generated instructions for the original function call, similar to an inline function. [1] Unlike an inline function, the compiler has an intimate knowledge of an intrinsic function and can thus better integrate and optimize it for a given situation.

Contents

Compilers that implement intrinsic functions may enable them only when a program requests optimization, otherwise falling back to a default implementation provided by the language runtime system (environment).

Vectorization and parallelization

Intrinsic functions are often used to explicitly implement vectorization and parallelization in languages which do not address such constructs. Some application programming interfaces (API), for example, AltiVec and OpenMP, use intrinsic functions to declare, respectively, vectorizable and multiprocessing-aware operations during compiling. The compiler parses the intrinsic functions and converts them into vector math or multiprocessing object code appropriate for the target platform. Some intrinsics are used to provide additional constraints to the optimizer, such as values a variable cannot assume. [2]

By programming language

C and C++

Compilers for C and C++, of Microsoft, [3] Intel, [1] and the GNU Compiler Collection (GCC) [4] implement intrinsics that map directly to the x86 single instruction, multiple data (SIMD) instructions (MMX, Streaming SIMD Extensions (SSE), SSE2, SSE3, SSSE3, SSE4, AVX, AVX2, AVX512, FMA, ...). The Microsoft Visual C++ compiler of Microsoft Visual Studio does not support inline assembly for x86-64. [5] [6] [7] [8] To compensate for this, new intrinsics have been added that map to standard assembly instructions that are not normally accessible through C/C++, e.g., bit scan.

Some C and C++ compilers provide non-portable platform-specific intrinsics. Other intrinsics (such as GNU built-ins) are slightly more abstracted, approximating the abilities of several contemporary platforms, with portable fall back implementations on platforms with no appropriate instructions. [9] It is common for C++ libraries, such as glm or Sony's vector maths libraries, [10] to achieve portability via conditional compilation (based on platform specific compiler flags), providing fully portable high-level primitives (e.g., a four-element floating-point vector type) mapped onto the appropriate low level programming language implementations, while still benefiting from the C++ type system and inlining; hence the advantage over linking to hand-written assembly object files, using the C application binary interface (ABI).

Examples

uint64_t__rdtsc();// return internal CPU clock counteruint64_t__popcnt64(uint64_tn);// count of bits set in nuint64_t_umul128(uint64_tFactor1,uint64_tFactor2,uint64_t*HighProduct);// 64 bit * 64 bit => 128 bit multiplication__m512_mm512_add_ps(__m512a,__m512b);// calculates a + b for two vectors of 16 floats__m512_mm512_fmadd_ps(__m512a,__m512b,__m512c);// calculates a*b + c for three vectors of 16 floats

Java

The HotSpot Java virtual machine's (JVM) just-in-time compiler also has intrinsics for specific Java APIs. [11] Hotspot intrinsics are standard Java APIs which may have one or more optimized implementation on some platforms.

PL/I

ANSI/ISO PL/I defines nearly 90 builtin functions. [12] These are conventionally grouped as follows: [13] :337–338

Individual compilers have added additional builtins specific to a machine architecture or operating system.

A builtin function is identified by leaving its name undeclared and allowing it to default, or by declaring it BUILTIN. A user-supplied function of the same name can be substituted by declaring it as ENTRY.

Related Research Articles

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple, IBM, and Freescale Semiconductor — the AIM alliance. It is implemented on versions of the PowerPC processor architecture, including Motorola's G4, IBM's G5 and POWER6 processors, and P.A. Semi's PWRficient PA6T. AltiVec is a trademark owned solely by Freescale, so the system is also referred to as Velocity Engine by Apple and VMX by IBM and P.A. Semi.

Visual Instruction Set, or VIS, is a SIMD instruction set extension for SPARC V9 microprocessors developed by Sun Microsystems. There are five versions of VIS: VIS 1, VIS 2, VIS 2+, VIS 3 and VIS 4.

SSE2 is one of the Intel SIMD processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

In the C and C++ programming languages, an inline function is one qualified with the keyword inline; this serves two purposes:

  1. It serves as a compiler directive that suggests that the compiler substitute the body of the function inline by performing inline expansion, i.e. by inserting the function code at the address of each function call, thereby saving the overhead of a function call. In this respect it is analogous to the register storage class specifier, which similarly provides an optimization hint.
  2. The second purpose of inline is to change linkage behavior; the details of this are complicated. This is necessary due to the C/C++ separate compilation + linkage model, specifically because the definition (body) of the function must be duplicated in all translation units where it is used, to allow inlining during compiling, which, if the function has external linkage, causes a collision during linking. C and C++ resolve this in different ways.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

<span class="mw-page-title-main">C99</span> C programming language standard, 1999 revision

C99 is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version (C90) with new features for the language and the standard library, and helps implementations make better use of available computer hardware, such as IEEE 754-1985 floating-point arithmetic, and compiler technology. The C11 version of the C programming language standard, published in 2011, updates C99.

Buffer overflow protection is any of various techniques used during software development to enhance the security of executable programs by detecting buffer overflows on stack-allocated variables, and preventing them from causing program misbehavior or from becoming serious security vulnerabilities. A stack buffer overflow occurs when a program writes to a memory address on the program's call stack outside of the intended data structure, which is usually a fixed-length buffer. Stack buffer overflow bugs are caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. This almost always results in corruption of adjacent data on the stack, which could lead to program crashes, incorrect operation, or security issues.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string, or the digit sum of the binary representation of a given number and the ₁ norm of a bit vector. In this binary case, it is also called the population count, popcount, sideways sum, or bit summation.

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

This article describes the calling conventions used when programming x86 architecture microprocessors.

Advanced Vector Extensions are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions, and a new coding scheme.

Open Watcom Assembler or WASM is an x86 assembler produced by Watcom, based on the Watcom Assembler found in Watcom C/C++ compiler and Watcom FORTRAN 77. Further development is being done on the 32- and 64-bit JWASM project, which more closely matches the syntax of Microsoft's assembler.

In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm ⌊log2(x)⌋. This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit. There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1, herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

Intel Advisor is a design assistance and analysis tool for SIMD vectorization, threading, memory use, and GPU offload optimization. The tool supports C, C++, Data Parallel C++ (DPC++), Fortran and Python languages. It is available on Windows and Linux operating systems in form of Standalone GUI tool, Microsoft Visual Studio plug-in or command line interface. It supports OpenMP. Intel Advisor user interface is also available on macOS.

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200, and then later in a number of AMD and other Intel CPUs. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

Permute instructions, part of bit manipulation as well as vector processing, copy unaltered contents from a source array to a destination array, where the indices are specified by a second source array. The size (bitwidth) of the source elements is not restricted but remains the same as the destination size.

References

  1. 1 2 "Intel® C++ Compiler 19.1 Developer Guide and Reference". Intel® C++ Compiler Documentation. 16 December 2019. Retrieved 2020-01-17.
  2. The Clang Team (2020). "Clang Language Extensions". Clang 11 documentation. Retrieved 2020-01-17. Builtin Functions
  3. MSDN. "Compiler Intrinsics". Microsoft . Retrieved 2012-06-20.
  4. GCC documentation. "Built-in Functions Specific to Particular Target Machines". Free Software Foundation . Retrieved 2012-06-20.
  5. MSDN. "Intrinsics and Inline Assembly". Microsoft. Archived from the original on 2018-01-02. Retrieved 2010-04-16.
  6. MSDN. "Intrinsics and Inline Assembly". Microsoft . Retrieved 2011-09-28.
  7. MSDN. "Intrinsics and Inline Assembly". Microsoft . Retrieved 2011-09-28.
  8. MSDN. "Intrinsics and Inline Assembly". Microsoft . Retrieved 2011-09-28.
  9. "Vector Extensions". Using the GNU Compiler Collection (GCC). Retrieved 2020-01-16.
  10. "Sony open sources Vector Math and SIMD math libraries (Cell PPU/SPU/other platforms)". Beyond3D Forum. Retrieved 2020-01-17.
  11. Mok, Kris (25 February 2013). "Intrinsic Methods in HotSpot VM". Slideshare. Retrieved 2014-12-20.
  12. ANSI X3 Committee (1976). American National Standard programming language PL/I.{{cite book}}: CS1 maint: numeric names: authors list (link)
  13. IBM Corporation (1995). IBM PL/I for MVS & VM Language Reference.