Inline assembler

Last updated

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

Contents

Motivation and alternatives

The embedding of assembly language code is usually done for one of these reasons: [1]

On the other hand, inline assembler poses a direct problem for the compiler itself as it complicates the analysis of what is done to each variable, a key part of register allocation. [2] This means the performance might actually decrease. Inline assembler also complicates future porting and maintenance of a program. [1]

Alternative facilities are often provided as a way to simplify the work for both the compiler and the programmer. Intrinsic functions for special instructions are provided by most compilers and C-function wrappers for arbitrary system calls are available on every Unix platform.

Syntax

In language standards

The ISO C++ standard and ISO C standards (annex J) specify a conditionally supported syntax for inline assembler:

An asm declaration has the form
 asm-declaration:
 asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [3]

This definition, however, is rarely used in actual C, as it is simultaneously too liberal (in the interpretation) and too restricted (in the use of one string literal only).

In actual compilers

In practical use, inline assembly operating on values is rarely standalone as free-floating code. Since the programmer cannot predict what register a variable is assigned to, compilers typically provide a way to substitute them in as an extension.

There are, in general, two types of inline assembly supported by C/C++ compilers:

The two families of extensions represent different understandings of division of labor in processing inline assembly. The GCC form preserves the overall syntax of the language and compartmentizes what the compiler needs to know: what is needed and what is changed. It does not explicitly require the compiler to understand instruction names, as the compiler is only needed to substitute its register assignments, plus a few mov operations, to handle the input requirements. However, the user is prone to specifying clobbered registers incorrectly. The MSVC form of an embedded domain-specific language provides ease of writing, but it requires the compiler itself to know about opcode names and their clobbering properties, demanding extra attention in maintenance and porting. [7] It is still possible to check GCC-style assembly for clobber mistakes with knowledge of the instruction set. [8]

GNAT (Ada language frontend of the GCC suite), and LLVM uses the GCC syntax. [9] [10] The D programming language uses a DSL similar to the MSVC extension officially for x86_64, [11] but the LLVM-based LDC also provides the GCC-style syntax on every architecture. [12] MSVC only supports inline assembler on 32-bit x86. [5]

The Rust language has since migrated to a syntax abstracting away inline assembly options further than the LLVM (GCC-style) version. It provides enough information to allow transforming the block into an externally-assembled function if the backend could not handle embedded assembly. [7]

Examples

A system call in GCC

Calling an operating system directly is generally not possible under a system using protected memory. The OS runs at a more privileged level (kernel mode) than the user (user mode); a (software) interrupt is used to make requests to the operating system. This is rarely a feature in a higher-level language, and so wrapper functions for system calls are written using inline assembler.

The following C code example shows an x86 system call wrapper in AT&T assembler syntax, using the GNU Assembler. Such calls are normally written with the aid of macros; the full code is included for clarity. In this particular case, the wrapper performs a system call of a number given by the caller with three operands, returning the result. [13]

To recap, GCC supports both basic and extended assembly. The former simply passes text verbatim to the assembler, while the latter performs some substitutions for register locations. [4]

externinterrno;intsyscall3(intnum,intarg1,intarg2,intarg3){intres;__asm__("int $0x80"/* make the request to the OS */:"=a"(res),/* return result in eax ("a") */"+b"(arg1),/* pass arg1 in ebx ("b") [as a "+" output because the syscall may change it] */"+c"(arg2),/* pass arg2 in ecx ("c") [ditto] */"+d"(arg3)/* pass arg3 in edx ("d") [ditto] */:"a"(num)/* pass system call number in eax ("a") */:"memory","cc",/* announce to the compiler that the memory and condition codes have been modified */"esi","edi","ebp");/* these registers are clobbered [changed by the syscall] too *//* The operating system will return a negative value on error;   * wrappers return -1 on error and set the errno global variable */if(-125<=res&&res<0){errno=-res;res=-1;}returnres;}

Processor-specific instruction in D

This example of inline assembly from the D programming language shows code that computes the tangent of x using the x86's FPU (x87) instructions.

// Compute the tangent of xrealtan(realx){asm{fldx[EBP];// load xfxam;// test for oddball valuesfstswAX;sahf;jctrigerr;// C0 = 1: x is NAN, infinity, or empty// 387's can handle denormalsSC18:fptan;fstpST(0);// dump X, which is always 1fstswAX;sahf;// if (!(fp_status & 0x20)) goto LretjnpLret;// C2 = 1: x is out of range, do argument reductionfldpi;// load pifxch;SC17:fprem1;// reminder (partial)fstswAX;sahf;jpSC17;// C2 = 1: partial reminder, need to loop fstpST(1);// remove pi from stackjmpSC18;}trigerr:returnreal.nan;Lret:// No need to manually return anything as the value is already on FP stack;}

For readers unfamiliar with x87 programming, the fstsw-sahf followed by conditional jump idiom is used to access the x87 FPU status word bits C0 and C2. fstsw stores the status in a general-purpose register; sahf sets the FLAGS register to the higher 8 bits of the register; and the jump is used to judge on whatever flag bit that happens to correspond to the FPU status bit. [14]

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. CIL instructions are executed by a CIL-compatible runtime environment such as the Common Language Runtime. Languages which target the CLI compile to CIL. CIL is object-oriented, stack-based bytecode. Runtimes typically just-in-time compile CIL instructions into native code.

A low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture—commands or functions in the language map that are structurally similar to processor's instructions. Generally, this refers to either machine code or assembly language. Because of the low abstraction between the language and machine language, low-level languages are sometimes described as being "close to the hardware". Programs written in low-level languages tend to be relatively non-portable, due to being optimized for a certain type of system architecture.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

SSE2 is one of the Intel SIMD processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

A variadic macro is a feature of some computer programming languages, especially the C preprocessor, whereby a macro may be declared to accept a varying number of arguments.

Treelang is a "toy" programming language distributed with the GNU Compiler Collection (GCC) to demonstrate the features of its code-generation backend. It was developed by Tim Josling, based on a language called Toy created by Richard Kenner. During the GCC 4.3 release cycle, a patch was committed to remove the language, because of high maintenance costs outweighing its benefits and also because it was no longer considered a good front-end example by GCC developers.

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

In computer programming, thread-local storage (TLS) is a memory management method that uses static or global memory local to a thread. The concept allows storage of data that appears to be global in a system with separate threads.

In computer science, the fetch-and-add (FAA) CPU instruction atomically increments the contents of a memory location by a specified value.

<span class="mw-page-title-main">Code::Blocks</span> Free and open source, cross-platform IDE

Code::Blocks is a free, open-source cross-platform IDE that supports multiple compilers including GCC, Clang and Visual C++. It is developed in C++ using wxWidgets as the GUI toolkit. Using a plugin architecture, its capabilities and features are defined by the provided plugins. Currently, Code::Blocks is oriented towards C, C++, and Fortran. It has a custom build system and optional Make support.

This is an incomplete list of assemblers: computer programs that translate assembly language source code into binary programs. Some assemblers are components of a compiler system for a high level language and may have limited or no usable functionality outside of the compiler system. Some assemblers are hosted on the target processor and operating system, while other assemblers (cross-assemblers) may run under an unrelated operating system or processor. For example, assemblers for embedded systems are not usually hosted on the target system since it would not have the storage and terminal I/O to permit entry of a program from a keyboard. An assembler may have a single target processor or may have options to support multiple processor types. Very simple assemblers may lack features, such as macros, present in more powerful versions.

In the x86 architecture, the CPUID instruction is a processor supplementary instruction allowing software to discover details of the processor. It was introduced by Intel in 1993 with the launch of the Pentium and SL-enhanced 486 processors.

This article describes the calling conventions used when programming x86 architecture microprocessors.

In software engineering and computer science, clobbering a file, processor register or a region of computer memory is the process of overwriting its contents completely, whether intentionally or unintentionally, or to indicate that such an action will likely occur. The Jargon File defines clobbering as

To overwrite, usually unintentionally: "I walked off the end of the array and clobbered the stack." Compare mung, scribble, trash, and smash the stack.

Blocks are a non-standard extension added by Apple Inc. to Clang's implementations of the C, C++, and Objective-C programming languages that uses a lambda expression-like syntax to create closures within these languages. Blocks are supported for programs developed for Mac OS X 10.6+ and iOS 4.0+, although third-party runtimes allow use on Mac OS X 10.5 and iOS 2.2+ and non-Apple systems.

Open Watcom Assembler or WASM is an x86 assembler produced by Watcom, based on the Watcom Assembler found in Watcom C/C++ compiler and Watcom FORTRAN 77. Further development is being done on the 32- and 64-bit JWASM project, which more closely matches the syntax of Microsoft's assembler.

<span class="mw-page-title-main">LuaJIT</span> Just-in-time compiler for the Lua programming language

LuaJIT is a tracing just-in-time compiler for the Lua programming language. Mike Pall, a primary maintainer of the project had resigned in 2015, resorting only to occasional patching to the future 2.1 version.

Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTSTEP operating system. Due to Apple macOS’s direct lineage from NeXTSTEP, Objective-C was the standard programming language used, supported, and promoted by Apple for developing macOS and iOS applications until the introduction of the Swift programming language in 2014. Thereafter, its usage has been consistently declining among developers and it has often been described as a "dying" language.

References

  1. 1 2 "DontUseInlineAsm". GCC Wiki. Retrieved 21 January 2020.
  2. Striegel, Ben (13 January 2020). ""To a compiler, a blob of inline assembly is like a slap in the face."". Reddit. Retrieved 15 January 2020.
  3. C++, [dcl.asm]
  4. 1 2 "Extended Asm - Assembler Instructions with C Expression Operands". Using the GNU C Compiler. Retrieved 15 January 2020.
  5. 1 2 "Inline Assembler". docs.microsoft.com.
  6. "Migration and Compatibility Guide: Inline assembly with Arm Compiler 6".
  7. 1 2 d'Antras, Amanieu (13 December 2019). "Rust RFC-2873: stable inline asm" . Retrieved 15 January 2020. However it is possible to implement support for inline assembly without support from the compiler backend by using an external assembler instead. Pull Request for status tracking
  8. "⚙ D54891 [RFC] Checking inline assembly for validity". reviews.llvm.org.
  9. "LLVM Language Reference: Inline assembly expressions". LLVM Documentation. Retrieved 15 January 2020.
  10. "Inline Assembly". Rust Documentation (1.0.0). Retrieved 15 January 2020.
  11. "Inline Assembler". D programming language. Retrieved 15 January 2020.
  12. "LDC inline assembly expressions". D Wiki. Retrieved 15 January 2020.
  13. syscall(2)    Linux Programmer's Manual – System Calls
  14. "FSTSW/FNSTSW — Store x87 FPU Status Word". The FNSTSW AX form of the instruction is used primarily in conditional branching...