Calling convention

Last updated

In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have been taken for where and how parameters are passed to that function, and where and how results are returned from that function, with these transfers typically done via certain registers or within a stack frame on the call stack. There are design choices for how the tasks of preparing for a function call and restoring the environment after the function has completed are divided between the caller and the callee. Some calling convention specifies the way every function should get called. The correct calling convention should be used for every function call, to allow the correct and reliable execution of the whole program using these functions.

Contents

Introduction

Calling conventions are usually considered part of the application binary interface (ABI).

The names or meanings of the parameters and return values are defined in the application programming interface (API, as opposed to ABI), which is a separate though related concept to ABI and calling convention. The names of members within passed structures and objects would also be considered part of the API, and not ABI. Sometimes APIs do include keywords to specify the calling convention for functions.

Calling conventions do not typically include information on handling lifespan of dynamically-allocated structures and objects. Other supplementary documentation may state where the responsibility for freeing up allocated memory lies.

Calling conventions are unlikely to specify the layout of items within structures and objects, such as byte ordering or structure packing.

For some languages, the calling convention includes details of error or exception handling, (e.g. Go, Java) and for others, it does not (e.g. C++).

For Remote procedure calls, there is an analogous concept called Marshalling.

Calling conventions may be related to a particular programming language's evaluation strategy, but most often are not considered part of it (or vice versa), as the evaluation strategy is usually defined on a higher abstraction level and seen as a part of the language rather than as a low-level implementation detail of a particular language's compiler.

Different calling conventions

Calling conventions may differ in:

Calling conventions within one platform

Sometimes multiple calling conventions appear on a single platform; a given platform and language implementation may offer a choice of calling conventions. Reasons for this include performance, adaptation of conventions of other popular languages, and restrictions or conventions imposed by various "computing platforms".

Many architectures only have one widely-used calling convention, often suggested by the architect. For RISCs including SPARC, MIPS, and RISC-V, registers names based on this calling convention are often used. For example, MIPS registers $4 through $7 have "ABI names" $a0 through $a3, reflecting their use for parameter passing in the standard calling convention. (RISC CPUs have many equivalent general-purpose registers so there's typically no hardware reason for giving them names other than numbers.)

The calling convention of a given program's language may differ from the calling convention of the underlying platform, OS, or of some library being linked to. For example, on 32-bit Windows, operating system calls have the stdcall calling convention, whereas many C programs that run there use the cdecl calling convention. To accommodate these differences in calling convention, compilers often permit keywords that specify the calling convention for a given function. The function declarations will include additional platform-specific keywords that indicate the calling convention to be used. When handled correctly, the compiler will generate code to call functions in the appropriate manner.

Some languages allow the calling convention for a function to be explicitly specified with that function; other languages will have some calling convention but it will be hidden from the users of that language, and therefore will not typically be a consideration for the programmer.

Architectures

x86 (32-bit)

The 32-bit version of the x86 architecture is used with many different calling conventions. Due to the small number of architectural registers, and historical focus on simplicity and small code-size, many x86 calling conventions pass arguments on the stack. The return value (or a pointer to it) is returned in a register. Some conventions use registers for the first few parameters which may improve performance, especially for short and simple leaf-routines very frequently invoked (i.e. routines that do not call other routines).

Example call:

pushEAX; pass some register resultpushdword[EBP+20]; pass some memory variable (FASM/TASM syntax)push3; pass some constantcallcalc; the returned result is now in EAX

Typical callee structure: (some or all (except ret) of the instructions below may be optimized away in simple procedures). Some conventions leave the parameter space allocated, using plain ret instead of ret imm16. In that case, the caller could add esp,12 in this example, or otherwise deal with the change to ESP.

calc:pushEBP; save old frame pointermovEBP,ESP; get new frame pointersubESP,localsize; reserve stack space for locals..; perform calculations, leave result in EAX.movESP,EBP; free space for localspopEBP; restore old frame pointerretparamsize; free parameter space and return.

x86-64

The 64-bit version of the x86 architecture, known as x86-64, AMD64, and Intel 64, has two calling sequences in common use. One calling sequence, defined by Microsoft, is used on Windows; the other calling sequence, specified in the AMD64 System V ABI, is used by Unix-like systems and, with some changes, by OpenVMS. As x86-64 has more general-purpose registers than does 16-bit x86, both conventions pass some arguments in registers.

ARM (A32)

The standard 32-bit ARM calling convention allocates the 16 general-purpose registers as:

If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at compile time, then the caller must allocate space for that value at run time, and pass a pointer to that space in r0.

Subroutines must preserve the contents of r4 to r11 and the stack pointer (perhaps by saving them to the stack in the function prologue, then using them as scratch space, then restoring them from the stack in the function epilogue). In particular, subroutines that call other subroutines must save the return address in the link register r14 to the stack before calling those other subroutines. However, such subroutines do not need to return that value to r14—they merely need to load that value into r15, the program counter, to return.

The ARM calling convention mandates using a full-descending stack. In addition, the stack pointer must always be 4-byte aligned, and must always be 8-byte aligned at a function call with a public interface. [1]

This calling convention causes a "typical" ARM subroutine to:

ARM (A64)

The 64-bit ARM (AArch64) calling convention allocates the 31 general-purpose registers as: [2]

All registers starting with x have a corresponding 32-bit register prefixed with w. Thus, a 32-bit x0 is called w0.

Similarly, the 32 floating-point registers are allocated as: [3]

RISC-V ISA

RISC-V has a defined calling convention with two flavors, with or without floating point. [4] It passes arguments in registers whenever possible.

POWER, PowerPC, and Power ISA

The POWER, PowerPC, and Power ISA architectures have a large number of registers so most functions can pass all arguments in registers for single level calls. Additional arguments are passed on the stack, and space for register-based arguments is also always allocated on the stack as a convenience to the called function in case multi-level calls are used (recursive or otherwise) and the registers must be saved. This is also of use in variadic functions, such as printf(), where the function's arguments need to be accessed as an array. A single calling convention is used for all procedural languages.

Branch-and-link instructions store the return address in a special link register separate from the general-purpose registers; a routine returns to its caller with a branch instruction that uses the link register as the destination address. Leaf routines do not need to save or restore the link register; non-leaf routines must save the return address before making a call to another routine and restore it before it returns, saving it by using the Move From Special Purpose Register instruction to move the link register to a general-purpose register and, if necessary, then saving it to the stack, and restoring it by, if it was saved to the stack, loading the saved link register value to a general-purpose register, and then using the Move To Special Purpose Register instruction to move the register containing the saved link-register value to the link register.

MIPS

The O32 [5] ABI is the most commonly-used ABI, owing to its status as the original System V ABI for MIPS. [6] It is strictly stack-based, with only four registers $a0-$a3 available to pass arguments. This perceived slowness, along with an antique floating-point model with 16 registers only, has encouraged the proliferation of many other calling conventions. The ABI took shape in 1990 and was never updated since 1994. It is only defined for 32-bit MIPS, but GCC has created a 64-bit variation called O64. [7]

For 64-bit, the N64 ABI (not related to Nintendo 64) by Silicon Graphics is most commonly used. The most important improvement is that eight registers are now available for argument passing; It also increases the number of floating-point registers to 32. There is also an ILP32 version called N32, which uses 32-bit pointers for smaller code, analogous to the x32 ABI. Both run under the 64-bit mode of the CPU. [7]

A few attempts have been made to replace O32 with a 32-bit ABI that resembles N32 more. A 1995 conference came up with MIPS EABI, for which the 32-bit version was quite similar. [8] EABI inspired MIPS Technologies to propose a more radical "NUBI" ABI that additionally reuses argument registers for the return value. [9] MIPS EABI is supported by GCC but not LLVM; neither supports NUBI.

For all of O32 and N32/N64, the return address is stored in a $ra register. This is automatically set with the use of the JAL (jump and link) or JALR (jump and link register) instructions. The stack grows downwards.

SPARC

The SPARC architecture, unlike most RISC architectures, is built on register windows. There are 24 accessible registers in each register window: 8 are the "in" registers (%i0-%i7), 8 are the "local" registers (%l0-%l7), and 8 are the "out" registers (%o0-%o7). The "in" registers are used to pass arguments to the function being called, and any additional arguments need to be pushed onto the stack. However, space is always allocated by the called function to handle a potential register window overflow, local variables, and (on 32-bit SPARC) returning a struct by value. To call a function, one places the arguments for the function to be called in the "out" registers; when the function is called, the "out" registers become the "in" registers and the called function accesses the arguments in its "in" registers. When the called function completes, it places the return value in the first "in" register, which becomes the first "out" register when the called function returns.

The System V ABI, [10] which most modern Unix-like systems follow, passes the first six arguments in "in" registers %i0 through %i5, reserving %i6 for the frame pointer and %i7 for the return address.

IBM System/360 and successors

The IBM System/360 is another architecture without a hardware stack. The examples below illustrate the calling convention used by OS/360 and successors prior to the introduction of 64-bit z/Architecture; other operating systems for System/360 might have different calling conventions.

Calling program:

     LA  1,ARGS      Load argument list address      L   15,=A(SUB)  Load subroutine address      BALR 14,15      Branch to called routine1      ... ARGS DC A(FIRST)     Address of 1st argument      DC A(SECOND)      ...      DC A(THIRD)+X'80000000' Last argument2

Called program:

SUB  EQU *            This is the entry point of the subprogram

Standard entry sequence:

     USING *,153      STM 14,12,12(13) Save registers4      ST  13,SAVE+4    Save caller's savearea addr      LA  12,SAVE      Chain saveareas      ST  12,8(13)      LR  13,12      ...

Standard return sequence:

     L   13,SAVE+45      LM  14,12,12(13)      L   15,RETVAL6      BR  14          Return to caller SAVE DS  18F         Savearea7

Notes:

  1. The BALR instruction stores the address of the next instruction (return address) in the register specified by the first argumentregister 14and branches to the second argument address in register 15.
  2. The caller passes the address of a list of argument addresses in register 1. The last address has the high-order bit set to indicate the end of the list. This limits programs using this convention to 31-bit addressing.
  3. The address of the called routine is in register 15. Normally this is loaded into another register and register 15 is not used as a base register.
  4. The STM instruction saves registers 14, 15, and 0 through 12 in a 72-byte area provided by the caller called a save area pointed to by register 13. The called routine provides its own save area for use by subroutines it calls; the address of this area is normally kept in register 13 throughout the routine. The instructions following STM update forward and backward chains linking this save area to the caller's save area.
  5. The return sequence restores the caller's registers.
  6. Register 15 is usually used to pass a return value.
  7. Declaring a savearea statically in the called routine makes it non-reentrant and non-recursive; a reentrant program uses a dynamic savearea, acquired either from the operating system and freed upon returning, or in storage passed by the calling program.

In the System/390 ABI [11] and the z/Architecture ABI, [12] used in Linux:

Additional arguments are passed on the stack.

SuperH

Register Windows CE 5.0 gcc Renesas
R0Return values. Temporary for expanding assembly pseudo-instructions. Implicit source/destination for 8/16-bit operations. Not preserved.Return value, caller savesVariables/temporary. Not guaranteed
R1..R3Serves as temporary registers. Not preserved.Caller saved scratch. Structure address (caller save, by default)Variables/temporary. Not guaranteed
R4..R7First four words of integer arguments. The argument build area provides space into which R4 through R7 holding arguments may spill. Not preserved.Parameter passing, caller savesArguments. Not guaranteed.
R8..R13Serves as permanent registers. Preserved.Callee SavesVariables/temporary. Guaranteed.
R14Default frame pointer. (R8-R13 may also serve as frame pointer and leaf routines may use R1–R3 as frame pointer.) Preserved.Frame Pointer, FP, callee savesVariables/temporary. Guaranteed.
R15Serves as stack pointer or as a permanent register. Preserved.Stack Pointer, SP, callee savesStack pointer. Guaranteed.

Note: "preserved" reserves to callee saving; same goes for "guaranteed".

68k

The most common calling convention for the Motorola 68000 series is: [13] [14] [15] [16]

IBM 1130

The IBM 1130 was a small 16-bit word-addressable machine. It had only six registers plus condition indicators, and no stack. The registers are Instruction Address Register (IAR), Accumulator (ACC), Accumulator Extension (EXT), and three index registers X1X3. The calling program is responsible for saving ACC, EXT, X1, and X2. [17] There are two pseudo-operations for calling subroutines, CALL to code non-relocatable subroutines directly linked with the main program, and LIBF to call relocatable library subroutines through a transfer vector. [18] Both pseudo-ops resolve to a Branch and Store IAR (BSI) machine instruction that stores the address of the next instruction at its effective address (EA) and branches to EA+1.

Arguments follow the BSIusually these are one-word addresses of argumentsthe called routine must know how many arguments to expect so that it can skip over them on return. Alternatively, arguments can be passed in registers. Function routines returned the result in ACC for real arguments, or in a memory location referred to as the Real Number Pseudo-Accumulator (FAC). Arguments and the return address were addressed using an offset to the IAR value stored in the first location of the subroutine.

  *                  1130 subroutine example      ENT  SUB        Declare "SUB" an external entry point  SUB DC   0          Reserved word at entry point, conventionally coded "DC *-*"  *                   Subroutine code begins here  *                   If there were arguments the addresses can be loaded indirectly from the return address      LDX I 1 SUB     Load X1 with the address of the first argument (for example)  ...  *                   Return sequence      LD      RES     Load integer result into ACC  *                   If no arguments were provided, indirect branch to the stored return address      B   I   SUB     If no arguments were provided      END  SUB 

Subroutines in IBM 1130, CDC 6600 and PDP-8 (all three computers were introduced in 1965) store the return address in the first location of a subroutine. [19]

Calling conventions outside machine architectures

Threaded code

Threaded code places all the responsibility for setting up for and cleaning up after a function call on the called code. The calling code does nothing but list the subroutines to be called. This puts all the function setup and clean-up code in one place—the prologue and epilogue of the function—rather than in the many places that function is called. This makes threaded code the most compact calling convention.

Threaded code passes all arguments on the stack. All return values are returned on the stack. This makes naive implementations slower than calling conventions that keep more values in registers. However, threaded code implementations that cache several of the top stack values in registers—in particular, the return address—are usually faster than subroutine calling conventions that always push and pop the return address to the stack. [20] [21] [22]

PL/I

The default calling convention for programs written in the PL/I language passes all arguments by reference, although other conventions may optionally be specified. The arguments are handled differently for different compilers and platforms, but typically the argument addresses are passed via an argument list in memory. A final, hidden, address may be passed pointing to an area to contain the return value. Because of the wide variety of data types supported by PL/I a data descriptor may also be passed to define, for example, the lengths of character or bit strings, the dimension and bounds of arrays (dope vectors), or the layout and contents of a data structure. Dummy arguments are created for arguments which are constants or which do not agree with the type of argument the called procedure expects.

See also

Related Research Articles

MIPS is a family of reduced instruction set computer (RISC) instruction set architectures (ISA) developed by MIPS Computer Systems, now MIPS Technologies, based in the United States.

In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.

<span class="mw-page-title-main">Application binary interface</span> Binary interface between two program units

In computer software, an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

Coroutines are computer program components that allow execution to be suspended and resumed, generalizing subroutines for cooperative multitasking. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.

In computer programming, a parameter or a formal argument is a special kind of variable used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are the values of the arguments with which the subroutine is going to be called/invoked. An ordered list of parameters is usually included in the definition of a subroutine, so that, each time the subroutine is called, its arguments for that call are evaluated, and the resulting values can be assigned to the corresponding parameters.

In computer programming, the word trampoline has a number of meanings, and is generally associated with jump instructions.

In computer programming, a thunk is a subroutine used to inject a calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine. They have many other applications in compiler code generation and modular programming.

In computer science, the funarg problem(function argument problem) refers to the difficulty in implementing first-class functions in programming language implementations so as to use stack-based memory allocation of the functions.

In computer programming, a return statement causes execution to leave the current subroutine and resume at the point in the code immediately after the instruction which called the subroutine, known as its return address. The return address is saved by the calling routine, today usually on the process's call stack or in a register. Return statements in many programming languages allow a function to specify a return value to be passed back to the code that called the function.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

The TMS9900 was one of the first commercially available, single-chip 16-bit microprocessors. Introduced in June 1976, it implemented Texas Instruments' TI-990 minicomputer architecture in a single-chip format, and was initially used for low-end models of that lineup.

In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This type of stack is also known as an execution stack, program stack, control stack, run-time stack, or machine stack, and is often shortened to simply "the stack". Although maintenance of the call stack is important for the proper functioning of most software, the details are normally hidden and automatic in high-level programming languages. Many computer instruction sets provide special instructions for manipulating stacks.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In assembly language programming, the function prologue is a few lines of code at the beginning of a function, which prepare the stack and registers for use within the function. Similarly, the function epilogue appears at the end of the function, and restores the stack and registers to the state they were in before the function was called.

This article describes the calling conventions used when programming x86 architecture microprocessors.

A stack register is a computer central processor register whose purpose is to keep track of a call stack. On an accumulator-based architecture machine, this may be a dedicated register. On a machine with multiple general-purpose registers, it may be a register that is reserved by convention, such as on the IBM System/360 through z/Architecture architecture and RISC architectures, or it may be a register that procedure call and return instructions are hardwired to use, such as on the PDP-11, VAX, and Intel x86 architectures. Some designs such as the Data General Eclipse had no dedicated register, but used a reserved hardware memory address for this function.

Little Computer 3, or LC-3, is a type of computer educational programming language, an assembly language, which is a type of low-level programming language.

Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing.

In computer programming, a function, subprogram, procedure, method, routine or subroutine is a callable unit that has a well-defined behavior and can be invoked by other software units to exhibit that behavior.

References

  1. "Procedure Call Standard for the ARM Architecture". 2021.
  2. "Parameters in general-purpose registers". ARM Cortex-A Series Programmer’s Guide for ARMv8-A. Retrieved 12 November 2020.
  3. "Parameters in NEON and floating-point registers". developer.arm.com. Retrieved 13 November 2020.
  4. "RISC-V calling convention" (PDF).
  5. "MIPS32 Instruction Set Quick Reference".
  6. Sweetman, Dominic. See MIPS Run (2 ed.). Morgan Kaufmann Publishers. ISBN   0-12088-421-6.
  7. 1 2 "MIPS ABI History".
  8. Christopher, Eric (11 June 2003). "mips eabi documentation". binutils@sources.redhat.com (Mailing list). Retrieved 19 June 2020.
  9. "NUBI".
  10. System V Application Binary Interface SPARC Processor Supplement (3 ed.).
  11. "S/390 ELF Application Binary Interface Supplement".
  12. "zSeries ELF Application Binary Interface Supplement".
  13. Smith, Dr. Mike. "SHARC (21k) and 68k Register Comparison".
  14. XGCC: The Gnu C/C++ Language System for Embedded Development (PDF). Embedded Support Tools Corporation. 2000. p. 59.
  15. "COLDFIRE/68K: ThreadX for the Freescale ColdFire Family". Archived from the original on 2015-10-02.
  16. Moshovos, Andreas. "Subroutines Continued: Passing Arguments, Returning Values and Allocating Local Variables". all registers except d0, d1, a0, a1 and a7 should be preserved across a call.
  17. IBM Corporation (1967). IBM 1130 Disk Monitor System, Version 2 System Introduction (C26-3709-0) (PDF). p. 67. Retrieved 21 December 2014.
  18. IBM Corporation (1968). IBM 1130 Assembler Language (C26-5927-4) (PDF). pp. 24–25.
  19. Smotherman, Mark (2004). "Subroutine and procedure call support: Early history".
  20. Rodriguez, Brad. "Moving Forth, Part 1: Design Decisions in the Forth Kernel". On the 6809 or Zilog Super8, DTC is faster than STC.
  21. Ertl, Anton. "Speed of various interpreter dispatch techniques".
  22. Zaleski, Mathew (2008). "Chapter 4: Design and Implementation of Efficient Interpretation". YETI: a graduallY Extensible Trace Interpreter. Although direct-threaded interpreters are known to have poor branch prediction properties... the latency of a call and return may be greater than an indirect jump.