Literal pool

Last updated

In computer science, and specifically in compiler and assembler design, a literal pool is a lookup table used to hold literals during assembly and execution.

Multiple (local) literal pools are typically used only for computer architectures that lack branch instructions for long jumps, or have a set of instructions optimized for shorter jumps. Examples of such architectures include the IBM System/360 and its successors, which had a number of instructions which took 12-bit address offsets. In this case, the compiler would create a literal table on every 4K page; any branches whose target was less than 4K bytes away could be taken immediately; longer branches required an address lookup via the literal table. The entries in the literal pool are placed into the object relocation table during assembly, and are then resolved at link edit time.

The ARM architecture also makes use of multiple local pools, [1] as does AArch64, the 64-bit extension to the original ARM. [2]

Another architecture making use of multiple local pools is C-SKY, a 32-bit architecture designed for embedded SoCs. [3] [4]

In certain ways, a literal pool resembles a TOC or a global offset table (GOT), except that the implementation is considerably simpler, and there may be multiple literal tables per object.

Perhaps the most common type of literal pool are the literal pools used by the LDR Rd,=const pseudo-instruction in ARM assembly language [5] [6] and similar instructions in IBM System/360 assembly language, [7] which are compiled to a LOAD with a PC-relative addressing mode and the constant stored in the literal pool.

On the IBM S/390 and zSeries architecture, the GNU assembler, "as" (which is invoked during the gcc build process) will use general-purpose register R13 to store a pointer to the literal pool. [8] [9]

Often some particular constant value will be used multiple times in a program. Many linkers, by default, store each unique constant once, in a single combined literal pool; that improves code size. [10]

The Java virtual machine has a "string literal pool" and a "class constant pool". [11]

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

<span class="mw-page-title-main">GNU Compiler Collection</span> Free and open-source compiler for various programming languages

The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software under the GNU General Public License. GCC is a key component of the GNU toolchain and the standard compiler for most projects related to GNU and the Linux kernel. With roughly 15 million lines of code in 2019, GCC is one of the biggest free programs in existence. It has played an important role in the growth of free software, as both a tool and an example.

MMIX is a 64-bit reduced instruction set computing (RISC) architecture designed by Donald Knuth, with significant contributions by John L. Hennessy and Richard L. Sites. Knuth has said that,

MMIX is a computer intended to illustrate machine-level aspects of programming. In my books The Art of Computer Programming, it replaces MIX, the 1960s-style machine that formerly played such a role… I strove to design MMIX so that its machine language would be simple, elegant, and easy to learn. At the same time I was careful to include all of the complexities needed to achieve high performance in practice, so that MMIX could in principle be built and even perhaps be competitive with some of the fastest general-purpose computers in the marketplace."

In computing, endianness is the order or sequence of bytes of a word of digital data in computer memory or data communication which is identified by describing the impact of the "first" bytes, meaning at the smallest address or sent first. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address. Bi-endianness is a feature supported by numerous computer architectures that feature switchable endianness in data fetches and stores or for instruction fetches. Other orderings are generically called middle-endian or mixed-endian.

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

ARM is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Ltd. develops the ISAs and licenses them to other companies, who build the physical devices that use the instruction set. It also designs and licenses cores that implement these ISAs.

In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, address buses, or data buses of that size. A computer that uses such a processor is a 64-bit computer.

<span class="mw-page-title-main">PIC microcontrollers</span> Line of single-chip microprocessors from Microchip Technology

PIC is a family of microcontrollers made by Microchip Technology, derived from the PIC1650 originally developed by General Instrument's Microelectronics Division. The name PIC initially referred to Peripheral Interface Controller, and is currently expanded as Programmable Intelligent Computer. The first parts of the family were available in 1976; by 2013 the company had shipped more than twelve billion individual parts, used in a wide variety of embedded systems.

A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on an Android smartphone is a cross compiler.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

The GNU Assembler, commonly known as gas or as, is the assembler developed by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software. It is a part of the GNU Binutils package.

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

In computer programming, the block starting symbol is the portion of an object file, executable, or assembly language code that contains statically allocated variables that are declared but have not been assigned a value yet. It is often referred to as the "bss section" or "bss segment".

This is an incomplete list of assemblers: computer programs that translate assembly language source code into binary programs. Some assemblers are components of a compiler system for a high level language and may have limited or no usable functionality outside of the compiler system. Some assemblers are hosted on the target processor and operating system, while other assemblers (cross-assemblers) may run under an unrelated operating system or processor. For example, assemblers for embedded systems are not usually hosted on the target system since it would not have the storage and terminal I/O to permit entry of a program from a keyboard. An assembler may have a single target processor or may have options to support multiple processor types. Very simple assemblers may lack features, such as macros, present in more powerful versions.

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

The IBM Basic assembly language and successors is a series of assembly languages and assemblers made for the IBM System/360 mainframe system and its successors through the IBM Z.

In computing, quadruple precision is a binary floating point–based computer number format that occupies 16 bytes with precision at least twice the 53-bit double precision.

In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm ⌊log2(x)⌋. This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit. There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1, herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

<span class="mw-page-title-main">AArch64</span> 64-bit extension of the ARM architecture

AArch64 or ARM64 is the 64-bit extension of the ARM architecture family.

References

  1. "Using as, section 9.4 (ARM Dependent Features)".
  2. "Using as, section 9.1 (AArch64 Dependent Features)".
  3. "Using as, section 9.10.1 (C-SKY Dependent Features: Options)".
  4. Larabel, Michael. "C-SKY 32-Bit CPUs Aim For Initial Support In Linux 4.20~5.0". Archived from the original on 22 August 2022. Retrieved 22 August 2022.
  5. "Writing ARM Assembly Language > Literal pools". Archived from the original on 19 August 2022.
  6. "IAR Assembler User Guide for Arm Limited's ARM Cores (AARM-12)" (PDF). Archived (PDF) from the original on 19 January 2022. Retrieved 22 August 2022. Archived 22 August 2022 at archive.today
  7. "High Level Assembler for z/OS & z/VM & z/VSE V1R6 (HLASM V1R6) Language Reference: Literal pool". IBM . Archived from the original on 19 August 2022.
  8. Geiselhart, Gregory; Grahn, Andrea; Handoko, Frans; Hundertmark, Jörg; Krzymowski, Albert; Pomar, Eliuth (July 2002). Linux on IBM zSeries and S/390: Application Development (PDF). IBM Redbooks. IBM. Archived (PDF) from the original on 22 July 2021. Archived 19 August 2022 at archive.today
  9. "Using as (section 9.41.3.8)". Archived from the original on 19 August 2022.
  10. William von Hagen. "The Definitive Guide to GCC".
  11. Jacquie Barker. "Beginning Java Objects: From Concepts to Code": "literal pool".