Register window

Last updated November 01, 2025

In computer engineering, register windows are a feature which dedicates registers to a subroutine by dynamically aliasing a subset of internal registers to fixed, programmer-visible registers. Register windows are implemented to improve the performance of a processor by reducing the number of stack operations required for function calls and returns. One of the most influential features of the Berkeley RISC design, they were later implemented in instruction set architectures such as AMD Am29000, Intel i960, Sun Microsystems SPARC, and Intel Itanium.

General operation

Several sets of registers are provided for the different parts of the program. Registers are deliberately hidden from the programmer to force several subroutines to share processor resources.

Rendering the registers invisible can be implemented efficiently; the CPU recognizes the movement from one part of the program to another during a procedure call. It is accomplished by one of a small number of instructions (prologue) and ends with one of a similarly small set (epilogue). In the Berkeley design, these calls would cause a new set of registers to be "swapped in" at that point, or marked as "dead" (or "reusable") when the call ends.

Application in CPUs

In the Berkeley RISC design, only eight registers out of a total of 64 are visible to the programs. The complete set of registers are known as the register file, and any particular set of eight as a window. The file allows up to eight procedure calls to have their own register sets. As long as the program does not call down chains longer than eight calls deep, the registers never have to be spilled , i.e. saved out to main memory or cache which is a slow process compared to register access.

By comparison, the Sun Microsystems SPARC architecture provides simultaneous visibility into four sets of eight registers each. Three sets of eight registers each are "windowed". Eight registers (i0 through i7) form the input registers to the current procedure level. Eight registers (L0 through L7) are local to the current procedure level, and eight registers (o0 through o7) are the outputs from the current procedure level to the next level called. When a procedure is called, the register window shifts by sixteen registers, hiding the old input registers and old local registers and making the old output registers the new input registers. The common registers (old output registers and new input registers) are used for parameter passing. Finally, eight registers (g0 through g7) are globally visible to all procedure levels.

The AMD 29000 improved the design by allowing the windows to be of variable size, which helps utilization in the common case where fewer than eight registers are needed for a call. It also separated the registers into a global set of 64, and an additional 128 for the windows. Similarly, the IA-64 (Itanium) architecture used variable-sized windows, with 32 global registers and 96 for the windows.

In the Infineon C166 architecture, most registers are simply locations in internal RAM which have the additional property of being accessible as registers. Of these, the addresses of the 16 general-purpose registers (R0-R15) are not fixed. Instead, the R0 register is located at the address pointed to by the "Context Pointer" (CP) register, and the remaining 15 registers follow sequentially thereafter.^[1]

Register windows also provide an easy upgrade path. Since the additional registers are invisible to the programs, additional windows can be added at any time. For instance, the use of object-oriented programming often results in a greater number of "smaller" calls, which can be accommodated by increasing the windows from eight to sixteen for instance. This was the approach used in the SPARC, which has included more register windows with newer generations of the architecture. The end result is fewer slow register window spill and fill operations because the register windows overflow less often.

In GPUs

Register windows is also a nearly universally used mechanism in GPUs, particularly in the Execution Units tasked with running general-purpose compute threads. For instance, in Intel's GenX architecture, each of the many parallel-running Execution Units ("cores") has its own register file, containing (say) 256 physical general-purpose registers. Each thread spawned on a core then accesses its own register window with a zero-based index which is mapped by hardware to a separate set of contiguous physical registers in that core's register file. Each window may consist of a variable number of registers depending on the function that the thread implements, but usually limited to no more than half the number of physical registers in the file. This is so that (at least) a second thread can immediately take over an otherwise idle core when the previously running thread must stall for some reason (usually to communicate with outside memory buffers).

Criticism

A drawback of register windows is that context switches require saving a large number of registers to memory. The SPARC implementation always advances the register window by sixteen registers, meaning that when this happens, many of these saved registers will not even contain useful data. Some have criticized the original studies that led to the implementation of register windows, for considering only programs in isolation and ignoring multitasking workloads. ^[2]

Register windows are not the only way to improve register performance. The group at Stanford University designing the MIPS saw the Berkeley work and decided that the problem was not a shortage of registers, but poor utilization of the existing ones. They instead invested more time in their compiler's register allocation, making sure it wisely used the larger set available in MIPS. This resulted in reduced complexity of the chip, with one half the total number of registers, while offering potentially higher performance in those cases where a single procedure could make use of the larger visible register space. In the end, with modern compilers, MIPS makes better use of its register space even during procedure calls.^{[ citation needed ]}

References

↑ "Infineon C166 Family Instruction Set Manual" (PDF). Keil . Retrieved 2020-03-12.
↑ Magnusson, Peter (April 1997). "Understanding stacks and registers in the Sparc architecture(s)". CSE 131 - Compiler Construction. University of San Diego . Retrieved October 24, 2024. The drawback is that upon interactions with the system the registers need to be flushed to the stack, necessitating a long sequence of writes to memory of data that is often mostly garbage. Register windows was a bad idea that was caused by simulation studies that considered only programs in isolation, as opposed to multitasking workloads, and by considering compilers with poor optimization.