Microarchitecture simulation

Last updated

Microarchitecture simulation is an important technique in computer architecture research and computer science education. It is a tool for modeling the design and behavior of a microprocessor and its components, such as the ALU, cache memory, control unit, and data path, among others. The simulation allows researchers to explore the design space as well as to evaluate the performance and efficiency of novel microarchitecture features. For example, several microarchitecture components, such as branch predictors, re-order buffer, and trace cache, went through numerous simulation cycles before they become common components in contemporary microprocessors of today. In addition, the simulation also enables educators to teach computer organization and architecture courses with hand-on experiences.

Contents

For system-level simulation of computer hardware, please refer to the full system simulation.

Classification

Microarchitecture simulation can be classified into multiple categories according to input types and level of details. Specifically, the input can be a trace collected from an execution of program on a real microprocessor (so called trace-driven simulation) or a program itself (so called execution-driven simulation).

A trace-driven simulation [1] reads a fixed sequence of trace records from a file as an input. These trace records usually represent memory references, branch outcomes, or specific machine instructions, among others. While a trace-driven simulation is known to be comparatively fast and its results are highly reproducible, it also requires a very large storage space. On the other hand, an execution-driven simulation [2] reads a program and simulates the execution of machine instructions on the fly. A program file is typically several magnitudes smaller than a trace file. However, the execution-driven simulation is much slower than the trace-driven simulation because it has to process each instruction one-by-one and update all statuses of the microarchitecture components involved. Thus, the selection of input types for simulation is a trade-off between space and time. In particular, a very detailed trace for a highly accurate simulation requires a very large storage space, whereas a very accurate execution-driven simulation takes a very long time to execute all instructions in the program.

Apart from input types, the level of details can also be used to classify the simulation. In particular, a piece of software that simulates a microprocessor executing a program on a cycle-by-cycle basis is known as cycle-accurate simulator, whereas instruction set simulator only models the execution of a program on a microprocessor through the eyes of an instruction scheduler along with a coarse timing of instruction execution. Most computer science classes in computer architecture with hand-on experiences adopt the instruction set simulators as tools for teaching, whereas the cycle-accurate simulators are deployed mostly for research projects due to both complexities and resource consumption.

Usages

Microarchitecture simulators are deployed for a variety of purposes. It allows researchers to evaluate their ideas without the need to fabricate a real microprocessor chip, which is both expensive and time consuming. For instance, simulating a microprocessor with thousand of cores along with multiple levels of cache memory incurs very little cost when comparing with the fabrication of a prototyping chip. The researchers can also play with several configurations of the cache hierarchy using different cache models in the simulator instead of having to fabricate a new chip every time they want to test something different.

Another usage of the microarchitecture simulator is in education. [3] Given that a course in computer architecture teaches students many different microprocessor's features and its architectures, the microarchitecture simulator is ideal for modeling and experimenting with different features and architectures over the course of a semester. For example, students may start with a microarchitecture simulator that models a simple microprocessor design at the beginning of a semester. As the semester progresses, additional features, such as instruction pipelining, register renaming, reservation stations, out-of-order execution, and scoreboarding, can be modeled and added to the simulator as they are introduced in the classroom. Microarchitecture simulator provides the flexibility of reconfiguration and testing with minimal costs.

Examples

Related Research Articles

Central processing unit Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).

In processor design, microcode is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a layer of hardware-level instructions that implement higher-level machine code instructions or internal finite-state machine sequencing in many digital processing elements. Microcode is used in general-purpose central processing units, although in current desktop CPUs, it is only a fallback path for cases that the faster hardwired control unit cannot handle.

Pentium (original) Intel microprocessor

The Pentium is a microprocessor that was introduced by Intel on March 22, 1993, as the first CPU in the Pentium brand. It was instruction set compatible with the 80486 but was a new and very different microarchitecture design. The P5 Pentium was the first superscalar x86 microarchitecture and the world's first superscalar microprocessor to be in mass production. It included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches, and many other techniques and features to enhance performance and support security, encryption, and multiprocessing, for workstations and servers.

Superscalar processor CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

SPIM is a MIPS processor simulator, designed to run assembly language code for this architecture. The program simulates R2000 and R3000 processors, and was written by James R. Larus while a professor at the University of Wisconsin–Madison. The MIPS machine language is often taught in college-level assembly courses, especially those using the textbook Computer Organization and Design: The Hardware/Software Interface by David A. Patterson and John L. Hennessy (ISBN 1-55860-428-6).

Simics is a full-system simulator or virtual platform used to run unchanged production binaries of the target hardware. Simics was originally developed by the Swedish Institute of Computer Science (SICS), and then spun off to Virtutech for commercial development in 1998. Virtutech was acquired by Intel in 2010. Currently, Simics is provided by Intel in a public release and sold commercially by Wind River Systems, which was in the past a subsidiary of Intel.

In computer engineering, out-of-order execution is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

Microarchitecture Component of computer engineering

In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

An instruction set simulator (ISS) is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.

In computer science, performance prediction means to estimate the execution time or other performance factors of a program on a given computer. It is being widely used for computer architects to evaluate new computer designs, for compiler writers to explore new optimizations, and also for advanced developers to tune their programs.

A computer architecture simulator is a program that simulates the execution of computer architecture.

Multithreading (computer architecture) Ability of a CPU to provide multiple threads of execution concurrently

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

Simulation software is based on the process of modeling a real phenomenon with a set of mathematical formulas. It is, essentially, a program that allows the user to observe an operation through simulation without actually performing that operation. Simulation software is used widely to design equipment so that the final product will be as close to design specs as possible without expensive in process modification. Simulation software with real-time response is often used in gaming, but it also has important industrial applications. When the penalty for improper operation is costly, such as airplane pilots, nuclear power plant operators, or chemical plant operators, a mock up of the actual control panel is connected to a real-time simulation of the physical response, giving valuable training experience without fear of a disastrous outcome.

Emulator System allowing a device to imitate another

In computing, an emulator is hardware or software that enables one computer system to behave like another computer system. An emulator typically enables the host system to run software or use peripheral devices designed for the guest system. Emulation refers to the ability of a computer program in an electronic device to emulate another program or device.

The Alpha 21464 is an unfinished microprocessor that implements the Alpha instruction set architecture (ISA) developed by Digital Equipment Corporation and later by Compaq after it acquired Digital. The microprocessor was also known as EV8. Slated for a 2004 release, it was canceled on 25 June 2001 when Compaq announced that Alpha would be phased out in favor of Itanium by 2004. When it was canceled, the Alpha 21464 was at a late stage of development but had not been taped out.

MikroSim

MikroSim is an educational software computer program for hardware-non-specific explanation of the general functioning and behaviour of a virtual processor, running on the Microsoft Windows operating system. Devices like miniaturized calculators, microcontroller, microprocessors, and computer can be explained on custom-developed instruction code on a register transfer level controlled by sequences of micro instructions (microcode). Based on this it is possible to develop an instruction set to control a virtual application board at higher level of abstraction.

Latency oriented processor architecture is the microarchitecture of a microprocessor designed to serve a serial computing thread with a low latency. This is typical of most central processing units (CPU) being developed since the 1970s. These architectures, in general, aim to execute as many instructions as possible belonging to a single serial thread, in a given window of time; however, the time to execute a single instruction completely from fetch to retire stages may vary from a few cycles to even a few hundred cycles in some cases. Latency oriented processor architectures are the opposite of throughput-oriented processors which concern themselves more with the total throughput of the system, rather than the service latencies for all individual threads that they work on.

References

  1. Uhlig, R. A., & Mudge, T. N. (2004). Trace-Driven Memory Simulation: A Survey. ACM Computing Surveys, 29(2), 128-170.
  2. Burger, D., & Austin, T. M. (1997). The Simplescalar Tool Set Version 2.0. Computer Architecture News, 25(3), 13-25.
  3. Skadron, K. (1996). A Microprocessor Survey Course for Learning Advanced Computer Architecture. In Proceedings of the 2002 ACM SIGCSE Conference, 152-156.
  4. Cmelik, R. F., & Keppel, D. (1994). Shade: A Fast Instruction-Set Simulator for Execution Profiling. ACM SIGMETRICS Performance Evaluation Review, 22(1), 128-137.
  5. Austin, T., Larson, E., & Ernst, D. (2002). SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Computer Magazine, 35(2), 59-67.
  6. D. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. http://www.simplescalar.com, 1997.
  7. Patterson, D. A., & Hennessy, J. L. (2011). Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann.
  8. Tullsen, D. M. (1996). Simulation and Modeling of a Simultaneous Multithreading Processor. In Proceedings of the 22nd Annual Computer Measurement Group Conference.