Dhrystone

Last updated October 02, 2024

Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance. The name "Dhrystone" is a pun on a different benchmark algorithm called Whetstone, which emphasizes floating point performance.^[1]

With Dhrystone, Weicker gathered meta-data from a broad range of software, including programs written in FORTRAN, PL/1, SAL, ALGOL 68, and Pascal. He then characterized these programs in terms of various common constructs: procedure calls, pointer indirections, assignments, etc. From this he wrote the Dhrystone benchmark to correspond to a representative mix. Dhrystone was published in Ada, with the C version for Unix developed by Rick Richardson ("version 1.1") greatly contributing to its popularity.

Dhrystone vs. Whetstone

The Dhrystone benchmark contains no floating point operations, thus the name is a pun on the then-popular Whetstone benchmark for floating point operations. The output from the benchmark is the number of Dhrystones per second (the number of iterations of the main code loop per second).

Both Whetstone and Dhrystone are synthetic benchmarks, meaning that they are simple programs that are carefully designed to statistically mimic the processor usage of some common set of programs. Whetstone, developed in 1972, originally strove to mimic typical Algol 60 programs based on measurements from 1970, but eventually became most popular in its Fortran version, reflecting the highly numerical orientation of computing in the 1960s.

Issues addressed by Dhrystone

Dhrystone's eventual importance as an indicator of general-purpose ("integer") performance of new computers made it a target for commercial compiler writers. Various modern compiler static code analysis techniques (such as elimination of dead code: for example, code which uses the processor but produces internal results which are not used or output) make the use and design of synthetic benchmarks more difficult. Version 2.0 of the benchmark, released by Weicker and Richardson in March 1988, had a number of changes intended to foil a range of compiler techniques. Yet it was carefully crafted so as not to change the underlying benchmark. This effort to foil compilers was only partly successful. Dhrystone 2.1, released in May of the same year, had some minor changes and as of July 2010^[update] remains the current definition of Dhrystone.

Other than issues related to compiler optimization, various other issues have been cited with the Dhrystone. Most of these, including the small code size and small data set size, were understood at the time of its publication in 1984. More subtle is the slight over-representation of string operations, which is largely language-related: both Ada and Pascal have strings as normal variables in the language, whereas C does not, so what was simple variable assignment in reference benchmarks became buffer copy operations in the C library. Another issue is that the score reported does not include information which is critical when comparing systems such as which compiler was used, and what optimizations.

Dhrystone remains remarkably resilient as a simple benchmark, but its continuing value in establishing true performance is questionable. It is easy to use, well documented, fully self-contained, well understood, and can be made to work on almost any system. In particular, it has remained in broad use in the embedded computing world, though the recently developed EEMBC benchmark suite, the CoreMark standalone benchmark, HINT, Stream, and even Bytemark are widely quoted and used, as well as more specific benchmarks for the memory subsystem (Cachebench), TCP/IP (TTCP), and many others.

Results

Dhrystone may represent a result more meaningfully than MIPS (million instructions per second) because instruction count comparisons between different instruction sets (e.g. RISC vs. CISC) can confound simple comparisons. For example, the same high-level task may require many more instructions on a RISC machine, but might execute faster than a single CISC instruction. Thus, the Dhrystone score counts only the number of program iteration completions per second, allowing individual machines to perform this calculation in a machine-specific way. Another common representation of the Dhrystone benchmark is the DMIPS (Dhrystone MIPS) obtained when the Dhrystone score is divided by 1757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine).

Another way to represent results is in DMIPS/MHz, where DMIPS result is further divided by CPU frequency, to allow for easier comparison of CPUs running at different clock rates.

Shortcomings

Using Dhrystone as a benchmark has pitfalls:

It features unusual code that is not usually representative of modern real-life programs.^[2]
It is susceptible to compiler optimizations. For example, it does a lot of string copying in an attempt to measure string copying performance. However, the strings in Dhrystone are of known constant length and their starts are aligned on natural boundaries, two characteristics usually absent from real programs. Therefore, an optimizer can replace a string copy with a sequence of word moves without any loops, which will be much faster. This optimization consequently overstates system performance, sometimes by more than 30%.^[3]
Dhrystone's small code size may fit in the instruction cache of a modern CPU, so that instruction fetch performance is not rigorously tested.^[2] Similarly, Dhrystone may also fit completely in the data cache, thus not exercising data cache miss performance. To counter fits-in-the-cache problem, the SPECint benchmark was created in 1988 to include a suite of (initially 8) much larger programs (including a compiler) which could not fit into L1 or L2 caches of that era.

Related Research Articles

A complex instruction set computer is a computer architecture in which single instructions can execute several low-level operations or are capable of multi-step operations or addressing modes within single instructions. The term was retroactively coined in contrast to reduced instruction set computer (RISC) and has therefore become something of an umbrella term for everything that is not RISC, where the typical differentiating characteristic is that most RISC designs use uniform instruction length for almost all instructions, and employ strictly separate load and store instructions.

Instructions per second (IPS) is a measure of a computer's processor speed. For complex instruction set computers (CISCs), different instructions take different amounts of time, so the value measured depends on the instruction mix; even for comparing processors in the same family the IPS measurement can be problematic. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches and no cache contention, whereas realistic workloads typically lead to significantly lower IPS values. Memory hierarchy also greatly affects processor performance, an issue barely considered in IPS calculations. Because of these problems, synthetic benchmarks such as Dhrystone are now generally used to estimate computer performance in commonly used applications, and raw IPS has fallen into disuse.

The Pentium is a x86 microprocessor introduced by Intel on March 22, 1993. It is the first CPU using the Pentium brand. Considered the fifth generation in the 8086 compatible line of processors, its implementation and microarchitecture was internally called P5.

In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set computer (CISC), a RISC computer might require more instructions in order to accomplish a task because the individual instructions are written in simpler code. The goal is to offset the need to process more instructions by increasing the speed of each instruction, in particular by implementing an instruction pipeline, which may be simpler to achieve given simpler instructions.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, algorithms that transform code to produce semantically equivalent code optimized for some aspect. It is typically CPU and memory intensive. In practice, factors such as available memory and a programmer's willingness to wait for compilation limit the optimizations that a compiler might provide. Research indicates that some optimization problems are NP-complete, or even undecidable.

In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.

Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction-level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel, whereas conventional central processing units (CPUs) mostly allow programs to specify instructions to execute in sequence only. VLIW is intended to allow higher performance without the complexity inherent in some other designs.

The 88000 is a RISC instruction set architecture developed by Motorola during the 1980s. The MC88100 arrived on the market in 1988, some two years after the competing SPARC and MIPS. Due to the late start and extensive delays releasing the second-generation MC88110, the m88k achieved very limited success outside of the MVME platform and embedded controller environments. When Motorola joined the AIM alliance in 1991 to develop the PowerPC, further development of the 88000 ended.

The 801 was an experimental central processing unit (CPU) design developed by IBM during the 1970s. It is considered to be the first modern RISC design, relying on processor registers for all computations and eliminating the many variant addressing modes found in CISC designs. Originally developed as the processor for a telephone switch, it was later used as the basis for a minicomputer and a number of products for their mainframe line. The initial design was a 24-bit processor; that was soon replaced by 32-bit implementations of the same concepts and the original 24-bit 801 was used only into the early 1980s.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

The Whetstone benchmark is a synthetic benchmark for evaluating the performance of computers. It was first written in ALGOL 60 in 1972 at the Technical Support Unit of the Department of Trade and Industry in the United Kingdom. It was derived from statistics on program behaviour gathered on the KDF9 computer at NPL National Physical Laboratory, using a modified version of its Whetstone ALGOL 60 compiler. The workload on the machine was represented as a set of frequencies of execution of the 124 instructions of the Whetstone Code. The Whetstone Compiler was built at the Atomic Power Division of the English Electric Company in Whetstone, Leicestershire, England, hence its name. Dr. B.A. Wichman at NPL produced a set of 42 simple ALGOL 60 statements, which in a suitable combination matched the execution statistics.

KDF9 was an early British 48-bit computer designed and built by English Electric. The first machine came into service in 1964 and the last of 29 machines was decommissioned in 1980 at the National Physical Laboratory. The KDF9 was designed for, and used almost entirely in, the mathematical and scientific processing fields – in 1967, nine were in use in UK universities and technical colleges. The KDF8, developed in parallel, was aimed at commercial processing workloads.

The AT&T Hobbit is a microprocessor design developed by AT&T Corporation in the early 1990s. It was based on the company's CRISP design resembling the classic RISC pipeline, and which in turn grew out of the C Machine design by Bell Labs of the late 1980s. All were optimized for running code compiled from the C programming language. The design concentrates on fast instruction decoding, indexed array access, and procedure calls.

Berkeley RISC is one of two seminal research projects into reduced instruction set computer (RISC) based microprocessor design taking place under the Defense Advanced Research Projects Agency VLSI Project. RISC was led by David Patterson at the University of California, Berkeley between 1980 and 1984. The other project took place a short distance away at Stanford University under their MIPS effort starting in 1981 and running until 1984.

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as μarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.

AVR32 is a 32-bit RISC microcontroller architecture produced by Atmel. The microcontroller architecture was designed by a handful of people educated at the Norwegian University of Science and Technology, including lead designer Øyvind Strøm and CPU architect Erik Renno in Atmel's Norwegian design center.

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

The Central Computer and Telecommunications Agency (CCTA) was a UK government agency providing computer and telecoms support to government departments.

CoreMark is a benchmark that measures the performance of central processing units (CPU) used in embedded systems. It was developed in 2009 by Shay Gal-On at EEMBC and is intended to become an industry standard, replacing the Dhrystone benchmark. The code is written in C and contains implementations of the following algorithms: list processing, matrix manipulation, state machine, and CRC. The code is under the Apache License 2.0 and is free of cost to use, but ownership is retained by the Consortium and publication of modified versions under the CoreMark name prohibited.

References

↑ Pun: whetstone → wet stone → dry stone → dhrystone.
1 2 Weiss, Alan. "Dhrystone Benchmark: History, Analysis, "Scores" and Recommendations" (PDF). Archived (PDF) from the original on 2011-07-26. Retrieved 2020-04-28.
↑ Franco Zappa (2017). Microcontrollers: Hardware and Firmware for 8-bit and 32-bit devices. Società Editrice Esculapio. p. 66. ISBN 978-88-9385-022-3.

External links

Weicker, Reinhold (October 1984). "Dhrystone: A Synthetic Systems Programming Benchmark". Communications of the ACM . 27 (10): 1013–30. doi: 10.1145/358274.358283 . S2CID 9026014.

Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules (Reinhold P. Weicker, 1988)
DHRYSTONE Benchmark Program (Reinhold P. Weicker, 1995)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Pun: whetstone → wet stone → dry stone → dhrystone.

[Weiss-2] 1 2 Weiss, Alan. "Dhrystone Benchmark: History, Analysis, "Scores" and Recommendations" (PDF). Archived (PDF) from the original on 2011-07-26. Retrieved 2020-04-28.

[3] Franco Zappa (2017). Microcontrollers: Hardware and Firmware for 8-bit and 32-bit devices. Società Editrice Esculapio. p. 66. ISBN 978-88-9385-022-3.

[1]

[2]

[3]