Coremark

Last updated July 27, 2022

CoreMark is a benchmark that measures the performance of central processing units (CPU) used in embedded systems. It was developed in 2009^[1] by Shay Gal-On at EEMBC and is intended to become an industry standard, replacing the Dhrystone benchmark.^[2] The code is written in C and contains implementations of the following algorithms: list processing (find and sort), matrix manipulation (common matrix operations), state machine (determine if an input stream contains valid numbers), and CRC. The code is under the Apache License 2.0 and is free of cost to use, but ownership is retained by the Consortium and publication of modified versions under the CoreMark name prohibited.^[3]

Issues addressed by CoreMark

The CRC algorithm serves a dual function; it provides a workload commonly seen in embedded applications and ensures correct operation of the CoreMark benchmark, essentially providing a self-checking mechanism. Specifically, to verify correct operation, a 16-bit CRC is performed on the data contained in elements of the linked list.

To ensure compilers cannot pre-compute the results at compile time every operation in the benchmark derives a value that is not available at compile time. Furthermore, all code used within the timed portion of the benchmark is part of the benchmark itself (no library calls).

CoreMark versus Dhrystone

CoreMark draws on the strengths that made Dhrystone so resilient - it is small, portable, easy to understand, free, and displays a single number benchmark score. Unlike Dhrystone, CoreMark has specific run and reporting rules, and was designed to avoid the well understood issues that have been cited with Dhrystone.

Major portions of Dhrystone are susceptible to a compiler’s ability to optimize the work away; thus it is more a compiler benchmark than a hardware benchmark. This also makes it very difficult to compare results when different compilers/flags are used.

Library calls are made within the timed portion of Dhrystone. Typically, those library calls consume the majority of the time consumed by the benchmark. Since the library code is not part of the benchmark, it is difficult to compare results if different libraries are used. Guidelines exist on how to run Dhrystone but since results are not certified or verified, they are not enforced.^{[ citation needed ]} There is no standardization on how Dhrystone results should be reported, with various formats in use (DMIPS, Dhrystones per second, DMIPS/MHz)

Results

CoreMark results can be found on the CoreMark web site,^[4] and on processor data sheets. Results are in the following format:

CoreMark 1.0 : N / C / P / M

N Number of iterations per second (with seeds 0,0,0x66,size=2000)
C Compiler version and flags
P Parameters such as data and code allocation specifics
M – Type of Parallel algorithm execution (if used) and number of contexts

For example: CoreMark 1.0 : 128 / GCC 4.1.2 -O2 -fprofile-use / Heap in TCRAM / FORK:2

Related Research Articles

Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance. The name "Dhrystone" is a pun on a different benchmark algorithm called Whetstone, which emphasizes floating point performance.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

A system on a chip, also written as system-on-a-chip and system-on-chip, is an integrated circuit that integrates all or most components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory interfaces, on-chip input/output devices, input/output interfaces, and secondary storage interfaces, often alongside other components such as radio modems and a graphics processing unit (GPU) – all on a single substrate or microchip. It may contain digital, analog, mixed-signal, and often radio frequency signal processing functions.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on the usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

Nios II is a 32-bit embedded processor architecture designed specifically for the Altera family of field-programmable gate array (FPGA) integrated circuits. Nios II incorporates many enhancements over the original Nios architecture, making it more suitable for a wider range of embedded computing applications, from digital signal processing (DSP) to system-control.

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.

V850 is a 32-bit RISC CPU architecture produced by Renesas Electronics for embedded microcontrollers. It was designed by NEC as a replacement for their earlier NEC V60 family, and introduced shortly before NEC sold their designs to Renesas in the early 1990s. It has continued to be developed by Renesas as of 2018.

The AVR32 is a 32-bit RISC microcontroller architecture produced by Atmel. The microcontroller architecture was designed by a handful of people educated at the Norwegian University of Science and Technology, including lead designer Øyvind Strøm and CPU architect Erik Renno in Atmel's Norwegian design center.

EEMBC, the Embedded Microprocessor Benchmark Consortium, is a non-profit, member-funded organization formed in 1997, focused on the creation of standard benchmarks for the hardware and software used in embedded systems. The goal of its members is to make EEMBC benchmarks an industry standard for evaluating the capabilities of embedded processors, compilers, and the associated embedded system implementations, according to objective, clearly defined, application-based criteria. EEMBC members may contribute to the development of benchmarks, vote at various stages before public distribution, and accelerate testing of their platforms through early access to benchmarks and associated specifications.

In software development, the programming language Java was historically considered slower than the fastest 3rd generation typed languages such as C and C++. The main reason being a different language design, where after compiling, Java programs run on a Java virtual machine (JVM) rather than directly on the computer's processor as native code, as do C and C++ programs. Performance was a matter of concern because much business software has been written in Java after the language quickly became popular in the late 1990s and early 2000s.

Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.

oneAPI Threading Building Blocks, is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

Dalvik is a discontinued process virtual machine (VM) in Android operating system that executes applications written for Android. Dalvik was an integral part of the Android software stack in the Android versions 4.4 "KitKat" and earlier, which were commonly used on mobile devices such as mobile phones and tablet computers, and more in some devices such as smart TVs and wearables. Dalvik is open-source software, originally written by Dan Bornstein, who named it after the fishing village of Dalvík in Eyjafjörður, Iceland.

In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consumed. This rate is typically measured by performance on the LINPACK benchmark when trying to compare between computing systems: an example using this is the Green500 list of supercomputers. Performance per watt has been suggested to be a more sustainable measure of computing than Moore’s Law.

OpenRISC 1200 Open source microprocessor

The OpenRISC 1200 (OR1200) is an implementation of the open source OpenRISC 1000 RISC architecture.

The Infineon XC800 family is an 8-bit microcontroller family, first introduced in 2005, with a dual cycle optimized 8051 "E-Warp" core. The XC800 family is divided into two categories, the A-Family for Automotive and the I-Family for Industrial and multi-market applications.

The Amber processor core is an ARM architecture-compatible 32-bit reduced instruction set computing (RISC) processor. It is open source, hosted on the OpenCores website, and is part of a movement to develop a library of open source hardware projects.

The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering.

SequenceL is a general purpose functional programming language and auto-parallelizing compiler and tool set, whose primary design objectives are performance on multi-core processor hardware, ease of programming, platform portability/optimization, and code clarity and readability. Its main advantage is that it can be used to write straightforward code that automatically takes full advantage of all the processing power available, without programmers needing to be concerned with identifying parallelisms, specifying vectorization, avoiding race conditions, and other challenges of manual directive-based programming approaches such as OpenMP.

References

↑ Pitcher, Graham (2009-06-08). "EEMBC launches MIPS busting benchmark". newelectronics.co.uk. Retrieved 2020-04-28.
↑ "ARM Announces Support For EEMBC CoreMark Benchmark". GISCafe. 2009-06-06. Retrieved 2020-04-28.
↑ "COREMARK® ACCEPTABLE USE AGREEMENT". GitHub . 2018-05-24. Retrieved 2020-04-28.
↑ "Scores". Coremark. Retrieved 2020-04-28.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Pitcher, Graham (2009-06-08). "EEMBC launches MIPS busting benchmark". newelectronics.co.uk. Retrieved 2020-04-28.

[2] "ARM Announces Support For EEMBC CoreMark Benchmark". GISCafe. 2009-06-06. Retrieved 2020-04-28.

[3] "COREMARK® ACCEPTABLE USE AGREEMENT". GitHub . 2018-05-24. Retrieved 2020-04-28.

[4] "Scores". Coremark. Retrieved 2020-04-28.

[1]

[2]

[3]

[4]