Data-oriented design

Last updated

In computing, data-oriented design is a program optimization approach motivated by efficient usage of the CPU cache, often used in video game development. [1] The approach is to focus on the data layout, separating and sorting fields according to when they are needed, and to think about transformations of data. Proponents include Mike Acton, [2] Scott Meyers, [3] and Jonathan Blow.

Contents

The parallel array (or structure of arrays) is the main example of data-oriented design. It is contrasted with the array of structures typical of object-oriented designs.

The definition of data-oriented design as a programming paradigm can be seen as contentious as many believe that it can be used side by side with another paradigm, [4] but due to the emphasis on data layout, it is also incompatible with most other paradigms. [1]

Motives

These methods became especially popular in the mid to late 2000s during the seventh generation of video game consoles that included the IBM PowerPC based PlayStation 3 (PS3) and Xbox 360 consoles. Historically, game consoles often have relatively weak central processing units (CPUs) compared to the top-of-line desktop computer counterparts. This is a design choice to devote more power and transistor budget to the graphics processing units (GPUs). For example, the 7th generation CPUs were not manufactured with modern out-of-order execution processors, but instead use in-order processors with high clock speeds and deep pipelines. In addition, most types of computing systems have main memory located hundreds of clock cycles away from the processing elements. Furthermore, as CPUs have become faster alongside a large increase in main memory capacity, there is massive data consumption that increases the likelihood of cache misses in the shared bus, otherwise known as Von Neumann bottlenecking. Consequently, locality of reference methods have been used to control performance, requiring improvement of memory access patterns to fix bottlenecking. Some of the software issues were also similar to those encountered on the Itanium, requiring loop unrolling for upfront scheduling.

Contrast with object orientation

The claim is that traditional object-oriented programming (OOP) design principles result in poor data locality, [5] [6] more so if runtime polymorphism (dynamic dispatch) is used (which is especially problematic on some processors). [7] [1] Although OOP appears to "organise code around data", it actually organises source code around data types rather than physically grouping individual fields and arrays in an efficient format for access by specific functions. Moreover, it often hides layout details under abstraction layers, while a data-oriented programmer wants to consider this first and foremost.

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

In computer architecture, 8-bit integers or other data units are those that are 8 bits wide. Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses of that size. Memory addresses for 8-bit CPUs are generally larger than 8-bit, usually 16-bit. 8-bit microcomputers are microcomputers that use 8-bit microprocessors.

<span class="mw-page-title-main">Memory hierarchy</span> Computer memory architecture

In computer organisation, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

<span class="mw-page-title-main">Pentium Pro</span> Sixth-generation x86 microprocessor by Intel

The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications. While the Pentium and Pentium MMX had 3.1 and 4.5 million transistors, respectively, the Pentium Pro contained 5.5 million transistors. Later, it was reduced to a more narrow role as a server and high-end desktop processor and was used in supercomputers like ASCI Red, the first computer to reach the trillion floating point operations per second (teraFLOPS) performance mark in 1996. The Pentium Pro was capable of both dual- and quad-processor configurations. It only came in one form factor, the relatively large rectangular Socket 8. The Pentium Pro was succeeded by the Pentium II Xeon in 1998.

<span class="mw-page-title-main">Front-side bus</span> Type of computer communication interface

The front-side bus (FSB) is a computer communication interface (bus) that was often used in Intel-chip-based computers during the 1990s and 2000s. The EV6 bus served the same function for competing AMD CPUs. Both typically carry data between the central processing unit (CPU) and a memory controller hub, known as the northbridge.

<span class="mw-page-title-main">Intel i960</span> RISC-based microprocessor design

Intel's i960 was a RISC-based microprocessor design that became popular during the early 1990s as an embedded microcontroller. It became a best-selling CPU in that segment, along with the competing AMD 29000. In spite of its success, Intel stopped marketing the i960 in the late 1990s, as a result of a settlement with DEC whereby Intel received the rights to produce the StrongARM CPU. The processor continues to be used for a few military applications.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

<span class="mw-page-title-main">Multi-core processor</span> Microprocessor with more than one processing unit

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is an internal memory, usually high-speed, used for temporary storage of calculations, data, and other work in progress. In reference to a microprocessor, scratchpad refers to a special high-speed memory used to hold small items of data for rapid retrieval. It is similar to the usage and size of a scratchpad in life: a pad of paper for preliminary notes or sketches or writings, etc. When the scratchpad is a hidden portion of the main memory then it is sometimes referred to as bump storage.

<span class="mw-page-title-main">Larrabee (microarchitecture)</span>

Larrabee is the codename for a cancelled GPGPU chip that Intel was developing separately from its current line of integrated graphics accelerators. It is named after either Mount Larrabee or Larrabee State Park in Whatcom County, Washington, near the town of Bellingham. The chip was to be released in 2010 as the core of a consumer 3D graphics card, but these plans were cancelled due to delays and disappointing early performance figures. The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010 and its technology was passed on to the Xeon Phi. The Intel MIC multiprocessor architecture announced in 2010 inherited many design elements from the Larrabee project, but does not function as a graphics processing unit; the product is intended as a co-processor for high performance computing.

This article attempts to set out the various similarities and differences between the various programming paradigms as a summary in both graphical and tabular format with links to the separate discussions concerning these similarities and differences in extant Wikipedia articles.

Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data and code: data in the form of fields, and code in the form of procedures.

Intel Advisor is a design assistance and analysis tool for SIMD vectorization, threading, memory use, and GPU offload optimization. The tool supports C, C++, Data Parallel C++ (DPC++), Fortran and Python languages. It is available on Windows and Linux operating systems in form of Standalone GUI tool, Microsoft Visual Studio plug-in or command line interface. It supports OpenMP. Intel Advisor user interface is also available on macOS.

In computing, an array of structures (AoS), structure of arrays (SoA) or array of structures of arrays (AoSoA) are contrasting ways to arrange a sequence of records in memory, with regard to interleaving, and are of interest in SIMD and SIMT programming.

In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache performance, and also have implications for the approach to parallelism and distribution of workload in shared memory systems. Further, cache coherency issues can affect multiprocessor performance, which means that certain memory access patterns place a ceiling on parallelism.

<span class="mw-page-title-main">Cache hierarchy</span> Memory hierarchy concept applied to CPU caches with multiple levels

Cache hierarchy, or multi-level caches, refers to a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access by central processing unit (CPU) cores.

<span class="mw-page-title-main">Epyc</span> AMD brand for server microprocessors

Epyc is a brand of multi-core x86-64 microprocessors designed and sold by AMD, based on the company's Zen microarchitecture. Introduced in June 2017, they are specifically targeted for the server and embedded system markets.

<span class="mw-page-title-main">Meltdown (security vulnerability)</span> Microprocessor security vulnerability

Meltdown is one of the two original transient execution CPU vulnerabilities. Meltdown affects Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

References

  1. 1 2 3 Llopis, Noel (December 4, 2009). "Data-oriented design". Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP). Retrieved April 17, 2020.
  2. "CppCon 2014: Mike Acton "Data-Oriented Design and C++"". YouTube .
  3. "code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care". YouTube .
  4. Richard Fabian (October 8, 2018). "Data-Oriented Design". www.dataorienteddesign.com. Retrieved 2023-12-20.
  5. "INTEL ® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT IMPROVE VECTORIZATION EFFICIENCY USING INTEL SIMD DATA LAYOUT TEMPLATE (INTEL SDLT)" (PDF).
  6. Holger Homann; Francois Laenen (2018). "SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes". Computer Physics Communications. 224: 325–332. arXiv: 1710.03462 . doi:10.1016/j.cpc.2017.11.015. S2CID   2878169.
  7. "What's wrong with Object-Oriented Design? Where's the harm in it?".describes the problems with virtual function calls, e.g., i-cache misses