Parallel array

Last updated December 18, 2024

In computing, a group of parallel arrays (also known as structure of arrays or SoA) is a form of implicit data structure that uses multiple arrays to represent a singular array of records. It keeps a separate, homogeneous data array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory (also known as array of structures or AoS). For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.

Examples

An example in C using parallel arrays:

intages[]={0,17,2,52,25};char*names[]={"None","Mike","Billy","Tom","Stan"};intparent[]={0/*None*/,3/*Tom*/,1/*Mike*/,0/*None*/,3/*Tom*/};for(i=1;i<=4;i++){printf("Name: %s, Age: %d, Parent: %s \n",names[i],ages[i],names[parent[i]]);}

in Perl (using a hash of arrays to hold references to each array):

my%data=(first_name=>['Joe','Bob','Frank','Hans'],last_name=>['Smith','Seger','Sinatra','Schultze'],height_in_cm=>[169,158,201,199]);for$i(0..$#{$data{first_name}}){printf"Name: %s %s\n",$data{first_name}[$i],$data{last_name}[$i];printf"Height in CM: %i\n",$data{height_in_cm}[$i];}

Or, in Python:

first_names=["Joe","Bob","Frank","Hans"]last_names=["Smith","Seger","Sinatra","Schultze"]heights_in_cm=[169,158,201,199]foriinrange(len(first_names)):print("Name: %s%s"%(first_names[i],last_names[i]))print("Height in cm: %s"%heights_in_cm[i])# Using zip:forfirst_name,last_name,height_in_cminzip(first_names,last_names,heights_in_cm):print(f"Name: {first_name}{last_name}")print(f"Height in cm: {height_in_cm}")

Pros and cons

Parallel arrays have a number of practical advantages over the normal approach:

They can save a substantial amount of space in some cases by avoiding alignment issues. For example, some architectures work best if 4-byte integers are always stored beginning at memory locations that are multiple of 4. If the previous field was a single byte, 3 bytes might be wasted. Many modern compilers can automatically avoid such problems, though in the past some programmers would explicitly declare fields in order of decreasing alignment restrictions.
If the number of items is small, array indices can occupy significantly less space than full pointers, particularly on some architectures.
Sequentially examining a single field of each record in the array is very fast on modern machines, since this amounts to a linear traversal of a single array, exhibiting ideal locality of reference and cache behaviour.
They may allow efficient processing with SIMD instructions in certain instruction set architectures

Several of these advantages depend strongly on the particular programming language and implementation in use.

However, parallel arrays also have several strong disadvantages, which serves to explain why they are not generally preferred:

They have significantly worse locality of reference when visiting the records non-sequentially and examining multiple fields of each record, because the various arrays may be stored arbitrarily far apart.
They obscure the relationship between fields of a single record (e.g. no type information relates the index between them, an index may be used erroneously).
They have little direct language support (the language and its syntax typically express no relationship between the arrays in the parallel array, and cannot catch errors).
Since the bundle of fields is not a "thing", passing it around is tedious and error-prone. For example, rather than calling a function to do something to one record (or structure or object), the function must take the fields as separate arguments. When a new field is added or changed, many parameter lists must change, where passing objects as whole would avoid such changes entirely.
They are expensive to grow or shrink, since each of several arrays must be reallocated. Multi-level arrays can ameliorate this problem, but impacts performance due to the additional indirection needed to find the desired elements.
Perhaps worst of all, they greatly raise the possibility of errors. Any insertion, deletion, or move must always be applied consistently to all of the arrays, or the arrays will no longer be synchronized with each other, leading to bizarre outcomes.

The bad locality of reference can be alleviated in some cases: if a structure can be divided into groups of fields that are generally accessed together, an array can be constructed for each group, and its elements are records containing only these subsets of the larger structure's fields. (see data-oriented design). This is a valuable way of speeding up access to very large structures with many members, while keeping the portions of the structure tied together. An alternative to tying them together using array indexes is to use references to tie the portions together, but this can be less efficient in time and space.

Another alternative is to use a single array, where each entry is a record structure. Many language provide a way to declare actual records, and arrays of them. In other languages it may be feasible to simulate this by declaring an array of n*m size, where m is the size of all the fields together, packing the fields into what is effectively a record, even though the particular language lacks direct support for records. Some compiler optimizations, particularly for vector processors, are able to perform this transformation automatically when arrays of structures are created in the program.^{[ citation needed ]}

Related Research Articles

In computer science, an array is a data structure consisting of a collection of elements, of same memory size, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called a one-dimensional array.

C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that data access time is linear in respect to the number of nodes in the list. Because nodes are serially linked, accessing any node requires that the prior node be accessed beforehand. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

In computer programming, a dope vector is a data structure used to hold information about a data object, especially its memory layout.

In computer programming, the stride of an array is the number of locations in memory between beginnings of successive array elements, measured in bytes or in units of the size of the array's elements. The stride cannot be smaller than the element size but can be larger, indicating extra space between elements.

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, a union is a value that may have any of multiple representations or formats within the same area of memory; that consists of a variable that may hold such a data structure. Some programming languages support a union type for such a data type. In other words, a union type specifies the permitted types that may be stored in its instances, e.g., float and integer. In contrast with a record, which could be defined to contain both a float and an integer; a union would hold only one at a time.

A function pointer, also called a subroutine pointer or procedure pointer, is a pointer referencing executable code, rather than data. Dereferencing the function pointer yields the referenced function, which can be invoked and passed arguments just as in a normal function call. Such an invocation is also known as an "indirect" call, because the function is being invoked indirectly through a variable instead of directly through a fixed identifier or address.

In the C programming language, struct is the keyword used to define a composite, a.k.a. record, data type – a named set of values that occupy a block of memory. It allows for the different values to be accessed via a single identifier, often a pointer. A struct can contain other data types so is used for mixed-data-type records. For example a bank customer struct might contains fields: name, address, telephone, balance.

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

A class in C++ is a user-defined type or data structure declared with any of the keywords class, struct or union that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class declared with the keyword class is private. The private members are not accessible outside the class; they can be accessed only through member functions of the class. The public members form an interface to the class and are accessible outside the class.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

In several programming languages, such as Perl, brace notation is a faster way to extract bytes from a string variable.

This is an overview of Fortran 95 language features. Included are the additional features of TR-15581:Enhanced Data Type Facilities, which have been universally implemented. Old features that have been superseded by new ones are not described – few of those historic features are used in modern programs although most have been retained in the language to maintain backward compatibility. The additional features of subsequent standards, up to Fortran 2023, are described in the Fortran 2023 standard document, ISO/IEC 1539-1:2023. Many of its new features are still being implemented in compilers.

In computer programming, a branch table or jump table is a method of transferring program control (branching) to another part of a program using a table of branch or jump instructions. It is a form of multiway branch. The branch table construction is commonly used when programming in assembly language but may also be generated by compilers, especially when implementing optimized switch statements whose values are densely packed together.

PROMELA is a verification modeling language introduced by Gerard J. Holzmann. The language allows for the dynamic creation of concurrent processes to model, for example, distributed systems. In PROMELA models, communication via message channels can be defined to be synchronous, or asynchronous. PROMELA models can be analyzed with the SPIN model checker, to verify that the modeled system produces the desired behavior. An implementation verified with Isabelle/HOL is also available, as part of the Computer Aided Verification of Automata (CAVA) project. Files written in Promela traditionally have a .pml file extension.

In computer science, array is a data type that represents a collection of elements, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an array variable or array value. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by analogy with the physical concept, tensor.

References

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms , Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Page 209 of section 10.3: Implementing pointers and objects.
Skeet, Jon (3 June 2014). "Anti-pattern: parallel collections" . Retrieved 28 October 2014.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Parallel array

Contents

Examples

Pros and cons

See also

Related Research Articles

References