Variable-length array

Last updated

In computer programming, a variable-length array (VLA), also called variable-sized or runtime-sized, is an array data structure whose length is determined at run time (instead of at compile time). [1] In C, the VLA is said to have a variably modified type that depends on a value (see Dependent type).

Contents

The main purpose of VLAs is to simplify programming of numerical algorithms.

Programming languages that support VLAs include Ada, Algol 68 (for non-flexible rows), APL, C99 (although subsequently relegated in C11 to a conditional feature, which implementations are not required to support; [2] [3] on some platforms, VLAs could be implemented previously with alloca() or similar functions) and C# (as unsafe-mode stack-allocated arrays), COBOL, Fortran 90, J, and Object Pascal (the language used in Borland Delphi and Lazarus, that uses FPC).

Growable arrays (also called dynamic arrays) are generally more useful than VLAs because dynamic arrays can do everything VLAs can do, and also support growing the array at run-time. For this reason, many programming languages (JavaScript, Java, Python, R, etc.) only support growable arrays. Even in programming languages that do support variable-length arrays, it's often recommended to avoid using (stack-based) variable-length arrays, and instead use (heap-based) dynamic arrays. [4]

Memory

Allocation

Implementation

C99

The following C99 function allocates a variable-length array of a specified size, fills it with floating-point values, and then passes it to another function for processing. Because the array is declared as an automatic variable, its lifetime ends when read_and_process() returns.

floatread_and_process(intn){floatvals[n];for(inti=0;i<n;++i)vals[i]=read_val();returnprocess(n,vals);}

In C99, the length parameter must come before the variable-length array parameter in function calls. [1] In C11, a __STDC_NO_VLA__ macro is defined if VLA is not supported. [6] The C23 standard makes VLA types mandatory again. Only creation of VLA objects with automatic storage duration is optional. [7] GCC had VLA as an extension before C99, one that also extends into its C++ dialect.

Linus Torvalds has expressed his displeasure in the past over VLA usage for arrays with predetermined small sizes because it generates lower quality assembly code. [8] With the Linux 4.20 kernel, the Linux kernel is effectively VLA-free. [9]

Although C11 does not explicitly name a size-limit for VLAs, some believe it should have the same maximum size as all other objects, i.e. SIZE_MAX bytes. [10] However, this should be understood in the wider context of environment and platform limits, such as the typical stack-guard page size of 4 KiB, which is many orders of magnitude smaller than SIZE_MAX.

It is possible to have VLA object with dynamic storage by using a pointer to an array.

floatread_and_process(intn){float(*vals)[n]=malloc(sizeof(float[n]));for(inti=0;i<n;++i)(*vals)[i]=read_val();floatret=process(n,*vals);free(vals);returnret;}

Ada

The following is the same example in Ada. Ada arrays carry their bounds with them, so there is no need to pass the length to the Process function.

typeVals_Typeisarray(Positiverange<>)ofFloat;functionRead_And_Process(N: Integer)returnFloatisVals:Vals_Type(1..N);beginforIin1..NloopVals(I):=Read_Val;endloop;returnProcess(Vals);endRead_And_Process;

Fortran 90

The equivalent Fortran 90 function is

function read_and_process(n)result(o)integer,intent(in)::nreal::oreal,dimension(n)::valsinteger::ido i=1,nvals(i)=read_val()end doo=process(vals)end function read_and_process

when utilizing the Fortran 90 feature of checking procedure interfaces at compile time; on the other hand, if the functions use pre-Fortran 90 call interface, the (external) functions must first be declared, and the array length must be explicitly passed as an argument (as in C):

function read_and_process(n)result(o)integer,intent(in)::nreal::oreal,dimension(n)::valsreal::read_val,processinteger::ido i=1,nvals(i)=read_val()end doo=process(vals,n)end function read_and_process

Cobol

The following COBOL fragment declares a variable-length array of records DEPT-PERSON having a length (number of members) specified by the value of PEOPLE-CNT:

DATADIVISION.WORKING-STORAGESECTION.01  DEPT-PEOPLE.05  PEOPLE-CNTPIC S9(4)BINARY.05  DEPT-PERSONOCCURS0 TO20 TIMESDEPENDINGONPEOPLE-CNT.10  PERSON-NAMEPIC X(20).10  PERSON-WAGEPIC S9(7)V99PACKED-DECIMAL.

The COBOL VLA, unlike that of other languages mentioned here, is safe because COBOL requires one to specify the maximal array size – in this example, DEPT-PERSON cannot have more than 20 items, regardless of the value of PEOPLE-CNT.

C#

The following C# fragment declares a variable-length array of integers. Prior to C# version 7.2, a pointer to the array is required, requiring an "unsafe" context. The "unsafe" keyword requires an assembly containing this code to be marked as unsafe.

unsafevoidDeclareStackBasedArrayUnsafe(intsize){int*pArray=stackallocint[size];pArray[0]=123;}

C# version 7.2 and later allow the array to be allocated without the "unsafe" keyword, through the use of the Span feature. [11]

voidDeclareStackBasedArraySafe(intsize){Span<int>stackArray=stackallocint[size];stackArray[0]=123;}

Object Pascal

Object Pascal dynamic arrays are allocated on the heap. [12]

In this language, it is called a dynamic array. The declaration of such a variable is similar to the declaration of a static array, but without specifying its size. The size of the array is given at the time of its use.

programCreateDynamicArrayOfNumbers(Size:Integer);varNumberArray:arrayofLongWord;beginSetLength(NumberArray,Size);NumberArray[0]:=2020;end.

Removing the contents of a dynamic array is done by assigning it a size of zero.

...SetLength(NumberArray,0);...

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables. The first machine in the family was the B5000 in 1961, which was optimized for compiling ALGOL 60 programs extremely well, using single-pass compilers. The B5000 evolved into the B5500 and the B5700. Subsequent major redesigns include the B6500/B6700 line and its successors, as well as the separate B8500 line.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer science, a union is a value that may have any of several representations or formats within the same position in memory; that consists of a variable that may hold such a data structure. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". In contrast with a record, which could be defined to contain both a float and an integer; in a union, there is only one value at any given time.

<span class="mw-page-title-main">C99</span> C programming language standard, 1999 revision

C99 is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version (C90) with new features for the language and the standard library, and helps implementations make better use of available computer hardware, such as IEEE 754-1985 floating-point arithmetic, and compiler technology. The C11 version of the C programming language standard, published in 2011, updates C99.

In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack, the stack is said to overflow, typically resulting in a program crash.

<span class="mw-page-title-main">Stack-based memory allocation</span> Form of computer memory allocation

Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out (LIFO) manner.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

<span class="mw-page-title-main">C data types</span> Data types supported by the C programming language

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

A scanf format string is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scanning functions are often supplied in standard libraries. Scanf is a function that reads formatted data from the standard input string, which is usually the keyboard and writes the results whenever called in the specified arguments.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

This is an overview of Fortran 95 language features. Included are the additional features of TR-15581:Enhanced Data Type Facilities, which have been universally implemented. Old features that have been superseded by new ones are not described – few of those historic features are used in modern programs although most have been retained in the language to maintain backward compatibility. The current standard is Fortran 2023; many of its new features are still being implemented in compilers. The additional features of Fortran 2003, Fortran 2008, Fortran 2018 and Fortran 2023 are described by Metcalf, Reid, Cohen and Bader.

In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

This comparison of programming languages (array) compares the features of array data structures or matrix processing for various computer programming languages.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

In computer science, array is a data type that represents a collection of elements, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an array variable or array value. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by analogy with the physical concept, tensor.

C11 is an informal name for ISO/IEC 9899:2011, a past standard for the C programming language. It replaced C99 and has been superseded by C17. C11 mainly standardizes features already supported by common contemporary compilers, and includes a detailed memory model to better support multiple threads of execution. Due to delayed availability of conforming C99 implementations, C11 makes certain features optional, to make it easier to comply with the core language standard.

References

  1. 1 2 "Variable Length Arrays". Archived from the original on 2018-01-26.
  2. "Variable Length – Using the GNU Compiler Collection (GCC)".
  3. ISO 9899:2011 Programming Languages – C 6.7.6.2 4.
  4. Eric S. Raymond. "Software Release Practice HOWTO: 6. Good development practice". 2000.
  5. "Code Gen Options - The GNU Fortran Compiler".
  6. § 6.10.8.3 of the C11 standard (n1570.pdf)
  7. § 6.10.9.3 of the C23 standard (n3054.pdf)
  8. Torvalds, Linus (7 March 2018). "LKML: Linus Torvalds: Re: VLA removal (was Re: [RFC 2/2] lustre: use VLA_SAFE)". Linux kernel (Mailing list).
  9. "The Linux Kernel Is Now VLA-Free: A Win For Security, Less Overhead & Better For Clang - Phoronix". www.phoronix.com.
  10. §6.5.3.4 and §7.20.3 of the C11 standard (n1570.pdf)
  11. "stackalloc operator (C# reference)". Microsoft.
  12. Michaël Van Canneyt. "Free Pascal Reference guide: Dynamic arrays".