Variable (computer science)

Last updated

In computer programming, a variable is an abstract storage location paired with an associated symbolic name, which contains some known or unknown quantity of data or object referred to as a value ; or in simpler terms, a variable is a named container for a particular set of bits or type of data (like integer, float, string etc...). A variable can eventually be associated with or identified by a memory address. The variable name is the usual way to reference the stored value, in addition to referring to the variable itself, depending on the context. This separation of name and content allows the name to be used independently of the exact information it represents. The identifier in computer source code can be bound to a value during run time, and the value of the variable may thus change during the course of program execution. [1] [2] [3] [4]

Contents

Variables in programming may not directly correspond to the concept of variables in mathematics. The latter is abstract, having no reference to a physical object such as storage location. The value of a computing variable is not necessarily part of an equation or formula as in mathematics. Variables in computer programming are frequently given long names to make them relatively descriptive of their use, whereas variables in mathematics often have terse, one- or two-character names for brevity in transcription and manipulation.

A variable's storage location may be referenced by several different identifiers, a situation known as aliasing. Assigning a value to the variable using one of the identifiers will change the value that can be accessed through the other identifiers.

Compilers have to replace variables' symbolic names with the actual locations of the data. While a variable's name, type, and location often remain fixed, the data stored in the location may be changed during program execution.

Actions on a variable

In imperative programming languages, values can generally be accessed or changed at any time. In pure functional and logic languages, variables are bound to expressions and keep a single value during their entire lifetime due to the requirements of referential transparency. In imperative languages, the same behavior is exhibited by (named) constants (symbolic constants), which are typically contrasted with (normal) variables.

Depending on the type system of a programming language, variables may only be able to store a specified data type (e.g. integer or string). Alternatively, a datatype may be associated only with the current value, allowing a single variable to store anything supported by the programming language. Variables are the containers for storing the values.

Variables and scope:

Identifiers referencing a variable

An identifier referencing a variable can be used to access the variable in order to read out the value, or alter the value, or edit other attributes of the variable, such as access permission, locks, semaphores, etc.

For instance, a variable might be referenced by the identifier "total_count" and the variable can contain the number 1956. If the same variable is referenced by the identifier "r" as well, and if using this identifier "r", the value of the variable is altered to 2009, then reading the value using the identifier "total_count" will yield a result of 2009 and not 1956.

If a variable is only referenced by a single identifier, that identifier can simply be called the name of the variable; otherwise we can speak of it as one of the names of the variable. For instance, in the previous example the identifier "total_count" is a name of the variable in question, and "r" is another name of the same variable.

Scope and extent

The scope of a variable describes where in a program's text the variable may be used, while the extent (also called lifetime) of a variable describes when in a program's execution the variable has a (meaningful) value. The scope of a variable affects its extent. The scope of a variable is actually a property of the name of the variable, and the extent is a property of the storage location of the variable. These should not be confused with context (also called environment), which is a property of the program, and varies by point in the program's text or execution—see scope: an overview. Further, object lifetime may coincide with variable lifetime, but in many cases is not tied to it.

Scope is an important part of the name resolution of a variable. Most languages define a specific scope for each variable (as well as any other named entity), which may differ within a given program. The scope of a variable is the portion of the program's text for which the variable's name has meaning and for which the variable is said to be "visible". Entrance into that scope typically begins a variable's lifetime (as it comes into context) and exit from that scope typically ends its lifetime (as it goes out of context). For instance, a variable with "lexical scope" is meaningful only within a certain function/subroutine, or more finely within a block of expressions/statements (accordingly with function scope or block scope); this is static resolution, performable at parse-time or compile-time. Alternatively, a variable with dynamic scope is resolved at run-time, based on a global binding stack that depends on the specific control flow. Variables only accessible within a certain functions are termed "local variables". A "global variable", or one with indefinite scope, may be referred to anywhere in the program.

Extent, on the other hand, is a runtime (dynamic) aspect of a variable. Each binding of a variable to a value can have its own extent at runtime. The extent of the binding is the portion of the program's execution time during which the variable continues to refer to the same value or memory location. A running program may enter and leave a given extent many times, as in the case of a closure.

Unless the programming language features garbage collection, a variable whose extent permanently outlasts its scope can result in a memory leak, whereby the memory allocated for the variable can never be freed since the variable which would be used to reference it for deallocation purposes is no longer accessible. However, it can be permissible for a variable binding to extend beyond its scope, as occurs in Lisp closures and C static local variables; when execution passes back into the variable's scope, the variable may once again be used. A variable whose scope begins before its extent does is said to be uninitialized and often has an undefined, arbitrary value if accessed (see wild pointer), since it has yet to be explicitly given a particular value. A variable whose extent ends before its scope may become a dangling pointer and deemed uninitialized once more since its value has been destroyed. Variables described by the previous two cases may be said to be out of extent or unbound. In many languages, it is an error to try to use the value of a variable when it is out of extent. In other languages, doing so may yield unpredictable results. Such a variable may, however, be assigned a new value, which gives it a new extent.

For space efficiency, a memory space needed for a variable may be allocated only when the variable is first used and freed when it is no longer needed. A variable is only needed when it is in scope, thus beginning each variable's lifetime when it enters scope may give space to unused variables. To avoid wasting such space, compilers often warn programmers if a variable is declared but not used.

It is considered good programming practice to make the scope of variables as narrow as feasible so that different parts of a program do not accidentally interact with each other by modifying each other's variables. Doing so also prevents action at a distance. Common techniques for doing so are to have different sections of a program use different name spaces, or to make individual variables "private" through either dynamic variable scoping or lexical variable scoping.

Many programming languages employ a reserved value (often named null or nil) to indicate an invalid or uninitialized variable.

Typing

In statically typed languages such as Go or ML, a variable also has a type, meaning that only certain kinds of values can be stored in it. For example, a variable of type "integer" is prohibited from storing text values.

In dynamically typed languages such as Python, a variable's type is inferred by its value, and can change according to its value. In Common Lisp, both situations exist simultaneously: A variable is given a type (if undeclared, it is assumed to be T, the universal supertype) which exists at compile time. Values also have types, which can be checked and queried at runtime.

Typing of variables also allows polymorphisms to be resolved at compile time. However, this is different from the polymorphism used in object-oriented function calls (referred to as virtual functions in C++) which resolves the call based on the value type as opposed to the supertypes the variable is allowed to have.

Variables often store simple data, like integers and literal strings, but some programming languages allow a variable to store values of other datatypes as well. Such languages may also enable functions to be parametric polymorphic. These functions operate like variables to represent data of multiple types. For example, a function named length may determine the length of a list. Such a length function may be parametric polymorphic by including a type variable in its type signature, since the number of elements in the list is independent of the elements' types.

Parameters

The formal parameters (or formal arguments) of functions are also referred to as variables. For instance, in this Python code segment,

>>> defaddtwo(x):... returnx+2...>>> addtwo(5)7

the variable named x is a parameter because it is given a value when the function is called. The integer 5 is the argument which gives x its value. In most languages, function parameters have local scope. This specific variable named x can only be referred to within the addtwo function (though of course other functions can also have variables called x).

Memory allocation

The specifics of variable allocation and the representation of their values vary widely, both among programming languages and among implementations of a given language. Many language implementations allocate space for local variables , whose extent lasts for a single function call on the call stack , and whose memory is automatically reclaimed when the function returns. More generally, in name binding , the name of a variable is bound to the address of some particular block (contiguous sequence) of bytes in memory, and operations on the variable manipulate that block. Referencing is more common for variables whose values have large or unknown sizes when the code is compiled. Such variables reference the location of the value instead of storing the value itself, which is allocated from a pool of memory called the heap .

Bound variables have values. A value, however, is an abstraction, an idea; in implementation, a value is represented by some data object , which is stored somewhere in computer memory. The program, or the runtime environment, must set aside memory for each data object and, since memory is finite, ensure that this memory is yielded for reuse when the object is no longer needed to represent some variable's value.

Objects allocated from the heap must be reclaimedespecially when the objects are no longer needed. In a garbage-collected language (such as C#, Java, Python, Golang and Lisp), the runtime environment automatically reclaims objects when extant variables can no longer refer to them. In non-garbage-collected languages, such as C, the program (and the programmer) must explicitly allocate memory, and then later free it, to reclaim its memory. Failure to do so leads to memory leaks, in which the heap is depleted as the program runs, risks eventual failure from exhausting available memory.

When a variable refers to a data structure created dynamically, some of its components may be only indirectly accessed through the variable. In such circumstances, garbage collectors (or analogous program features in languages that lack garbage collectors) must deal with a case where only a portion of the memory reachable from the variable needs to be reclaimed.

Naming conventions

Unlike their mathematical counterparts, programming variables and constants commonly take multiple-character names, e.g. COST or total. Single-character names are most commonly used only for auxiliary variables; for instance, i, j, k for array index variables.

Some naming conventions are enforced at the language level as part of the language syntax which involves the format of valid identifiers. In almost all languages, variable names cannot start with a digit (0–9) and cannot contain whitespace characters. Whether or not punctuation marks are permitted in variable names varies from language to language; many languages only permit the underscore ("_") in variable names and forbid all other punctuation. In some programming languages, sigils (symbols or punctuation) are affixed to variable identifiers to indicate the variable's datatype or scope.

Case-sensitivity of variable names also varies between languages and some languages require the use of a certain case in naming certain entities; [note 1] Most modern languages are case-sensitive; some older languages are not. Some languages reserve certain forms of variable names for their own internal use; in many languages, names beginning with two underscores ("__") often fall under this category.

However, beyond the basic restrictions imposed by a language, the naming of variables is largely a matter of style. At the machine code level, variable names are not used, so the exact names chosen do not matter to the computer. Thus names of variables identify them, for the rest they are just a tool for programmers to make programs easier to write and understand. Using poorly chosen variable names can make code more difficult to review than non-descriptive names, so names that are clear are often encouraged. [5] [6]

Programmers often create and adhere to code style guidelines that offer guidance on naming variables or impose a precise naming scheme. Shorter names are faster to type but are less descriptive; longer names often make programs easier to read and the purpose of variables easier to understand. However, extreme verbosity in variable names can also lead to less comprehensible code.

Variable types (based on lifetime)

We can classify variables based on their lifetime. The different types of variables are static, stack-dynamic, explicit heap-dynamic, and implicit heap-dynamic. A static variable is also known as global variable, it is bound to a memory cell before execution begins and remains to the same memory cell until termination. A typical example is the static variables in C and C++. A Stack-dynamic variable is known as local variable, which is bound when the declaration statement is executed, and it is deallocated when the procedure returns. The main examples are local variables in C subprograms and Java methods. Explicit Heap-Dynamic variables are nameless (abstract) memory cells that are allocated and deallocated by explicit run-time instructions specified by the programmer. The main examples are dynamic objects in C++ (via new and delete) and all objects in Java. Implicit Heap-Dynamic variables are bound to heap storage only when they are assigned values. Allocation and release occur when values are reassigned to variables. As a result, Implicit heap-dynamic variables have the highest degree of flexibility. The main examples are some variables in JavaScript, PHP and all variables in APL.

See also

Notes

  1. For example, Haskell requires that names of types start with a capital letter.

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

In computer programming, the scope of a name binding is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity. In other parts of the program, the name may refer to a different entity, or to nothing at all. Scope helps prevent name collisions by allowing the same name to refer to different objects – as long as the names have separate scopes. The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is from the perspective of the referenced entity, not the referencing name.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program consists of commands for the computer to perform. Imperative programming focuses on describing how a program operates step by step, rather than on high-level descriptions of its expected results.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term". Usually the terms are various constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In object-oriented programming (OOP), the object lifetime of an object is the time between an object's creation and its destruction. Rules for object lifetime vary significantly between languages, in some cases between implementations of a given language, and lifetime of a particular object may vary from one run of the program to another.

<span class="mw-page-title-main">D (programming language)</span> Multi-paradigm system programming language

D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is a profoundly different language —features of D can be considered streamlined and expanded-upon ideas from C++, however D also draws inspiration from other high-level programming languages, notably Java, Python, Ruby, C#, and Eiffel.

In computer programming, a global variable is a variable with global scope, meaning that it is visible throughout the program, unless shadowed. The set of all global variables is known as the global environment or global state. In compiled languages, global variables are generally static variables, whose extent (lifetime) is the entire runtime of the program, though in interpreted languages, global variables are generally dynamically allocated when declared, since they are not known ahead of time.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

Resource acquisition is initialization (RAII) is a programming idiom used in several object-oriented, statically-typed programming languages to describe a particular language behavior. In RAII, holding a resource is a class invariant, and is tied to object lifetime. Resource allocation is done during object creation, by the constructor, while resource deallocation (release) is done during object destruction, by the destructor. In other words, resource acquisition must succeed for initialization to succeed. Thus the resource is guaranteed to be held between when initialization finishes and finalization starts, and to be held only when the object is alive. Thus if there are no object leaks, there are no resource leaks.

In computer science, the funarg problem(function argument problem) refers to the difficulty in implementing first-class functions in programming language implementations so as to use stack-based memory allocation of the functions.

In computer programming, a static variable is a variable that has been allocated "statically", meaning that its lifetime is the entire run of the program. This is in contrast to shorter-lived automatic variables, whose storage is stack allocated and deallocated on the call stack; and in contrast to objects, whose storage is dynamically allocated and deallocated in heap memory.

In computer science, a local variable is a variable that is given local scope. A local variable reference in the function or block in which it is declared overrides the same variable name in the larger scope. In programming languages with only two levels of visibility, local variables are contrasted with global variables. On the other hand, many ALGOL-derived languages allow any number of nested levels of visibility, with private variables, functions, constants and types hidden within them, either by nested blocks or nested functions. Local variables are fundamental to procedural programming, and more generally modular programming: variables of local scope are used to avoid issues with side-effects that can occur with global variables.

In computer programming, an automatic variable is a local variable which is allocated and deallocated automatically when program flow enters and leaves the variable's scope. The scope is the lexical context, particularly the function or block in which a variable is defined. Local data is typically invisible outside the function or lexical context where it is defined. Local data is also invisible and inaccessible to a called function, but is not deallocated, coming back in scope as the execution thread returns to the caller.

In compiler optimization, escape analysis is a method for determining the dynamic scope of pointers – where in the program a pointer can be accessed. It is related to pointer analysis and shape analysis.

In the C programming language, an external variable is a variable defined outside any function block. On the other hand, a local (automatic) variable is a variable defined inside a function block.

As an alternative to automatic variables, it is possible to define variables that are external to all functions, that is, variables that can be accessed by name by any function. Because external variables are globally accessible, they can be used instead of argument lists to communicate data between functions. Furthermore, because external variables remain in existence permanently, rather than appearing and disappearing as functions are called and exited, they retain their values even after the functions that set them have returned.

In computer programming, a constant is a value that should not be altered by the program during normal execution, i.e., the value is constant. When associated with an identifier, a constant is said to be "named," although the terms "constant" and "named constant" are often used interchangeably. This is contrasted with a variable, which is an identifier with a value that can be changed during normal execution, i.e., the value is variable.

In programming languages, name resolution is the resolution of the tokens within program expressions to the intended program components.

References

  1. Compilers: Principles, Techniques, and Tools , pp. 26–28
  2. Knuth, Donald (1997). The Art of Computer Programming. Vol. 1 (3rd ed.). Reading, Massachusetts: Addison-Wesley. pp. 3–4. ISBN   0-201-89683-4.
  3. "Programming with variables". Khan Academy. Retrieved 23 March 2020.
  4. "Scratch for Budding Coders". Harvard. Retrieved 23 March 2020.
  5. How Not To Pick Variables, Retrieved July 11, 2012 [DEAD LINK]
  6. Edsger Dijkstra, To hell with "meaningful identifiers"!