Funarg problem

Last updated

In computer science, the funarg problem(function argument problem) refers to the difficulty in implementing first-class functions (functions as first-class objects) in programming language implementations so as to use stack-based memory allocation of the functions.

Contents

The difficulty only arises if the body of a nested function refers directly (i.e., not by argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call. [1] A standard resolution is either to forbid such references or to create closures. [2]

There are two subtly different versions of the funarg problem. The upwards funarg problem arises from returning (or otherwise transmitting "upwards") a function from a function call. The downwards funarg problem arises from passing a function as a parameter to another function call.

Upwards funarg problem

When one function calls another during a typical program's execution, the local state of the caller (including parameters and local variables) must be preserved in order for execution to proceed after the callee returns. In most compiled programs, this local state is stored on the call stack in a data structure called a stack frame or activation record. This stack frame is pushed, or allocated, as prelude to calling another function, and is popped, or deallocated, when the other function returns to the function that did the call. The upwards funarg problem arises when the calling function refers to the called/exited function's state after that function has returned. Therefore, the stack frame containing the called function's state variables must not be deallocated when the function returns, violating the stack-based function call paradigm.

One solution to the upwards funarg problem is to simply allocate all activation records from the heap instead of the stack and rely on some form of garbage collection or reference counting to deallocate them when they are no longer needed. Managing activation records on the heap has historically been perceived to be less efficient than on the stack (although this is partially contradicted [3] ) and has been perceived to impose significant implementation complexity. Most functions in typical programs (less so for programs in functional programming languages) do not create upwards funargs, adding to concerns about potential overhead associated with their implementation. Furthermore, this approach is genuinely difficult in languages that do not support garbage collection.

Some efficiency-minded compilers employ a hybrid approach in which the activation records for a function are allocated from the stack if the compiler is able to deduce, through static program analysis, that the function creates no upwards funargs. Otherwise, the activation records are allocated from the heap.

Another solution is to simply copy the value of the variables into the closure at the time the closure is created. This will cause a different behavior in the case of mutable variables, because the state will no longer be shared between closures. But if it is known that the variables are constant, then this approach will be equivalent. The ML languages take this approach, since variables in those languages are bound to values—i.e. variables cannot be changed. Java also takes this approach with respect to anonymous classes (and lambdas since Java 8), in that it only allows one to refer to variables in the enclosing scope that are effectively final (i.e. constant).

Some languages allow the programmer to explicitly choose between the two behaviors. PHP 5.3's anonymous functions require one to specify which variables to include in the closure using the use () clause; if the variable is listed by reference, it includes a reference to the original variable; otherwise, it passes the value. In Apple's Blocks anonymous functions, captured local variables are by default captured by value; if one wants to share the state between closures or between the closure and the outside scope, the variable must be declared with the __block modifier, in which case that variable is allocated on the heap.

Example

The following Haskell-like pseudocode defines function composition:

composefg=λxf(gx)

λ is the operator for constructing a new function, which in this case has one argument, x, and returns the result of first applying g to x, then applying f to that. This λ function carries the functions f and g (or pointers to them) as internal state.

The problem in this case exists if the compose function allocates the parameter variables f and g on the stack. When compose returns, the stack frame containing f and g is discarded. When the internal function λx attempts to access g, it will access a discarded memory area.

Downwards funarg problem

A downwards funarg may also refer to a function's state when that function is not actually executing. However, because, by definition, the existence of a downwards funarg is contained in the execution of the function that creates it, the stack frame for the function can usually still be stored on the stack. Nonetheless, the existence of downwards funargs implies a tree structure of closures and stack frames that can complicate human and machine reasoning about the program state.

The downwards funarg problem complicates the efficient compilation of tail calls and code written in continuation-passing style. In these special cases, the intent of the programmer is (usually) that the function run in limited stack space, so the "faster" behavior may actually be undesirable.[ clarification needed ]

Practical implications

Historically, the upwards funarg problem has proven to be more difficult. For example, the Pascal programming language allows functions to be passed as arguments but not returned as results; thus implementations of Pascal are required to address the downwards funarg problem but not the upwards one. The Modula-2 and Oberon programming languages (descendants of Pascal) allow functions both as parameters and return values, but the assigned function may not be a nested function. The C programming language historically avoids the main difficulty of the funarg problem by not allowing function definitions to be nested; because the environment of every function is the same, containing just the statically allocated global variables and functions, a pointer to a function's code describes the function completely. Apple has proposed and implemented a closure syntax for C that solves the upwards funarg problem by dynamically moving closures from the stack to the heap as necessary.[ citation needed ] The Java programming language deals with it by requiring that context used by nested functions in anonymous inner and local classes be declared final , and context used by lambda expressions be effectively final. C# and D have lambdas (closures) that encapsulate a function pointer and related variables.

In functional languages, functions are first-class values that can be passed anywhere. Thus, implementations of Scheme or Standard ML must address both the upwards and downwards funarg problems. This is usually accomplished by representing function values as heap-allocated closures, as previously described. The OCaml compiler employs a hybrid technique (based on static program analysis) to maximize efficiency.[ citation needed ]

See also

Related Research Articles

The SECD machine is a highly influential virtual machine and abstract machine intended as a target for functional programming language compilers. The letters stand for Stack, Environment, Control, Dump—the internal registers of the machine. The registers Stack, Control, and Dump point to stacks, and Environment points to an associative array.

In computer programming, the scope of a name binding is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity. In other parts of the program, the name may refer to a different entity, or to nothing at all. Scope helps prevent name collisions by allowing the same name to refer to different objects – as long as the names have separate scopes. The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is in relation to the referenced entity, not the referencing name.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

In mathematics and computer science, a higher-order function (HOF) is a function that does at least one of the following:

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">Conditional (computer programming)</span> Control flow statement that executes code according to some condition(s)

In computer science, conditionals are programming language commands for handling decisions. Specifically, conditionals perform different computations or actions depending on whether a programmer-defined Boolean condition evaluates to true or false. In terms of control flow, the decision is always achieved by selectively altering the control flow based on some condition . Although dynamic dispatch is not usually classified as a conditional construct, it is another way to select between alternatives at runtime. Conditional statements are the checkpoints in the programme that determines behaviour according to situation.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In computer science, a programming language is said to have first-class functions if it treats functions as first-class citizens. This means the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures. Some programming language theorists require support for anonymous functions as well. In languages with first-class functions, the names of functions do not have any special status; they are treated like ordinary variables with a function type. The term was coined by Christopher Strachey in the context of "functions as first-class citizens" in the mid-1960s.

In computer programming, a nested function is a named function that is defined within another, enclosing, block and is lexically scoped within the enclosing block – meaning it is only callable by name within the body of the enclosing block and can use identifiers declared in outer blocks, including outer functions. The enclosing block is typically, but not always, another function.

In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This type of stack is also known as an execution stack, program stack, control stack, run-time stack, or machine stack, and is often shortened to simply "the stack". Although maintenance of the call stack is important for the proper functioning of most software, the details are normally hidden and automatic in high-level programming languages. Many computer instruction sets provide special instructions for manipulating stacks.

In computer programming, an automatic variable is a local variable which is allocated and deallocated automatically when program flow enters and leaves the variable's scope. The scope is the lexical context, particularly the function or block in which a variable is defined. Local data is typically invisible outside the function or lexical context where it is defined. Local data is also invisible and inaccessible to a called function, but is not deallocated, coming back in scope as the execution thread returns to the caller.

<span class="mw-page-title-main">Parent pointer tree</span>

In computer science, an in-tree or parent pointer tree is an N-ary tree data structure in which each node has a pointer to its parent node, but no pointers to child nodes. When used to implement a set of stacks, the structure is called a spaghetti stack, cactus stack or saguaro stack. Parent pointer trees are also used as disjoint-set data structures.

Lambda lifting is a meta-process that restructures a computer program so that functions are defined independently of each other in a global scope. An individual "lift" transforms a local function into a global function. It is a two step process, consisting of;

In computing, a procedural parameter is a parameter of a procedure that is itself a procedure.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

Substructural type systems are a family of type systems analogous to substructural logics where one or more of the structural rules are absent or only allowed under controlled circumstances. Such systems can constrain access to system resources such as files, locks, and memory by keeping track of changes of state and prohibiting invalid states.

In computer programming, a variable is an abstract storage location paired with an associated symbolic name, which contains some known or unknown quantity of data or object referred to as a value; or in simpler terms, a variable is a named container for a particular set of bits or type of data. A variable can eventually be associated with or identified by a memory address. The variable name is the usual way to reference the stored value, in addition to referring to the variable itself, depending on the context. This separation of name and content allows the name to be used independently of the exact information it represents. The identifier in computer source code can be bound to a value during run time, and the value of the variable may thus change during the course of program execution.

In computer programming, a function, subprogram, procedure, method, routine or subroutine is a callable unit that has a well-defined behavior and can be invoked by other software units to exhibit that behavior.

In programming language theory, a non-local variable is a variable that is not defined in the local scope. While the term can refer to global variables, it is primarily used in the context of nested and anonymous functions where some variables can be in neither the local nor the global scope.

References

  1. The function of FUNCTION in LISP or why the FUNARG problem should be called the environment problem, by Joel Moses, MIT Project MAC memo AI-199, MAC-M-428, June 1970 (15 pp.).
  2. A proposed solution to the FUNARG problem, by Erik Sandewall, in: ACM SIGSAM Bulletin 17 (Jan. 1971), pp. 29–42.
  3. Andrew W. Appel, Zhong Shao. An Empirical and Analytic Study of Stack vs. Heap Cost for Languages with Closures. Princeton CS Tech Report TR-450-94, 1994.