List (abstract data type)

Last updated

In computer science, a list or sequence is an abstract data type that represents a countable number of ordered values, where the same value may occur more than once. An instance of a list is a computer representation of the mathematical concept of a tuple or finite sequence; the (potentially) infinite analog of a list is a stream. [1] :§3.5 Lists are a basic example of containers, as they contain other values. If the same value occurs multiple times, each occurrence is considered a distinct item.

Contents

A singly linked list structure, implementing a list with three integer elements. Singly linked list.png
A singly linked list structure, implementing a list with three integer elements.

The name list is also used for several concrete data structures that can be used to implement abstract lists, especially linked lists and arrays. In some contexts, such as in Lisp programming, the term list may refer specifically to a linked list rather than an array. In class-based programming, lists are usually provided as instances of subclasses of a generic "list" class, and traversed via separate iterators.

Many programming languages provide support for list data types, and have special syntax and semantics for lists and list operations. A list can often be constructed by writing the items in sequence, separated by commas, semicolons, and/or spaces, within a pair of delimiters such as parentheses '()', brackets '[]', braces '{}', or angle brackets '<>'. Some languages may allow list types to be indexed or sliced like array types, in which case the data type is more accurately described as an array.

In type theory and functional programming, abstract lists are usually defined inductively by two operations: nil that yields the empty list, and cons, which adds an item at the beginning of a list. [2]

Operations

Implementation of the list data structure may provide some of the following operations:

Implementations

Lists are typically implemented either as linked lists (either singly or doubly linked) or as arrays, usually variable length or dynamic arrays.

The standard way of implementing lists, originating with the programming language Lisp, is to have each element of the list contain both its value and a pointer indicating the location of the next element in the list. This results in either a linked list or a tree, depending on whether the list has nested sublists. Some older Lisp implementations (such as the Lisp implementation of the Symbolics 3600) also supported "compressed lists" (using CDR coding) which had a special internal representation (invisible to the user). Lists can be manipulated using iteration or recursion. The former is often preferred in imperative programming languages, while the latter is the norm in functional languages.

Lists can be implemented as self-balancing binary search trees holding index-value pairs, providing equal-time access to any element (e.g. all residing in the fringe, and internal nodes storing the right-most child's index, used to guide the search), taking the time logarithmic in the list's size, but as long as it doesn't change much will provide the illusion of random access and enable swap, prefix and append operations in logarithmic time as well. [3]

Programming language support

Some languages do not offer a list data structure, but offer the use of associative arrays or some kind of table to emulate lists. For example, Lua provides tables. Although Lua stores lists that have numerical indices as arrays internally, they still appear as dictionaries. [4]

In Lisp, lists are the fundamental data type and can represent both program code and data. In most dialects, the list of the first three prime numbers could be written as (list 2 3 5). In several dialects of Lisp, including Scheme, a list is a collection of pairs, consisting of a value and a pointer to the next pair (or null value), making a singly linked list. [5]

Applications

As the name implies, lists can be used to store a list of elements. However, unlike in traditional arrays, lists can expand and shrink, and are stored dynamically in memory.

In computing, lists are easier to implement than sets. A finite set in the mathematical sense can be realized as a list with additional restrictions; that is, duplicate elements are disallowed and order is irrelevant. Sorting the list speeds up determining if a given item is already in the set, but in order to ensure the order, it requires more time to add new entry to the list. In efficient implementations, however, sets are implemented using self-balancing binary search trees or hash tables, rather than a list.

Lists also form the basis for other abstract data types including the queue, the stack, and their variations.

Abstract definition

The abstract list type L with elements of some type E (a monomorphic list) is defined by the following functions:

nil: () → L
cons: E × LL
first: LE
rest: LL

with the axioms

first (cons (e, l)) = e
rest (cons (e, l)) = l

for any element e and any list l. It is implicit that

cons (e, l) ≠ l
cons (e, l) ≠ e
cons (e1, l1) = cons (e2, l2) if e1 = e2 and l1 = l2

Note that first (nil ()) and rest (nil ()) are not defined.

These axioms are equivalent to those of the abstract stack data type.

In type theory, the above definition is more simply regarded as an inductive type defined in terms of constructors: nil and cons. In algebraic terms, this can be represented as the transformation 1 + E×LL. first and rest are then obtained by pattern matching on the cons constructor and separately handling the nil case.

The list monad

The list type forms a monad with the following functions (using E* rather than L to represent monomorphic lists with elements of type E):

where append is defined as:

Alternatively, the monad may be defined in terms of operations return, fmap and join, with:

Note that fmap, join, append and bind are well-defined, since they're applied to progressively deeper arguments at each recursive call.

The list type is an additive monad, with nil as the monadic zero and append as monadic sum.

Lists form a monoid under the append operation. The identity element of the monoid is the empty list, nil. In fact, this is the free monoid over the set of list elements.

See also

Related Research Articles

Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 (S20018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

In computer science, a double-ended queue is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation of a deque.

Lisp (programming language) Programming language family

Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today. Only Fortran is older, by one year. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Racket, Common Lisp, Scheme and Clojure.

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains: data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that access time is linear. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

Queue (abstract data type) Abstract data type

In computer science, a queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end of the sequence. By convention, the end of the sequence at which elements are added is called the back, tail, or rear of the queue, and the end at which elements are removed is called the head or front of the queue, analogously to the words used when people line up to wait for goods or services.

Lua (programming language) Lightweight programming language

Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications. Lua is cross-platform, since the interpreter of compiled bytecode is written in ANSI C, and Lua has a relatively simple C API to embed it into applications.

S-expression

In computer programming, S-expressions are a notation for nested list (tree-structured) data, invented for and popularized by the programming language Lisp, which uses them for source code as well as data. In the usual parenthesized syntax of Lisp, an S-expression is classically defined as

  1. an atom, or
  2. an expression of the form (x. y) where x and y are S-expressions.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection.

In computer programming, cons is a fundamental function in most dialects of the Lisp programming language. consconstructs memory objects which hold two values or pointers to values. These objects are referred to as (cons) cells, conses, non-atomic s-expressions ("NATSes"), or (cons) pairs. In Lisp jargon, the expression "to cons x onto y" means to construct a new object with (cons xy). The resulting pair has a left half, referred to as the car, and a right half, referred to as the cdr.

In computer programming and particularly in Lisp, an association list, often referred to as an alist, is a linked list in which each list element comprises a key and a value. The association list is said to associate the value with the key. In order to find the value associated with a given key, a sequential search is used: each element of the list is searched in turn, starting at the head, until the key is found. Associative lists provide a simple way of implementing an associative array, but are efficient only when the number of keys is very small.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

In functional programming, a monad is an abstraction that allows structuring programs generically. Supporting languages may use monads to abstract away boilerplate code needed by the program logic. Monads achieve this by providing their own data type, which represents a specific form of computation, along with two procedures:

Format is a function in Common Lisp that can produce formatted text using a format string similar to the printf format string. It provides more functionality than printf, allowing the user to output numbers in English, apply certain format specifiers only under certain conditions, iterate over data structures, and output in a tabular format. This functionally originates in MIT's Lisp Machine Lisp, where it was based on Multics ioa_.

In mathematics, an initial algebra is an initial object in the category of F-algebras for a given endofunctor F. This initiality provides a general framework for induction and recursion.

In computer programming, append is the operation for concatenating linked lists or arrays in some high-level programming languages.

In many programming languages, map is the name of a higher-order function that applies a given function to each element of a functor, e.g. a list, returning a list of results in the same order. It is often called apply-to-all when considered in functional form.

In functional programming, fold refers to a family of higher-order functions that analyze a recursive data structure and through use of a given combining operation, recombine the results of recursively processing its constituent parts, building up a return value. Typically, a fold is presented with a combining function, a top node of a data structure, and possibly some default values to be used under certain conditions. The fold then proceeds to combine elements of the data structure's hierarchy, using the function in a systematic way.

This Comparison of programming languages compares the features of associative array data structures or array-lookup processing for over 40 computer programming languages.

In functional programming, a monad transformer is a type constructor which takes a monad as an argument and returns a monad as a result.

In computer science, an array type is a data type that represents a collection of elements, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an array variable, array value, or simply array. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type.

References

  1. Abelson, Harold; Sussman, Gerald Jay (1996). Structure and Interpretation of Computer Programs . MIT Press.
  2. Reingold, Edward; Nievergelt, Jurg; Narsingh, Deo (1977). Combinatorial Algorithms: Theory and Practice. Englewood Cliffs, New Jersey: Prentice Hall. pp. 38–41. ISBN   0-13-152447-X.
  3. Barnett, Granville; Del tonga, Luca (2008). "Data Structures and Algorithms" (PDF). mta.ca. Retrieved 12 November 2014.
  4. Lerusalimschy, Roberto (December 2003). Programming in Lua (first edition) (First ed.). Lua.org. ISBN   8590379817 . Retrieved 12 November 2014.
  5. Steele, Guy (1990). Common Lisp (Second ed.). Digital Press. pp. 29–31. ISBN   1-55558-041-6.