List (abstract data type)

Last updated

In computer science, a list or sequence is an abstract data type that represents a finite number of ordered values, where the same value may occur more than once. An instance of a list is a computer representation of the mathematical concept of a tuple or finite sequence; the (potentially) infinite analog of a list is a stream. [1] :§3.5 Lists are a basic example of containers, as they contain other values. If the same value occurs multiple times, each occurrence is considered a distinct item.

Contents

A singly-linked list structure, implementing a list with three integer elements. Singly-linked-list.svg
A singly-linked list structure, implementing a list with three integer elements.

The name list is also used for several concrete data structures that can be used to implement abstract lists, especially linked lists and arrays. In some contexts, such as in Lisp programming, the term list may refer specifically to a linked list rather than an array. In class-based programming, lists are usually provided as instances of subclasses of a generic "list" class, and traversed via separate iterators.

Many programming languages provide support for list data types, and have special syntax and semantics for lists and list operations. A list can often be constructed by writing the items in sequence, separated by commas, semicolons, and/or spaces, within a pair of delimiters such as parentheses '()', brackets '[]', braces '{}', or angle brackets '<>'. Some languages may allow list types to be indexed or sliced like array types, in which case the data type is more accurately described as an array.

In type theory and functional programming, abstract lists are usually defined inductively by two operations: nil that yields the empty list, and cons, which adds an item at the beginning of a list. [2]

Operations

Implementation of the list data structure may provide some of the following operations:

Implementations

Lists are typically implemented either as linked lists (either singly or doubly linked) or as arrays, usually variable length or dynamic arrays.

The standard way of implementing lists, originating with the programming language Lisp, is to have each element of the list contain both its value and a pointer indicating the location of the next element in the list. This results in either a linked list or a tree, depending on whether the list has nested sublists. Some older Lisp implementations (such as the Lisp implementation of the Symbolics 3600) also supported "compressed lists" (using CDR coding) which had a special internal representation (invisible to the user). Lists can be manipulated using iteration or recursion. The former is often preferred in imperative programming languages, while the latter is the norm in functional languages.

Lists can be implemented as self-balancing binary search trees holding index-value pairs, providing equal-time access to any element (e.g. all residing in the fringe, and internal nodes storing the right-most child's index, used to guide the search), taking the time logarithmic in the list's size, but as long as it doesn't change much will provide the illusion of random access and enable swap, prefix and append operations in logarithmic time as well. [3]

Programming language support

Some languages do not offer a list data structure, but offer the use of associative arrays or some kind of table to emulate lists. For example, Lua provides tables. Although Lua stores lists that have numerical indices as arrays internally, they still appear as dictionaries. [4]

In Lisp, lists are the fundamental data type and can represent both program code and data. In most dialects, the list of the first three prime numbers could be written as (list 2 3 5). In several dialects of Lisp, including Scheme, a list is a collection of pairs, consisting of a value and a pointer to the next pair (or null value), making a singly linked list. [5]

Applications

As the name implies, lists can be used to store a list of elements. However, unlike in traditional arrays, lists can expand and shrink, and are stored dynamically in memory.

In computing, lists are easier to implement than sets. A finite set in the mathematical sense can be realized as a list with additional restrictions; that is, duplicate elements are disallowed and order is irrelevant. Sorting the list speeds up determining if a given item is already in the set, but in order to ensure the order, it requires more time to add new entry to the list. In efficient implementations, however, sets are implemented using self-balancing binary search trees or hash tables, rather than a list.

Lists also form the basis for other abstract data types including the queue, the stack, and their variations.

Abstract definition

The abstract list type L with elements of some type E (a monomorphic list) is defined by the following functions:

nil: () → L
cons: E × LL
first: LE
rest: LL

with the axioms

first (cons (e, l)) = e
rest (cons (e, l)) = l

for any element e and any list l. It is implicit that

cons (e, l) ≠ l
cons (e, l) ≠ e
cons (e1, l1) = cons (e2, l2) if e1 = e2 and l1 = l2

Note that first (nil ()) and rest (nil ()) are not defined.

These axioms are equivalent to those of the abstract stack data type.

In type theory, the above definition is more simply regarded as an inductive type defined in terms of constructors: nil and cons. In algebraic terms, this can be represented as the transformation 1 + E×LL. first and rest are then obtained by pattern matching on the cons constructor and separately handling the nil case.

The list monad

The list type forms a monad with the following functions (using E* rather than L to represent monomorphic lists with elements of type E):

where append is defined as:

Alternatively, the monad may be defined in terms of operations return, fmap and join, with:

Note that fmap, join, append and bind are well-defined, since they're applied to progressively deeper arguments at each recursive call.

The list type is an additive monad, with nil as the monadic zero and append as monadic sum.

Lists form a monoid under the append operation. The identity element of the monoid is the empty list, nil. In fact, this is the free monoid over the set of list elements.

See also

Related Research Articles

In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. For example, a stack has push/pop operations that follow a Last-In-First-Out rule, and can be concretely implemented using either a list or an array. Another example is a set which stores values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a Boolean "in" or "not in".

In computer science, a double-ended queue is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation of a deque.

<span class="mw-page-title-main">Lisp (programming language)</span> Programming language family

Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in the late 1950s, it is the second-oldest high-level programming language still in common use, after Fortran. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Common Lisp, Scheme, Racket, and Clojure.

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that data access time is linear in respect to the number of nodes in the list. Because nodes are serially linked, accessing any node requires that the prior node be accessed beforehand. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

<span class="mw-page-title-main">Monoid</span> Algebraic structure with an associative operation and an identity element

In abstract algebra, a branch of mathematics, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being 0.

<span class="mw-page-title-main">Queue (abstract data type)</span> Abstract data type

In computer science, a queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end of the sequence. By convention, the end of the sequence at which elements are added is called the back, tail, or rear of the queue, and the end at which elements are removed is called the head or front of the queue, analogously to the words used when people line up to wait for goods or services.

<span class="mw-page-title-main">Lua (programming language)</span> Lightweight programming language

Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications. Lua is cross-platform, since the interpreter of compiled bytecode is written in ANSI C, and Lua has a relatively simple C API to embed it into applications.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

In computer programming, cons is a fundamental function in most dialects of the Lisp programming language. consconstructs memory objects which hold two values or pointers to two values. These objects are referred to as (cons) cells, conses, non-atomic s-expressions ("NATSes"), or (cons) pairs. In Lisp jargon, the expression "to cons x onto y" means to construct a new object with (cons xy). The resulting pair has a left half, referred to as the car, and a right half, referred to as the cdr.

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.

<span class="mw-page-title-main">Pontryagin duality</span> Duality for locally compact abelian groups

In mathematics, Pontryagin duality is a duality between locally compact abelian groups that allows generalizing Fourier transform to all such groups, which include the circle group, the finite abelian groups, and the additive group of the integers, the real numbers, and every finite-dimensional vector space over the reals or a p-adic field.

In computer programming and particularly in Lisp, an association list, often referred to as an alist, is a linked list in which each list element comprises a key and a value. The association list is said to associate the value with the key. In order to find the value associated with a given key, a sequential search is used: each element of the list is searched in turn, starting at the head, until the key is found. Associative lists provide a simple way of implementing an associative array, but are efficient only when the number of keys is very small.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

In functional programming, a monad is a structure that combines program fragments (functions) and wraps their return values in a type with additional computation. In addition to defining a wrapping monadic type, monads define two operators: one to wrap a value in the monad type, and another to compose together functions that output values of the monad type. General-purpose languages use monads to reduce boilerplate code needed for common operations. Functional languages use monads to turn complicated sequences of functions into succinct pipelines that abstract away control flow, and side-effects.

In computer programming, append is the operation for concatenating linked lists or arrays in some high-level programming languages.

In many programming languages, map is a higher-order function that applies a given function to each element of a collection, e.g. a list or set, returning the results in a collection of the same type. It is often called apply-to-all when considered in functional form.

A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was described by Gérard Huet in 1997. It includes and generalizes the gap buffer technique sometimes used with arrays.

<span class="mw-page-title-main">Representation theory</span> Branch of mathematics that studies abstract algebraic structures

Representation theory is a branch of mathematics that studies abstract algebraic structures by representing their elements as linear transformations of vector spaces, and studies modules over these abstract algebraic structures. In essence, a representation makes an abstract algebraic object more concrete by describing its elements by matrices and their algebraic operations. The theory of matrices and linear operators is well-understood, so representations of more abstract objects in terms of familiar linear algebra objects helps glean properties and sometimes simplify calculations on more abstract theories.

In computer science, array is a data type that represents a collection of elements, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an array variable or array value. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by analogy with the physical concept, tensor.

<span class="mw-page-title-main">Functor (functional programming)</span> Design pattern in pure functional programming

In functional programming, a functor is a design pattern inspired by the definition from category theory that allows one to apply a function to values inside a generic type without changing the structure of the generic type. In Haskell this idea can be captured in a type class:

References

  1. Abelson, Harold; Sussman, Gerald Jay (1996). Structure and Interpretation of Computer Programs . MIT Press.
  2. Reingold, Edward; Nievergelt, Jurg; Narsingh, Deo (1977). Combinatorial Algorithms: Theory and Practice. Englewood Cliffs, New Jersey: Prentice Hall. pp. 38–41. ISBN   0-13-152447-X.
  3. Barnett, Granville; Del tonga, Luca (2008). "Data Structures and Algorithms" (PDF). mta.ca. Retrieved 12 November 2014.
  4. Lerusalimschy, Roberto (December 2003). Programming in Lua (first edition) (First ed.). Lua.org. ISBN   8590379817 . Retrieved 12 November 2014.
  5. Steele, Guy (1990). Common Lisp (Second ed.). Digital Press. pp. 29–31. ISBN   1-55558-041-6.