List comprehension

Last updated

A list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists. It follows the form of the mathematical set-builder notation (set comprehension) as distinct from the use of map and filter functions.

Contents

Overview

Consider the following example in mathematical set-builder notation.

or often

This can be read, " is the set of all numbers "2 times " SUCH THAT is an ELEMENT or MEMBER of the set of natural numbers (), AND squared is greater than ."

The smallest natural number, x = 1, fails to satisfy the condition x2>3 (the condition 12>3 is false) so 2 ·1 is not included in S. The next natural number, 2, does satisfy the condition (22>3) as does every other natural number. Thus x consists of 2, 3, 4, 5... Since the set S consists of all numbers "2 times x" it is given by S = {4, 6, 8, 10,...}. S is, in other words, the set of all even numbers greater than 2.

In this annotated version of the example:

A list comprehension has the same syntactic components to represent generation of a list in order from an input list or iterator:

The order of generation of members of the output list is based on the order of items in the input.

In Haskell's list comprehension syntax, this set-builder construct would be written similarly, as:

s=[2*x|x<-[0..],x^2>3]

Here, the list [0..] represents , x^2>3 represents the predicate, and 2*x represents the output expression.

List comprehensions give results in a defined order (unlike the members of sets); and list comprehensions may generate the members of a list in order, rather than produce the entirety of the list thus allowing, for example, the previous Haskell definition of the members of an infinite list.

History

The existence of related constructs predates the use of the term "List Comprehension". The SETL programming language (1969) has a set formation construct which is similar to list comprehensions. E.g., this code prints all prime numbers from 2 to N:

print([n in [2..N] | forall m in {2..n - 1} | n mod m > 0]);

The computer algebra system AXIOM (1973) has a similar construct that processes streams.

The first use of the term "comprehension" for such constructs was in Rod Burstall and John Darlington's description of their functional programming language NPL from 1977. In his retrospective "Some History of Functional Programming Languages", [1] David Turner recalls:

NPL was implemented in POP2 by Burstall and used for Darlington’s work on program transformation (Burstall & Darlington 1977). The language was first order, strongly (but not polymorphically) typed, purely functional, call-by-value. It also had “set expressions” e.g.

setofeven (X)  <=  <:x : x in X & even(x):>}}

In a footnote attached to the term "list comprehension", Turner also notes

I initially called these ZF expressions, a reference to Zermelo–Fraenkel set theory — it was Phil Wadler who coined the better term list comprehension.

Burstall and Darlington's work with NPL influenced many functional programming languages during the 1980s, but not all included list comprehensions. An exception was Turner's influential, pure, lazy, functional programming language Miranda, released in 1985. The subsequently developed standard pure lazy functional language Haskell includes many of Miranda's features, including list comprehensions.

Comprehensions were proposed as a query notation for databases [2] and were implemented in the Kleisli database query language. [3]

Examples in different programming languages

Similar constructs

Monad comprehension

In Haskell, a monad comprehension is a generalization of the list comprehension to other monads in functional programming.

Set comprehension

Version 3.x and 2.7 of the Python language introduces syntax for set comprehensions. Similar in form to list comprehensions, set comprehensions generate Python sets instead of lists.

>>>s={vforvin'ABCDABCD'ifvnotin'CB'}>>>print(s){'A','D'}>>>type(s)<class'set'>>>>

Racket set comprehensions generate Racket sets instead of lists.

(for/set([v"ABCDABCD"]#:unless(memberv(string->list"CB")))v))

Dictionary comprehension

Version 3.x and 2.7 of the Python language introduced a new syntax for dictionary comprehensions, similar in form to list comprehensions but which generate Python dicts instead of lists.

>>>s={key:valforkey,valinenumerate('ABCD')ifvalnotin'CB'}>>>s{0:'A',3:'D'}>>>

Racket hash table comprehensions generate Racket hash tables (one implementation of the Racket dictionary type).

(for/hash([(valkey)(in-indexed"ABCD")]#:unless(memberval(string->list"CB")))(valueskeyval))

Parallel list comprehension

The Glasgow Haskell Compiler has an extension called parallel list comprehension (also known as zip-comprehension) that permits multiple independent branches of qualifiers within the list comprehension syntax. Whereas qualifiers separated by commas are dependent ("nested"), qualifier branches separated by pipes are evaluated in parallel (this does not refer to any form of multithreadedness: it merely means that the branches are zipped).

-- regular list comprehensiona=[(x,y)|x<-[1..5],y<-[3..5]]-- [(1,3),(1,4),(1,5),(2,3),(2,4) ...-- zipped list comprehensionb=[(x,y)|(x,y)<-zip[1..5][3..5]]-- [(1,3),(2,4),(3,5)]-- parallel list comprehensionc=[(x,y)|x<-[1..5]|y<-[3..5]]-- [(1,3),(2,4),(3,5)]

Racket's comprehensions standard library contains parallel and nested versions of its comprehensions, distinguished by "for" vs "for*" in the name. For example, the vector comprehensions "for/vector" and "for*/vector" create vectors by parallel versus nested iteration over sequences. The following is Racket code for the Haskell list comprehension examples.

>(for*/list([x(in-range16)][y(in-range36)])(listxy))'((13)(14)(15)(23)(24)(25)(33)(34)(35)(43)(44)(45)(53)(54)(55))>(for/list([x(in-range16)][y(in-range36)])(listxy))'((13)(24)(35))

In Python, we could do as follows:

# regular list comprehension>>>a=[(x,y)forxinrange(1,6)foryinrange(3,6)][(1,3),(1,4),(1,5),(2,3),(2,4),...# parallel/zipped list comprehension>>>b=[xforxinzip(range(1,6),range(3,6))][(1,3),(2,4),(3,5)]

In Julia, practically the same results can be achieved as follows:

# regular array comprehension>>>a=[(x,y)forxin1:5foryin3:5]# parallel/zipped array comprehension>>>b=[xforxinzip(1:3,3:5)]

with the only difference that instead of lists, in Julia, we have arrays.

XQuery and XPath

Like the original NPL use, these are fundamentally database access languages.

This makes the comprehension concept more important, because it is computationally infeasible to retrieve the entire list and operate on it (the initial 'entire list' may be an entire XML database).

In XPath, the expression:

/library/book//paragraph[@style='first-in-chapter']

is conceptually evaluated as a series of "steps" where each step produces a list and the next step applies a filter function to each element in the previous step's output. [4]

In XQuery, full XPath is available, but FLWOR statements are also used, which is a more powerful comprehension construct. [5]

for$bin//bookwhere$b[@pages<400]order by$b//titlereturn<shortBook><title>{$b//title}</title><firstPara>{($book//paragraph)[1]}</firstPara></shortBook>

Here the XPath //book is evaluated to create a sequence (aka list); the where clause is a functional "filter", the order by sorts the result, and the <shortBook>...</shortBook> XML snippet is actually an anonymous function that builds/transforms XML for each element in the sequence using the 'map' approach found in other functional languages.

So, in another functional language the above FLWOR statement may be implemented like this:

map(newXML(shortBook,newXML(title,$1.title),newXML(firstPara,$1...))filter(lt($1.pages,400),xpath(//book)))

LINQ in C#

C# 3.0 has a group of related features called LINQ, which defines a set of query operators for manipulating object enumerations.

vars=Enumerable.Range(0,100).Where(x=>x*x>3).Select(x=>x*2);

It also offers an alternative comprehension syntax, reminiscent of SQL:

vars=fromxinEnumerable.Range(0,100)wherex*x>3selectx*2;

LINQ provides a capability over typical list comprehension implementations. When the root object of the comprehension implements the IQueryable interface, rather than just executing the chained methods of the comprehension, the entire sequence of commands are converted into an abstract syntax tree (AST) object, which is passed to the IQueryable object to interpret and execute.

This allows, amongst other things, for the IQueryable to

C++

C++ does not have any language features directly supporting list comprehensions but operator overloading (e.g., overloading |, >>, >>=) has been used successfully to provide expressive syntax for "embedded" query domain-specific languages (DSL). Alternatively, list comprehensions can be constructed using the erase-remove idiom to select elements in a container and the STL algorithm for_each to transform them.

#include<algorithm>#include<list>#include<numeric>usingnamespacestd;template<classC,classP,classT>Ccomprehend(C&&source,constP&predicate,constT&transformation){// initialize destinationCd=forward<C>(source);// filter elementsd.erase(remove_if(begin(d),end(d),predicate),end(d));// apply transformationfor_each(begin(d),end(d),transformation);returnd;}intmain(){list<int>range(10);// range is a list of 10 elements, all zeroiota(begin(range),end(range),1);// range now contains 1, 2, ..., 10list<int>result=comprehend(range,[](intx){returnx*x<=3;},[](int&x){x*=2;});// result now contains 4, 6, ..., 20}

There is some effort in providing C++ with list-comprehension constructs/syntax similar to the set builder notation.

list<int>N;list<double>S;for(inti=0;i<10;i++)N.push_back(i);S<<list_comprehension(3.1415*x,x,N,x*x>3)
booleven(intx){returnx%2==0;}boolx2(int&x){x*=2;returntrue;}list<int>l,t;intx,y;for(inti=0;i<10;i++)l.push_back(i);(x,t)=l|x2;(t,y)=t;t=l<9;t=t<7|even|x2;
<catalog><book><title>Hamlet</title><price>9.99</price><author><name>WilliamShakespeare</name><country>England</country></author></book><book>...</book> ... </catalog>

LEESA provides >> for XPath's / separator. XPath's // separator that "skips" intermediate nodes in the tree is implemented in LEESA using what's known as Strategic Programming. In the example below, catalog_, book_, author_, and name_ are instances of catalog, book, author, and name classes, respectively.

// Equivalent X-Path: "catalog/book/author/name"std::vector<name>author_names=evaluate(root,catalog_>>book_>>author_>>name_);// Equivalent X-Path: "catalog//name"std::vector<name>author_names=evaluate(root,catalog_>>DescendantsOf(catalog_,name_));// Equivalent X-Path: "catalog//author[country=="England"]"std::vector<name>author_names=evaluate(root,catalog_>>DescendantsOf(catalog_,author_)>>Select(author_,[](constauthor&a){returna.country()=="England";})>>name_);

See also

Notes and references

  1. Turner, David (2012). "Some history of functional programming languages" (PDF). International Symposium on Trends in Functional Programming, Springer, Berlin, Heidelberg. pp. 1–20.
  2. Comprehensions, a query notation for DBPLs
  3. The functional guts of the Kleisli query system
  4. "2.1 Location Steps". XML Path Language (XPath). W3C. 16 November 1999. Archived from the original on 9 December 2012. Retrieved 24 December 2008.
  5. "XQuery FLWOR Expressions". W3Schools . Archived from the original on 2011-10-08.
  6. "Single-variable List Comprehension in C++ using Preprocessor Macros". Archived from the original on 2011-08-21. Retrieved 2011-01-09.
  7. "C++ list comprehensions". Archived from the original on 2017-07-07. Retrieved 2011-01-09.
  8. "Language for Embedded Query and Traversal (LEESA)".

Axiom

Clojure

Common Lisp

Haskell

OCaml

Python

Related Research Articles

In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.

In set theory and its applications to logic, mathematics, and computer science, set-builder notation is a mathematical notation for describing a set by enumerating its elements, or stating the properties that its members must satisfy.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.

In computer science, a generator is a routine that can be used to control the iteration behaviour of a loop. All generators are also iterators. A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.

In functional programming, a monad is a structure that combines program fragments (functions) and wraps their return values in a type with additional computation. In addition to defining a wrapping monadic type, monads define two operators: one to wrap a value in the monad type, and another to compose together functions that output values of the monad type. General-purpose languages use monads to reduce boilerplate code needed for common operations. Functional languages use monads to turn complicated sequences of functions into succinct pipelines that abstract away control flow, and side-effects.

In computer programming, a guard is a boolean expression that must evaluate to true if the program execution is to continue in the branch in question. Regardless of which programming language is used, a guard clause, guard code, or guard statement, is a check of integrity preconditions used to avoid errors during execution.

In computer science, function composition is an act or mechanism to combine simple functions to build more complicated ones. Like the usual composition of functions in mathematics, the result of each function is passed as the argument of the next, and the result of the last one is the result of the whole.

<span class="mw-page-title-main">Python syntax and semantics</span> Set of rules defining correctly structured programs

The syntax of the Python programming language is the set of rules that defines how a Python program will be written and interpreted. The Python language has many similarities to Perl, C, and Java. However, there are some definite differences between the languages. It supports multiple programming paradigms, including structured, object-oriented programming, and functional programming, and boasts a dynamic type system and automatic memory management.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

In functional programming, filter is a higher-order function that processes a data structure in some order to produce a new data structure containing exactly those elements of the original data structure for which a given predicate returns the boolean value true.

List comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists. It follows the form of the mathematical set-builder notation as distinct from the use of map and filter functions.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

In computer science, partial application refers to the process of fixing a number of arguments of a function, producing another function of smaller arity. Given a function , we might fix the first argument, producing a function of type . Evaluation of this function might be represented as . Note that the result of partial function application in this case is a function that takes two arguments. Partial application is sometimes incorrectly called currying, which is a related, but distinct concept.

OptimJ is an extension for Java with language support for writing optimization models and abstractions for bulk data processing. The extensions and the proprietary product implementing the extensions were developed by Ateji which went out of business in September 2011. OptimJ aims at providing a clear and concise algebraic notation for optimization modeling, removing compatibility barriers between optimization modeling and application programming tools, and bringing software engineering techniques such as object-orientation and modern IDE support to optimization experts.

<span class="mw-page-title-main">XML transformation language</span> Type of programming language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.