Generator (computer programming)

Last updated

In computer science, a generator is a routine that can be used to control the iteration behaviour of a loop. All generators are also iterators. [1] A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.

Contents

Generators can be implemented in terms of more expressive control flow constructs, such as coroutines or first-class continuations. [2] Generators, also known as semicoroutines, [3] are a special case of (and weaker than) coroutines, in that they always yield control back to the caller (when passing a value back), rather than specifying a coroutine to jump to; see comparison of coroutines with generators.

Uses

Generators are usually invoked inside loops. [4] The first time that a generator invocation is reached in a loop, an iterator object is created that encapsulates the state of the generator routine at its beginning, with arguments bound to the corresponding parameters. The generator's body is then executed in the context of that iterator until a special yield action is encountered; at that time, the value provided with the yield action is used as the value of the invocation expression. The next time the same generator invocation is reached in a subsequent iteration, the execution of the generator's body is resumed after the yield action, until yet another yield action is encountered. In addition to the yield action, execution of the generator body can also be terminated by a finish action, at which time the innermost loop enclosing the generator invocation is terminated. In more complicated situations, a generator may be used manually outside of a loop to create an iterator, which can then be used in various ways.

Because generators compute their yielded values only on demand, they are useful for representing streams, such as sequences that would be expensive or impossible to compute at once. These include e.g. infinite sequences and live data streams.

When eager evaluation is desirable (primarily when the sequence is finite, as otherwise evaluation will never terminate), one can either convert to a list, or use a parallel construction that creates a list instead of a generator. For example, in Python a generator g can be evaluated to a list l via l = list(g), while in F# the sequence expression seq { ... } evaluates lazily (a generator or sequence) but [ ... ] evaluates eagerly (a list).

In the presence of generators, loop constructs of a language – such as for and while – can be reduced into a single loop ... end loop construct; all the usual loop constructs can then be comfortably simulated by using suitable generators in the right way. For example, a ranged loop like for x = 1 to 10 can be implemented as iteration through a generator, as in Python's for x in range(1, 10). Further, break can be implemented as sending finish to the generator and then using continue in the loop.

Timeline

Generators first appeared in CLU (1975), [5] were a prominent feature in the string manipulation language Icon (1977) and are now available in Python (2001), [6] C#, [7] Ruby, PHP, [8] ECMAScript (as of ES6/ES2015), and other languages. In CLU and C#, generators are called iterators, and in Ruby, enumerators.

Lisp

The final Common Lisp standard does not natively provide generators, yet various library implementations exist, such as SERIES documented in CLtL2 or pygen.

CLU

A yield statement is used to implement iterators over user-defined data abstractions. [9]

string_chars = iter (s: string) yields (char);   index: int := 1;   limit: int := string$size (s);   while index <= limit do     yield (string$fetch(s, index));     index := index + 1;     end; end string_chars;  for c: char in string_chars(s) do    ... end; 

Icon

Every expression (including loops) is a generator. The language has many generators built-in and even implements some of the logic semantics using the generator mechanism (logical disjunction or "OR" is done this way).

Printing squares from 0 to 20 can be achieved using a co-routine by writing:

localsquares,jsquares:=create(seq(0)^2)everyj:=|@squaresdoifj<=20thenwrite(j)elsebreak

However, most of the time custom generators are implemented with the "suspend" keyword which functions exactly like the "yield" keyword in CLU.

C

C does not have generator functions as a language construct, but, as they are a subset of coroutines, it is simple to implement them using any framework that implements stackful coroutines, such as libdill. [10] On POSIX platforms, when the cost of context switching per iteration is not a concern, or full parallelism rather than merely concurrency is desired, a very simple generator function framework can be implemented using pthreads and pipes.

C++

It is possible to introduce generators into C++ using pre-processor macros. The resulting code might have aspects that are very different from native C++, but the generator syntax can be very uncluttered. [11] The set of pre-processor macros defined in this source allow generators defined with the syntax as in the following example:

$generator(descent){inti;// place the constructor of our generator, e.g. // descent(int minv, int maxv) {...}// from $emit to $stop is a body of our generator:$emit(int)// will emit int values. Start of body of the generator.for(i=10;i>0;--i)$yield(i);// similar to yield in Python,// returns next number in [1..10], reversed.$stop;// stop, end of sequence. End of body of the generator.};

This can then be iterated using:

intmain(intargc,char*argv[]){descentgen;for(intn;gen(n);)// "get next" generator invocationprintf("next number is %d\n",n);return0;}

Moreover, C++11 allows foreach loops to be applied to any class that provides the begin and end functions. It's then possible to write generator-like classes by defining both the iterable methods (begin and end) and the iterator methods (operator!=, operator++ and operator*) in the same class. For example, it is possible to write the following program:

#include<iostream>intmain(){for(inti:range(10)){std::cout<<i<<std::endl;}return0;}

A basic range implementation would look like that:

classrange{private:intlast;intiter;public:range(intend):last(end),iter(0){}// Iterable functionsconstrange&begin()const{return*this;}constrange&end()const{return*this;}// Iterator functionsbooloperator!=(constrange&)const{returniter<last;}voidoperator++(){++iter;}intoperator*()const{returniter;}};

Perl

Perl does not natively provide generators, but support is provided by the Coro::Generator module which uses the Coro co-routine framework. Example usage:

usestrict;usewarnings;# Enable generator { BLOCK } and yielduseCoro::Generator;# Array reference to iterate overmy$chars=['A'...'Z'];# New generator which can be called like a coderef.my$letters=generator{my$i=0;formy$letter(@$chars){# get next letter from $charsyield$letter;}};# Call the generator 15 times.print$letters->(),"\n"for(0..15);

Raku

Example parallel to Icon uses Raku (formerly/aka Perl 6) Range class as one of several ways to achieve generators with the language.

Printing squares from 0 to 20 can be achieved by writing:

for (0 .. *).map(* ** 2) -> $i {     lastif$i > 20;     say$i } 

However, most of the time custom generators are implemented with "gather" and "take" keywords in a lazy context.

Tcl

In Tcl 8.6, the generator mechanism is founded on named coroutines.

procgenerator{body}{coroutinegen[incr::disambiguator]apply{{script}{# Produce the result of [generator], the name of the generatoryield[infocoroutine]# Do the generationeval$script# Finish the loop of the caller using a 'break' exceptionreturn-codebreak }}$body}# Use a simple 'for' loop to do the actual generationsetcount[generator{for{seti10}{$i<=20}{incri}{yield$i}}]# Pull values from the generator until it is exhaustedwhile1{puts[$count]}

Haskell

In Haskell, with its lazy evaluation model, every datum created with a non-strict data constructor is generated on demand. For example,

countFrom::Integer->[Integer]countFromn=n:countFrom(n+1)from10to20::[Integer]from10to20=takeWhile(<=20)$countFrom10primes::[Integer]primes=2:3:nextPrime5wherenextPrimen|notDivisiblen=n:nextPrime(n+2)|otherwise=nextPrime(n+2)notDivisiblen=all((/=0).(remn))$takeWhile((<=n).(^2))$tailprimes

where (:) is a non-strict list constructor, cons, and $ is just a "called-with" operator, used for parenthesization. This uses the standard adaptor function,

takeWhilep[]=[]takeWhilep(x:xs)|px=x:takeWhilepxs|otherwise=[]

which walks down the list and stops on the first element that doesn't satisfy the predicate. If the list has been walked before until that point, it is just a strict data structure, but if any part hadn't been walked through before, it will be generated on demand. List comprehensions can be freely used:

squaresUnder20=takeWhile(<=20)[x*x|x<-countFrom10]squaresForNumbersUnder20=[x*x|x<-takeWhile(<=20)$countFrom10]

Racket

Racket provides several related facilities for generators. First, its for-loop forms work with sequences, which are a kind of a producer:

(for([i(in-range1020)])(printf"i = ~s\n"i))

and these sequences are also first-class values:

(define10-to-20(in-range1020))(for([i10-to-20])(printf"i = ~s\n"i))

Some sequences are implemented imperatively (with private state variables) and some are implemented as (possibly infinite) lazy lists. Also, new struct definitions can have a property that specifies how they can be used as sequences.

But more directly, Racket comes with a generator library for a more traditional generator specification. For example,

#lang racket(requireracket/generator)(define(ints-fromfrom)(generator()(for([i(in-naturalsfrom)]); infinite sequence of integers from 0(yieldi))))(defineg(ints-from10))(list(g)(g)(g)); -> '(10 11 12)

Note that the Racket core implements powerful continuation features, providing general (re-entrant) continuations that are composable, and also delimited continuations. Using this, the generator library is implemented in Racket.

PHP

The community of PHP implemented generators in PHP 5.5. Details can be found in the original Request for Comments: Generators.

Infinite Fibonacci sequence:

functionfibonacci():Generator{$last=0;$current=1;yield1;while(true){$current=$last+$current;$last=$current-$last;yield$current;}}foreach(fibonacci()as$number){echo$number,"\n";}

Fibonacci sequence with limit:

functionfibonacci(int$limit):Generator{yield$a=$b=$i=1;while(++$i<$limit){yield$a=($b=$a+$b)-$a;}}foreach(fibonacci(10)as$number){echo"$number\n";}

Any function which contains a yield statement is automatically a generator function.

Ruby

Ruby supports generators (starting from version 1.9) in the form of the built-in Enumerator class.

# Generator from an Enumerator objectchars=Enumerator.new(['A','B','C','Z'])4.times{putschars.next}# Generator from a blockcount=Enumerator.newdo|yielder|i=0loop{yielder.yieldi+=1}end100.times{putscount.next}

Java

Java has had a standard interface for implementing iterators since its early days, and since Java 5, the "foreach" construction makes it easy to loop over objects that provide the java.lang.Iterable interface. (The Java collections framework and other collections frameworks, typically provide iterators for all collections.)

recordPair(inta,intb){};Iterable<Integer>myIterable=Stream.iterate(newPair(1,1),p->newPair(p.b,p.a+p.b)).limit(10).map(p->p.a)::iterator;myIterable.forEach(System.out::println);

Or get an Iterator from the Java 8 super-interface BaseStream of Stream interface.

recordPair(inta,intb){};// Save the iterator of a stream that generates fib sequenceIterator<Integer>myGenerator=Stream// Generates Fib sequence.iterate(newPair(1,1),p->newPair(p.b,p.a+p.b)).map(p->p.a).iterator();// Print the first 5 elementsfor(inti=0;i<5;i++){System.out.println(myGenerator.next());}System.out.println("done with first iteration");// Print the next 5 elementsfor(inti=0;i<5;i++){System.out.println(myGenerator.next());}

Output:

11235done with first iteration813213455

C#

An example C# 2.0 generator (the yield is available since C# version 2.0): Both of these examples utilize generics, but this is not required. yield keyword also helps in implementing custom stateful iterations over a collection as discussed in this discussion. [12]

// Method that takes an iterable input (possibly an array)// and returns all even numbers.publicstaticIEnumerable<int>GetEven(IEnumerable<int>numbers){foreach(intnumberinnumbers){if((number%2)==0){yieldreturnnumber;}}}

It is possible to use multiple yield return statements and they are applied in sequence on each iteration:

publicclassCityCollection:IEnumerable<string>{publicIEnumerator<string>GetEnumerator(){yieldreturn"New York";yieldreturn"Paris";yieldreturn"London";}}

XL

In XL, iterators are the basis of 'for' loops:

import IO = XL.UI.CONSOLE  iterator IntegerIterator (var out Counter : integer; Low, High : integer) written Counter in Low..High is     Counter := Low     while Counter <= High loop         yield         Counter += 1  // Note that I needs not be declared, because declared 'var out' in the iterator // An implicit declaration of I as an integer is therefore made here for I in 1..5 loop     IO.WriteLn "I=", I 

F#

F# provides generators via sequence expressions, since version 1.9.1. [13] These can define a sequence (lazily evaluated, sequential access) via seq { ... }, a list (eagerly evaluated, sequential access) via [ ... ] or an array (eagerly evaluated, indexed access) via [| ... |] that contain code that generates values. For example,

seq{forbin0..25doifb<15thenyieldb*b}

forms a sequence of squares of numbers from 0 to 14 by filtering out numbers from the range of numbers from 0 to 25.

Python

Generators were added to Python in version 2.2 in 2001. [6] An example generator:

fromtypingimportIteratordefcountfrom(n:int)->Iterator[int]:whileTrue:yieldnn+=1# Example use: printing out the integers from 10 to 20.# Note that this iteration terminates normally, despite# countfrom() being written as an infinite loop.foriincountfrom(10):ifi<=20:print(i)else:break# Another generator, which produces prime numbers indefinitely as needed.importitertoolsdefprimes()->Iterator[int]:"""Generate prime numbers indefinitely as needed."""yield2n=3p=[2]whileTrue:# If dividing n by all the numbers in p, up to and including sqrt(n),# produces a non-zero remainder then n is prime.ifall(n%f>0forfinitertools.takewhile(lambdaf:f*f<=n,p)):yieldnp.append(n)n+=2

In Python, a generator can be thought of as an iterator that contains a frozen stack frame. Whenever next() is called on the iterator, Python resumes the frozen frame, which executes normally until the next yield statement is reached. The generator's frame is then frozen again, and the yielded value is returned to the caller.

PEP 380 (implemented in Python 3.3) adds the yield from expression, allowing a generator to delegate part of its operations to another generator or iterable. [14]

Generator expressions

Python has a syntax modeled on that of list comprehensions, called a generator expression that aids in the creation of generators. The following extends the first example above by using a generator expression to compute squares from the countfrom generator function:

squares=(n*nfornincountfrom(2))forjinsquares:ifj<=20:print(j)else:break

ECMAScript

ECMAScript 6 (a.k.a. Harmony) introduced generator functions.

An infinite Fibonacci sequence can be written using a function generator:

function*fibonacci(limit){let[prev,curr]=[0,1];while(!limit||curr<=limit){yieldcurr;[prev,curr]=[curr,prev+curr];}}// bounded by upper limit 10for(constnoffibonacci(10)){console.log(n);}// generator without an upper bound limitfor(constnoffibonacci()){console.log(n);if(n>10000)break;}// manually iteratingletfibGen=fibonacci();console.log(fibGen.next().value);// 1console.log(fibGen.next().value);// 1console.log(fibGen.next().value);// 2console.log(fibGen.next().value);// 3console.log(fibGen.next().value);// 5console.log(fibGen.next().value);// 8// picks up from where you stoppedfor(constnoffibGen){console.log(n);if(n>10000)break;}

R

The iterators package can be used for this purpose. [15] [16]

library(iterators)# Example ------------------abc<-iter(c('a','b','c'))nextElem(abc)

Smalltalk

Example in Pharo Smalltalk:

The Golden ratio generator below returns to each invocation 'goldenRatio next' a better approximation to the Golden Ratio.

goldenRatio:=Generatoron: [ :g|| x y z r |x:=0.y:=1.  [     z:=x+y.r:= (z/y) asFloat.x:=y.y:=z.gyield:r  ] repeat  ].goldenRationext.

The expression below returns the next 10 approximations.

Charactercrjoin: ((1to:10) collect: [ :dummy|rationext ]).

See more in A hidden gem in Pharo: Generator.

See also

Notes

  1. What is the difference between an Iterator and a Generator?
  2. Kiselyov, Oleg (January 2004). "General ways to traverse collections in Scheme".
  3. Anthony Ralston (2000). Encyclopedia of computer science. Nature Pub. Group. ISBN   978-1-56159-248-7 . Retrieved 11 May 2013.
  4. The Icon Programming Language utilizes generators to implement its goal directed evaluation. In Icon, generators can be invoked in contexts outside of the normal looping control structures.
  5. Liskov, Barbara (April 1992). "A History of CLU" (PDF). Archived from the original (PDF) on 2003-09-17. Retrieved 2006-01-05.
  6. 1 2 Python Enhancement Proposals: PEP 255: Simple Generators, PEP 289: Generator Expressions, PEP 342: Coroutines via Enhanced Generators
  7. yield (C# Reference)
  8. "PHP: Generators overview - Manual".
  9. Liskov, B.; Snyder, A.; Atkinson, R.; Schaffert, C. (1977). "Abstraction mechanisms in CLU". Communications of the ACM. 20 (8): 564–576. CiteSeerX   10.1.1.112.656 . doi:10.1145/359763.359789. S2CID   17343380.
  10. "Structured Concurrency for C".
  11. "Generators in C++". 21 September 2008.
  12. "What is the yield keyword used for in C#?". stackoverflow.com. Retrieved 2018-01-01.
  13. "Some Details on F# Computation Expressions" . Retrieved 2007-12-14.
  14. PEP 380 -- Syntax for Delegating to a Subgenerator
  15. Generator functions in R
  16. "Infinite generators in R". 5 January 2013.

Related Research Articles

In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists. Various types of iterators are often provided via a container's interface. Though the interface and semantics of a given iterator are fixed, iterators are often implemented in terms of the structures underlying a container implementation and are often tightly coupled to the container to enable the operational semantics of the iterator. An iterator performs traversal and also gives access to data elements in a container, but does not itself perform iteration.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

An Aggregate pattern can refer to concepts in either statistics or computer programming. Both uses deal with considering a large case as composed of smaller, simpler, pieces.

<span class="mw-page-title-main">F Sharp (programming language)</span> Microsoft programming language

F# is a general-purpose, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. It is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but can also generate JavaScript and graphics processing unit (GPU) code.

Coroutines are computer program components that allow execution to be suspended and resumed, generalizing subroutines for cooperative multitasking. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

A random password generator is a software program or hardware device that takes input from a random or pseudo-random number generator and automatically generates a password. Random passwords can be generated manually, using simple sources of randomness such as dice or coins, or they can be generated using a computer.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion, which may lack a definite "direction" inherent in corecursion and recursion.

In mathematics and in computer programming, a variadic function is a function of indefinite arity, i.e., one which accepts a variable number of arguments. Support for variadic functions differs widely among programming languages.

In number theory, a narcissistic number in a given number base is a number that is the sum of its own digits each raised to the power of the number of digits.

<span class="mw-page-title-main">JavaScript syntax</span> Set of rules defining correctly structured programs

The syntax of JavaScript is the set of rules that define a correctly structured JavaScript program.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

setcontext is one of a family of C library functions used for context control. The setcontext family allows the implementation in C of advanced control flow patterns such as iterators, fibers, and coroutines. They may be viewed as an advanced version of setjmp/longjmp; whereas the latter allows only a single non-local jump up the stack, setcontext allows the creation of multiple cooperative threads of control, each with its own stack.

This article describes the syntax of the C# programming language. The features described are compatible with .NET Framework and Mono.

This article describes the features in the programming language Haskell.

In C++, associative containers refer to a group of class templates in the standard library of the C++ programming language that implement ordered associative arrays. Being templates, they can be used to store arbitrary elements, such as integers or custom classes. The following containers are defined in the current revision of the C++ standard: set, map, multiset, multimap. Each of these containers differ only on constraints placed on their elements.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

References