Jason Tian

Suck out the marrow of data

Generators in Python

Definition

A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function (even if it also contains a return statement). There’s nothing else we need to do to create one.

Just remember that a generator is a special type of iterator.

In Python, “functions” with these capabilities are called generators, and they’re incredibly useful. generators (and the yield statement) were initially introduced to give programmers a more straightforward way to write code responsible for producing a series of values. Previously, creating something like a random number generator required a class or module that both generated values and kept track of state between calls. With the introduction of generators, this became much simpler.

To better understand the problem generators solve, let’s take a look at an example. Throughout the example, keep in mind the core problem being solved: generating a series of values.

Outside of Python, all but the simplest generators would be referred to as coroutines.

Stop criteria

  1. If a generator function calls return
  2. If the generator is exhausted by calling next(). After that if we call next() again it will result in an error.

Examples

Basic function

fibSeries = [0,1]
def fib1(n):
    for i in range(2,n):
        fibSeries.append(fibSeries[i-1]+fibSeries[i-2])
    return fibSeries[n-1]

The complexity is O(N).

Recursive function

def fib2(n):
    if n==1:
        return 0
    elif n==2:
        return 1
    else:
        return fib(n-1)+fib(n-2)

The issue with this implementation is that each function calls two functions and that become exponetial. Think of fib(20). It will call fib(18) and fib(19). fib(19) will call fib(18) and fib(17). So fib(18) gets called 2 times, fib(17) 3 times, etc. And the count goes up exponetially. The same code as above with a counter. fib(20) causes the function to be called over 13K times. Try fib(40)… the code wouldn’t even complete.

Generators

def temp():
    a,b = 0,1
    yield a
    yield b
    while True:
        a, b = b, a + b
        yield b
def fib3(n):
    res = []
    for i,f in enumerate(temp()):
        if i > 10: break
        res.append(f)
    return res[-1]

Running time

%timeit fib1(10)
%timeit fib2(10)
%timeit fib3(10)

Output:

100000 loops, best of 3: 2.24 µs per loop
10000 loops, best of 3: 26.3 µs per loop
100000 loops, best of 3: 3.31 µs per loop

So the basic fuction is the best, generator function comes next and recursive function is the worst.

This fibonacci is not a good problem for generator function. I will provide another fancier example. This example is shameless stolen from this blog.

import random

def get_data():
    """Return 3 random integers between 0 and 9"""
    return random.sample(range(10), 3)

def consume():
    """Displays a running average across lists of integers sent to it"""
    running_sum = 0
    data_items_seen = 0

    while True:
        data = yield
        data_items_seen += len(data)
        running_sum += sum(data)
        print('The running average is {}'.format(running_sum / float(data_items_seen)))

def produce(consumer):
    """Produces a set of values and forwards them to the pre-defined consumer
    function"""
    while True:
        data = get_data()
        print('Produced {}'.format(data))
        consumer.send(data)
        yield

if __name__ == '__main__':
    consumer = consume()
    consumer.send(None)
    producer = produce(consumer)

    for _ in range(10):
        print('Producing...')
        next(producer)

Tips

  • generators are used to generate a series of values
  • yield is like the return of generator functions
  • The only other thing yield does is save the “state” of a generator function
  • A generator is just a special type of iterator
  • Like iterators, we can get the next value from a generator using next()
  • It is common to use while True: inside generator function.