Python Generators: When and How

Also Featuring: What and Why

Robert Boscacci
5 min readJan 14, 2019

When you write code, primarily, you want for it to work, and secondarily, you want for it to work efficiently. Efficient code minimizes those situations where you’re sitting there waiting for your script to run, wondering if you could have written more efficient code and spent the extra time sleeping, dreaming of rainbow unicorns. Efficient code is also just less likely to fail. We like that.

For example, sometimes, it’s better to use generator expressions than list comprehensions in Python—especially when you don’t need the list implicitly created by a list comprehension.

For example, if you were to run a list comprehension squaring numbers within a sum function like this one:

final_sum = sum([number ** 2 for number in range(100000000)])

Python would run through the above code inside-out, sort of like this:

1. Make a new list (as hinted by the square brackets).

2. Square, then append to the list, every integer from zero to 100 million.

3. Stuff that heaping whale of a list into memory.

Just the Beginning of the Massive List of Numbers you Never Needed to Make

4. Go back to the beginning of the list: Sum up all the terms from left to right.

5. Return the thing you wanted in the first place: The sum.

6. Release from memory the massive list of squared integers. (..!)

Is there a more efficient way to do the same thing?

Yes! With a generator expression:

final_sum = sum(number ** 2 for number in range(100000000))

Notice the subtle difference from the list comprehension: We’ve ditched the square brackets. Now we’re just in the parenthetical brackets of the sum function, using an anonymous generator expression. Here’s the new order of operations this will follow:

1. Start at zero. Square it. Save the, uh, sum. (0² + {no previous sum} = 0)

2. Move to one. Square it and add that to the last sum. Save. (1² + 0 = 1)

3. Move to two. Square it and add that to the last sum. Save. (2² + 1 = 5)

4. Move to three. Square it and add that to the last sum. Save. (3² + 5 = 14)

5. Repeat until to the 100 millionth term.

5. Return the final sum.

Much better! We never had to save any list of integers into memory: We just lazily iterated through each integer from zero to a hundred million, adding each integer’s square to the previous sum we had saved. No list involved.

This simple example generator function is only ~7% faster at working through 100 million integers than a list comprehension doing the same task, but I’ll take that ~7% speed boost every time.

List comprehension was ~7% slower than generator! It adds up quickly with repeat operations.

The list comprehension works for this task, but now that you know there’s a better, faster way to do the same thing, you will have a guilty conscience every time you go the slower, less efficient route, or at least I will.

When I attempt to sum the squares of a billion integers using the same functions, my jupyter kernel just crashes on the list comprehension. I don’t have enough memory to handle a list that huge. The generator doesn’t have that problem: It takes a good four minutes to run over a billion integers for me, but it runs. It doesn’t need nearly as much space in memory to do what it needs to do.

But…What Even Is a Generator in Python?

Fair question. Now that we know we want to use it, let’s talk about what it is.

The following knowledge and examples draw heavily from Programiz:

Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).

If a function contains at least one yield statement (it may contain other yield or returnstatements), it becomes a generator function.

The difference is that, while a return statement terminates a function entirely, a yieldstatement pauses the function saving all its states and later continues from there on successive calls.

Generators are great for another big reason: They can operate on an infinite or continuous stream of data. For example, the following function could, theoretically, keep returning even numbers forever by using an infinite while loop and a yield statement in tandem:

### Define a generator function: ### def all_even_generator():
n = 0
while True: ### Aka always, forever;
yield n ### This is where it will yield and freeze;
n += 2 ### It starts up again here when we call next()
### Instantiate a generator object: ###genny = all_even_generator()### Pump out some yields: ###next(genny)
Output

Recognize that this function doesn’t attempt to generate every even number in existence and return that as a list (which is impossible, there are infinite), it just gives us the next even number every time we call next() on the genny object.

Conclusion

Use simple generator expressions instead of list comprehensions whenever you won’t need the actual list later.

Use formal generators when you need: A function that magically freezes itself after it yields something (which is like returning something), and can be un-frozen to continue yielding more things later.

--

--