Python Generators & yield
debt(d7/e2/b3/t5)
Closest to 'only careful code review or runtime testing' (d7), because pylint/memray won't flag list-vs-generator choices automatically — memory issues typically surface in testing or review, not in static analysis.
Closest to 'one-line patch' (e1) with slight bump toward e3 for cases needing function restructuring; quick_fix is literally swapping [...] for (...), but converting a return-list function to a yield-based generator may require small refactoring of call sites that index or re-iterate.
Closest to 'localised tax' (b3), generators apply per-function and per-pipeline; the choice has modest reach (callers must know it's single-pass, not indexable) but doesn't shape system architecture.
Closest to 'notable trap most devs eventually learn' (t5), matching the misconception that generators and lists are interchangeable — they cannot be restarted or indexed, and exhaustion raises StopIteration unexpectedly.
Also Known As
TL;DR
Explanation
A generator function uses yield instead of return — calling it returns a generator object (an iterator). Each call to next() resumes execution until the next yield. Generators enable: memory-efficient processing of large files (yield line by line), infinite sequences (counter, Fibonacci), and pipeline composition (chain multiple generators). yield from delegates to another generator/iterable. Generator expressions (x*2 for x in items) are concise single-expression generators. send() passes values into a generator — forming the basis of Python coroutines (pre-async/await). PHP generators use the same yield keyword — a rare case where PHP and Python share nearly identical syntax and semantics. itertools provides composable generator utilities: chain, islice, groupby, product.
Common Misconception
Why It Matters
Common Mistakes
- Returning a list when a generator would be sufficient — materialises the entire sequence in memory.
- Not using generator expressions: (x*2 for x in items) instead of list comprehensions for one-time iteration.
- Calling next() without a default on an exhausted generator — raises StopIteration unexpectedly.
- Generators that hold expensive resources without close() — use try/finally or contextlib.
Code Examples
# Loads entire file into memory:
def read_log(path):
return open(path).readlines() # 1GB log = 1GB RAM
# Generator — processes one line at a time:
def read_log(path):
with open(path) as f:
for line in f:
yield line.strip()
def read_large_csv(path):
with open(path) as f:
next(f) # skip header
for line in f:
yield line.strip().split(',')
# Processes millions of rows with constant memory
for row in read_large_csv('huge.csv'):
process(row)