Batch Processing
debt(d7/e5/b5/t5)
Closest to 'only careful code review or runtime testing' (d7) — Blackfire and php-meminfo (per detection_hints) are profiling tools that surface the issue only when you run the job and observe memory/time; static linters won't flag a ->get() on a large table.
Closest to 'touches multiple files / significant refactor in one component' (e5) — quick_fix says rewrite to generators + cursor pagination + chunk loops + per-batch transactions; that's a refactor of the import/export pipeline, not a one-liner, though usually scoped to one component.
Closest to 'persistent productivity tax' (b5) — applies_to CLI and queue-worker contexts means batching discipline must be applied across all bulk jobs; chunk size, transaction boundaries, and memory release become recurring concerns for many work streams.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5) — misconception that per-row loops equal batching is a classic gotcha, and common_mistakes (chunk size vs memory, missing transactions, PHP not GCing between iterations) are well-documented surprises devs hit once and remember.
Also Known As
TL;DR
Explanation
Batch processing groups operations to amortise fixed costs: instead of 1000 individual INSERT statements, a single INSERT with 1000 rows is 10–100x faster. In PHP, chunked processing is essential for memory management: instead of loading 100,000 rows into an array, process them in chunks of 500 using LIMIT/OFFSET or cursor-based pagination, keeping memory usage constant. PHP CLI scripts using generators can process arbitrarily large datasets with O(1) memory. Laravel's chunk() and chunkById() methods, and Doctrine's iterate(), implement this pattern for ORM queries.
Diagram
flowchart TD
subgraph Real-time - Slow
R1[Request 1 - 1 DB write]
R2[Request 2 - 1 DB write]
R3[Request 3 - 1 DB write]
R1 & R2 & R3 --> DB1[(1000 requests = 1000 queries)]
end
subgraph Batch - Fast
QUEUE[(Queue: 1000 events)]
WORKER[Worker collects 100 events]
QUEUE --> WORKER
WORKER -->|1 bulk INSERT| DB2[(1000 events = 10 queries)]
end
style DB1 fill:#f85149,color:#fff
style DB2 fill:#238636,color:#fff
Common Misconception
Why It Matters
Common Mistakes
- Using too large a batch size and exhausting memory — chunk size must balance throughput against memory.
- Not wrapping each batch in a transaction — a failure halfway through leaves partial data.
- Batching inserts but not updates — UPDATE ... WHERE id IN (...) is equally important.
- Forgetting to release memory between batches in long-running PHP processes — PHP does not garbage collect between loop iterations automatically.
Code Examples
$users = $pdo->query('SELECT * FROM users')->fetchAll(); // loads entire table
$stmt = $pdo->query('SELECT * FROM users');
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
process($row); // O(1) memory
}