Database Bulk Operations
debt(d7/e4/b3/t5)
Closest to 'only careful code review or runtime testing' (d7). detection_hints.automated is no; only a code_pattern regex (foreach with ->execute) exists, no SAST/linter rule. The per-row loop runs correctly and only surfaces as a performance problem under load or profiling, requiring code review or runtime testing to spot.
Closest to 'simple parameterised fix' (e3) leaning to e4. quick_fix is replacing per-row loops with chunked multi-row INSERT/UPSERT statements — a pattern replacement, but it touches batching logic, transaction boundaries, and bind-parameter limits, so slightly more than a one-pattern swap.
Closest to 'localised tax' (b3). The choice is confined to specific import/sync/loop code paths (applies_to contexts web/cli/queue-worker) rather than load-bearing across the whole system; the rest of the codebase is unaffected by a single batch routine.
Closest to 'notable trap' (t5). The misconception — that per-row INSERT loops are fine because the database is fast — is a documented gotcha most devs eventually learn, since per-statement round-trip overhead is invisible until row counts grow. Additional traps (bind-parameter limits, lock duration on bulk deletes, SQL injection via string concatenation) reinforce a t5 level.
Also Known As
TL;DR
Explanation
Bulk operations process many rows with a single database statement or a small number of batched statements, rather than executing one INSERT/UPDATE/DELETE per row in a loop. Each statement carries fixed costs: network round-trip latency, query parsing and planning, and per-statement transaction overhead. Doing 10,000 individual inserts pays those costs 10,000 times; a multi-row INSERT or a batched executemany pays them once or a handful of times.
The core techniques are: multi-row INSERT (INSERT INTO t (a,b) VALUES (1,2),(3,4),(5,6)), bulk UPDATE via a single statement with CASE or a join against a VALUES list or temp table, bulk DELETE with a WHERE id IN (...) or range predicate, and bulk UPSERT (INSERT ... ON CONFLICT / ON DUPLICATE KEY UPDATE). For very large loads, dedicated paths like COPY (PostgreSQL) or LOAD DATA INFILE (MySQL) are dramatically faster than INSERT.
Batching is not free of trade-offs. A single huge statement holds locks longer, grows the transaction log, and can blow past parameter limits (PostgreSQL caps bind parameters at 65535, so a wide multi-row insert needs chunking). The practical pattern is to chunk into batches of a few hundred to a few thousand rows, wrap each batch in its own transaction, and tune the batch size against lock contention and memory. Bulk DELETEs on hot tables should be ranged and paced to avoid long lock holds and replication lag.
Done well, bulk operations turn a multi-minute import or migration into seconds. Done naively - one query per row inside a request - they are a classic source of slow endpoints, connection exhaustion, and timeouts.
Common Misconception
Why It Matters
Common Mistakes
- Running one INSERT/UPDATE per row inside a loop instead of batching into multi-row statements.
- Building a single statement so large it exceeds the driver's bind-parameter limit or available memory.
- Committing every row in its own transaction, paying commit overhead per row instead of per batch.
- Bulk-deleting millions of rows in one statement, holding locks long enough to stall other writers and lag replicas.
- Concatenating values into SQL strings instead of using parameterised batches, opening an injection hole.
Avoid When
- Avoid one giant statement for millions of rows - chunk it to bound lock duration, log growth, and parameter count.
- Do not bulk-delete large ranges on hot tables in a single statement; range and pace deletes to limit lock holds and replication lag.
- Skip batching when only a handful of rows are involved - the added complexity is not worth it.
When To Use
- Use multi-row INSERT/UPSERT when importing, seeding, or syncing hundreds or more rows.
- Use a single set-based UPDATE/DELETE with a WHERE predicate instead of per-row statements in a loop.
- Use COPY or LOAD DATA INFILE for very large bulk loads where raw throughput matters most.
Code Examples
// One round-trip per row - slow and connection-hogging
$stmt = $pdo->prepare('INSERT INTO events (user_id, action) VALUES (?, ?)');
foreach ($events as $e) {
$stmt->execute([$e['user_id'], $e['action']]); // N round-trips
}
// Chunked multi-row INSERT - one round-trip per batch
foreach (array_chunk($events, 500) as $chunk) {
$placeholders = implode(',', array_fill(0, count($chunk), '(?, ?)'));
$sql = "INSERT INTO events (user_id, action) VALUES $placeholders";
$params = [];
foreach ($chunk as $e) {
$params[] = $e['user_id'];
$params[] = $e['action'];
}
$pdo->beginTransaction();
$pdo->prepare($sql)->execute($params);
$pdo->commit();
}