{
    "slug": "db_bulk_operations",
    "term": "Database Bulk Operations",
    "category": "database",
    "difficulty": "intermediate",
    "short": "Process many rows in a single SQL statement or batched transaction instead of one round-trip per row, for far higher throughput.",
    "long": "Bulk operations process many rows with a single database statement or a small number of batched statements, rather than executing one INSERT/UPDATE/DELETE per row in a loop. Each statement carries fixed costs: network round-trip latency, query parsing and planning, and per-statement transaction overhead. Doing 10,000 individual inserts pays those costs 10,000 times; a multi-row INSERT or a batched executemany pays them once or a handful of times.\n\nThe core techniques are: multi-row INSERT (INSERT INTO t (a,b) VALUES (1,2),(3,4),(5,6)), bulk UPDATE via a single statement with CASE or a join against a VALUES list or temp table, bulk DELETE with a WHERE id IN (...) or range predicate, and bulk UPSERT (INSERT ... ON CONFLICT / ON DUPLICATE KEY UPDATE). For very large loads, dedicated paths like COPY (PostgreSQL) or LOAD DATA INFILE (MySQL) are dramatically faster than INSERT.\n\nBatching is not free of trade-offs. A single huge statement holds locks longer, grows the transaction log, and can blow past parameter limits (PostgreSQL caps bind parameters at 65535, so a wide multi-row insert needs chunking). The practical pattern is to chunk into batches of a few hundred to a few thousand rows, wrap each batch in its own transaction, and tune the batch size against lock contention and memory. Bulk DELETEs on hot tables should be ranged and paced to avoid long lock holds and replication lag.\n\nDone well, bulk operations turn a multi-minute import or migration into seconds. Done naively - one query per row inside a request - they are a classic source of slow endpoints, connection exhaustion, and timeouts.",
    "aliases": [
        "batch insert",
        "bulk insert",
        "multi-row insert",
        "batch operations"
    ],
    "tags": [
        "database",
        "performance",
        "batch-processing",
        "bulk-insert",
        "transactions"
    ],
    "misconception": "Looping over rows and running one INSERT each is fine because the database is fast. The bottleneck is per-statement overhead - round-trip latency, parsing, and transaction commits - which dominates total time once you have hundreds of rows.",
    "why_it_matters": "Replacing a per-row loop with batched multi-row statements routinely cuts import and update times by 10-100x and frees up connections that would otherwise be held for the entire loop.",
    "common_mistakes": [
        "Running one INSERT/UPDATE per row inside a loop instead of batching into multi-row statements.",
        "Building a single statement so large it exceeds the driver's bind-parameter limit or available memory.",
        "Committing every row in its own transaction, paying commit overhead per row instead of per batch.",
        "Bulk-deleting millions of rows in one statement, holding locks long enough to stall other writers and lag replicas.",
        "Concatenating values into SQL strings instead of using parameterised batches, opening an injection hole."
    ],
    "when_to_use": [
        "Use multi-row INSERT/UPSERT when importing, seeding, or syncing hundreds or more rows.",
        "Use a single set-based UPDATE/DELETE with a WHERE predicate instead of per-row statements in a loop.",
        "Use COPY or LOAD DATA INFILE for very large bulk loads where raw throughput matters most."
    ],
    "avoid_when": [
        "Avoid one giant statement for millions of rows - chunk it to bound lock duration, log growth, and parameter count.",
        "Do not bulk-delete large ranges on hot tables in a single statement; range and pace deletes to limit lock holds and replication lag.",
        "Skip batching when only a handful of rows are involved - the added complexity is not worth it."
    ],
    "related": [
        "db_upsert",
        "db_transactions",
        "mysql_on_duplicate_key",
        "db_locking_strategies"
    ],
    "prerequisites": [
        "db_transactions",
        "pdo_bind_param"
    ],
    "refs": [
        "https://www.postgresql.org/docs/current/sql-insert.html",
        "https://dev.mysql.com/doc/refman/8.0/en/insert.html",
        "https://www.postgresql.org/docs/current/sql-copy.html"
    ],
    "bad_code": "// One round-trip per row - slow and connection-hogging\n$stmt = $pdo->prepare('INSERT INTO events (user_id, action) VALUES (?, ?)');\nforeach ($events as $e) {\n    $stmt->execute([$e['user_id'], $e['action']]); // N round-trips\n}",
    "good_code": "// Chunked multi-row INSERT - one round-trip per batch\nforeach (array_chunk($events, 500) as $chunk) {\n    $placeholders = implode(',', array_fill(0, count($chunk), '(?, ?)'));\n    $sql = \"INSERT INTO events (user_id, action) VALUES $placeholders\";\n    $params = [];\n    foreach ($chunk as $e) {\n        $params[] = $e['user_id'];\n        $params[] = $e['action'];\n    }\n    $pdo->beginTransaction();\n    $pdo->prepare($sql)->execute($params);\n    $pdo->commit();\n}",
    "quick_fix": "Replace per-row loops with chunked multi-row INSERT/UPSERT statements (a few hundred to a few thousand rows per batch), each in its own transaction.",
    "severity": "medium",
    "effort": "medium",
    "created": "2026-06-18",
    "updated": "2026-06-18",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/db_bulk_operations",
        "html_url": "https://codeclaritylab.com/glossary/db_bulk_operations",
        "json_url": "https://codeclaritylab.com/glossary/db_bulk_operations.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Database Bulk Operations](https://codeclaritylab.com/glossary/db_bulk_operations) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/db_bulk_operations"
            }
        }
    }
}