← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Bloom Filter

Data Structures PHP 7.0+ Advanced
debt(d9/e5/b5/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). The detection_hints indicate 'automated: no' with no tooling support. Misuse — such as missing a fallback for false positives, wrong sizing, or attempting deletion — produces no compile error, no linter warning, and no runtime exception. The system silently returns incorrect membership answers under load, only surfacing when users encounter data inconsistencies or performance anomalies in production.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). Correcting misuse isn't a one-liner: wrong bit-array sizing requires recalculating parameters and rebuilding the filter; missing fallback checks require adding a second lookup path at every call site; supporting deletion requires swapping to a Counting Bloom Filter implementation. These changes touch the filter construction, all call sites, and potentially the caching/storage layer, but are contained within one subsystem rather than being codebase-wide.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). Bloom filters apply across web, cli, and queue-worker contexts per applies_to. Once adopted as a caching gate, every team member must understand the probabilistic semantics, the mandatory fallback pattern, and sizing tradeoffs. This shapes cache-miss handling patterns across multiple work streams, but doesn't define the entire system's architecture.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field states developers believe Bloom filters can replace a database set, but they cannot — false positives require a fallback, and elements cannot be removed from a standard filter. This directly contradicts the mental model of a normal set or hash-table membership check, where lookups are definitive and deletion is trivially supported. The 'obvious' usage pattern (drop-in set replacement) is always incomplete.

About DEBT scoring →

Also Known As

probabilistic set space-efficient set

TL;DR

A probabilistic data structure that tests set membership in O(1) time and O(1) space, with a tunable false-positive rate and zero false negatives.

Explanation

A Bloom filter uses multiple hash functions to set bits in a bit array. To test membership, all hash positions are checked — if any is 0, the element is definitely not in the set. If all are 1, the element is probably in the set (possible false positive). Bloom filters never produce false negatives. They are ideal for cache miss pre-screening, username availability checks, and preventing unnecessary database lookups.

Diagram

flowchart LR
    ADD[Add item X] --> HASH[Hash with k functions<br/>h1 h2 h3]
    HASH --> SET[Set bits in bit array<br/>positions 2, 7, 14]
    CHECK[Check item Y] --> HASH2[Hash with same k functions]
    HASH2 --> BITS{All bits set?}
    BITS -->|yes| MAYBE[Probably in set<br/>false positive possible]
    BITS -->|no| DEFNO[Definitely NOT in set<br/>no false negatives]
    INFO[Use case: cache negative lookups<br/>block known-bad passwords<br/>deduplicate crawl URLs]
style DEFNO fill:#238636,color:#fff
style MAYBE fill:#d29922,color:#fff
style INFO fill:#1f6feb,color:#fff

Common Misconception

Bloom filters can replace a database set — they cannot; false positives require a fallback check, and elements cannot be removed from a standard Bloom filter.

Why It Matters

Bloom filters can eliminate 99% of unnecessary database lookups for non-existent keys, turning cache stampedes and cold-miss floods into a solved problem.

Common Mistakes

  • Not accounting for false positive rate when sizing the bit array — too small a filter has high false positive rates.
  • Trying to delete elements from a standard Bloom filter — use a Counting Bloom Filter for deletion support.
  • Using a Bloom filter without a fallback — false positives require a definitive check against the source.
  • Not understanding that Bloom filters are write-only for membership — you cannot enumerate members.

Code Examples

✗ Vulnerable
// Without bloom filter — every cache miss hits the DB:
function getUser(int $id): ?User {
    $cached = $cache->get("user:$id");
    if (!$cached) {
        return $db->find($id); // DB hit even for non-existent IDs
    }
    return $cached;
}
✓ Fixed
// Bloom filter pre-screens — DB only queried if probably exists:
function getUser(int $id): ?User {
    if (!$bloomFilter->mightContain("user:$id")) return null; // Definite miss
    $cached = $cache->get("user:$id");
    if (!$cached) {
        $user = $db->find($id); // Only if bloom says 'probably exists'
        if ($user) $cache->set("user:$id", $user);
        return $user;
    }
    return $cached;
}

Added 15 Mar 2026
Edited 22 Mar 2026
Views 57
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
1 ping T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 1 ping F 1 ping S 1 ping S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Amazonbot 16 Perplexity 9 SEMrush 5 Ahrefs 4 Scrapy 3 Google 2 Unknown AI 2 Claude 2 ChatGPT 2 Bing 1 PetalBot 1
crawler 44 crawler_json 3
DEV INTEL Tools & Severity
🟢 Low ⚙ Fix effort: High
⚡ Quick Fix
Use a Bloom filter when you need to answer 'is this item definitely NOT in the set' cheaply — it has no false negatives but accepts a tunable false positive rate
📦 Applies To
PHP 7.0+ web cli queue-worker
🔗 Prerequisites
🔍 Detection Hints
Duplicate URL/email check on very large dataset where in_array or DB query is too slow
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: High Context: Function Tests: Update


✓ schema.org compliant