← Back to glossary

Bloom Filter

Data Structures PHP 7.0+ Advanced

debt(d9/e5/b5/t7)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). The detection_hints indicate 'automated: no' with no tooling support. Misuse — such as missing a fallback for false positives, wrong sizing, or attempting deletion — produces no compile error, no linter warning, and no runtime exception. The system silently returns incorrect membership answers under load, only surfacing when users encounter data inconsistencies or performance anomalies in production.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). Correcting misuse isn't a one-liner: wrong bit-array sizing requires recalculating parameters and rebuilding the filter; missing fallback checks require adding a second lookup path at every call site; supporting deletion requires swapping to a Counting Bloom Filter implementation. These changes touch the filter construction, all call sites, and potentially the caching/storage layer, but are contained within one subsystem rather than being codebase-wide.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). Bloom filters apply across web, cli, and queue-worker contexts per applies_to. Once adopted as a caching gate, every team member must understand the probabilistic semantics, the mandatory fallback pattern, and sizing tradeoffs. This shapes cache-miss handling patterns across multiple work streams, but doesn't define the entire system's architecture.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field states developers believe Bloom filters can replace a database set, but they cannot — false positives require a fallback, and elements cannot be removed from a standard filter. This directly contradicts the mental model of a normal set or hash-table membership check, where lookups are definitive and deletion is trivially supported. The 'obvious' usage pattern (drop-in set replacement) is always incomplete.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-06 · reviewed by human

Also Known As

probabilistic set space-efficient set

TL;DR

A probabilistic data structure that tests set membership in O(1) time and O(1) space, with a tunable false-positive rate and zero false negatives.

Explanation

A Bloom filter uses multiple hash functions to set bits in a bit array. To test membership, all hash positions are checked — if any is 0, the element is definitely not in the set. If all are 1, the element is probably in the set (possible false positive). Bloom filters never produce false negatives. They are ideal for cache miss pre-screening, username availability checks, and preventing unnecessary database lookups.

Diagram

flowchart LR
    ADD[Add item X] --> HASH[Hash with k functions<br/>h1 h2 h3]
    HASH --> SET[Set bits in bit array<br/>positions 2, 7, 14]
    CHECK[Check item Y] --> HASH2[Hash with same k functions]
    HASH2 --> BITS{All bits set?}
    BITS -->|yes| MAYBE[Probably in set<br/>false positive possible]
    BITS -->|no| DEFNO[Definitely NOT in set<br/>no false negatives]
    INFO[Use case: cache negative lookups<br/>block known-bad passwords<br/>deduplicate crawl URLs]
style DEFNO fill:#238636,color:#fff
style MAYBE fill:#d29922,color:#fff
style INFO fill:#1f6feb,color:#fff

Common Misconception

✗ Bloom filters can replace a database set — they cannot; false positives require a fallback check, and elements cannot be removed from a standard Bloom filter.

Why It Matters

Bloom filters can eliminate 99% of unnecessary database lookups for non-existent keys, turning cache stampedes and cold-miss floods into a solved problem.

Common Mistakes

Not accounting for false positive rate when sizing the bit array — too small a filter has high false positive rates.
Trying to delete elements from a standard Bloom filter — use a Counting Bloom Filter for deletion support.
Using a Bloom filter without a fallback — false positives require a definitive check against the source.
Not understanding that Bloom filters are write-only for membership — you cannot enumerate members.

Code Examples

✗ Vulnerable

// Without bloom filter — every cache miss hits the DB:
function getUser(int $id): ?User {
    $cached = $cache->get("user:$id");
    if (!$cached) {
        return $db->find($id); // DB hit even for non-existent IDs
    }
    return $cached;
}

✓ Fixed

// Bloom filter pre-screens — DB only queried if probably exists:
function getUser(int $id): ?User {
    if (!$bloomFilter->mightContain("user:$id")) return null; // Definite miss
    $cached = $cache->get("user:$id");
    if (!$cached) {
        $user = $db->find($id); // Only if bloom says 'probably exists'
        if ($user) $cache->set("user:$id", $user);
        return $user;
    }
    return $cached;
}

Tags

Added 15 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,506 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 0

No pings yet today

No pings yesterday

Amazonbot 16 Perplexity 9 SEMrush 5 Ahrefs 4 Scrapy 3 Google 2 Unknown AI 2 Claude 2 ChatGPT 2 Bing 1 PetalBot 1

Also referenced

Hash Table 66 Caching Strategies (APCu, Redis, Memcached) 57 Redis Patterns (Pub/Sub, Sorted Sets, Lua) 50 Cache Stampede / Thundering Herd 42

How they use it

crawler 44 crawler_json 3

Related categories

performance 3.3k data_structures 863

⚡ DEV INTEL Tools & Severity

🟢 Low ⚙ Fix effort: High

⚡ Quick Fix

Use a Bloom filter when you need to answer 'is this item definitely NOT in the set' cheaply — it has no false negatives but accepts a tunable false positive rate

📦 Applies To

PHP 7.0+ web cli queue-worker

🔗 Prerequisites

Hash Table Cryptographic Hash Functions Big-O Notation

🔍 Detection Hints

Duplicate URL/email check on very large dataset where in_array or DB query is too slow

Auto-detectable: ✗ No

⚠ Related Problems

🤖 AI Agent

Confidence: Low False Positives: Medium ✗ Manual fix Fix: High Context: Function Tests: Update

References

https://en.wikipedia.org/wiki/Bloom_filter