← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Bloom Filter

data_structures PHP 7.0+ Advanced

Also Known As

probabilistic set space-efficient set

TL;DR

A probabilistic data structure that tests set membership in O(1) time and O(1) space, with a tunable false-positive rate and zero false negatives.

Explanation

A Bloom filter uses multiple hash functions to set bits in a bit array. To test membership, all hash positions are checked — if any is 0, the element is definitely not in the set. If all are 1, the element is probably in the set (possible false positive). Bloom filters never produce false negatives. They are ideal for cache miss pre-screening, username availability checks, and preventing unnecessary database lookups.

Diagram

flowchart LR
    ADD[Add item X] --> HASH[Hash with k functions<br/>h1 h2 h3]
    HASH --> SET[Set bits in bit array<br/>positions 2, 7, 14]
    CHECK[Check item Y] --> HASH2[Hash with same k functions]
    HASH2 --> BITS{All bits set?}
    BITS -->|yes| MAYBE[Probably in set<br/>false positive possible]
    BITS -->|no| DEFNO[Definitely NOT in set<br/>no false negatives]
    INFO[Use case: cache negative lookups<br/>block known-bad passwords<br/>deduplicate crawl URLs]
style DEFNO fill:#238636,color:#fff
style MAYBE fill:#d29922,color:#fff
style INFO fill:#1f6feb,color:#fff

Common Misconception

Bloom filters can replace a database set — they cannot; false positives require a fallback check, and elements cannot be removed from a standard Bloom filter.

Why It Matters

Bloom filters can eliminate 99% of unnecessary database lookups for non-existent keys, turning cache stampedes and cold-miss floods into a solved problem.

Common Mistakes

  • Not accounting for false positive rate when sizing the bit array — too small a filter has high false positive rates.
  • Trying to delete elements from a standard Bloom filter — use a Counting Bloom Filter for deletion support.
  • Using a Bloom filter without a fallback — false positives require a definitive check against the source.
  • Not understanding that Bloom filters are write-only for membership — you cannot enumerate members.

Code Examples

✗ Vulnerable
// Without bloom filter — every cache miss hits the DB:
function getUser(int $id): ?User {
    $cached = $cache->get("user:$id");
    if (!$cached) {
        return $db->find($id); // DB hit even for non-existent IDs
    }
    return $cached;
}
✓ Fixed
// Bloom filter pre-screens — DB only queried if probably exists:
function getUser(int $id): ?User {
    if (!$bloomFilter->mightContain("user:$id")) return null; // Definite miss
    $cached = $cache->get("user:$id");
    if (!$cached) {
        $user = $db->find($id); // Only if bloom says 'probably exists'
        if ($user) $cache->set("user:$id", $user);
        return $user;
    }
    return $cached;
}

Added 15 Mar 2026
Edited 22 Mar 2026
Views 38
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 0 pings S 0 pings S 0 pings M 1 ping T 1 ping W 1 ping T 1 ping F 0 pings S 0 pings S 0 pings M 1 ping T 1 ping W 1 ping T
Amazonbot 15 Perplexity 9 SEMrush 3 Google 2 Unknown AI 2 Ahrefs 2
crawler 32 crawler_json 1
DEV INTEL Tools & Severity
🟢 Low ⚙ Fix effort: High
⚡ Quick Fix
Use a Bloom filter when you need to answer 'is this item definitely NOT in the set' cheaply — it has no false negatives but accepts a tunable false positive rate
📦 Applies To
PHP 7.0+ web cli queue-worker
🔗 Prerequisites
🔍 Detection Hints
Duplicate URL/email check on very large dataset where in_array or DB query is too slow
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: High Context: Function Tests: Update

✓ schema.org compliant