Bloom Filter
Also Known As
probabilistic set
space-efficient set
TL;DR
A probabilistic data structure that tests set membership in O(1) time and O(1) space, with a tunable false-positive rate and zero false negatives.
Explanation
A Bloom filter uses multiple hash functions to set bits in a bit array. To test membership, all hash positions are checked — if any is 0, the element is definitely not in the set. If all are 1, the element is probably in the set (possible false positive). Bloom filters never produce false negatives. They are ideal for cache miss pre-screening, username availability checks, and preventing unnecessary database lookups.
Diagram
flowchart LR
ADD[Add item X] --> HASH[Hash with k functions<br/>h1 h2 h3]
HASH --> SET[Set bits in bit array<br/>positions 2, 7, 14]
CHECK[Check item Y] --> HASH2[Hash with same k functions]
HASH2 --> BITS{All bits set?}
BITS -->|yes| MAYBE[Probably in set<br/>false positive possible]
BITS -->|no| DEFNO[Definitely NOT in set<br/>no false negatives]
INFO[Use case: cache negative lookups<br/>block known-bad passwords<br/>deduplicate crawl URLs]
style DEFNO fill:#238636,color:#fff
style MAYBE fill:#d29922,color:#fff
style INFO fill:#1f6feb,color:#fff
Common Misconception
✗ Bloom filters can replace a database set — they cannot; false positives require a fallback check, and elements cannot be removed from a standard Bloom filter.
Why It Matters
Bloom filters can eliminate 99% of unnecessary database lookups for non-existent keys, turning cache stampedes and cold-miss floods into a solved problem.
Common Mistakes
- Not accounting for false positive rate when sizing the bit array — too small a filter has high false positive rates.
- Trying to delete elements from a standard Bloom filter — use a Counting Bloom Filter for deletion support.
- Using a Bloom filter without a fallback — false positives require a definitive check against the source.
- Not understanding that Bloom filters are write-only for membership — you cannot enumerate members.
Code Examples
✗ Vulnerable
// Without bloom filter — every cache miss hits the DB:
function getUser(int $id): ?User {
$cached = $cache->get("user:$id");
if (!$cached) {
return $db->find($id); // DB hit even for non-existent IDs
}
return $cached;
}
✓ Fixed
// Bloom filter pre-screens — DB only queried if probably exists:
function getUser(int $id): ?User {
if (!$bloomFilter->mightContain("user:$id")) return null; // Definite miss
$cached = $cache->get("user:$id");
if (!$cached) {
$user = $db->find($id); // Only if bloom says 'probably exists'
if ($user) $cache->set("user:$id", $user);
return $user;
}
return $cached;
}
References
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
15 Mar 2026
Edited
22 Mar 2026
Views
38
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 1
Amazonbot 15
Perplexity 9
SEMrush 3
Google 2
Unknown AI 2
Ahrefs 2
Also referenced
How they use it
crawler 32
crawler_json 1
Related categories
⚡
DEV INTEL
Tools & Severity
🟢 Low
⚙ Fix effort: High
⚡ Quick Fix
Use a Bloom filter when you need to answer 'is this item definitely NOT in the set' cheaply — it has no false negatives but accepts a tunable false positive rate
📦 Applies To
PHP 7.0+
web
cli
queue-worker
🔗 Prerequisites
🔍 Detection Hints
Duplicate URL/email check on very large dataset where in_array or DB query is too slow
Auto-detectable:
✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low
False Positives: Medium
✗ Manual fix
Fix: High
Context: Function
Tests: Update