← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Elasticsearch Fundamentals

Search PHP 7.0+ Intermediate
debt(d9/e9/b7/t9)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). No detection_hints.tools are specified. The core misuses — dynamic mapping producing wrong field types, using ES as the system of record, missing refresh lag — produce no compiler errors, no linter warnings, and no runtime exceptions at write time. Data loss or wrong types surface only when users attempt searches or when cluster issues cause data to vanish, making this firmly d9.

e9 Effort Remediation debt — work required to fix once spotted

Closest to 'architectural rework' (e9). The quick_fix surface (define mappings, write to DB first, async indexing via queue) understates the remediation scope. If Elasticsearch has been used as the system of record — the canonical misconception — recovering requires introducing a primary database, establishing an event/queue sync pipeline, migrating existing data, and re-architecting write paths across the application. This is full architectural rework, not a parameterised fix.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (b7). Elasticsearch integration via applies_to covers web and cli contexts, meaning every feature involving search, indexing, or data display is shaped by the architectural choice. Mapping decisions are immutable without reindexing; the sync strategy (queue, events) touches data write paths everywhere; refresh lag must be considered in any real-time feature. Every future change in the data model must account for ES mapping and synchronisation, making this a strong gravitational constraint across the codebase.

t9 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'catastrophic trap — the obvious way is always wrong' (t9). The misconception field states explicitly that Elasticsearch is routinely mistaken for a primary database. Competent developers familiar with MongoDB or other document stores will naturally treat ES as the system of record — the 'obvious' approach — which leads directly to data loss on cluster issues, no transaction support, and eventual consistency surprises. The misconception is both common and catastrophic in consequence, placing this squarely at t9.

About DEBT scoring →

Also Known As

Elasticsearch ES elastic search ELK stack OpenSearch

TL;DR

A distributed search and analytics engine built on Lucene — storing documents as JSON, indexing them automatically, and providing a REST API for full-text search, aggregations, and real-time analytics.

Explanation

Elasticsearch stores data as JSON documents in indexes (analogous to database tables). Each index is split into shards (distributed across nodes) and each shard has replicas for fault tolerance. Documents are indexed automatically — field types are inferred or defined in a mapping. Queries are expressed in JSON using the Query DSL — match queries for full-text, term queries for exact values, bool queries for combining conditions, and aggregations for analytics. The most common PHP integration pattern is a synchronous index-write on every database mutation (simple but adds latency) or an asynchronous queue-based indexing pipeline (reliable at scale). The official PHP client is elasticsearch/elasticsearch; a simpler alternative is ruflin/elastica.

Common Misconception

Elasticsearch is a database and can replace the primary data store. Elasticsearch is a search index — it is eventually consistent, does not support transactions, and prioritises read performance over write durability. Primary data should live in a relational or document database; Elasticsearch should contain a search-optimised projection of that data, kept in sync via events or a queue. Using Elasticsearch as the system of record leads to data loss on cluster issues.

Why It Matters

Elasticsearch solves search problems that SQL databases handle poorly — multi-field full-text search with relevance ranking, faceted filtering with counts, autocomplete with typo tolerance, and analytics aggregations over millions of documents. For PHP applications that have outgrown LIKE queries or MySQL FULLTEXT, Elasticsearch provides a step-change in search quality and performance. The REST API means no PHP extension is required — any HTTP client works, making it straightforward to integrate.

Common Mistakes

  • Using dynamic mapping in production — Elasticsearch infers field types from the first document, which often produces wrong types. Always define explicit mappings.
  • Not handling index refresh lag — newly indexed documents are not immediately searchable (default 1-second refresh interval); account for this in real-time applications.
  • Indexing entire database rows including sensitive fields — index only the fields needed for search and display, not passwords, tokens, or PII.
  • Using Elasticsearch as the source of truth and skipping the primary database — always write to the database first, then index asynchronously.

Code Examples

✗ Vulnerable
// Indexing entire user row including sensitive fields
$client->index([
    'index' => 'users',
    'id'    => $user['id'],
    'body'  => $user, // includes password_hash, api_token, 2fa_secret
]);
✓ Fixed
// Index only search-relevant fields
$client->index([
    'index' => 'users',
    'id'    => $user['id'],
    'body'  => [
        'name'     => $user['name'],
        'email'    => $user['email'], // only if searching by email
        'bio'      => $user['bio'],
        'skills'   => $user['skills'],
        'joined'   => $user['created_at'],
    ],
]);

// Match query — full-text with relevance
$results = $client->search([
    'index' => 'users',
    'body'  => [
        'query' => [
            'multi_match' => [
                'query'  => $searchTerm,
                'fields' => ['name^2', 'bio', 'skills'], // name boosted
            ]
        ]
    ]
]);

Added 23 Mar 2026
Edited 5 Apr 2026
Views 71
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 2 pings T 2 pings F 0 pings S 1 ping S 2 pings M 0 pings T 0 pings W 1 ping T 0 pings F 2 pings S 0 pings S 1 ping M 0 pings T 1 ping W 1 ping T 1 ping F 0 pings S 2 pings S 1 ping M 1 ping T 0 pings W
No pings yet today
Bing 1
Amazonbot 17 Perplexity 9 Ahrefs 6 SEMrush 6 Scrapy 6 Google 5 Bing 3 Majestic 2 PetalBot 2 ChatGPT 1 Claude 1 Meta AI 1 Sogou 1
crawler 57 crawler_json 3
DEV INTEL Tools & Severity
🔵 Info ⚙ Fix effort: High
⚡ Quick Fix
Define explicit mappings before indexing, write to your database first then index asynchronously via a queue, use the match query for full-text and term query for exact values
📦 Applies To
PHP 7.0+ web cli


✓ schema.org compliant