Elasticsearch Fundamentals
debt(d9/e9/b7/t9)
Closest to 'silent in production until users hit it' (d9). No detection_hints.tools are specified. The core misuses — dynamic mapping producing wrong field types, using ES as the system of record, missing refresh lag — produce no compiler errors, no linter warnings, and no runtime exceptions at write time. Data loss or wrong types surface only when users attempt searches or when cluster issues cause data to vanish, making this firmly d9.
Closest to 'architectural rework' (e9). The quick_fix surface (define mappings, write to DB first, async indexing via queue) understates the remediation scope. If Elasticsearch has been used as the system of record — the canonical misconception — recovering requires introducing a primary database, establishing an event/queue sync pipeline, migrating existing data, and re-architecting write paths across the application. This is full architectural rework, not a parameterised fix.
Closest to 'strong gravitational pull' (b7). Elasticsearch integration via applies_to covers web and cli contexts, meaning every feature involving search, indexing, or data display is shaped by the architectural choice. Mapping decisions are immutable without reindexing; the sync strategy (queue, events) touches data write paths everywhere; refresh lag must be considered in any real-time feature. Every future change in the data model must account for ES mapping and synchronisation, making this a strong gravitational constraint across the codebase.
Closest to 'catastrophic trap — the obvious way is always wrong' (t9). The misconception field states explicitly that Elasticsearch is routinely mistaken for a primary database. Competent developers familiar with MongoDB or other document stores will naturally treat ES as the system of record — the 'obvious' approach — which leads directly to data loss on cluster issues, no transaction support, and eventual consistency surprises. The misconception is both common and catastrophic in consequence, placing this squarely at t9.
Also Known As
TL;DR
Explanation
Elasticsearch stores data as JSON documents in indexes (analogous to database tables). Each index is split into shards (distributed across nodes) and each shard has replicas for fault tolerance. Documents are indexed automatically — field types are inferred or defined in a mapping. Queries are expressed in JSON using the Query DSL — match queries for full-text, term queries for exact values, bool queries for combining conditions, and aggregations for analytics. The most common PHP integration pattern is a synchronous index-write on every database mutation (simple but adds latency) or an asynchronous queue-based indexing pipeline (reliable at scale). The official PHP client is elasticsearch/elasticsearch; a simpler alternative is ruflin/elastica.
Common Misconception
Why It Matters
Common Mistakes
- Using dynamic mapping in production — Elasticsearch infers field types from the first document, which often produces wrong types. Always define explicit mappings.
- Not handling index refresh lag — newly indexed documents are not immediately searchable (default 1-second refresh interval); account for this in real-time applications.
- Indexing entire database rows including sensitive fields — index only the fields needed for search and display, not passwords, tokens, or PII.
- Using Elasticsearch as the source of truth and skipping the primary database — always write to the database first, then index asynchronously.
Code Examples
// Indexing entire user row including sensitive fields
$client->index([
'index' => 'users',
'id' => $user['id'],
'body' => $user, // includes password_hash, api_token, 2fa_secret
]);
// Index only search-relevant fields
$client->index([
'index' => 'users',
'id' => $user['id'],
'body' => [
'name' => $user['name'],
'email' => $user['email'], // only if searching by email
'bio' => $user['bio'],
'skills' => $user['skills'],
'joined' => $user['created_at'],
],
]);
// Match query — full-text with relevance
$results = $client->search([
'index' => 'users',
'body' => [
'query' => [
'multi_match' => [
'query' => $searchTerm,
'fields' => ['name^2', 'bio', 'skills'], // name boosted
]
]
]
]);