Elasticsearch
debt(d7/e7/b7/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints note automated=no and tools like laravel-scout/kibana don't flag misuse automatically. Slow LIKE queries or bad mappings only surface under load or via careful profiling/review — not caught by a linter or compiler.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix describes a non-trivial integration: adding Elasticsearch, defining mappings, setting up queue-based sync workers, and replacing DB queries with ES queries across multiple layers. This is a significant architectural addition touching models, queues, and query logic across files.
Closest to 'strong gravitational pull' (b7). Elasticsearch is a distributed external system that shapes indexing strategy, data sync, mapping decisions, and query design for the lifetime of the application. It applies to both web and CLI contexts, and every future search feature is constrained by the initial ES schema and sync architecture.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5). The common_mistakes list several real gotchas — auto-mapping guessing types incorrectly, text vs keyword field confusion, filter vs query context performance — these are well-documented but frequently tripped over by developers new to Elasticsearch, matching the t5 anchor well.
Also Known As
TL;DR
Explanation
Elasticsearch stores documents as JSON in an inverted index — each unique term maps to the documents containing it. Queries: match (full-text), term (exact), bool (combine queries), range (numeric/date ranges), and aggregations (faceted counts). Relevance scoring uses BM25 by default. PHP clients: ruflin/elastica, elastic/elasticsearch-php. Alternatives: Meilisearch (simpler setup, instant search), Typesense (typo-tolerant), OpenSearch (AWS fork). For PHP apps with moderate search needs, Meilisearch is often simpler to operate.
Diagram
flowchart TD
DOC[Document - JSON] --> INDEX[(Index<br/>like a DB table)]
INDEX --> SHARDS[Shards<br/>split for scale]
SHARDS --> REPLICAS[Replica shards<br/>for redundancy]
subgraph Query Flow
SEARCH[Search request] --> COORD[Coordinating node]
COORD --> ALL[Query all shards in parallel]
ALL --> MERGE[Merge and rank results]
MERGE --> TOP[Return top N hits]
end
subgraph Indexing
PHP_DOC[PHP sends doc] --> INVERTED[Build inverted index<br/>token to doc list]
end
style INDEX fill:#6e40c9,color:#fff
style COORD fill:#1f6feb,color:#fff
style TOP fill:#238636,color:#fff
Common Misconception
Why It Matters
Common Mistakes
- High-cardinality keyword fields not mapped as 'keyword' type — text fields are analysed and tokenised; exact values need keyword type.
- Storing large binary or irrelevant data in Elasticsearch — only index what needs to be searched, store everything else in the primary database.
- Not defining mappings before indexing — auto-mapping guesses types and often maps numbers as text.
- Querying without a filter context for exact matches — filters are cached and much faster than queries for boolean conditions.
Code Examples
// Auto-mapping leads to wrong type inferences:
$client->index(['index' => 'products', 'body' => [
'price' => '19.99', // Auto-mapped as text! Cannot range query
'in_stock' => 'true', // Auto-mapped as text! Cannot boolean filter
'description' => 'Great product',
]]);
// Range query: {"range":{"price":{"gte":10}}} — fails silently on text field
// Explicit mapping before indexing:
$client->indices()->putMapping(['index' => 'products', 'body' => [
'properties' => [
'name' => ['type' => 'text'], // Analysed for full-text
'name_exact' => ['type' => 'keyword'], // Exact match/aggregation
'price' => ['type' => 'float'], // Numeric range queries
'in_stock' => ['type' => 'boolean'],
'description' => ['type' => 'text'],
'tags' => ['type' => 'keyword'],
]
]]);