Elasticsearch Fundamentals
Also Known As
TL;DR
Explanation
Elasticsearch stores data as JSON documents in indexes (analogous to database tables). Each index is split into shards (distributed across nodes) and each shard has replicas for fault tolerance. Documents are indexed automatically — field types are inferred or defined in a mapping. Queries are expressed in JSON using the Query DSL — match queries for full-text, term queries for exact values, bool queries for combining conditions, and aggregations for analytics. The most common PHP integration pattern is a synchronous index-write on every database mutation (simple but adds latency) or an asynchronous queue-based indexing pipeline (reliable at scale). The official PHP client is elasticsearch/elasticsearch; a simpler alternative is ruflin/elastica.
Common Misconception
Why It Matters
Common Mistakes
- Using dynamic mapping in production — Elasticsearch infers field types from the first document, which often produces wrong types. Always define explicit mappings.
- Not handling index refresh lag — newly indexed documents are not immediately searchable (default 1-second refresh interval); account for this in real-time applications.
- Indexing entire database rows including sensitive fields — index only the fields needed for search and display, not passwords, tokens, or PII.
- Using Elasticsearch as the source of truth and skipping the primary database — always write to the database first, then index asynchronously.
Code Examples
// Indexing entire user row including sensitive fields
$client->index([
'index' => 'users',
'id' => $user['id'],
'body' => $user, // includes password_hash, api_token, 2fa_secret
]);
// Index only search-relevant fields
$client->index([
'index' => 'users',
'id' => $user['id'],
'body' => [
'name' => $user['name'],
'email' => $user['email'], // only if searching by email
'bio' => $user['bio'],
'skills' => $user['skills'],
'joined' => $user['created_at'],
],
]);
// Match query — full-text with relevance
$results = $client->search([
'index' => 'users',
'body' => [
'query' => [
'multi_match' => [
'query' => $searchTerm,
'fields' => ['name^2', 'bio', 'skills'], // name boosted
]
]
]
]);