Search Indexing Pipeline
debt(d7/e5/b5/t5)
Closest to 'only careful code review or runtime testing' (d7), because sync indexing in request paths shows up via Horizon/Datadog latency traces but isn't flagged by static tools; detection_hints.automated is 'no'.
Closest to 'touches multiple files / significant refactor in one component' (e5), since the quick_fix requires moving indexing into queue jobs wired to model events — a refactor across model observers, job classes, and indexer service, not a one-liner.
Closest to 'persistent productivity tax' (b5), as the pipeline applies across web and queue-worker contexts and every searchable model must integrate with it; analyser/synonym choices shape ongoing search work.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5), matching the misconception that the index is just a DB copy — devs eventually learn about stemming, analysers, and async indexing after hitting the gotchas.
Also Known As
TL;DR
Explanation
A search indexing pipeline: (1) Extract — fetch content from DB, files, or API; (2) Normalise — lowercase, strip HTML, remove special chars; (3) Tokenise — split into terms; (4) Analyse — apply stemmer (run→run, running→run), synonyms, stop word removal; (5) Index — write inverted index (term → list of document IDs with positions). Incremental indexing handles updates: updated_at timestamp queries, change data capture (CDC) from DB, or event-driven indexing via domain events. Full re-index handles schema changes.
Common Misconception
Why It Matters
Common Mistakes
- Synchronous re-indexing on every write — index updates should be async via queue.
- No incremental indexing — full re-index on every change is O(n) regardless of change size.
- Indexing HTML without stripping tags — '<strong>PHP</strong>' won't match 'PHP'.
- No synonym configuration — 'oauth' and 'open authorisation' treated as unrelated terms.
Code Examples
// Synchronous full re-index on every term save:
public function save(Term $term): void {
$this->db->save($term);
// Blocks request — re-indexes ALL 800 terms on every save:
foreach ($this->db->findAll() as $t) {
$this->searchEngine->index($t); // O(n) on every save!
}
}
// Async incremental indexing via queue:
public function save(Term $term): void {
$this->db->save($term);
// Queue just this term for indexing:
$this->queue->dispatch(new IndexTermJob($term->slug));
// Response returns immediately — indexing happens in background
}
// Queue worker:
class IndexTermJob {
public function handle(SearchIndex $index): void {
$term = $this->db->find($this->slug);
$index->upsert($term->slug, [
'term' => $term->term,
'body' => strip_tags($term->long),
'category' => $term->category,
]);
}
}