Semantic Search
Also Known As
vector search
neural search
dense retrieval
embedding search
TL;DR
Search that matches by meaning and intent rather than exact keywords — a query for 'how to prevent database attacks' finds SQL injection documentation even if those exact words never appear.
Explanation
Semantic search works by converting both queries and documents into embeddings — dense vector representations that encode meaning. Documents semantically similar to the query cluster nearby in vector space. This contrasts with keyword search (BM25, TF-IDF) which requires lexical overlap and fails on synonyms, paraphrases, and conceptual matches. In practice, production search often combines both: vector similarity for semantic recall and BM25 for lexical precision, a pattern called hybrid search. Building semantic search in PHP requires an embedding model (via API or local), a vector store, and a query pipeline — the query gets embedded at search time and the nearest vectors are returned.
Common Misconception
✗ Semantic search makes keyword search obsolete. Hybrid search — combining vector similarity with BM25 keyword matching — consistently outperforms either approach alone. Semantic search excels at conceptual queries and handles synonyms well; keyword search excels at exact product codes, names, and rare terms. Production systems use both with a reranker to combine scores.
Why It Matters
Semantic search transforms search from a string-matching problem into a meaning-matching problem. Users who type 'forgot my login' find password reset documentation even if the word 'forgot' never appears in the docs. For PHP applications serving content-heavy sites, knowledge bases, or e-commerce catalogues, semantic search dramatically reduces zero-result searches and improves relevance without requiring users to guess exact keywords.
Common Mistakes
- Using a general-purpose embedding model for domain-specific search — a model fine-tuned on code search produces significantly better results for programming queries than a general text embedding model.
- Not chunking documents before embedding — embedding a 10,000-word document produces a single vector that averages across all its topics, reducing precision.
- Ignoring metadata filtering — most searches combine semantic similarity with structured filters (category, date, author) and vector databases support these efficiently.
- Evaluating only by user satisfaction — measure retrieval quality with recall@K and MRR before optimising generation quality.
Code Examples
✗ Vulnerable
// ❌ Keyword matching instead of semantic search — misses synonyms/intent
function search(string $query, PDO $db): array {
$stmt = $db->prepare(
"SELECT * FROM articles WHERE content LIKE :q"
);
$stmt->execute([':q' => "%$query%"]);
return $stmt->fetchAll();
// "prevent data breach" won't match "SQL injection", "XSS", "auth bypass"
}
✓ Fixed
// ✅ Semantic search with pgvector — matches by meaning
// 1. At ingest: embed and store
$embedding = $embedder->embed($article['content']); // e.g. OpenAI, Cohere, Voyage
$pdo->prepare("
INSERT INTO articles (title, content, embedding)
VALUES (:title, :content, :embedding)
")->execute([
':title' => $article['title'],
':content' => $article['content'],
':embedding' => json_encode($embedding), // pgvector accepts JSON array
]);
// 2. At search: embed the query and find nearest vectors
$queryEmbedding = $embedder->embed($userQuery);
$stmt = $pdo->prepare("
SELECT title, content,
1 - (embedding <=> :q::vector) AS similarity
FROM articles
ORDER BY embedding <=> :q::vector
LIMIT 10
");
$stmt->execute([':q' => json_encode($queryEmbedding)]);
// Now 'car' matches 'automobile', 'vehicle', 'motor transport'
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
23 Mar 2026
Views
29
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 1
No pings yesterday
Amazonbot 9
Perplexity 7
ChatGPT 3
SEMrush 3
Google 1
Ahrefs 1
Also referenced
How they use it
crawler 23
crawler_json 1
Related categories
⚡
DEV INTEL
Tools & Severity
🔵 Info
⚙ Fix effort: High
⚡ Quick Fix
Embed queries and documents with the same model, store in pgvector, query with SELECT ... ORDER BY embedding <=> $query_vector LIMIT 10