Vector Database
Also Known As
vector store
embedding database
ANN database
similarity search database
TL;DR
A database optimised for storing and querying high-dimensional vector embeddings, enabling similarity search — finding items semantically close to a query rather than exact-match lookups.
Explanation
Vector databases store embeddings — fixed-length arrays of floating-point numbers that encode semantic meaning — and provide approximate nearest neighbour (ANN) search across millions of vectors in milliseconds. Unlike SQL databases that match exact values, a vector database returns the N most semantically similar vectors to a query embedding using distance metrics like cosine similarity or Euclidean distance. Common implementations include Pinecone (managed cloud), pgvector (PostgreSQL extension), Qdrant, Weaviate, and Chroma. In PHP applications, vector databases are typically accessed via HTTP APIs — you embed text locally or via an API, then store or query the resulting vector.
Common Misconception
✗ A vector database replaces your regular database for AI features. Vector databases complement SQL/NoSQL databases — they handle similarity search while relational databases handle structured queries, transactions, and joins. Most production systems use both: metadata filtering in SQL, semantic ranking in the vector store.
Why It Matters
Vector databases make semantic search possible — finding documents by meaning rather than keyword. Without one, building a RAG pipeline requires loading all documents into memory and computing distances in PHP, which is slow and does not scale. With pgvector you can add vector search to an existing PostgreSQL database with one extension, making it the lowest-friction entry point for most PHP applications.
Common Mistakes
- Using Euclidean distance when cosine similarity is appropriate — for text embeddings, cosine similarity is almost always correct because it measures directional similarity regardless of magnitude.
- Not normalising vectors before storage when using dot product similarity — unnormalised vectors produce incorrect rankings.
- Storing the full document text in the vector database — keep metadata in your relational database and store only the chunk text and embedding in the vector store.
- Choosing a managed cloud vector database before validating the use case — pgvector on your existing PostgreSQL instance handles millions of vectors adequately for most PHP applications.
Code Examples
✗ Vulnerable
// ❌ Storing raw text and doing LIKE search instead of vector similarity
function findSimilar(string $query, PDO $db): array {
$stmt = $db->prepare(
"SELECT * FROM documents WHERE content LIKE :q ORDER BY id LIMIT 10"
);
$stmt->execute([':q' => "%$query%"]);
return $stmt->fetchAll();
// Zero semantic understanding — "car" won't match "automobile" or "vehicle"
}
✓ Fixed
// ✅ pgvector — vector similarity search on existing PostgreSQL
// Setup (once):
// CREATE EXTENSION vector;
// ALTER TABLE documents ADD COLUMN embedding vector(1536);
// CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
// Store embedding at ingest
$embedding = $embedder->embed($document['content']); // 1536-dim float array
$stmt = $pdo->prepare(
'INSERT INTO documents (content, embedding) VALUES (:content, :embedding)'
);
$stmt->execute([
':content' => $document['content'],
':embedding' => '[' . implode(',', $embedding) . ']', // pgvector format
]);
// Similarity search — cosine distance, top 10 results
$queryEmbedding = $embedder->embed($userQuery);
$stmt = $pdo->prepare("
SELECT content, 1 - (embedding <=> :q::vector) AS similarity
FROM documents
ORDER BY embedding <=> :q::vector
LIMIT 10
");
$stmt->execute([':q' => '[' . implode(',', $queryEmbedding) . ']']);
References
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
23 Mar 2026
Views
26
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 0
No pings yet today
Amazonbot 6
Perplexity 6
ChatGPT 3
Google 2
Ahrefs 2
Meta AI 1
Also referenced
How they use it
crawler 18
crawler_json 2
Related categories
⚡
DEV INTEL
Tools & Severity
🔵 Info
⚙ Fix effort: Medium
⚡ Quick Fix
Start with pgvector on existing PostgreSQL: CREATE EXTENSION vector; then add a vector(1536) column — no new infrastructure required