← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Vector Databases

ai_ml Intermediate

Also Known As

pgvector Pinecone Weaviate Qdrant ANN search

TL;DR

Databases specialised for storing and querying high-dimensional vectors — enabling fast approximate nearest-neighbour search across millions of embeddings.

Explanation

Traditional databases cannot efficiently query 'find the 10 most similar vectors to this query vector' across millions of rows. Vector databases use specialised index structures (HNSW, IVF) for approximate nearest-neighbour (ANN) search. Options: Pinecone (managed), Weaviate (self-hosted or managed), Qdrant (Rust, self-hosted), pgvector (PostgreSQL extension — good starting point). For PHP applications, pgvector requires no new infrastructure and supports hybrid search (vector + SQL filters).

Common Misconception

You need a dedicated vector database to use embeddings — pgvector adds vector search to PostgreSQL; for most applications it is sufficient without adding infrastructure complexity.

Why It Matters

Vector databases are the storage layer for RAG systems — without one, semantic search requires computing similarity against every stored embedding on every query, which is O(n) and unusable at scale.

Common Mistakes

  • Not creating an index on the vector column — without HNSW or IVF index, queries are O(n) exact search.
  • Storing vectors as JSON strings — use the native vector type (pgvector's vector type) for efficient indexing.
  • Not filtering by metadata before vector search — combining SQL WHERE clauses with vector search (hybrid search) dramatically reduces the search space.
  • Choosing a hosted vector DB before trying pgvector — pgvector handles millions of vectors adequately and eliminates an extra service.

Code Examples

✗ Vulnerable
// Exact search over all vectors — O(n), unusable at scale:
SELECT id, content,
       embedding <=> $1 AS distance
FROM documents
ORDER BY distance
LIMIT 10;
-- No index: scans all rows, 1M docs = seconds per query
✓ Fixed
-- pgvector with HNSW index — O(log n) approximate search:
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE documents ADD COLUMN embedding vector(1536);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Hybrid search: filter then vector search:
SELECT id, content, embedding <=> $1 AS distance
FROM documents
WHERE category = 'security'   -- SQL filter reduces search space
ORDER BY distance
LIMIT 10;

Added 15 Mar 2026
Edited 22 Mar 2026
Views 30
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 2 pings T 0 pings F 1 ping S 1 ping S 0 pings M 1 ping T 0 pings W 0 pings T 0 pings F 1 ping S 0 pings S 0 pings M 1 ping T 0 pings W 1 ping T 0 pings F 1 ping S
No pings yesterday
Amazonbot 9 Perplexity 6 Unknown AI 2 Google 2 Ahrefs 2 ChatGPT 2 SEMrush 2
crawler 23 crawler_json 2
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Start with pgvector (PostgreSQL extension) if you already run Postgres — it avoids an extra service; move to a dedicated vector DB (Pinecone, Qdrant) only when you need ANN at scale
📦 Applies To
any web cli
🔗 Prerequisites
🔍 Detection Hints
Brute-force cosine similarity over entire embedding table without ANN index; pgvector without HNSW or IVFFlat index
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: High Context: File Tests: Update

✓ schema.org compliant