← Back to glossary

Vector Databases

ai_ml Intermediate

Also Known As

pgvector Pinecone Weaviate Qdrant ANN search

TL;DR

Databases specialised for storing and querying high-dimensional vectors — enabling fast approximate nearest-neighbour search across millions of embeddings.

Explanation

Traditional databases cannot efficiently query 'find the 10 most similar vectors to this query vector' across millions of rows. Vector databases use specialised index structures (HNSW, IVF) for approximate nearest-neighbour (ANN) search. Options: Pinecone (managed), Weaviate (self-hosted or managed), Qdrant (Rust, self-hosted), pgvector (PostgreSQL extension — good starting point). For PHP applications, pgvector requires no new infrastructure and supports hybrid search (vector + SQL filters).

Common Misconception

✗ You need a dedicated vector database to use embeddings — pgvector adds vector search to PostgreSQL; for most applications it is sufficient without adding infrastructure complexity.

Why It Matters

Vector databases are the storage layer for RAG systems — without one, semantic search requires computing similarity against every stored embedding on every query, which is O(n) and unusable at scale.

Common Mistakes

Not creating an index on the vector column — without HNSW or IVF index, queries are O(n) exact search.
Storing vectors as JSON strings — use the native vector type (pgvector's vector type) for efficient indexing.
Not filtering by metadata before vector search — combining SQL WHERE clauses with vector search (hybrid search) dramatically reduces the search space.
Choosing a hosted vector DB before trying pgvector — pgvector handles millions of vectors adequately and eliminates an extra service.

Code Examples

✗ Vulnerable

// Exact search over all vectors — O(n), unusable at scale:
SELECT id, content,
       embedding <=> $1 AS distance
FROM documents
ORDER BY distance
LIMIT 10;
-- No index: scans all rows, 1M docs = seconds per query

✓ Fixed

-- pgvector with HNSW index — O(log n) approximate search:
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE documents ADD COLUMN embedding vector(1536);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Hybrid search: filter then vector search:
SELECT id, content, embedding <=> $1 AS distance
FROM documents
WHERE category = 'security'   -- SQL filter reduces search space
ORDER BY distance
LIMIT 10;

References

Tags

ai vector-database search rag

Added 15 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,445 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 1

No pings yesterday

Amazonbot 9 Perplexity 6 Unknown AI 2 Google 2 Ahrefs 2 ChatGPT 2 SEMrush 2

Also referenced

Retrieval-Augmented Generation (RAG) 26 Full-Text Search 24 Elasticsearch 23 Embeddings 22

How they use it

crawler 23 crawler_json 2

Related categories

performance 1.6k ai_ml 1k search 370

⚡ DEV INTEL Tools & Severity

🟡 Medium ⚙ Fix effort: Medium

⚡ Quick Fix

Start with pgvector (PostgreSQL extension) if you already run Postgres — it avoids an extra service; move to a dedicated vector DB (Pinecone, Qdrant) only when you need ANN at scale

📦 Applies To

any web cli

🔗 Prerequisites

Embeddings Retrieval-Augmented Generation (RAG) Database Indexing

🔍 Detection Hints

Brute-force cosine similarity over entire embedding table without ANN index; pgvector without HNSW or IVFFlat index

Auto-detectable: ✗ No

⚠ Related Problems