Graph Databases
debt(d7/e7/b7/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints show automated=no, meaning no tooling automatically flags when graph databases would be appropriate. The code patterns (recursive self-JOINs, multi-table many-to-many queries, variable-depth traversals) are only identifiable through manual code review or performance profiling in production.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix indicates switching to a graph database like Neo4j for connected data. This requires introducing polyglot persistence, new query languages (Cypher), connection libraries (php-neo4j), data migration, and changing all affected query code across the application—a significant architectural change.
Closest to 'strong gravitational pull' (b7). A graph database choice applies to web/cli contexts and has architectural tags. Once adopted, it becomes a load-bearing component: data models, query patterns, and application logic all shape around it. The common_mistakes warn against mixing graph and relational concerns, showing how the choice creates ongoing architectural constraints.
Closest to 'notable trap' (t5). The misconception explicitly states devs believe graph databases are only for social networks, when they apply to fraud detection, supply chains, recommendations, and hierarchies. Common mistakes include using graphs for simple hierarchical data (where PostgreSQL CTEs suffice) and super-node performance issues—documented gotchas that experienced developers eventually learn.
Also Known As
TL;DR
Explanation
Graph databases store nodes (entities) and edges (relationships) with properties on both. Queries traverse relationships in O(1) per hop regardless of total graph size — unlike SQL joins which scan tables. Cypher (Neo4j) is the graph query language: MATCH (u:User)-[:FOLLOWS]->(f:User) WHERE u.id=42 RETURN f. Use cases: social networks, recommendation engines, fraud detection (detect rings of connected suspicious accounts), knowledge graphs, and network topology. PHP: laudis/neo4j-php-client for Neo4j. Amazon Neptune is managed (supports Gremlin and SPARQL).
Common Misconception
Why It Matters
Common Mistakes
- Using a graph database for simple hierarchical data — a recursive CTE in PostgreSQL is simpler.
- Mixing graph and relational concerns in the same database — use polyglot persistence.
- Super-nodes (nodes with millions of edges) — cause performance bottlenecks in traversal.
- Not indexing node properties used in WHERE clauses — full graph scan without indexes.
Code Examples
-- SQL friends-of-friends — O(n^3) joins:
SELECT DISTINCT u3.* FROM users u1
JOIN follows f1 ON u1.id = f1.follower_id
JOIN follows f2 ON f1.followed_id = f2.follower_id
JOIN follows f3 ON f2.followed_id = f3.follower_id
JOIN users u3 ON f3.followed_id = u3.id
WHERE u1.id = 42;
-- At 1M users: potentially billions of rows scanned
// Neo4j Cypher — O(k) graph traversal:
MATCH (u:User {id: 42})-[:FOLLOWS*1..3]->(friend:User)
WHERE NOT (u)-[:FOLLOWS]->(friend) AND friend.id <> 42
RETURN DISTINCT friend.name, friend.id
ORDER BY friend.follower_count DESC
LIMIT 20;
// Traverses only actual connections — no full table scans