OLAP vs OLTP
debt(d7/e7/b7/t7)
Closest to 'only careful code review or runtime testing' (d7). There are no automated tools that detect when you're running analytical queries on an OLTP database vs when you should migrate to OLAP. Query performance issues only surface through slow query logs, APM monitoring, or production incidents when analytical queries block live traffic. No linter or SAST tool flags 'this GROUP BY belongs in a data warehouse'.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix mentions moving heavy aggregate queries to a separate system (ClickHouse, DuckDB, BigQuery), which requires setting up ETL pipelines, rewriting query interfaces, potentially changing reporting dashboards, and managing data freshness concerns. This is not a one-file fix but a significant architectural addition.
Closest to 'strong gravitational pull' (b7). Once you've chosen your database architecture (OLTP-only vs OLTP+OLAP), this decision shapes how every analytics feature is built. Teams must decide for each reporting need whether it runs against production or the analytics layer. The common_mistakes show ongoing costs: ETL lag management, replica confusion, avoiding production slowdowns. This is a load-bearing architectural choice.
Closest to 'serious trap' (t7). The misconception explicitly states developers believe 'adding more indexes to PostgreSQL will make analytical queries fast' — this contradicts how indexing actually works for aggregate scans. Developers familiar with OLTP optimization (indexes, query plans) apply those mental models to analytics and get poor results. The read-replica confusion mentioned in common_mistakes compounds this: developers think 'read replica = analytics solution' when the row-oriented structure remains the bottleneck.
Also Known As
TL;DR
Explanation
OLTP databases (MySQL, PostgreSQL, SQLite) are row-oriented: every column of a row is stored together. This is fast for inserting or updating a single row and looking up individual records by primary key. OLAP databases (ClickHouse, BigQuery, Redshift, DuckDB) are column-oriented: all values for one column are stored together. This makes scanning the 'revenue' column across 100 million rows extremely fast — only that column is read from disk. OLTP schemas are normalised (many small tables with foreign keys). OLAP schemas are denormalised (wide flat tables, star schema, snowflake schema) because JOINs across billions of rows are expensive. Most applications use OLTP for operations and periodically ETL data into an OLAP system for analytics.
Common Misconception
Why It Matters
Common Mistakes
- Running GROUP BY / COUNT DISTINCT queries on production OLTP databases — these require full table scans that compete with live traffic.
- Confusing a read replica with an OLAP database — a read replica has the same row-oriented structure; it reduces load on primary but does not make analytical queries faster.
- Building an analytics system on MySQL with heavy JOINs — MySQL is not optimised for analytical workloads; consider DuckDB (embeddable) or ClickHouse for serious analytics.
- Not accounting for ETL lag in reporting — data in your OLAP system may be hours old; make sure dashboards communicate data freshness to avoid trust issues.
Code Examples
-- ❌ Running analytical query on OLTP production database
-- This scans 50M rows on the same database handling live orders
SELECT
region,
DATE_TRUNC('month', created_at) AS month,
SUM(total_amount) AS revenue,
COUNT(DISTINCT user_id) AS unique_customers
FROM orders
WHERE created_at >= NOW() - INTERVAL '12 months'
GROUP BY region, month
ORDER BY month, revenue DESC;
-- Blocks the orders table, causes timeouts for live users
-- ✅ Same query on ClickHouse (columnar) — 100x faster for aggregates
-- Data ETL'd from OLTP hourly or via CDC
SELECT
region,
toStartOfMonth(created_at) AS month,
sum(total_amount) AS revenue,
uniq(user_id) AS unique_customers
FROM orders_analytics -- Denormalised, columnar store
WHERE created_at >= now() - INTERVAL 12 MONTH
GROUP BY region, month
ORDER BY month, revenue DESC;
-- Sub-second on 50M rows; zero impact on production OLTP
-- ✅ For PHP: use a read replica + materialised view for medium scale
-- CREATE MATERIALIZED VIEW monthly_revenue AS
-- SELECT ... GROUP BY ... -- Refresh nightly
-- REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;