Time-Series Databases
debt(d7/e7/b7/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints show automated=no, and the tools listed (timescaledb, mysql-partitioning, pg-partman) are not static analysis tools but database extensions. The code_pattern describes symptoms (millions of rows, full scans, slow queries) that only manifest at runtime under load. No linter or SAST tool catches 'you should use a time-series database instead of MySQL'; this requires manual architectural review or production performance monitoring.
Closest to 'cross-cutting refactor across the codebase' (e7). While the quick_fix suggests partitioning as a mitigation, the full fix—migrating from a general-purpose database to a time-series database like TimescaleDB—requires schema changes, query rewrites, connection configuration changes, and potentially application-level changes to write/read patterns. This touches multiple files and components across the codebase.
Closest to 'strong gravitational pull' (b7). Database choice is a load-bearing architectural decision that affects the entire application. The applies_to shows this applies across web and cli contexts. Once you've built your metrics/events system on MySQL without time-series features, every query pattern, every dashboard, every retention policy must work around this limitation. The common_mistakes (no retention policy, no downsampling, wrong primary key) show how the choice propagates constraints throughout the system.
Closest to 'notable trap' (t5). The misconception explicitly states developers believe 'MySQL or PostgreSQL is sufficient for time-series data.' This is a documented gotcha that developers eventually learn through experience—general-purpose databases can store time-series data but become 10-100x less efficient at scale. It's not catastrophic (the obvious way isn't always wrong—it works at small scale) but it's a significant trap when volumes grow.
Also Known As
TL;DR
Explanation
Time-series databases are optimised for append-only writes with a timestamp as the primary index. Key features: efficient range queries (SELECT WHERE time BETWEEN), downsampling (aggregate minute data to hour data automatically), retention policies (auto-delete data older than 90 days), and compression (repeated timestamps and similar values compress extremely well). Tools: InfluxDB (time-series specific), TimescaleDB (PostgreSQL extension — SQL plus time-series optimisations), Prometheus (pull-based metrics), ClickHouse (column-store for analytics). PHP metrics: use Prometheus PHP client or StatsD to push metrics to InfluxDB/Prometheus.
Common Misconception
Why It Matters
Common Mistakes
- No data retention policy — time-series tables grow indefinitely without automatic deletion.
- No downsampling — storing raw per-second data when per-minute is sufficient wastes storage.
- Using a general-purpose DB primary key (UUID) instead of timestamp — defeats time-series optimisations.
- Querying raw data for dashboards — always aggregate to an appropriate resolution for the time range.
Code Examples
-- MySQL metrics table — growing forever, slow range queries:
CREATE TABLE metrics (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
value FLOAT,
timestamp DATETIME,
INDEX idx_name_time (name, timestamp)
);
-- 1 year of per-second data: 31M rows per metric
-- Range query: 30 seconds for 90-day chart
-- TimescaleDB — PostgreSQL with time-series superpowers:
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
name TEXT,
value DOUBLE PRECISION
);
SELECT create_hypertable('metrics', 'time');
-- Automatic compression after 7 days:
ALTER TABLE metrics SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'name'
);
SELECT add_compression_policy('metrics', INTERVAL '7 days');
-- Automatic retention after 90 days:
SELECT add_retention_policy('metrics', INTERVAL '90 days');
-- Continuous aggregate — pre-compute hourly averages:
CREATE MATERIALIZED VIEW metrics_hourly WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time), name, avg(value)
FROM metrics GROUP BY 1, 2;