MySQL GROUP BY and Aggregates
debt(d5/e3/b3/t7)
Closest to 'specialist tool catches it' (d5). The detection_hints list semgrep as the tool, which is a SAST/specialist tool that can detect SELECT with GROUP BY where some columns are not aggregated. The bug is silent at runtime in permissive mode, so it won't be caught by a compiler or default linter, placing it firmly in the specialist-tool tier.
Closest to 'simple parameterised fix' (e3). The quick_fix states every column in SELECT must appear in GROUP BY or be wrapped in an aggregate function, and HAVING must be used to filter aggregate results. This is a targeted query rewrite — typically fixing one or a few queries rather than a one-line swap (e1) or a multi-file refactor (e5).
Closest to 'localised tax' (b3). The applies_to covers web and cli contexts broadly, but the issue is constrained to SQL queries that use GROUP BY. It doesn't permeate architectural decisions across the codebase; it's a recurring query-level concern that each developer must remember when writing aggregation queries, imposing a modest but localised tax.
Closest to 'serious trap' (t7). The misconception field states developers believe they can SELECT any column when using GROUP BY, but MySQL silently returns a random value for un-aggregated, un-grouped columns in permissive mode. This contradicts standard SQL behavior and the behavior of other databases (PostgreSQL, SQLite strict mode), and produces non-deterministic results that look correct but aren't — a serious cognitive trap that contradicts how similar concepts work elsewhere.
Also Known As
TL;DR
Explanation
GROUP BY is used with aggregate functions to summarise data. MySQL's ONLY_FULL_GROUP_BY mode (default since 5.7) requires all SELECT columns to be in GROUP BY or wrapped in an aggregate — preventing non-deterministic results. HAVING filters groups after aggregation; WHERE filters rows before aggregation. ROLLUP adds subtotals. Window functions (MySQL 8) provide aggregation without collapsing rows.
Common Misconception
Why It Matters
Common Mistakes
- Using WHERE to filter aggregate results instead of HAVING — WHERE runs before aggregation.
- Selecting non-aggregated columns not in GROUP BY — non-deterministic on MySQL permissive mode.
- GROUP BY on an unindexed column — triggers filesort in the EXPLAIN Extra column.
Avoid When
- Do not use GROUP BY on an unindexed column for large tables — add an index to avoid filesort.
- Do not use GROUP BY as a workaround for DISTINCT — they have different semantics.
When To Use
- Use GROUP BY with COUNT/SUM/AVG for reporting and analytics queries.
- Use HAVING to filter groups — e.g. users with more than 5 orders.
Code Examples
-- ONLY_FULL_GROUP_BY error in MySQL 5.7+
SELECT user_id, email, COUNT(*) AS orders
FROM orders
GROUP BY user_id; -- email not in GROUP BY or aggregate
SELECT user_id,
COUNT(*) AS order_count,
SUM(total) AS revenue,
MAX(created_at) AS last_order
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5
ORDER BY revenue DESC;