Sparse Matrix Representations
debt(d7/e5/b5/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate automated detection is 'no', and tools listed (blackfire, php-meminfo) are profiler/memory-inspection tools that require deliberate profiling runs — not passive linting. A developer must notice the pattern (large 2D array mostly zeros) through memory profiling or code review, not through any default tooling. Slightly better than d9 because memory profilers can surface the waste once run.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes replacing dense 2D arrays with associative arrays indexed only for non-zero values, which sounds simple, but the common_mistakes highlight that choosing the right format (COO vs CSR) and potentially converting between formats means the fix touches data construction, computation logic, and possibly serialization — spanning multiple parts of a component rather than a single-line swap.
Closest to 'persistent productivity tax' (b5). The applies_to covers web and cli contexts broadly. Choosing the wrong representation (dense vs sparse, or wrong sparse format) creates an ongoing performance and memory burden that shapes data pipeline and algorithm decisions throughout the affected component. It's not architectural/system-wide (b7+) but it does persistently affect any developer working on the data layer.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5). The misconception field explicitly states that developers believe sparse matrices are only for scientific computing, missing their applicability to recommendation systems, social graphs, and NLP. Additionally, common_mistakes include using the wrong sparse format for the operation (COO vs CSR), which is a secondary trap. These are documented gotchas that developers learn through experience rather than a catastrophic or contradictory behavior.
Also Known As
TL;DR
Explanation
Dense 2D array: O(m*n) space — wasteful when <1% of values are non-zero. COO (Coordinate): store (row, col, value) triples for each non-zero — simple to build. CSR (Compressed Sparse Row): three arrays — fast row iteration and matrix-vector products. DOK (Dictionary of Keys): hash map of (row,col)→value — fast element access. Applications: recommendation systems (user×item ratings), graph adjacency, NLP term-document matrices. 1M users × 100K products densely requires 400GB — as CSR at 1% density, 4GB.
Common Misconception
Why It Matters
Common Mistakes
- Dense matrix for sparse data — memory infeasible
- Wrong format for the operation — COO for building, CSR for computation
- Not checking sparsity before choosing format
- Building CSR directly instead of building COO then converting
Code Examples
// Dense matrix for user-item ratings — physically impossible:
$ratings = array_fill(0, 1000000, array_fill(0, 100000, 0.0));
// 1M * 100K * 8 bytes = 800TB — impossible
// DOK: only store non-zero ratings:
$ratings = [];
$ratings["42:1337"] = 4.5; // User 42 rated item 1337: 4.5 stars
// 1M users * 200 ratings avg = 200M entries
// Memory: 200M * ~30 bytes = 6GB — feasible