← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

ML Types

AI / ML Intermediate
debt(d9/e7/b5/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). Detection_hints confirm 'automated: no' — there is no tool that catches paradigm misselection. Choosing clustering over supervised classification, or building a custom model when an API suffices, produces no error, warning, or lint signal; the consequence is only visible when model results underperform in production.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). Switching ML paradigms (e.g., from unsupervised clustering to supervised classification) means re-collecting or labelling data, retraining or swapping models, re-evaluating pipelines, and potentially re-architecting integration code. The quick_fix note acknowledges that for consumers of ML APIs the fix is simpler, but the common_mistakes list shows the real cost when the wrong paradigm is deeply embedded in a data pipeline.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). The choice of ML paradigm shapes data collection strategy, labelling effort, API or model selection, and evaluation metrics across multiple work streams. It applies to both web and cli contexts per applies_to. While it doesn't define the entire system shape, it imposes ongoing costs on data, model, and integration decisions throughout the project lifecycle.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception field directly states a high-confidence wrong belief: developers assume LLMs use purely supervised learning, when modern LLMs use self-supervised pre-training plus RLHF. This contradicts common mental models imported from classical supervised ML education, paralleling how a similar concept (supervised learning) works — making this a paradigm-level misconception that can drive incorrect architectural and data-labelling decisions.

About DEBT scoring →

Also Known As

supervised learning unsupervised learning reinforcement learning ML paradigms

TL;DR

Supervised (labelled examples), unsupervised (find patterns), reinforcement learning (reward signals), and self-supervised (model creates its own labels).

Explanation

Supervised: labelled input→output pairs. Classification (spam/not), regression (predict price). Unsupervised: clustering (K-means), dimensionality reduction, anomaly detection. Self-supervised: model generates its own labels from data — GPT predicts next token. Reinforcement learning: agent+rewards+policy — game playing, RLHF for fine-tuning LLMs. Choosing the right paradigm depends on: whether labels are available, whether you need groups or predictions, and latency requirements.

Common Misconception

LLMs are trained purely with supervised learning — modern LLMs use self-supervised pre-training (predict next token) then RLHF (reinforcement learning from human feedback).

Why It Matters

Choosing the wrong ML paradigm wastes engineering effort — labelled churn data should use supervised classification, not unsupervised clustering which cannot use the labels.

Common Mistakes

  • Supervised learning without sufficient labelled data — model learns noise
  • Using clustering when supervised labels are available — worse results
  • Ignoring class imbalance in supervised classification
  • Not considering self-supervised approaches when labelling is expensive

Code Examples

✗ Vulnerable
// Goal: detect fraud (labelled historical data exists)
// Wrong choice: unsupervised clustering (ignores labels)
// Should use: supervised binary classification with fraud labels
✓ Fixed
// Matching ML type to problem:
// Churn prediction (labelled) → supervised: logistic regression
// Customer segments (no predefined groups) → unsupervised: K-means
// Optimise recommendations (engagement signal) → reinforcement: bandit
// PHP code completion (large PHP codebase) → self-supervised: next-token

Tags

ai ml

Added 16 Mar 2026
Edited 22 Mar 2026
Views 56
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 1 ping W 1 ping T 1 ping F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 2 pings T 1 ping F 2 pings S 1 ping S 1 ping M 0 pings T 1 ping W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 2 pings S 1 ping M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Amazonbot 10 Scrapy 6 Perplexity 5 ChatGPT 5 Ahrefs 4 Google 4 Unknown AI 3 Bing 3 Claude 2 PetalBot 2 Majestic 1 Meta AI 1
crawler 40 crawler_json 5 pre-tracking 1
DEV INTEL Tools & Severity
🔵 Info ⚙ Fix effort: Low
⚡ Quick Fix
For PHP developers integrating AI: supervised learning (classification, prediction from labelled data) is what most ML APIs provide; you consume the model output via API, not train the model yourself
📦 Applies To
any web cli
🔗 Prerequisites
🔍 Detection Hints
Building a custom ML model when a pre-trained API (OpenAI, Claude) would suffice; misunderstanding supervised vs unsupervised for use case selection
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: High Context: File


✓ schema.org compliant