← Back to glossary

ML Types

AI / ML Intermediate

debt(d9/e7/b5/t7)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). Detection_hints confirm 'automated: no' — there is no tool that catches paradigm misselection. Choosing clustering over supervised classification, or building a custom model when an API suffices, produces no error, warning, or lint signal; the consequence is only visible when model results underperform in production.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). Switching ML paradigms (e.g., from unsupervised clustering to supervised classification) means re-collecting or labelling data, retraining or swapping models, re-evaluating pipelines, and potentially re-architecting integration code. The quick_fix note acknowledges that for consumers of ML APIs the fix is simpler, but the common_mistakes list shows the real cost when the wrong paradigm is deeply embedded in a data pipeline.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). The choice of ML paradigm shapes data collection strategy, labelling effort, API or model selection, and evaluation metrics across multiple work streams. It applies to both web and cli contexts per applies_to. While it doesn't define the entire system shape, it imposes ongoing costs on data, model, and integration decisions throughout the project lifecycle.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception field directly states a high-confidence wrong belief: developers assume LLMs use purely supervised learning, when modern LLMs use self-supervised pre-training plus RLHF. This contradicts common mental models imported from classical supervised ML education, paralleling how a similar concept (supervised learning) works — making this a paradigm-level misconception that can drive incorrect architectural and data-labelling decisions.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-10 · reviewed by human

Also Known As

supervised learning unsupervised learning reinforcement learning ML paradigms

TL;DR

Supervised (labelled examples), unsupervised (find patterns), reinforcement learning (reward signals), and self-supervised (model creates its own labels).

Explanation

Supervised: labelled input→output pairs. Classification (spam/not), regression (predict price). Unsupervised: clustering (K-means), dimensionality reduction, anomaly detection. Self-supervised: model generates its own labels from data — GPT predicts next token. Reinforcement learning: agent+rewards+policy — game playing, RLHF for fine-tuning LLMs. Choosing the right paradigm depends on: whether labels are available, whether you need groups or predictions, and latency requirements.

Common Misconception

✗ LLMs are trained purely with supervised learning — modern LLMs use self-supervised pre-training (predict next token) then RLHF (reinforcement learning from human feedback).

Why It Matters

Choosing the wrong ML paradigm wastes engineering effort — labelled churn data should use supervised classification, not unsupervised clustering which cannot use the labels.

Common Mistakes

Supervised learning without sufficient labelled data — model learns noise
Using clustering when supervised labels are available — worse results
Ignoring class imbalance in supervised classification
Not considering self-supervised approaches when labelling is expensive

Code Examples

✗ Vulnerable

// Goal: detect fraud (labelled historical data exists)
// Wrong choice: unsupervised clustering (ignores labels)
// Should use: supervised binary classification with fraud labels

✓ Fixed

// Matching ML type to problem:
// Churn prediction (labelled) → supervised: logistic regression
// Customer segments (no predefined groups) → unsupervised: K-means
// Optimise recommendations (engagement signal) → reinforcement: bandit
// PHP code completion (large PHP codebase) → self-supervised: next-token

Tags

ai ml

Added 16 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,506 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 0

No pings yet today

No pings yesterday

Amazonbot 10 Scrapy 6 Perplexity 5 ChatGPT 5 Ahrefs 4 Google 4 Unknown AI 3 Bing 3 Claude 2 PetalBot 2 Majestic 1 Meta AI 1

Also referenced

AI Evaluation Metrics 68 Neural Networks — Conceptual Overview 48 Embeddings 46

How they use it

crawler 40 crawler_json 5 pre-tracking 1

Related categories

ai_ml 2.3k

⚡ DEV INTEL Tools & Severity

🔵 Info ⚙ Fix effort: Low

⚡ Quick Fix

For PHP developers integrating AI: supervised learning (classification, prediction from labelled data) is what most ML APIs provide; you consume the model output via API, not train the model yourself

📦 Applies To

any web cli

🔗 Prerequisites

Large Language Models (LLMs) AI Agents & Tool Use Neural Networks — Conceptual Overview

🔍 Detection Hints

Building a custom ML model when a pre-trained API (OpenAI, Claude) would suffice; misunderstanding supervised vs unsupervised for use case selection

Auto-detectable: ✗ No

⚠ Related Problems

Large Language Models (LLMs) AI Agents & Tool Use Embeddings

🤖 AI Agent

Confidence: Low False Positives: High ✗ Manual fix Fix: High Context: File

References

https://scikit-learn.org/stable/user_guide.html