Using AI APIs in PHP
debt(d5/e7/b7/t7)
Closest to 'specialist tool catches it' (d5). Semgrep can detect hardcoded API keys and missing retry logic via code patterns; TruffleHog catches leaked secrets. However, architectural issues like synchronous LLM calls in web handlers or missing token budgets require more nuanced detection — some patterns are caught by these tools, others need code review. Splitting the difference at d5.
Closest to 'cross-cutting refactor across the codebase' (e7). While the quick_fix mentions using an SDK with env vars (a simple change), the real misuses — synchronous calls blocking web requests, missing queue infrastructure, no caching layer, no fallback/degradation strategy, no token budget tracking — require introducing queue workers, caching layers, circuit breakers, and cost monitoring across the application. Moving from synchronous inline AI calls to a proper async queue-based architecture touches controllers, job classes, response handlers, and frontend polling/webhook infrastructure.
Closest to 'strong gravitational pull' (b7). AI integration architecture choices (sync vs async, caching strategy, cost management, fallback behavior) shape how features are built across web, CLI, and queue-worker contexts. Once you pick an approach to AI integration, every AI-powered feature in the system follows that pattern. The choice of queue infrastructure, caching strategy, and error handling for AI calls becomes load-bearing across the system. Not quite b9 (it doesn't define the entire system's shape), but it strongly influences how AI features are developed going forward.
Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception is explicit: developers assume AI API calls work like any other REST API call — make a synchronous request in the web handler, get a response, render it. This mental model works for most APIs (sub-200ms responses) but fails catastrophically for LLM calls (2-30 seconds). A competent PHP developer with no AI integration experience will almost certainly build it synchronously first, causing timeouts, blocked workers, and poor UX. The 'obvious' approach (treat it like any other HTTP API call) leads directly to production problems.
Also Known As
TL;DR
Explanation
PHP applications call AI APIs via HTTP — the openai-php/client and anthropic-sdk-php packages provide type-safe wrappers. Key patterns: text generation (completion), structured extraction (function calling/JSON mode), classification, embedding generation, and streaming responses. Important considerations: rate limiting, retry logic, cost management (token counting), timeout handling, and not blocking the web request for slow AI calls (offload to queues). PHP's synchronous model means long AI calls should be async via job queues.
Diagram
flowchart LR
PHP[PHP Application] --> SDK[Anthropic OpenAI SDK<br/>or HTTP client]
SDK -->|API call| LLM2[LLM API]
LLM2 -->|response| PARSE[Parse response]
subgraph Patterns
SYNC2[Synchronous<br/>await full response]
STREAM[Streaming<br/>chunk by chunk output]
TOOL[Tool use<br/>function calling]
RAG3[RAG<br/>inject context from DB]
end
subgraph Caching
PROMPT_CACHE[Cache identical prompts<br/>save cost and latency]
SEMANTIC[Semantic cache<br/>similar prompts hit cache]
end
style SDK fill:#1f6feb,color:#fff
style STREAM fill:#238636,color:#fff
style PROMPT_CACHE fill:#d29922,color:#fff
Watch Out
Common Misconception
Why It Matters
Common Mistakes
- Synchronous LLM calls in web request handlers — 10-second AI calls time out or block workers.
- No token budget — uncapped prompts with large context windows generate unexpected API costs.
- Not caching identical prompts — same question asked repeatedly incurs cost each time.
- No fallback when API is unavailable — AI features should degrade gracefully, not cause 500 errors.
Avoid When
- Do not call LLM APIs synchronously inside a web request handler — it ties up PHP-FPM workers for the full response duration.
- Avoid sending sensitive PII or credentials in prompts — treat every API call as potentially logged by the provider.
- Do not trust LLM-generated code or SQL without review — use it as a draft, not as production-ready output.
When To Use
- Dispatch LLM API calls to a queue worker — responses take 2–30 seconds and must not block a synchronous web request.
- Cache LLM responses for identical or near-identical prompts to reduce cost and latency on repeated queries.
- Validate and sanitise LLM output before using it in SQL queries, HTML output, or file operations.
Code Examples
// Synchronous in web request — blocks worker for 10+ seconds:
class ArticleController {
public function summarise(Request $req): Response {
$summary = $this->openai->chat->completions->create([
'model' => 'gpt-4',
'messages' => [['role' => 'user', 'content' => $req->article]],
]); // Blocks for 5-15 seconds — worker unavailable for other requests
return response()->json(['summary' => $summary->choices[0]->message->content]);
}
}
// Queue-based — returns immediately, processes async:
class ArticleController {
public function summarise(Request $req): Response {
$jobId = Str::uuid();
SummariseArticleJob::dispatch($req->article, $jobId);
return response()->json(['job_id' => $jobId, 'status' => 'processing']);
}
}
// Client polls: GET /api/jobs/{jobId} or uses websocket for push notification