Distributed Tracing
debt(d7/e5/b5/t5)
Closest to 'only careful code review or runtime testing' (d7). OpenTelemetry tooling can verify traces exist, but detecting broken trace chains (missing header propagation) or missing span attributes typically requires runtime testing or manual review of inter-service calls. The code_pattern regex (traceparent|X-B3-TraceId) can find some usages, but won't catch omissions.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix involves adding OpenTelemetry SDK, propagating headers in ALL HTTP/queue calls, and adding custom attributes. This touches multiple integration points across the codebase—HTTP clients, queue producers/consumers, and potentially middleware layers—but doesn't require architectural rework.
Closest to 'persistent productivity tax' (b5). Applies to web, cli, and queue-worker contexts, meaning every service interaction point must maintain trace propagation. Once adopted, all new services and communication paths must include tracing instrumentation. However, it's additive rather than architectural—the system still functions without it, unlike a core auth system.
Closest to 'notable trap' (t5). The misconception field explicitly states developers wrongly believe tracing replaces logging. This is a documented gotcha that most devs eventually learn—traces show flow and timing, logs show detailed events. Additionally, the 100% sampling trap and missing header propagation are common pitfalls that contradict intuitive expectations.
TL;DR
Explanation
A trace represents one request journey. Each service adds a span (operation + timing). Spans link via trace ID + parent span ID propagated in headers (W3C Trace Context: traceparent header). Tools: Jaeger, Zipkin, Datadog APM, AWS X-Ray, Honeycomb. OpenTelemetry: the standard SDK for instrumentation. PHP: opentelemetry-php SDK, Laravel/Symfony integration. Key use: find where in a multi-service request the latency is — which service is slow, which DB query takes 2s. Without tracing, cross-service debugging requires correlating logs by timestamp — fragile and slow.
Common Misconception
Why It Matters
Common Mistakes
- Not propagating trace headers between services — breaks the trace chain.
- 100% sampling in production — high volume, high cost. Use sampling.
- Not adding custom attributes to spans — spans without context are hard to use.
Code Examples
// No trace propagation — broken traces:
$response = $httpClient->post('/api/orders', $data);
// No traceparent header passed — downstream sees a new root trace
// Propagate W3C trace context:
$response = $httpClient->post('/api/orders', $data, [
'headers' => [
'traceparent' => $tracer->getCurrentSpan()->getContext()->getTraceId(),
]
]);
// OpenTelemetry PHP:
$span = $tracer->spanBuilder('processOrder')->startSpan();
$span->setAttribute('order.id', $orderId);
try { processOrder(); }
finally { $span->end(); }