← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Error Recovery Patterns

general Intermediate

Also Known As

fault recovery failure recovery strategies resilience patterns graceful degradation

TL;DR

Design strategies for gracefully handling failures and restoring system functionality without data loss or user disruption.

Explanation

Error recovery patterns are architectural and code-level strategies that allow systems to detect, handle, and recover from failures while preserving data integrity and user experience. Key patterns include retry with exponential backoff (retry transient failures with increasing delays), circuit breaker (stop calling failing services to prevent cascade), fallback (provide degraded functionality when primary fails), compensation (undo partial operations on failure), and checkpoint/restart (save progress to resume after crash). Recovery differs from error handling: handling catches the exception, recovery restores the system to a consistent state. Design for recovery from the start - retrofitting is expensive. Consider idempotency (safe to retry), observability (know when recovery happened), and graceful degradation (partial functionality beats total outage). Recovery patterns are especially critical in distributed systems where network partitions, service unavailability, and partial failures are routine rather than exceptional.

Common Misconception

Error recovery means catching all exceptions and logging them - true recovery restores the system to a consistent state where operations can continue, not just acknowledging that something went wrong.

Why It Matters

Systems without recovery patterns turn transient failures into permanent outages - a brief network hiccup becomes a corrupted database state or lost customer order that requires manual intervention to fix.

Common Mistakes

  • Retrying without exponential backoff - hammering a struggling service makes recovery harder for everyone.
  • No idempotency in retry logic - retrying a non-idempotent operation can duplicate side effects like payments or emails.
  • Swallowing exceptions without restoring state - the error is hidden but the system remains in an inconsistent state.
  • Missing compensation logic for multi-step operations - partial failure leaves data spread across services in conflicting states.
  • Infinite retry loops without circuit breakers - a permanently failed dependency exhausts resources retrying forever.

Avoid When

  • Simple CRUD operations where database transactions provide atomicity.
  • Fast-fail scenarios where immediate error feedback is more valuable than retry.
  • Operations where the cost of retry exceeds the cost of failure.

When To Use

  • Multi-step operations where partial failure leaves inconsistent state.
  • External service calls that may fail transiently due to network or load.
  • Long-running processes that should survive restarts.
  • Financial or order processing where correctness is more important than availability.

Code Examples

✗ Vulnerable
// No recovery - partial failure leaves inconsistent state
function processOrder($order) {
    $this->inventory->reserve($order->items);  // Step 1: succeeds
    $this->payment->charge($order->total);      // Step 2: fails!
    // Inventory is reserved but payment failed
    // No cleanup, no retry, customer stuck, stock locked
    $this->shipping->schedule($order);          // Never reached
}

// Retry without backoff - makes outage worse
function callExternalApi($data) {
    while (true) {
        try {
            return $this->api->send($data);
        } catch (Exception $e) {
            // Immediate retry - floods struggling service
            continue;
        }
    }
}
✓ Fixed
// Recovery pattern: compensation on failure
function processOrder($order): OrderResult {
    $reservationId = null;
    $paymentId = null;
    
    try {
        $reservationId = $this->inventory->reserve($order->items);
        $paymentId = $this->payment->charge($order->total);
        $this->shipping->schedule($order);
        return OrderResult::success($order->id);
    } catch (PaymentException $e) {
        // Compensate: release inventory reservation
        if ($reservationId) {
            $this->inventory->release($reservationId);
        }
        return OrderResult::failed('Payment declined');
    } catch (ShippingException $e) {
        // Compensate: refund payment and release inventory
        if ($paymentId) {
            $this->payment->refund($paymentId);
        }
        if ($reservationId) {
            $this->inventory->release($reservationId);
        }
        return OrderResult::failed('Shipping unavailable');
    }
}

// Retry with exponential backoff and circuit breaker
function callWithRecovery($operation, $maxRetries = 3): mixed {
    if ($this->circuitBreaker->isOpen()) {
        return $this->fallback->execute();
    }
    
    $attempt = 0;
    while ($attempt < $maxRetries) {
        try {
            $result = $operation();
            $this->circuitBreaker->recordSuccess();
            return $result;
        } catch (TransientException $e) {
            $attempt++;
            $delay = min(100 * pow(2, $attempt), 10000); // 200ms, 400ms, 800ms... max 10s
            usleep($delay * 1000);
        }
    }
    
    $this->circuitBreaker->recordFailure();
    return $this->fallback->execute();
}

Added 2 May 2026
Views 14
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 2 pings S 2 pings S 0 pings M 0 pings T 0 pings W 1 ping T
No pings yesterday
ChatGPT 2 Google 1 Perplexity 1 SEMrush 1
crawler 4 crawler_json 1
DEV INTEL Tools & Severity
🟠 High ⚙ Fix effort: Medium
⚡ Quick Fix
For any multi-step operation, identify what compensating actions are needed if each step fails and implement them before the operation goes to production
📦 Applies To
any web cli queue-worker
🔗 Prerequisites
🔍 Detection Hints
catch.*Exception.*\{\s*(log|return|throw)\s*[^}]*\}(?!.*finally)
Auto-detectable: ✗ No
⚠ Related Problems
🤖 AI Agent
Confidence: Medium False Positives: Medium ✗ Manual fix Fix: High Context: Function Tests: Regenerate

✓ schema.org compliant