March 14, 2026 6 min read

Building Resilient APIs: Circuit Breakers, Retry Policies, and Rate Limiting

Production APIs fail. Downstream services time out, databases become unavailable, third-party providers return 503s. The question is not whether your API will encounter failures — it is whether those failures will cascade into a full system outage or be contained gracefully. Circuit breakers, retry policies, and rate limiting are the three patterns that separate resilient systems from fragile ones.

Why Your API Needs Resilience from Day One

The most expensive time to add resilience is after your first production incident. By then, the patterns need to be retrofitted into an already-running system, often under pressure, with incomplete test coverage. The cost is not just engineering time — it is the indirect cost of every retry storm that hammered a struggling database, every cascading failure that brought down five services when one was slow, and every user who gave up during a degraded experience.

Retry storms are a particularly insidious problem. A client calls a service that is slow. The client times out and retries. Now two requests are in flight against an already-overloaded service. Multiply this by hundreds of clients and you have turned a partial degradation into a complete outage. Without a circuit breaker to stop the retries and a rate limiter to cap the throughput, good intentions (retrying on failure) become a denial-of-service attack on your own infrastructure.

These patterns share a common property: they are mechanical. The logic of a token-bucket rate limiter or an exponential-backoff retry policy does not change between projects. What changes is the configuration: thresholds, timeouts, window sizes. This is exactly the kind of boilerplate that CrowVault automates away through its API tools.

Circuit Breakers: Fail Fast, Recover Gracefully

A circuit breaker wraps a call to a remote service and monitors its failure rate. When the failure rate crosses a threshold, the breaker opens — subsequent calls fail immediately without hitting the remote service. After a recovery timeout, the breaker enters a half-open state, allowing a single probe request through. If that probe succeeds, the breaker closes and normal operation resumes. If it fails, the breaker reopens and the recovery timer resets.

The generate_circuit_breaker tool produces a production-ready implementation with configurable thresholds and optional metrics hooks:

bash

curl -s -X POST https://api.crowvault.ai/v1/tools/call \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "server": "api-mcp",
    "tool": "generate_circuit_breaker",
    "args": {
      "runtime": "node",
      "failureThreshold": 5,
      "recoveryTimeMs": 30000,
      "withMetrics": true
    }
  }'

The generated output is a fully typed TypeScript class that you can drop directly into your service layer:

typescript

type CircuitState = 'CLOSED' | 'OPEN' | 'HALF_OPEN';

interface CircuitBreakerMetrics {
  totalCalls: number;
  successCount: number;
  failureCount: number;
  lastStateChange: Date;
  currentState: CircuitState;
}

export class CircuitBreaker {
  private state: CircuitState = 'CLOSED';
  private failureCount = 0;
  private lastFailureTime?: number;
  private metrics: CircuitBreakerMetrics;

  constructor(
    private readonly failureThreshold = 5,
    private readonly recoveryTimeMs = 30_000,
    private readonly onStateChange?: (state: CircuitState, metrics: CircuitBreakerMetrics) => void
  ) {
    this.metrics = {
      totalCalls: 0,
      successCount: 0,
      failureCount: 0,
      lastStateChange: new Date(),
      currentState: 'CLOSED',
    };
  }

  async call(fn: () => Promise): Promise {
    if (this.state === 'OPEN') {
      const elapsed = Date.now() - (this.lastFailureTime ?? 0);
      if (elapsed < this.recoveryTimeMs) {
        throw new Error(`Circuit is OPEN — retry after ${Math.ceil((this.recoveryTimeMs - elapsed) / 1000)}s`);
      }
      this.transitionTo('HALF_OPEN');
    }

    this.metrics.totalCalls++;
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }

  private onSuccess(): void {
    this.metrics.successCount++;
    this.failureCount = 0;
    if (this.state !== 'CLOSED') this.transitionTo('CLOSED');
  }

  private onFailure(): void {
    this.metrics.failureCount++;
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold) this.transitionTo('OPEN');
  }

  private transitionTo(next: CircuitState): void {
    this.state = next;
    this.metrics.currentState = next;
    this.metrics.lastStateChange = new Date();
    this.onStateChange?.(next, { ...this.metrics });
  }

  getMetrics(): CircuitBreakerMetrics {
    return { ...this.metrics };
  }
}

The withMetrics: true flag adds the onStateChange callback, giving you a clean integration point for Prometheus counters or structured log events whenever the breaker transitions states.

Retry Policies with Exponential Backoff

Retrying a failed request immediately is almost always the wrong choice. If a service is overloaded, an immediate retry adds more load at the worst possible moment. Exponential backoff with jitter spreads retries over time, reducing the probability that multiple clients retry in lockstep.

bash

curl -s -X POST https://api.crowvault.ai/v1/tools/call \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "server": "api-mcp",
    "tool": "generate_retry_policy",
    "args": {
      "maxRetries": 4,
      "baseDelayMs": 200,
      "maxDelayCapMs": 8000,
      "withJitter": true,
      "retryOn": [429, 502, 503, 504]
    }
  }'

typescript

interface RetryOptions {
  maxRetries?: number;
  baseDelayMs?: number;
  maxDelayCapMs?: number;
  retryOn?: number[];
}

export async function withRetry(
  fn: () => Promise,
  options: RetryOptions = {}
): Promise {
  const {
    maxRetries = 4,
    baseDelayMs = 200,
    maxDelayCapMs = 8_000,
    retryOn = [429, 502, 503, 504],
  } = options;

  let lastError: unknown;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      lastError = err;
      const status = err?.status ?? err?.response?.status;

      if (attempt === maxRetries || (status && !retryOn.includes(status))) {
        throw err;
      }

      // Exponential backoff with full jitter: delay = random(0, min(cap, base * 2^attempt))
      const exponential = Math.min(maxDelayCapMs, baseDelayMs * 2 ** attempt);
      const jitter = Math.random() * exponential;
      await new Promise((r) => setTimeout(r, jitter));
    }
  }

  throw lastError;
}

Full jitter (randomising between 0 and the capped exponential value) is preferred over adding a small random offset to the full backoff because it significantly reduces the variance in collective load across all retrying clients.

Rate Limiting to Protect Your Services

Rate limiting sits at the ingress of your service and enforces a maximum request rate per client. The token-bucket algorithm is the most flexible approach: a bucket holds up to N tokens; each request consumes one; tokens refill at a steady rate. Short bursts are absorbed by the bucket depth while the average rate stays bounded.

bash

curl -s -X POST https://api.crowvault.ai/v1/tools/call \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "server": "api-mcp",
    "tool": "generate_rate_limiting",
    "args": {
      "algorithm": "token-bucket",
      "store": "redis",
      "requestsPerWindow": 100,
      "windowSeconds": 60,
      "burstCapacity": 20,
      "keyStrategy": "api-key"
    }
  }'

typescript

// Redis Lua script ensures atomicity — no race conditions across instances
const TOKEN_BUCKET_SCRIPT = `
  local key       = KEYS[1]
  local capacity  = tonumber(ARGV[1])
  local refillRate = tonumber(ARGV[2])  -- tokens per second
  local now       = tonumber(ARGV[3])   -- unix ms

  local bucket    = redis.call('HMGET', key, 'tokens', 'last_refill')
  local tokens    = tonumber(bucket[1]) or capacity
  local lastRefill = tonumber(bucket[2]) or now

  local elapsed   = (now - lastRefill) / 1000
  tokens = math.min(capacity, tokens + elapsed * refillRate)

  if tokens >= 1 then
    tokens = tokens - 1
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)
    return 1  -- allowed
  end

  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  return 0  -- denied
`;

export async function tokenBucketMiddleware(
  redis: Redis,
  capacity: number,
  refillPerSecond: number
) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const key = `rate:${req.headers['x-api-key'] ?? req.ip}`;
    const allowed = await redis.eval(
      TOKEN_BUCKET_SCRIPT, 1, key,
      capacity, refillPerSecond, Date.now()
    );

    if (!allowed) {
      res.setHeader('Retry-After', Math.ceil(1 / refillPerSecond));
      return res.status(429).json({ error: 'Rate limit exceeded' });
    }
    next();
  };
}

The Lua script executes atomically in Redis, preventing race conditions in multi-instance deployments where two requests for the same key could both read the same token count before either writes back.

Combining Patterns for Production

These patterns compose into a defence-in-depth stack. The recommended order of application, from outermost to innermost, is:

Rate limiter — reject excess requests before they consume any resources downstream.
Circuit breaker — stop forwarding requests to services that are known to be failing.
Retry with backoff — transparently handle transient failures for requests that get through.
Timeout — bound the maximum time any single attempt can run.
Bulkhead — isolate resource pools so that one saturated dependency cannot exhaust threads or connections for the rest of the system.

CrowVault has dedicated tools for all five layers. generate_timeout_config produces per-operation timeout constants with AbortController wiring. generate_bulkhead generates a semaphore-backed concurrency limiter that partitions your connection pool by downstream service. Together, these five patterns handle the vast majority of distributed-systems failure modes without requiring a service mesh.

Each tool generates idiomatic, typed code that you own and can modify. There is no runtime dependency on CrowVault — the generated output is plain TypeScript that you paste into your repository. See the full API reference for the complete list of resilience tools, or create an account to start generating code immediately.