Resilience Package
The @hazeljs/resilience package provides fault-tolerance and resilience patterns for HazelJS microservices. It includes circuit breaker, retry, timeout, bulkhead, rate limiter, and metrics collection — all usable via decorators or programmatic API.
Purpose
In a microservices architecture, services depend on each other over the network. Networks are unreliable — services go down, become slow, or get overloaded. Without resilience patterns, a single failing service can cascade and take down your entire system. The @hazeljs/resilience package solves this by providing:
- Circuit Breaker: Stops calling a failing service before it drags everything else down
- Retry: Automatically re-attempts transient failures with configurable backoff
- Timeout: Fails fast when a service is too slow, freeing resources
- Bulkhead: Limits concurrent calls to isolate failures and prevent resource exhaustion
- Rate Limiter: Controls request throughput with token bucket and sliding window strategies
- Metrics: Tracks success/failure/latency per target, feeding into gateway canary decisions
Architecture
The package provides both a decorator API for declarative use and a programmatic API for advanced scenarios:
graph TD A["Your Service Method"] --> B["@WithCircuitBreaker"] B --> C["@WithRetry"] C --> D["@WithTimeout"] D --> E["@WithBulkhead"] E --> F["@WithRateLimit"] F --> G["Actual Call"] H["MetricsCollector"] --> B H --> I["SlidingWindow<br/>(Count / Time)"] style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff style B fill:#ef4444,stroke:#f87171,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style F fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff style G fill:#6366f1,stroke:#818cf8,stroke-width:2px,color:#fff
Key Components
- CircuitBreaker: State machine (CLOSED → OPEN → HALF_OPEN) that prevents cascading failures
- RetryPolicy: Retries with fixed, linear, or exponential backoff and optional jitter
- Timeout: Promise-based timeout wrapper with cancellation
- Bulkhead: Concurrency limiter with queue support
- RateLimiter: Token bucket and sliding window strategies
- MetricsCollector: Tracks call statistics within a sliding window
- Decorators:
@WithCircuitBreaker,@WithRetry,@WithTimeout,@WithBulkhead,@WithRateLimit,@Fallback
Installation
npm install @hazeljs/resilience
Quick Start
Decorator API
The decorator API lets you apply resilience patterns declaratively to any class method:
import { Injectable } from '@hazeljs/core';
import { WithCircuitBreaker, WithRetry, WithTimeout, WithBulkhead, Fallback } from '@hazeljs/resilience';
@Injectable()
class PaymentService {
@WithCircuitBreaker({
failureThreshold: 5,
slidingWindow: { type: 'count', size: 20 },
resetTimeout: 30_000,
fallback: 'processPaymentFallback',
})
@WithRetry({ maxAttempts: 3, backoff: 'exponential', baseDelay: 500 })
@WithTimeout(5000)
@WithBulkhead({ maxConcurrent: 10, maxQueue: 50 })
async processPayment(order: Order): Promise<PaymentResult> {
return await this.paymentGateway.charge(order);
}
@Fallback('processPayment')
async processPaymentFallback(order: Order): Promise<PaymentResult> {
return { status: 'queued', message: 'Payment will be processed later' };
}
}
When processPayment is called, the decorators execute in order: circuit breaker check → retry wrapper → timeout → bulkhead concurrency check → actual call. If the circuit is open, it immediately calls the fallback without touching the network.
Programmatic API
For cases where you need more control, use the classes directly:
import { CircuitBreaker, RetryPolicy, Timeout, Bulkhead, RateLimiter } from '@hazeljs/resilience';
// Circuit Breaker
const breaker = new CircuitBreaker({
failureThreshold: 5,
resetTimeout: 30000,
slidingWindow: { type: 'count', size: 20 },
});
const result = await breaker.execute(() => fetch('/api/data'));
// Listen to state changes
breaker.on('stateChange', (from, to) => {
console.log(`Circuit breaker: ${from} -> ${to}`);
});
// Retry
const retry = new RetryPolicy({
maxAttempts: 3,
backoff: 'exponential',
baseDelay: 1000,
jitter: true,
});
const data = await retry.execute(() => fetch('/api/unstable'));
// Timeout
const timeout = new Timeout(5000);
const response = await timeout.execute(() => fetch('/api/slow'));
// Bulkhead
const bulkhead = new Bulkhead({ maxConcurrent: 10, maxQueue: 50 });
const result = await bulkhead.execute(() => intensiveOperation());
// Rate Limiter
const limiter = new RateLimiter({
strategy: 'token-bucket',
max: 100,
window: 60000,
});
if (limiter.tryAcquire()) {
await handleRequest();
}
Circuit Breaker
The circuit breaker prevents cascading failures by monitoring call success rates and temporarily blocking calls to unhealthy services.
How It Works
graph LR A["CLOSED<br/>(Normal)"] -->|"failures >= threshold"| B["OPEN<br/>(Blocking)"] B -->|"reset timeout"| C["HALF_OPEN<br/>(Testing)"] C -->|"success >= threshold"| A C -->|"failure"| B style A fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff style B fill:#ef4444,stroke:#f87171,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
| State | Behavior |
|---|---|
| CLOSED | All calls pass through. Failures are counted in the sliding window. |
| OPEN | All calls are immediately rejected. After resetTimeout, transitions to HALF_OPEN. |
| HALF_OPEN | A limited number of trial calls are allowed. If they succeed, the breaker closes. If they fail, it opens again. |
Configuration
const breaker = new CircuitBreaker({
// When to open the circuit
failureThreshold: 5, // Number of failures to trigger OPEN
failureRateThreshold: 50, // Or: percentage failure rate to trigger OPEN
// How long to wait before testing again
resetTimeout: 30000, // ms in OPEN state before trying HALF_OPEN
// Sliding window for tracking failures
slidingWindow: {
type: 'count', // 'count' or 'time'
size: 20, // Last 20 calls (count) or 20s window (time)
},
// How many trial calls in HALF_OPEN
halfOpenMaxCalls: 3,
// Custom failure detection
isFailure: (error) => {
// Only count 5xx as failures, not 4xx
return error.status >= 500;
},
});
Events
breaker.on('stateChange', (from, to) => {
console.log(`Circuit: ${from} -> ${to}`);
});
breaker.on('success', (duration) => {
console.log(`Call succeeded in ${duration}ms`);
});
breaker.on('failure', (error, duration) => {
console.log(`Call failed after ${duration}ms: ${error}`);
});
breaker.on('rejected', () => {
console.log('Call rejected — circuit is OPEN');
});
Metrics
const metrics = breaker.getMetrics();
console.log(metrics);
// {
// totalRequests: 150,
// failureCount: 12,
// failureRate: 8,
// successCount: 138,
// averageLatency: 45,
// p99Latency: 120,
// }
Circuit Breaker Registry
Manage named circuit breakers across your application:
import { CircuitBreakerRegistry } from '@hazeljs/resilience';
// Get or create a named circuit breaker
const breaker = CircuitBreakerRegistry.getOrCreate('payment-service', {
failureThreshold: 5,
resetTimeout: 30000,
});
// Get an existing one
const same = CircuitBreakerRegistry.get('payment-service');
// List all registered breakers
const all = CircuitBreakerRegistry.getAll();
Retry Policy
Automatically retries failed operations with configurable backoff strategies.
Backoff Strategies
| Strategy | Delay Pattern | Use Case |
|---|---|---|
fixed | Same delay every time (e.g. 1s, 1s, 1s) | Simple retry with constant delay |
linear | Delay increases linearly (e.g. 1s, 2s, 3s) | Gradually back off |
exponential | Delay doubles each time (e.g. 1s, 2s, 4s, 8s) | Best for network calls — gives the service time to recover |
Configuration
const retry = new RetryPolicy({
maxAttempts: 3, // Total attempts (including first)
backoff: 'exponential', // 'fixed' | 'linear' | 'exponential'
baseDelay: 1000, // Starting delay in ms
maxDelay: 30000, // Cap on delay
jitter: true, // Add randomness to prevent thundering herd
// Only retry on certain errors
retryOn: (error) => {
return error.code === 'ECONNREFUSED' || error.status >= 500;
},
});
const result = await retry.execute(async () => {
return await fetch('https://api.example.com/data');
});
Retry with Circuit Breaker
Combine retry with circuit breaker for maximum resilience:
const breaker = new CircuitBreaker({ failureThreshold: 5, resetTimeout: 30000 });
const retry = new RetryPolicy({ maxAttempts: 3, backoff: 'exponential', baseDelay: 500 });
// Retry wraps the circuit breaker — if all retries fail, the circuit counts it as a failure
const result = await retry.execute(() =>
breaker.execute(() => fetch('/api/data'))
);
Timeout
Enforces a time limit on operations, freeing resources when a service is too slow.
import { Timeout, TimeoutError } from '@hazeljs/resilience';
const timeout = new Timeout(5000); // 5 seconds
try {
const result = await timeout.execute(async () => {
return await fetch('/api/slow-endpoint');
});
} catch (error) {
if (error instanceof TimeoutError) {
console.log('Request timed out after 5000ms');
}
}
Or use the standalone helper:
import { withTimeout, TimeoutError } from '@hazeljs/resilience';
const result = await withTimeout(
fetch('/api/slow-endpoint'),
5000
);
Bulkhead
Limits concurrent executions to isolate failures and prevent resource exhaustion. Excess requests are queued up to a maximum, then rejected.
import { Bulkhead, BulkheadFullError } from '@hazeljs/resilience';
const bulkhead = new Bulkhead({
maxConcurrent: 10, // Max parallel executions
maxQueue: 50, // Max queued requests
queueTimeout: 5000, // How long a queued request waits before being rejected
});
try {
const result = await bulkhead.execute(async () => {
return await processRequest();
});
} catch (error) {
if (error instanceof BulkheadFullError) {
console.log('Service overloaded — try again later');
}
}
// Check current state
console.log(bulkhead.getActiveCount()); // Current parallel executions
console.log(bulkhead.getQueueSize()); // Requests waiting in queue
Rate Limiter
Controls request throughput using token bucket or sliding window strategies.
Token Bucket
Allows bursts up to the bucket size, then refills at a steady rate:
import { RateLimiter } from '@hazeljs/resilience';
const limiter = new RateLimiter({
strategy: 'token-bucket',
max: 100, // Bucket size (max burst)
window: 60000, // Refill window in ms
refillRate: 10, // Tokens added per second
});
if (limiter.tryAcquire()) {
await handleRequest();
} else {
const retryAfter = limiter.getRetryAfterMs();
res.status(429).header('Retry-After', String(Math.ceil(retryAfter / 1000))).send('Too Many Requests');
}
Sliding Window
Tracks requests in a rolling time window:
const limiter = new RateLimiter({
strategy: 'sliding-window',
max: 100, // Max requests per window
window: 60000, // Window size in ms
});
Metrics Collection
The MetricsCollector tracks call statistics within a sliding window, providing insights into service health:
import { MetricsCollector, MetricsRegistry } from '@hazeljs/resilience';
// Create a collector with a 60-second window
const collector = new MetricsCollector(60000);
// Record outcomes
collector.recordSuccess(45); // 45ms latency
collector.recordFailure(120, 'timeout');
// Get aggregated metrics
const snapshot = collector.getSnapshot();
console.log(snapshot);
// {
// totalRequests: 1500,
// successCount: 1480,
// failureCount: 20,
// failureRate: 1.33,
// averageLatency: 42,
// p50Latency: 35,
// p95Latency: 95,
// p99Latency: 150,
// }
// Use the registry for named metrics
MetricsRegistry.getOrCreate('user-service');
MetricsRegistry.getOrCreate('order-service');
These metrics feed directly into the @hazeljs/gateway canary deployment engine for automated promotion and rollback decisions.
Combining Patterns
The real power comes from composing patterns together. Here's a production-ready service call:
Decorator Composition
@Injectable()
class OrderService {
@WithCircuitBreaker({
failureThreshold: 5,
resetTimeout: 30000,
fallback: 'createOrderFallback',
})
@WithRetry({ maxAttempts: 3, backoff: 'exponential', baseDelay: 500 })
@WithTimeout(5000)
@WithBulkhead({ maxConcurrent: 20, maxQueue: 100 })
async createOrder(data: OrderData): Promise<Order> {
return await this.httpClient.post('/orders', data);
}
@Fallback('createOrder')
async createOrderFallback(data: OrderData): Promise<Order> {
// Queue for later processing
await this.queue.add('create-order', data);
return { id: 'pending', status: 'queued' };
}
}
Programmatic Composition
const breaker = new CircuitBreaker({ failureThreshold: 5, resetTimeout: 30000 });
const retry = new RetryPolicy({ maxAttempts: 3, backoff: 'exponential', baseDelay: 500 });
const timeout = new Timeout(5000);
const bulkhead = new Bulkhead({ maxConcurrent: 20, maxQueue: 100 });
async function resilientCall<T>(fn: () => Promise<T>): Promise<T> {
return bulkhead.execute(() =>
retry.execute(() =>
breaker.execute(() =>
timeout.execute(fn)
)
)
);
}
const result = await resilientCall(() => fetch('/api/orders'));
Integration with @hazeljs/gateway
The @hazeljs/gateway package uses @hazeljs/resilience internally for per-route protection. When you configure circuit breakers or rate limits in your gateway config, they use the same classes:
// gateway.config.ts
const gatewayConfig = () => ({
gateway: {
resilience: {
defaultCircuitBreaker: {
failureThreshold: parseInt(process.env.GATEWAY_CB_THRESHOLD || '5'),
resetTimeout: parseInt(process.env.GATEWAY_CB_RESET_TIMEOUT || '30000'),
},
defaultTimeout: parseInt(process.env.GATEWAY_DEFAULT_TIMEOUT || '5000'),
},
routes: [
{
path: '/api/users/**',
serviceName: 'user-service',
circuitBreaker: { failureThreshold: 10 },
rateLimit: { strategy: 'sliding-window', max: 100, window: 60000 },
},
],
},
});
Best Practices
-
Order matters for decorators: Apply
@WithCircuitBreakeroutermost, then@WithRetry, then@WithTimeout. This ensures retries happen inside the circuit breaker's tracking. -
Set realistic timeouts: A 5-second timeout for a service that normally responds in 50ms means something is very wrong. Set timeouts close to your p99 latency plus a buffer.
-
Use exponential backoff with jitter: Prevents the thundering herd problem where all retries hit the recovering service at the same time.
-
Configure circuit breaker thresholds per service: A critical payment service might open after 3 failures, while a notification service can tolerate 10.
-
Monitor metrics: Use
MetricsCollectorto track service health and feed data into dashboards or alerting systems. -
Always define fallbacks: For user-facing operations, define a fallback that returns a degraded response instead of an error.
-
Use bulkheads for shared resources: If your service calls multiple downstream services, use separate bulkheads to prevent one slow service from consuming all your connection pool.
What's Next?
- Learn about Gateway Package for intelligent API routing with canary deployments
- Explore Discovery Package for service registration and discovery
- Check out Config Package for managing resilience settings via environment variables