Observability & Operations

Reliability

Error handling, retries, circuit breakers, failure modes

Reliability in PURISTA is built on typed errors, automatic retries, bounded failure handling, and graceful shutdown. The framework separates expected errors (HandledError) from unexpected failures (UnhandledError) and provides clear semantics for each.

Error types

PURISTA provides two error types:

Type	Meaning	HTTP Status	Log Level	Retries
HandledError	Expected business error	Configurable (404, 409, etc.)	Debug	No — not retried
UnhandledError	Unexpected failure	500	Error	Yes — retried up to `maxAttempts`

HandledError signals an expected business condition (user not found, conflict, forbidden). These are not retried — retrying would produce the same result. UnhandledError signals an unexpected infrastructure or code failure where a retry may succeed.

HandledError

Use for expected business conditions:

import { HandledError, StatusCode } from '@purista/core'

.setCommandFunction(async function (context, payload) {
  const user = await context.resources.db.findById(payload.userId)
  if (!user) {
    throw new HandledError(StatusCode.NotFound, 'User not found')
  }
  return user
})

The HTTP adapter returns RFC 9457 Problem Details:

{
  "type": "about:blank",
  "title": "Not Found",
  "status": 404,
  "detail": "User not found"
}

UnhandledError

Use for unexpected failures:

try {
  const result = await context.resources.db.create(payload)
  return result
} catch (err) {
  if (isConstraintViolation(err)) {
    throw new HandledError(StatusCode.Conflict, 'User already exists')
  }
  throw UnhandledError.from(err, StatusCode.InternalServerError)
}

Validation errors

Input validation failures are automatically converted to HandledError with status 400 Bad Request:

{
  "type": "about:blank",
  "title": "Bad Request",
  "status": 400,
  "detail": "Bad Request",
  "errors": [
    {
      "code": "invalid_type",
      "expected": "string",
      "received": "number",
      "path": ["name"],
      "message": "Expected string, received number"
    }
  ]
}

Output validation failures are 500 Internal Server Error — never expose internal schema details.

Retry and circuit breaker

Configure retries per command or subscription:

.adviceConsumerFailureHandling({
  mode: 'strict',
  maxAttempts: 5,
  retryDelayMs: 1000,
  deadLetterTarget: 'my-command.dead-letter',
})

For queue workers:

.setLifecycleConfig({
  visibilityTimeoutMs: 60_000,
  maxAttempts: 10,
  retryDelayMs: 5000,
})

Graceful shutdown

gracefulShutdown(logger, destroyables, timeoutMs?) registers SIGTERM and SIGINT handlers and sequentially destroys each entry. The optional third argument sets the maximum wait time in milliseconds before a forced exit (default: 30000).

import { gracefulShutdown } from '@purista/core'

const services = [userService, emailService]
gracefulShutdown(logger, [
  honoService.prepareDestroy(),
  eventBridge,
  ...services,
  {
    name: 'close-http-socket',
    destroy: async () => await serverInstance.stop(),
  },
], 30_000) // optional: default is 30 000 ms

Steps:

Stop accepting new HTTP requests
Drain the event bridge
Shut down services
Close sockets
Release resources

Health checks

honoService.setHealthFunction(async function () {
  if (!isDatabaseHealthy()) {
    throw new Error('Database unreachable')
  }
})

Service health includes:

Event bridge connectivity
Store connectivity
Paused subscription consumers
Paused queue workers

When to focus on reliability

Production systems where downtime is costly
Financial transactions (payments, billing, inventory)
Multi-step workflows where partial failure is expensive
Systems with external dependencies (third-party APIs, databases)

Common pitfalls

Swallowing errors. Catch only what you can handle. Let the rest throw.
Non-idempotent side effects. Without idempotency, retries create duplicates.
Ignoring timeout boundaries. Default timeouts may not match your SLA.
Missing graceful shutdown. In-flight requests are dropped without proper shutdown.

Checklist

HandledError is used for expected business errors
UnhandledError is used for unexpected failures
Input validation errors return 400 with helpful details
Output validation errors return 500 without internal details
Retry policies are configured per handler
Graceful shutdown waits for in-flight messages
Health checks verify all critical dependencies
Error tracking is wired to OpenTelemetry or Sentry

Observability

Logs, traces, metrics — understand what's happening

OpenTelemetry Backends

Connect PURISTA traces and metrics to your preferred observability platform

Deployment Architectures

Monolith, microservices, Kubernetes, edge, serverless