Observability & Operations

Performance

Measurement, bottlenecks, optimization strategies

Performance in PURISTA comes from horizontal scaling, not faster code. Because services are stateless and communicate through messages, you scale by adding instances — not by optimizing algorithms.

The scaling model

flowchart LR
    LB["Load Balancer<br/>or Broker"] --> I1["Instance 1"]
    LB --> I2["Instance 2"]
    LB --> I3["Instance 3"]
    I1 --> DB[(Database)]
    I2 --> DB
    I3 --> DB

The broker distributes messages across service instances
No session affinity required
Instances are interchangeable — start more, stop some, no data loss
Scale per service — User Service needs 3 instances, Email Service needs 1

Measuring performance

Latency

Measure end-to-end latency with OpenTelemetry traces:

// Every message is automatically traced
// Check your Jaeger/Tempo/Zipkin dashboard for:
// - event_bridge.route duration
// - command execution duration
// - subscription processing duration

Throughput

Monitor message rates:

// Messages per second per command/subscription
// Queue backlog depth
// Subscription consumer lag

Resource usage

CPU per service instance
Memory per service instance
Database connection pool utilization
Broker queue depth

Common bottlenecks

Bottleneck	Symptom	Solution
Slow database queries	High command latency	Add indexes, optimize queries, use connection pooling
Single hot command	One instance overloaded	Scale that service independently
Large payloads	High serialization cost	Split into smaller messages, use references
Synchronous external calls	Command blocks for seconds	Use queues for async work
Missing indexes	Database scans	Add indexes for query patterns
In-memory caching	State lost on restart	Use Redis state store

Optimization strategies

1. Scale horizontally

Add instances for the service that needs more capacity:

# Scale User Service to 5 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 5

2. Use queues for long work

Don’t block commands with slow operations. Declare .canEnqueue(queueId, payloadSchema) on the builder to get the typed context.queue.enqueue.queueId(payload) helper:

// ❌ Bad: command blocks for minutes
.setCommandFunction(async function (context, payload) {
  await processLargeFile(payload.fileId) // blocks for 5 minutes
})

// ✅ Good: declare enqueue access, then enqueue and return immediately
.canEnqueue('processFile', z.object({ fileId: z.string() }))
.setCommandFunction(async function (context, payload) {
  const job = await context.queue.enqueue.processFile({ fileId: payload.fileId })
  return { jobId: job.id, status: 'queued' }
})

3. Batch operations

Process multiple items in one command:

.addPayloadSchema(z.object({
  items: z.array(z.object({ id: z.string() })).max(100),
}))
.setCommandFunction(async function (context, payload) {
  const results = await Promise.all(
    payload.items.map(item => processItem(item))
  )
  return { processed: results.length }
})

4. Cache with state stores

.setCommandFunction(async function (context, payload) {
  const cacheKey = `user:${payload.userId}`
  const cached = await context.states.getState(cacheKey)

  if (cached[cacheKey]) {
    return cached[cacheKey]
  }

  const user = await context.resources.db.getUser(payload.userId)
  await context.states.setState(cacheKey, user)
  return user
})

5. Tune queue bridge settings

Queue bridges have their own configuration for batch sizes and recovery behavior. Tuning these affects how quickly jobs are claimed and retried after a worker crash.

RedisQueueBridge exposes scheduleBatchSize (how many scheduled-but-not-yet-due jobs to promote per poll cycle) and recoveryBatchSize (how many expired leases to reclaim per cycle):

import { RedisQueueBridge } from '@purista/redis-queue-bridge'

const queueBridge = new RedisQueueBridge({
  config: { url: process.env.REDIS_URL },
  keyPrefix: 'myapp:queue:',
  scheduleBatchSize: 50,   // jobs promoted from scheduled→pending per poll
  recoveryBatchSize: 20,   // expired leases reclaimed per poll cycle
})

NatsQueueBridge uses a NATS JetStream KV store. To maximize throughput, run more worker instances rather than tuning the bridge — NATS handles distribution automatically:

import { NatsQueueBridge } from '@purista/nats-queue-bridge'

const queueBridge = new NatsQueueBridge({
  connectionOptions: { servers: process.env.NATS_URL },
  subjectPrefix: 'myapp',
  releaseBatchSize: 20,  // expired leases released back to pending per cycle
})

6. Connection pooling

Database and external API connection pools are not managed by PURISTA — configure them in your resources. A common pattern is to share a pool across all commands in a service:

const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20 })

const myService = await myV1Service.getInstance(eventBridge, {
  resources: { db: pool },
})

Keep max pool size proportional to the number of concurrent jobs per instance — a worker instance handling 10 parallel jobs typically needs 10–20 database connections.

When to optimize

Latency exceeds SLA
Throughput cannot keep up with demand
Resource costs are too high
User experience degrades

When NOT to optimize

Premature optimization before measuring
Micro-optimizations that hurt readability
Optimizing the wrong layer (code vs. infrastructure)

Common pitfalls

Optimizing before measuring. Profile first. Optimize the bottleneck.
Ignoring the broker. A slow broker affects all services.
Over-caching. Stale cache causes bugs. Use TTL.
Blocking the event loop. Use queues for CPU-intensive work.

Checklist

Latency is measured end-to-end with traces
Throughput is monitored per command/subscription
Bottlenecks are identified before optimizing
Long work uses queues, not blocking commands
Caching uses state stores with TTL
Scaling is horizontal (more instances) before vertical (bigger instances)
Load tests verify performance under realistic conditions

Observability

Logs, traces, metrics — understand what's happening

OpenTelemetry Backends

Connect PURISTA traces and metrics to your preferred observability platform

Deployment Architectures

Monolith, microservices, Kubernetes, edge, serverless