In-process runtime for AI agents.

The Harness is not a SaaS. It sits inside your application — between your TypeScript code and external model providers — providing typed boundaries, sandboxed execution, and full observability. Every piece is a pluggable adapter.

Core Concepts

Eight building blocks

The Harness composes eight primitives into a single typed boundary. Each has a single responsibility and a defined interface.

Harness

Compiled definition of models, tools, skills, agents, workflows, defaults, and adapters. The harness is the declaration of what capabilities exist in your system.

Session

Isolated operational context with memory, history, sandbox, and one active run at a time. Answers: what user, thread, or tenant is this run for?

Agent

A typed LLM conversation loop. Prepares messages, calls the model, executes tools, appends results, validates output, and emits events.

Workflow

Application-owned orchestration around one or more agent invocations. Sequence, branch, fan out, reflect, judge, request human approval, write state.

Tool

Callable capability exposed to an agent: TypeScript, built-in, or MCP. Each tool has typed input/output, a timeout, and sandbox access.

Skill

Mounted instruction directory with SKILL.md frontmatter. Reusable domain guidance that agents follow without code duplication or prompt embedding.

Memory

Session-, run-, agent-, user-, and tenant-scoped JSON memory. Default sandboxMemory stores in the session sandbox. Durable adapters for persistence and TTL.

Durable Runtime

Optional checkpoint, lease, and resume adapter for long-running workflows. Recovery starts from the last committed checkpoint. Streams remain observation-only.

Agent vs workflow The workflow owns orchestration; agents own reasoning.
Composition Model

Stack agentsinto workflows.

An agent is one typed LLM conversation loop with its own tools, skills, and instructions — the smallest unit of AI reasoning. A workflow composes multiple agents into a named sequence with branching, fan-out, human review gates, and deterministic logic between each step. One workflow can run a research agent, hand off to a writing agent, and gate on a review agent — each with its own tools and skills, all coordinated by your application code.

  • Agent: one typed LLM loop with tools, skills, and typed output
  • Workflow: orchestrates one or more agents with full application control
  • Fan-out: run multiple agents in parallel, synthesize results
  • Human gates: block progression until a human decision is recorded
  • Each agent in a workflow has its own instructions and tool set
Agent lifecycle Every model call, tool call and result is observable.
Agent Lifecycle

Every step is observable

From session creation through model calls, tool execution, and validation — every step emits run events and OpenTelemetry spans. Nothing is invisible.

  • App calls session.agents.answerer.stream(input)
  • Session creates a run and emits run.started
  • Agent invokes the model with tool specs
  • Model returns tool call or final object
  • Tool executes with timeout and sandbox
  • Validated output returned, run finished
Durable execution State snapshots keep long runs inspectable.
Durable Execution

Long runs.Survive restarts.

Optional durable runtime adapters add checkpoints, leases, and resume for long-running workflows. Recovery starts from the last committed checkpoint, not from scratch. Streams remain observation-only.

  • Checkpoint state at deterministic boundaries
  • Lease management prevents duplicate execution
  • Resume from last checkpoint on restart
  • Declare requirements with .requires(["runtime.checkpoint"])
  • Streams are observation-only, not recovery cursors
Adapter Model

Every boundary is pluggable

Models, sandbox, state, tools, skills, and MCP — each capability is an adapter with a defined interface. Pick the defaults. Replace them as requirements change. Add your own when nothing off-the-shelf fits.

Model Provider

Translates harness requests to provider SDKs. Capability-gated — if the bound model can't provide a required feature, startup fails with a clear error, not a runtime surprise.

OpenAI
@purista/harness-openai
Anthropic
@purista/harness-anthropic
Amazon Bedrock
@purista/harness-bedrock
Azure AI Foundry
@purista/harness-azure-foundry
Sandbox

File operations, command execution, and MCP server isolation. Choose in-memory for most agents. Add bash execution only where required. Build container or remote adapters for strict isolation policies.

  • inMemorySandbox() — file ops only, no command execution
  • bashSandbox() — command execution, higher privilege
  • Custom adapters for containers or remote execution
State Store

Persists sessions, run history, messages, and events across process restarts. Use the default for development. Use a durable adapter in production so run history and traces survive deployment cycles.

  • In-memory default — suitable for local development
  • Custom adapter — implement StateStore against any backend (Redis, Postgres, etc.)
  • Any adapter that satisfies the typed StateStore port compiles in without changes to agent or workflow code
MCP Tools

Connect to any Model Context Protocol server. Stdio runs through the sandbox executor. HTTP calls remote endpoints. Both sides validate schemas before registering tools to the agent.

  • mcp_stdio — local server through sandbox executor
  • mcp_http — remote endpoint over HTTP
Observability

Every run is visible

All streaming APIs emit typed run events. Applications render these in chat UIs, run inspectors, logs, or tests. OpenTelemetry spans provide distributed tracing aligned to the GenAI semantic conventions.

run.started Session created a run for the agent or workflow invocation
agent.started Agent reasoning loop began; model call is being prepared
tool.started Tool invocation began with validated input; sandbox timer running
tool.finished Tool returned; output validated and appended to conversation
agent.finished Agent reasoning loop complete; output validated against output schema
run.finished Run completed or failed; final typed output or normalized error returned

See it in practice.

Explore production AI patterns — RAG, triage, review gates, parallel agents — then layer on memory adapters for context that persists across sessions.