In-processruntimefor AI agents.

The Harness is not a SaaS. It sits inside your application — between your TypeScript code and external model providers — providing typed boundaries, sandboxed execution, and full observability. Every piece is a pluggable adapter.

Core Concepts

Eight building blocks

The Harness composes eight primitives into a single typed boundary. Each has a single responsibility and a defined interface.

Harness

Compiled definition of models, tools, skills, agents, workflows, defaults, and adapters. The harness is the declaration of what capabilities exist in your system.

Session

Isolated operational context with memory, history, sandbox, and one active run at a time. Answers: what user, thread, or tenant is this run for?

Agent

A typed LLM conversation loop. Prepares messages, calls the model, executes tools, appends results, validates output, and emits events.

Workflow

Application-owned orchestration around one or more agent invocations. Sequence, branch, fan out, reflect, judge, request human approval, write state.

Tool

Callable capability exposed to an agent: TypeScript, built-in, or MCP. Each tool has typed input/output, a timeout, and sandbox access.

Skill

Mounted instruction directory with SKILL.md frontmatter. Reusable domain guidance that agents follow without code duplication or prompt embedding.

Memory

Session-, run-, agent-, user-, and tenant-scoped JSON memory. Default sandboxMemory stores in the session sandbox. Durable adapters for persistence and TTL.

Durable Runtime

Optional local checkpoint, lease, and resume adapter for long-running workflows. Recovery starts from the last committed checkpoint inside your application boundary. Streams remain observation-only.

Agent vs workflowThe workflow owns orchestration; agents own reasoning.

Composition Model

Stack agentsinto workflows.

An agent is one typed LLM conversation loop with its own tools, skills, and instructions — the smallest unit of AI reasoning. A workflow composes multiple agents into a named sequence with branching, fan-out, human review gates, and deterministic logic between each step. One workflow can run a research agent, hand off to a writing agent, and gate on a review agent — each with its own tools and skills, all coordinated by your application code.

Agent: one typed LLM loop with tools, skills, and typed output
Workflow: orchestrates one or more agents with full application control
Fan-out: run multiple agents in parallel, synthesize results
Human gates: block progression until a human decision is recorded
Each agent in a workflow has its own instructions and tool set

Agent lifecycleEvery model call, tool call and result is observable.

Agent Lifecycle

Every step is observable

From session creation through model calls, tool execution, and validation — every step emits run events and OpenTelemetry spans. Nothing is invisible.

App calls session.agents.answerer.stream(input)
Session creates a run and emits run.started
Agent invokes the model with tool specs
Model returns tool call or final object
Tool executes with timeout and sandbox
Validated output returned, run finished

Durable executionState snapshots keep long runs inspectable.

Durable Execution

Long runs.Survive restarts.

Optional durable runtime adapters add local checkpoints, leases, and resume for long-running workflows. Recovery starts from the last committed checkpoint inside your application boundary, not from scratch or a hosted workflow engine. Streams remain observation-only.

Checkpoint state locally at deterministic boundaries
Lease management prevents duplicate execution
Resume from last checkpoint on restart
Declare requirements with .requires(["runtime.checkpoint"])
Streams are observation-only, not recovery cursors

Adapter Model

Every boundary is pluggable

Models, sandbox, state, tools, skills, and MCP — each capability is an adapter with a defined interface. Pick the defaults. Replace them as requirements change. Add your own when nothing off-the-shelf fits.

Model Provider

Translates harness requests to provider SDKs. Capability-gated — if the bound model can't provide a required feature, startup fails with a clear error, not a runtime surprise.

OpenAI

@purista/harness-openai

Anthropic

@purista/harness-anthropic

Amazon Bedrock

@purista/harness-bedrock

Azure AI Foundry

@purista/harness-azure-foundry

Sandbox

File operations, command execution, and MCP server isolation. Choose in-memory for most agents. Add bash execution only where required. Build container or remote adapters for strict isolation policies.

inMemorySandbox() — file ops only, no command execution
bashSandbox() — command execution, higher privilege
Custom adapters for containers or remote execution

State Store

Persists sessions, run history, messages, and events across process restarts. Use the default for development. Use a durable adapter in production so run history and traces survive deployment cycles.

In-memory default — suitable for local development
Custom adapter — implement StateStore against any backend (Redis, Postgres, etc.)
Any adapter that satisfies the typed StateStore port compiles in without changes to agent or workflow code

MCP Tools

Connect to any Model Context Protocol server. Stdio runs through the sandbox executor. HTTP calls remote endpoints. Both sides validate schemas before registering tools to the agent.

mcp_stdio — local server through sandbox executor
mcp_http — remote endpoint over HTTP

Observability

Every run is visible

All streaming APIs emit typed run events. Applications render these in chat UIs, run inspectors, logs, or tests. OpenTelemetry spans provide distributed tracing aligned to the GenAI semantic conventions.

run.startedSession created a run for the agent or workflow invocation

agent.startedAgent reasoning loop began; model call is being prepared

tool.startedTool invocation began with validated input; sandbox timer running

tool.finishedTool returned; output validated and appended to conversation

agent.finishedAgent reasoning loop complete; output validated against output schema

run.finishedRun completed or failed; final typed output or normalized error returned

See it in practice.

Explore production AI patterns — RAG, triage, review gates, parallel agents — then layer on memory adapters for context that persists across sessions.

Use Cases→Memory & Context