In-process runtime for AI agents.
The Harness is not a SaaS. It sits inside your application — between your TypeScript code and external model providers — providing typed boundaries, sandboxed execution, and full observability. Every piece is a pluggable adapter.
Eight building blocks
The Harness composes eight primitives into a single typed boundary. Each has a single responsibility and a defined interface.
Compiled definition of models, tools, skills, agents, workflows, defaults, and adapters. The harness is the declaration of what capabilities exist in your system.
Isolated operational context with memory, history, sandbox, and one active run at a time. Answers: what user, thread, or tenant is this run for?
A typed LLM conversation loop. Prepares messages, calls the model, executes tools, appends results, validates output, and emits events.
Application-owned orchestration around one or more agent invocations. Sequence, branch, fan out, reflect, judge, request human approval, write state.
Callable capability exposed to an agent: TypeScript, built-in, or MCP. Each tool has typed input/output, a timeout, and sandbox access.
Mounted instruction directory with SKILL.md frontmatter. Reusable domain guidance that agents follow without code duplication or prompt embedding.
Session-, run-, agent-, user-, and tenant-scoped JSON memory. Default sandboxMemory stores in the session sandbox. Durable adapters for persistence and TTL.
Optional checkpoint, lease, and resume adapter for long-running workflows. Recovery starts from the last committed checkpoint. Streams remain observation-only.
Stack agentsinto workflows.
An agent is one typed LLM conversation loop with its own tools, skills, and instructions — the smallest unit of AI reasoning. A workflow composes multiple agents into a named sequence with branching, fan-out, human review gates, and deterministic logic between each step. One workflow can run a research agent, hand off to a writing agent, and gate on a review agent — each with its own tools and skills, all coordinated by your application code.
- Agent: one typed LLM loop with tools, skills, and typed output
- Workflow: orchestrates one or more agents with full application control
- Fan-out: run multiple agents in parallel, synthesize results
- Human gates: block progression until a human decision is recorded
- Each agent in a workflow has its own instructions and tool set
Every step is observable
From session creation through model calls, tool execution, and validation — every step emits run events and OpenTelemetry spans. Nothing is invisible.
- App calls session.agents.answerer.stream(input)
- Session creates a run and emits run.started
- Agent invokes the model with tool specs
- Model returns tool call or final object
- Tool executes with timeout and sandbox
- Validated output returned, run finished
Long runs.Survive restarts.
Optional durable runtime adapters add checkpoints, leases, and resume for long-running workflows. Recovery starts from the last committed checkpoint, not from scratch. Streams remain observation-only.
- Checkpoint state at deterministic boundaries
- Lease management prevents duplicate execution
- Resume from last checkpoint on restart
- Declare requirements with .requires(["runtime.checkpoint"])
- Streams are observation-only, not recovery cursors
Every boundary is pluggable
Models, sandbox, state, tools, skills, and MCP — each capability is an adapter with a defined interface. Pick the defaults. Replace them as requirements change. Add your own when nothing off-the-shelf fits.
Translates harness requests to provider SDKs. Capability-gated — if the bound model can't provide a required feature, startup fails with a clear error, not a runtime surprise.
File operations, command execution, and MCP server isolation. Choose in-memory for most agents. Add bash execution only where required. Build container or remote adapters for strict isolation policies.
inMemorySandbox()— file ops only, no command executionbashSandbox()— command execution, higher privilege- Custom adapters for containers or remote execution
Persists sessions, run history, messages, and events across process restarts. Use the default for development. Use a durable adapter in production so run history and traces survive deployment cycles.
- In-memory default — suitable for local development
- Custom adapter — implement
StateStoreagainst any backend (Redis, Postgres, etc.) - Any adapter that satisfies the typed
StateStoreport compiles in without changes to agent or workflow code
Connect to any Model Context Protocol server. Stdio runs through the sandbox executor. HTTP calls remote endpoints. Both sides validate schemas before registering tools to the agent.
mcp_stdio— local server through sandbox executormcp_http— remote endpoint over HTTP
Every run is visible
All streaming APIs emit typed run events. Applications render these in chat UIs, run inspectors, logs, or tests. OpenTelemetry spans provide distributed tracing aligned to the GenAI semantic conventions.
See it in practice.
Explore production AI patterns — RAG, triage, review gates, parallel agents — then layer on memory adapters for context that persists across sessions.