From install to production.

A progressive guide. Pick your provider. Start with a single agent. Add tools. Compose agents into workflows. Layer on MCP, skills, and sandbox policies when you need them.

0
Step 0 — Choose Your Provider

Pick a model adapter

Each provider is a separate package — no SDK bloat from providers you don't use. Install only what you need. Swap providers by changing one config line.

OpenAI
@purista/harness-openai

The default for most applications. Full capability set: structured objects, tool use, embeddings, streaming, and vision.

object · tool_use · embeddings · text_stream · vision_input
Anthropic
@purista/harness-anthropic

Claude models with strong reasoning. Excellent for complex tool use chains and long-context tasks.

object · tool_use · text_stream
Amazon Bedrock
@purista/harness-bedrock

Enterprise-grade inference through AWS. IAM-based access, VPC endpoints, and compliance-friendly data residency.

object · tool_use · text_stream
Azure AI Foundry
@purista/harness-azure-foundry

Microsoft-hosted models with enterprise security, content filtering, and regional deployment options.

object · tool_use · embeddings

Provider-neutral by design. Capabilities gate both compile-time and runtime access. If an adapter doesn't support a feature — say, embeddings — your code won't compile when you try to use embed(). Swap providers without rewriting agents or workflows.

1
Level 1 — Your First Agent

The smallest useful harness

Every harness starts with a name, a model, and an agent. This is the smallest configuration that does something useful. Run it, then iterate.

defineHarness typescript
import { defineHarness, JsonLogger } from '@purista/harness'
import { openai } from '@purista/harness-openai'
import { z } from 'zod'

const harness = defineHarness({ name: 'hello-world' })
  .logger(new JsonLogger({ level: 'info' }))
  .models({
    fast: {
      provider: openai({ apiKey: process.env.OPENAI_API_KEY! }),
      model: 'gpt-4o-mini',
      capabilities: ['object'],
    },
  })
  .agents(({ agent }) => ({
    answerer: agent({
      model: 'fast',
      input: z.object({ question: z.string() }),
      output: z.object({ answer: z.string() }),
      builtinTools: false,
      instructions: 'Answer concisely.',
    }),
  }))
  .build()
Run It typescript
const session = await harness.getSession('user-1')

const result = await session.agents.answerer.prompt({
  question: 'How do tools work?',
})

console.log(result.answer)
// -> "Tools are typed functions..."

await session.close()
await harness.shutdown()
Typed I/O

Input and output schemas are Zod. TypeScript inference propagates through the entire chain.

Sandboxed

Even the simplest agent runs in a sandbox. No accidental file or network access.

Observable

JsonLogger is on. Every run emits structured events you can inspect.

2
Level 2 — Adapters & Tools

Give your agent capabilities

An agent without tools is just a chatbot. Add TypeScript tools for your domain logic, pick the right sandbox, and declare what your model can do.

Add a TypeScript tool

snippet.ts typescript
.tools({
  search_docs: {
    description: 'Search internal docs.',
    input: z.object({ query: z.string() }),
    output: z.object({
      hits: z.array(z.object({ id: z.string(), text: z.string() })),
    }),
    handler: async (_ctx, input) => ({
      hits: [{ id: 'intro', text: `Result for ${input.query}` }],
    }),
  },
})
.agents(({ agent }) => ({
  answerer: agent({
    model: 'fast',
    input: z.object({ question: z.string() }),
    output: z.object({ answer: z.string(), citations: z.array(z.string()) }),
    tools: ['search_docs'],
    instructions: 'Search docs before answering. Cite sources.',
  }),
}))

Choose your sandbox

inMemorySandbox()

File operations only. No command execution. Default for most agents.

bashSandbox()

File + command execution through just-bash. Use when agents need to run commands.

Custom adapter

Containers, remote execution, or custom filesystem policy. Implement Sandbox and SandboxSession.

Stream a run typescript
for await (const event of session.agents.answerer.stream({ question: 'How do tools work?' })) {
  if (event.type === 'tool.started') console.log('tool:', event.toolId)
  if (event.type === 'run.finished') console.log(event.output)
}
3
Level 3 — Multiple Agents

Specialize and compose

One agent per concern. A research agent retrieves facts. A writer agent produces prose. A critic agent checks accuracy. Each has its own model, tools, and instructions.

Multi-agent harness typescript
.agents(({ agent }) => ({
  researcher: agent({
    model: 'fast',
    input: z.object({ topic: z.string() }),
    output: z.object({ facts: z.array(z.string()) }),
    tools: ['search_docs'],
    instructions: 'Find 3-5 relevant facts.',
  }),
  writer: agent({
    model: 'fast',
    input: z.object({ facts: z.array(z.string()), topic: z.string() }),
    output: z.object({ draft: z.string() }),
    instructions: 'Write a concise paragraph.',
  }),
  critic: agent({
    model: 'fast',
    input: z.object({ draft: z.string(), facts: z.array(z.string()) }),
    output: z.object({ accurate: z.boolean(), issues: z.array(z.string()) }),
    instructions: 'Check for unsupported claims.',
  }),
}))

Researcher

Retrieves facts through search tools. Returns structured evidence.

Writer

Consumes facts. Produces prose. No tool access needed.

Critic

Checks draft against facts. Flags unsupported claims.

4
Level 4 — Workflows

Orchestrate agents with deterministic logic

Workflows are application-owned orchestration around agents. Sequence them, branch on results, request human approval, and write durable state. The workflow handler owns the process. Agents own the reasoning.

Define a workflow typescript
.workflows(({ workflow }) => ({
  research_and_write: workflow({
    input: z.object({ topic: z.string() }),
    output: z.object({ draft: z.string(), approved: z.boolean() }),
    handler: async (ctx) => {
      const research = await ctx.agents.researcher(ctx.input)
      const draft = await ctx.agents.writer(research)
      const review = await ctx.agents.critic(draft)

      if (!review.accurate) {
        return { ...draft, approved: false }
      }

      return { ...draft, approved: true }
    },
  }),
}))
Invoke the workflow typescript
const result = await
  session.workflows.research_and_write.prompt({
    topic: 'Vector databases',
  })

if (result.approved) {
  await saveToKnowledgeBase(result.draft)
} else {
  await flagForHumanReview(result.draft)
}

Key insight: Workflows sequence agents with deterministic logic. Agents do not call each other — the workflow handler orchestrates them.

Agent vs workflow The workflow owns orchestration; agents own reasoning.
5
Level 5 — MCP & Skills

Connect the ecosystem

MCP servers extend your agent with external capabilities. Skills add reusable domain guidance. Both are mounted, not embedded — keeping your harness clean and composable.

MCP over stdio

snippet.ts typescript
.tools({
  drawio_diagram: {
    kind: 'mcp_stdio',
    description: 'Create draw.io diagrams.',
    install: {
      command: 'npm install @drawio/mcp',
      cwd: '/workspace',
      timeoutMs: 120_000,
    },
    command: 'npx',
    args: ['@drawio/mcp'],
    tool: 'drawio.create',
  },
})

Stdio MCP runs through the sandbox executor. The install command bootstraps on first use.

MCP over HTTP

snippet.ts typescript
.tools({
  drawio_remote: {
    kind: 'mcp_http',
    description: 'Remote draw.io MCP.',
    url: process.env.DRAWIO_MCP_URL!,
    auth: { kind: 'bearer', token: process.env.DRAWIO_MCP_TOKEN! },
    tool: 'drawio.create',
  },
})

HTTP MCP calls a remote endpoint directly. Supports bearer, oauth2, api_key, and basic auth.

Mount a skill

Create a SKILL.md in a directory:

snippet.ts markdown
---
name: incident-responder
description: Incident response guidance.
---

Use concise summaries with owner,
impact, timeline, and next action.

Mount it on the harness:

snippet.ts typescript
.skills({
  'incident-responder': {
    directory: './skills/incident-responder',
  },
})
.agents(({ agent }) => ({
  writer: agent({
    skills: ['incident-responder'],
    // ...
  }),
}))
6
Level 6 — Memory & State

Remember across runs

Sessions have memory and history. Write JSON blobs the agent can recall. List conversation messages. Persist across restarts with a durable StateStore adapter.

Session memory typescript
await session.memory.write('last-topic', { topic: 'tools' })

const lastTopic = await
  session.memory.read<{ topic: string }>('last-topic')

const messages = await
  session.history.list({ limit: 20 })

Scopes: session, run, agent, user(), and tenant() memory helpers.

Durable state typescript
// Implement StateStore or use a community adapter.
// The harness ships InMemoryStateStore for development.
// Pass any StateStore-compatible class to .state() for production.

import { defineHarness } from '@purista/harness'
import { MyRedisStateStore } from './myRedisStateStore'

const harness = defineHarness({ name: 'prod-service' })
  .state(new MyRedisStateStore({ url: process.env.REDIS_URL }))
  // sessions survive restart
  .build()

Note: In-memory state is the default. Use a durable adapter for production.

Evaluate prompt candidates

Prompt evaluation typescript
import {
  evaluatePromptCandidates,
  evaluateDeterministicScorer
} from '@purista/harness'

const scores = await evaluatePromptCandidates({
  candidates: [
    { id: 'brief', prompt: 'Answer in one short paragraph.' },
    { id: 'detailed', prompt: 'Answer with details and citations.' }
  ],
  items: [
    {
      id: 'policy-1',
      input: { question: 'Can I deploy on Friday?' },
      expected: 'change freeze'
    }
  ],
  runCandidate: async (candidate, item) =>
    runYourAgent(candidate.prompt, item.input),
  scorer: async (target) => evaluateDeterministicScorer({
    type: 'contains',
    path: '/answer',
    value: String(target.expected),
    caseInsensitive: true
  }, target)
})

Built-in deterministic scorers:

  • contains — JSON Pointer selected output contains a string
  • regex — Output matches a regular expression
  • attribute-equality — Two JSON Pointer values are deeply equal
  • json-schema — Output matches a supported JSON Schema subset

Note: Eval helpers do not persist data. Store results in your application layer if you need experiment history.

Production Config

Safe defaults in one block

Start with this configuration. It gives you the right boundaries for production: privacy-first telemetry, explicit timeouts, and the minimal sandbox.

Recommended Configuration typescript
const harness = defineHarness({ name: 'my-service' })
  .logger(new JsonLogger({ level: 'info' }))
  .telemetry({ captureContent: false })
  .sandbox(inMemorySandbox())
  .defaults({
    runTimeoutMs: 120_000,
    modelTimeoutMs: 60_000,
    toolTimeoutMs: 30_000,
    agentMaxIterations: 16,
  })
  .models({ ... })
  .build()

Go deeper.

Understand how the runtime is assembled — adapters, agent lifecycle, durable execution, and every pluggable boundary.