From installto production.

A progressive guide. Pick your provider. Start with a single agent. Add tools. Compose agents into workflows. Layer on MCP, skills, and sandbox policies when you need them.

Step 0 — Choose Your Provider

Pick a model adapter

Each provider is a separate package — no SDK bloat from providers you don't use. Install only what you need. Swap providers by changing one config line.

OpenAI

@purista/harness-openai

The default for most applications. Full capability set: structured objects, tool use, embeddings, streaming, and vision.

object · tool_use · embeddings · text_stream · vision_input

Anthropic

@purista/harness-anthropic

Claude models with strong reasoning. Excellent for complex tool use chains and long-context tasks.

object · tool_use · text_stream

Amazon Bedrock

@purista/harness-bedrock

Enterprise-grade inference through AWS. IAM-based access, VPC endpoints, and compliance-friendly data residency.

object · tool_use · text_stream

Azure AI Foundry

@purista/harness-azure-foundry

Microsoft-hosted models with enterprise security, content filtering, and regional deployment options.

object · tool_use · embeddings

Provider-neutral by design. Capabilities gate both compile-time and runtime access. If an adapter doesn't support a feature — say, embeddings — your code won't compile when you try to use embed(). Swap providers without rewriting agents or workflows.

Level 1 — Your First Agent

The smallest useful harness

Every harness starts with a name, a model, and an agent. This is the smallest configuration that does something useful. Run it, then iterate.

defineHarnesstypescript

import { defineHarness, JsonLogger } from '@purista/harness'
import { openai } from '@purista/harness-openai'
import { z } from 'zod'

const harness = defineHarness({ name: 'hello-world' })
  .logger(new JsonLogger({ level: 'info' }))
  .models({
    fast: {
      provider: openai({ apiKey: process.env.OPENAI_API_KEY! }),
      model: 'gpt-4o-mini',
      capabilities: ['object'],
      retry: true,
    },
  })
  .agents(({ agent }) => ({
    answerer: agent({
      model: 'fast',
      input: z.object({ question: z.string() }),
      output: z.object({ answer: z.string() }),
      builtinTools: false,
      instructions: 'Answer concisely.',
    }),
  }))
  .build()

Run Ittypescript

const session = await harness.getSession('user-1')

const result = await session.agents.answerer.prompt({
  question: 'How do tools work?',
})

console.log(result.answer)
// -> "Tools are typed functions..."

await session.close()
await harness.shutdown()

Typed I/O

Input and output schemas are Zod. TypeScript inference propagates through the entire chain.

Sandboxed

Even the simplest agent runs in a sandbox. No accidental file or network access.

Observable

JsonLogger is on. Every run emits structured events you can inspect.

Level 2 — Adapters & Tools

Give your agent capabilities

An agent without tools is just a chatbot. Add TypeScript tools for your domain logic, pick the right sandbox, and declare what your model can do.

Add a TypeScript tool

snippet.tstypescript

.tools({
  search_docs: {
    description: 'Search internal docs.',
    input: z.object({ query: z.string() }),
    output: z.object({
      hits: z.array(z.object({ id: z.string(), text: z.string() })),
    }),
    handler: async (_ctx, input) => ({
      hits: [{ id: 'intro', text: `Result for ${input.query}` }],
    }),
  },
})
.agents(({ agent }) => ({
  answerer: agent({
    model: 'fast',
    input: z.object({ question: z.string() }),
    output: z.object({ answer: z.string(), citations: z.array(z.string()) }),
    tools: ['search_docs'],
    instructions: 'Search docs before answering. Cite sources.',
  }),
}))

Choose your sandbox

inMemorySandbox()

File operations only. No command execution. Default for most agents.

bashSandbox()

File + command execution through just-bash. Use when agents need to run commands.

Custom adapter

Containers, remote execution, or custom filesystem policy. Implement Sandbox and SandboxSession.

Stream a runtypescript

for await (const event of session.agents.answerer.stream({ question: 'How do tools work?' })) {
  if (event.type === 'tool.started') console.log('tool:', event.toolId)
  if (event.type === 'run.finished') console.log(event.output)
}

Level 3 — Multiple Agents

Specialize and compose

One agent per concern. A research agent retrieves facts. A writer agent produces prose. A critic agent checks accuracy. Each has its own model, tools, and instructions.

Multi-agent harnesstypescript

.agents(({ agent }) => ({
  researcher: agent({
    model: 'fast',
    input: z.object({ topic: z.string() }),
    output: z.object({ facts: z.array(z.string()) }),
    tools: ['search_docs'],
    instructions: 'Find 3-5 relevant facts.',
  }),
  writer: agent({
    model: 'fast',
    input: z.object({ facts: z.array(z.string()), topic: z.string() }),
    output: z.object({ draft: z.string() }),
    instructions: 'Write a concise paragraph.',
  }),
  critic: agent({
    model: 'fast',
    input: z.object({ draft: z.string(), facts: z.array(z.string()) }),
    output: z.object({ accurate: z.boolean(), issues: z.array(z.string()) }),
    instructions: 'Check for unsupported claims.',
  }),
}))

Researcher

Retrieves facts through search tools. Returns structured evidence.

Writer

Consumes facts. Produces prose. No tool access needed.

Critic

Checks draft against facts. Flags unsupported claims.

Level 4 — Workflows

Orchestrate agents with deterministic logic

Workflows are application-owned orchestration around agents. Sequence them, branch on results, request human approval, and write durable state. The workflow handler owns the process. Agents own the reasoning.

Define a workflowtypescript

.workflows(({ workflow }) => ({
  research_and_write: workflow({
    input: z.object({ topic: z.string() }),
    output: z.object({ draft: z.string(), approved: z.boolean() }),
    handler: async (ctx) => {
      const research = await ctx.agents.researcher(ctx.input)
      const draft = await ctx.agents.writer(research)
      const review = await ctx.agents.critic(draft)

      if (!review.accurate) {
        return { ...draft, approved: false }
      }

      return { ...draft, approved: true }
    },
  }),
}))

Invoke the workflowtypescript

const result = await
  session.workflows.research_and_write.prompt({
    topic: 'Vector databases',
  })

if (result.approved) {
  await saveToKnowledgeBase(result.draft)
} else {
  await flagForHumanReview(result.draft)
}

Key insight: Workflows sequence agents with deterministic logic. Agents do not call each other — the workflow handler orchestrates them.

Agent vs workflowThe workflow owns orchestration; agents own reasoning.

Level 5 — MCP & Skills

Connect the ecosystem

MCP servers extend your agent with external capabilities. Skills add reusable domain guidance. Both are mounted, not embedded — keeping your harness clean and composable.

MCP over stdio

snippet.tstypescript

.tools({
  drawio_diagram: {
    kind: 'mcp_stdio',
    description: 'Create draw.io diagrams.',
    install: {
      command: 'npm install @drawio/mcp',
      cwd: '/workspace',
      timeoutMs: 120_000,
    },
    command: 'npx',
    args: ['@drawio/mcp'],
    tool: 'drawio.create',
  },
})

Stdio MCP runs through the sandbox executor. The install command bootstraps on first use.

MCP over HTTP

snippet.tstypescript

.tools({
  drawio_remote: {
    kind: 'mcp_http',
    description: 'Remote draw.io MCP.',
    url: process.env.DRAWIO_MCP_URL!,
    auth: { kind: 'bearer', token: process.env.DRAWIO_MCP_TOKEN! },
    tool: 'drawio.create',
  },
})

HTTP MCP calls a remote endpoint directly. Supports bearer, oauth2, api_key, and basic auth.

Mount a skill

Create a SKILL.md in a directory:

snippet.tsmarkdown

---
name: incident-responder
description: Incident response guidance.
---

Use concise summaries with owner,
impact, timeline, and next action.

Mount it on the harness:

snippet.tstypescript

.skills({
  'incident-responder': {
    directory: './skills/incident-responder',
  },
})
.agents(({ agent }) => ({
  writer: agent({
    skills: ['incident-responder'],
    // ...
  }),
}))

Level 6 — Memory & State

Remember across runs

Sessions have memory and history. Write JSON blobs the agent can recall. List conversation messages. Persist across restarts with a durable state adapter or the local durable execution bundle.

Session memorytypescript

await session.memory.write('last-topic', { topic: 'tools' })

const lastTopic = await
  session.memory.read<{ topic: string }>('last-topic')

const messages = await
  session.history.list({ limit: 20 })

Scopes: session, run, agent, user(), and tenant() memory helpers.

Durable statetypescript

// Use the local bundle for production-shaped durability
// without adding external infrastructure on day one.

import { defineHarness, localDurableExecution } from '@purista/harness'

const local = localDurableExecution({
  root: './.harness',
})

const harness = defineHarness({ name: 'prod-service' })
  .state(local.state)
  .runtime(local.runtime)
  .sandbox(local.sandbox)
  .workspaceStore(local.workspaceStore)
  .checkpoints(local.checkpoints)
  .build()

Note: In-memory state is the default. Use localDurableExecution or custom adapters when progress must survive restart.

Evaluate prompt candidates

Prompt evaluationtypescript

import {
  evaluatePromptCandidates,
  evaluateDeterministicScorer
} from '@purista/harness'

const scores = await evaluatePromptCandidates({
  candidates: [
    { id: 'brief', prompt: 'Answer in one short paragraph.' },
    { id: 'detailed', prompt: 'Answer with details and citations.' }
  ],
  items: [
    {
      id: 'policy-1',
      input: { question: 'Can I deploy on Friday?' },
      expected: 'change freeze'
    }
  ],
  runCandidate: async (candidate, item) =>
    runYourAgent(candidate.prompt, item.input),
  scorer: async (target) => evaluateDeterministicScorer({
    type: 'contains',
    path: '/answer',
    value: String(target.expected),
    caseInsensitive: true
  }, target)
})

Built-in deterministic scorers:

contains — JSON Pointer selected output contains a string
regex — Output matches a regular expression
attribute-equality — Two JSON Pointer values are deeply equal
json-schema — Output matches a supported JSON Schema subset

Note: Eval helpers do not persist data. Store results in your application layer if you need experiment history.

Production Config

Safe defaults in one block

Start with this configuration. It gives you the right boundaries for production: privacy-first telemetry, explicit timeouts, and the minimal sandbox.

Recommended Configurationtypescript

const harness = defineHarness({ name: 'my-service' })
  .logger(new JsonLogger({ level: 'info' }))
  .telemetry({ contentCaptureMode: 'NO_CONTENT' })
  .sandbox(inMemorySandbox())
  .defaults({
    runTimeoutMs: 120_000,
    modelTimeoutMs: 60_000,
    toolTimeoutMs: 30_000,
    agentMaxIterations: 16,
  })
  .models({
    fast: {
      provider,
      model: 'gpt-4o-mini',
      capabilities: ['object', 'tool_use'],
      retry: {
        maxAttempts: 3,
        maxActiveElapsedMs: 60_000,
        maxActiveDelayMs: 20_000,
      },
    },
  })
  .build()

Model retry is provider-neutral. Short outages and rate limits retry inside the active call budget. Long provider retry windows become typed errors with retry metadata, which is better handled by a queue, worker, or workflow retry policy.

Optional Governancetypescript

const harness = defineHarness({ name: 'my-service' })
  .models({ fast: { provider, model: 'gpt-4o-mini', capabilities: ['object', 'tool_use'] } })
  .governance({
    mode: 'enforce',
    policies: [
      {
        kind: 'native',
        id: 'tool-policy',
        rules: [
          { id: 'audit-file-reads', tools: ['read'], effect: 'audit' },
          { id: 'approve-writes', tools: ['write', 'edit'], effect: 'require_approval' },
        ],
      },
    ],
    approval: approvalProvider,
    audit: auditSink,
  })
  .build()

Governance is optional. Configure it only when tool exposure, tool execution, approvals, shadow rollout, or central audit needs policy-as-code.

Go deeper.

Understand how the runtime is assembled — adapters, agent lifecycle, durable execution, and every pluggable boundary.

Explore Architecture→Real-world Use Cases