From install to production.
A progressive guide. Pick your provider. Start with a single agent. Add tools. Compose agents into workflows. Layer on MCP, skills, and sandbox policies when you need them.
Pick a model adapter
Each provider is a separate package — no SDK bloat from providers you don't use. Install only what you need. Swap providers by changing one config line.
The default for most applications. Full capability set: structured objects, tool use, embeddings, streaming, and vision.
Claude models with strong reasoning. Excellent for complex tool use chains and long-context tasks.
Enterprise-grade inference through AWS. IAM-based access, VPC endpoints, and compliance-friendly data residency.
Microsoft-hosted models with enterprise security, content filtering, and regional deployment options.
Provider-neutral by design. Capabilities gate both compile-time and runtime access. If an adapter doesn't support a feature — say, embeddings — your code won't compile when you try to use embed(). Swap providers without rewriting agents or workflows.
The smallest useful harness
Every harness starts with a name, a model, and an agent. This is the smallest configuration that does something useful. Run it, then iterate.
import { defineHarness, JsonLogger } from '@purista/harness'
import { openai } from '@purista/harness-openai'
import { z } from 'zod'
const harness = defineHarness({ name: 'hello-world' })
.logger(new JsonLogger({ level: 'info' }))
.models({
fast: {
provider: openai({ apiKey: process.env.OPENAI_API_KEY! }),
model: 'gpt-4o-mini',
capabilities: ['object'],
},
})
.agents(({ agent }) => ({
answerer: agent({
model: 'fast',
input: z.object({ question: z.string() }),
output: z.object({ answer: z.string() }),
builtinTools: false,
instructions: 'Answer concisely.',
}),
}))
.build() const session = await harness.getSession('user-1')
const result = await session.agents.answerer.prompt({
question: 'How do tools work?',
})
console.log(result.answer)
// -> "Tools are typed functions..."
await session.close()
await harness.shutdown() Input and output schemas are Zod. TypeScript inference propagates through the entire chain.
Even the simplest agent runs in a sandbox. No accidental file or network access.
JsonLogger is on. Every run emits structured events you can inspect.
Give your agent capabilities
An agent without tools is just a chatbot. Add TypeScript tools for your domain logic, pick the right sandbox, and declare what your model can do.
Add a TypeScript tool
.tools({
search_docs: {
description: 'Search internal docs.',
input: z.object({ query: z.string() }),
output: z.object({
hits: z.array(z.object({ id: z.string(), text: z.string() })),
}),
handler: async (_ctx, input) => ({
hits: [{ id: 'intro', text: `Result for ${input.query}` }],
}),
},
})
.agents(({ agent }) => ({
answerer: agent({
model: 'fast',
input: z.object({ question: z.string() }),
output: z.object({ answer: z.string(), citations: z.array(z.string()) }),
tools: ['search_docs'],
instructions: 'Search docs before answering. Cite sources.',
}),
})) Choose your sandbox
File operations only. No command execution. Default for most agents.
File + command execution through just-bash. Use when agents need to run commands.
Containers, remote execution, or custom filesystem policy. Implement Sandbox and SandboxSession.
for await (const event of session.agents.answerer.stream({ question: 'How do tools work?' })) {
if (event.type === 'tool.started') console.log('tool:', event.toolId)
if (event.type === 'run.finished') console.log(event.output)
} Specialize and compose
One agent per concern. A research agent retrieves facts. A writer agent produces prose. A critic agent checks accuracy. Each has its own model, tools, and instructions.
.agents(({ agent }) => ({
researcher: agent({
model: 'fast',
input: z.object({ topic: z.string() }),
output: z.object({ facts: z.array(z.string()) }),
tools: ['search_docs'],
instructions: 'Find 3-5 relevant facts.',
}),
writer: agent({
model: 'fast',
input: z.object({ facts: z.array(z.string()), topic: z.string() }),
output: z.object({ draft: z.string() }),
instructions: 'Write a concise paragraph.',
}),
critic: agent({
model: 'fast',
input: z.object({ draft: z.string(), facts: z.array(z.string()) }),
output: z.object({ accurate: z.boolean(), issues: z.array(z.string()) }),
instructions: 'Check for unsupported claims.',
}),
})) Researcher
Retrieves facts through search tools. Returns structured evidence.
Writer
Consumes facts. Produces prose. No tool access needed.
Critic
Checks draft against facts. Flags unsupported claims.
Orchestrate agents with deterministic logic
Workflows are application-owned orchestration around agents. Sequence them, branch on results, request human approval, and write durable state. The workflow handler owns the process. Agents own the reasoning.
.workflows(({ workflow }) => ({
research_and_write: workflow({
input: z.object({ topic: z.string() }),
output: z.object({ draft: z.string(), approved: z.boolean() }),
handler: async (ctx) => {
const research = await ctx.agents.researcher(ctx.input)
const draft = await ctx.agents.writer(research)
const review = await ctx.agents.critic(draft)
if (!review.accurate) {
return { ...draft, approved: false }
}
return { ...draft, approved: true }
},
}),
})) const result = await
session.workflows.research_and_write.prompt({
topic: 'Vector databases',
})
if (result.approved) {
await saveToKnowledgeBase(result.draft)
} else {
await flagForHumanReview(result.draft)
} Key insight: Workflows sequence agents with deterministic logic. Agents do not call each other — the workflow handler orchestrates them.
Connect the ecosystem
MCP servers extend your agent with external capabilities. Skills add reusable domain guidance. Both are mounted, not embedded — keeping your harness clean and composable.
MCP over stdio
.tools({
drawio_diagram: {
kind: 'mcp_stdio',
description: 'Create draw.io diagrams.',
install: {
command: 'npm install @drawio/mcp',
cwd: '/workspace',
timeoutMs: 120_000,
},
command: 'npx',
args: ['@drawio/mcp'],
tool: 'drawio.create',
},
}) Stdio MCP runs through the sandbox executor. The install command bootstraps on first use.
MCP over HTTP
.tools({
drawio_remote: {
kind: 'mcp_http',
description: 'Remote draw.io MCP.',
url: process.env.DRAWIO_MCP_URL!,
auth: { kind: 'bearer', token: process.env.DRAWIO_MCP_TOKEN! },
tool: 'drawio.create',
},
}) HTTP MCP calls a remote endpoint directly. Supports bearer, oauth2, api_key, and basic auth.
Mount a skill
Create a SKILL.md in a directory:
---
name: incident-responder
description: Incident response guidance.
---
Use concise summaries with owner,
impact, timeline, and next action. Mount it on the harness:
.skills({
'incident-responder': {
directory: './skills/incident-responder',
},
})
.agents(({ agent }) => ({
writer: agent({
skills: ['incident-responder'],
// ...
}),
})) Remember across runs
Sessions have memory and history. Write JSON blobs the agent can recall. List conversation messages. Persist across restarts with a durable StateStore adapter.
await session.memory.write('last-topic', { topic: 'tools' })
const lastTopic = await
session.memory.read<{ topic: string }>('last-topic')
const messages = await
session.history.list({ limit: 20 }) Scopes: session, run, agent, user(), and tenant() memory helpers.
// Implement StateStore or use a community adapter.
// The harness ships InMemoryStateStore for development.
// Pass any StateStore-compatible class to .state() for production.
import { defineHarness } from '@purista/harness'
import { MyRedisStateStore } from './myRedisStateStore'
const harness = defineHarness({ name: 'prod-service' })
.state(new MyRedisStateStore({ url: process.env.REDIS_URL }))
// sessions survive restart
.build() Note: In-memory state is the default. Use a durable adapter for production.
Evaluate prompt candidates
import {
evaluatePromptCandidates,
evaluateDeterministicScorer
} from '@purista/harness'
const scores = await evaluatePromptCandidates({
candidates: [
{ id: 'brief', prompt: 'Answer in one short paragraph.' },
{ id: 'detailed', prompt: 'Answer with details and citations.' }
],
items: [
{
id: 'policy-1',
input: { question: 'Can I deploy on Friday?' },
expected: 'change freeze'
}
],
runCandidate: async (candidate, item) =>
runYourAgent(candidate.prompt, item.input),
scorer: async (target) => evaluateDeterministicScorer({
type: 'contains',
path: '/answer',
value: String(target.expected),
caseInsensitive: true
}, target)
}) Built-in deterministic scorers:
- contains — JSON Pointer selected output contains a string
- regex — Output matches a regular expression
- attribute-equality — Two JSON Pointer values are deeply equal
- json-schema — Output matches a supported JSON Schema subset
Note: Eval helpers do not persist data. Store results in your application layer if you need experiment history.
Safe defaults in one block
Start with this configuration. It gives you the right boundaries for production: privacy-first telemetry, explicit timeouts, and the minimal sandbox.
const harness = defineHarness({ name: 'my-service' })
.logger(new JsonLogger({ level: 'info' }))
.telemetry({ captureContent: false })
.sandbox(inMemorySandbox())
.defaults({
runTimeoutMs: 120_000,
modelTimeoutMs: 60_000,
toolTimeoutMs: 30_000,
agentMaxIterations: 16,
})
.models({ ... })
.build() Go deeper.
Understand how the runtime is assembled — adapters, agent lifecycle, durable execution, and every pluggable boundary.