# Testing

Test agents with core testing helpers and scripted model providers.

---
Canonical: /handbook/blocks/agent-pattern/agent-testing/
Source: web/src/content/handbook-cards/blocks/agent-pattern/agent-testing.mdx
Format: Markdown for agents
---

Agent testing helpers are exported from `@purista/core`.

| Level | API | Validates |
|---|---|---|
| **Handler context** | `createAgentContextMock(...)` | Direct `setRunFunction` logic with fake resources and model handles. |
| **Runtime harness** | `createAgentTestHarness(...)` | Full attached agent definition with model bindings, output validation, skill bindings, and stream execution. |
| **Skill fixtures** | `createAgentSkillTestRuntime(...)` | Temporary `SKILL.md` bindings for agents that declare `.useSkills(...)`. |

## Runtime harness

Use `createAgentTestHarness(...)` when you want to test the generated attached agent definition without starting a full service:

```typescript [triageTicket.test.ts]
import { createAgentTestHarness, createScriptedHarnessModel } from '@purista/core'
import { triageTicketAgentBuilder } from './triageTicketAgentBuilder.js'

test('classifies a ticket', async () => {
  const definition = await triageTicketAgentBuilder.getDefinition()
  const model = createScriptedHarnessModel()

  model.enqueueObject({
    object: {
      priority: 'high',
      reason: 'mentions production outage',
    },
    usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
    finishReason: 'stop',
  })

  const harness = createAgentTestHarness(definition, {
    models: {
      primary: {
        provider: model,
        model: 'fake-object',
        capabilities: ['object'],
      },
    },
  })

  await expect(
    harness.run({
      payload: {
        ticketId: 'T-1',
        text: 'Production outage for an enterprise customer',
      },
    }),
  ).resolves.toEqual({
    priority: 'high',
    reason: 'mentions production outage',
  })
})
```

## Handler context mock

Use `createAgentContextMock(...)` for narrow tests around custom `setRunFunction(...)` logic:

```typescript [supportAgent.test.ts]
import { createAgentContextMock, createScriptedHarnessModel } from '@purista/core'

test('uses the primary model alias', async () => {
  const model = createScriptedHarnessModel()
  model.enqueueObject({
    object: { answer: 'Reset your password from settings.' },
    usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
    finishReason: 'stop',
  })

  const context = createAgentContextMock({
    payload: { query: 'How do I reset my password?' },
    models: {
      primary: model as never,
    },
  })

  const result = await runSupportAgent(context)

  expect(result.answer).toContain('password')
})
```

## Stream tests

The runtime harness can execute the generated stream path and collect chunks:

```typescript
const result = await harness.stream({
  payload: {
    ticketId: 'T-2',
    text: 'Stream this run',
  },
})

expect(result.chunks.length).toBeGreaterThan(0)
```

## What to cover

- success output from scripted model responses
- invalid model output rejected by the agent output schema
- missing runtime model aliases
- capability mismatches between declared models and runtime bindings
- command tool and child-agent failure behavior
- stream success and stream failure paths
- long-running response mode returning `jobId`, `runId`, and status or stream URLs
