Core Building Blocks / AI Agent

Testing

Test agents with core testing helpers and scripted model providers.

Agent testing helpers are exported from @purista/core.

LevelAPIValidates
Handler contextcreateAgentContextMock(...)Direct setRunFunction logic with fake resources and model handles.
Runtime harnesscreateAgentTestHarness(...)Full attached agent definition with model bindings, output validation, skill bindings, and stream execution.
Skill fixturescreateAgentSkillTestRuntime(...)Temporary SKILL.md bindings for agents that declare .useSkills(...).

Runtime harness

Use createAgentTestHarness(...) when you want to test the generated attached agent definition without starting a full service:

import { createAgentTestHarness, createScriptedHarnessModel } from '@purista/core'
import { triageTicketAgentBuilder } from './triageTicketAgentBuilder.js'

test('classifies a ticket', async () => {
  const definition = await triageTicketAgentBuilder.getDefinition()
  const model = createScriptedHarnessModel()

  model.enqueueObject({
    object: {
      priority: 'high',
      reason: 'mentions production outage',
    },
    usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
    finishReason: 'stop',
  })

  const harness = createAgentTestHarness(definition, {
    models: {
      primary: {
        provider: model,
        model: 'fake-object',
        capabilities: ['object'],
      },
    },
  })

  await expect(
    harness.run({
      payload: {
        ticketId: 'T-1',
        text: 'Production outage for an enterprise customer',
      },
    }),
  ).resolves.toEqual({
    priority: 'high',
    reason: 'mentions production outage',
  })
})

Handler context mock

Use createAgentContextMock(...) for narrow tests around custom setRunFunction(...) logic:

import { createAgentContextMock, createScriptedHarnessModel } from '@purista/core'

test('uses the primary model alias', async () => {
  const model = createScriptedHarnessModel()
  model.enqueueObject({
    object: { answer: 'Reset your password from settings.' },
    usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
    finishReason: 'stop',
  })

  const context = createAgentContextMock({
    payload: { query: 'How do I reset my password?' },
    models: {
      primary: model as never,
    },
  })

  const result = await runSupportAgent(context)

  expect(result.answer).toContain('password')
})

Stream tests

The runtime harness can execute the generated stream path and collect chunks:

const result = await harness.stream({
  payload: {
    ticketId: 'T-2',
    text: 'Stream this run',
  },
})

expect(result.chunks.length).toBeGreaterThan(0)

What to cover

  • success output from scripted model responses
  • invalid model output rejected by the agent output schema
  • missing runtime model aliases
  • capability mismatches between declared models and runtime bindings
  • command tool and child-agent failure behavior
  • stream success and stream failure paths
  • long-running response mode returning jobId, runId, and status or stream URLs