Test an agent
Use @purista/core/testing for deterministic tests. Unit and integration tests should not call real model providers.
The testing helpers let you:
- execute an attached agent definition without starting a full service
- inject fake model providers
- enqueue scripted text, object, embedding, and rerank responses
- assert output validation, capability failures, missing aliases, stream errors, and provider failures
Success path
import { createAgentTestHarness, createScriptedHarnessModel } from '@purista/core/testing'
const model = createScriptedHarnessModel()
model.enqueueObject({
object: {
priority: 'high',
reason: 'mentions outage',
},
usage: {
inputTokens: 0,
outputTokens: 0,
totalTokens: 0,
},
finishReason: 'stop',
})
const harness = createAgentTestHarness(triageAgent, {
models: {
primary: {
provider: model,
model: 'fake-object',
capabilities: ['object'],
},
},
})
await expect(
harness.run({
payload: {
ticketId: 'T-1',
text: 'Production outage for enterprise customer',
},
message: { id: 'msg-1' },
}),
).resolves.toEqual({
priority: 'high',
reason: 'mentions outage',
})Invalid model output
Use invalid fake output to prove the PURISTA output schema is enforced.
const failingModel = createScriptedHarnessModel()
failingModel.enqueueObject({
object: { priority: 'unknown' },
usage: {
inputTokens: 0,
outputTokens: 0,
totalTokens: 0,
},
finishReason: 'stop',
})
const failingHarness = createAgentTestHarness(triageAgent, {
models: {
primary: {
provider: failingModel,
model: 'fake-object',
capabilities: ['object'],
},
},
})
await expect(
failingHarness.run({
payload: {
ticketId: 'T-2',
text: 'The request is ambiguous',
},
}),
).rejects.toThrow(/output validation failed/i)Missing alias
Runtime startup should fail when a declared model alias is not bound.
await expect(
createAgentTestHarness(triageAgent, {
models: {},
}).run({
payload: {
ticketId: 'T-3',
text: 'Missing model binding',
},
}),
).rejects.toThrow(/missing runtime model binding/i)Capability mismatch
Assert that tests catch provider capability drift before production startup does.
const model = createScriptedHarnessModel()
await expect(
createAgentTestHarness(triageAgent, {
models: {
primary: {
provider: model,
model: 'fake-text',
capabilities: ['text'],
},
},
}).run({
payload: {
ticketId: 'T-4',
text: 'Provider cannot produce objects',
},
}),
).rejects.toThrow(/capabil/i)Embeddings and rerank
Fake provider calls can cover retrieval flows without a vector provider or external model.
const model = createScriptedHarnessModel()
model.enqueueEmbedding({
embeddings: [{ index: 0, vector: [0.1, 0.2, 0.3] }],
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
})
model.enqueueRerank({
results: [{ id: 'doc-2', index: 1, score: 0.92 }],
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
})
const harness = createAgentTestHarness(answerAgent, {
models: {
retrieval: {
provider: model,
model: 'fake-embedding',
capabilities: ['embeddings'],
},
ranker: {
provider: model,
model: 'fake-rerank',
capabilities: ['rerank'],
},
writer: {
provider: model,
model: 'fake-object',
capabilities: ['object'],
},
},
})For full RAG tests, keep the vector index as a fake PURISTA resource and assert the handler passes tenant filters and candidate text correctly.
Streams
For HTTP stream behavior, assert the generated stream chunks rather than real provider protocols.
const chunks: unknown[] = []
await harness.stream(
{
payload: {
ticketId: 'T-5',
text: 'Stream this run',
},
},
{
write: async chunk => {
chunks.push(chunk)
},
},
)
expect(chunks.some(chunk => chunk.data?.type === 'response.created')).toBe(true)
expect(chunks.some(chunk => chunk.data?.type === 'response.completed')).toBe(true)Also test stream writer failures. A failed writer should reject the stream run and call the failure path instead of losing the error.
Command tools and child agents
When an agent declares canInvoke(...) or canInvokeAgent(...), test both success and failure behavior.
await expect(
harness.run({
payload: {
ticketId: 'T-6',
text: 'Needs enrichment',
},
appContext: {
service: fakeServiceWithCommandFailure,
},
}),
).rejects.toThrow(/customer lookup failed/i)Useful assertions:
- the expected command or child agent was called once
- payload and parameter values are schema-shaped
- command failure propagates or maps to the intended agent output
- child-agent invalid output fails validation
- cancellation stops downstream calls
Integration tests
Keep a small number of service-level integration tests around the generated PURISTA artifacts:
- service startup fails without
queueBridge - service startup fails without
ai.models - aggregate command returns validated output
- stream endpoint emits lifecycle chunks and closes with final output
- long-running response mode returns
jobId,runId,statusUrl, orstreamUrl - queue worker retries and dead-letter behavior follow the configured queue bridge
Live-provider smoke tests
Live-provider tests are optional and should be isolated from normal CI. Use them only to verify credentials, endpoint configuration, provider options, and model availability.
Normal CI should run against fake providers.
Checklist
- no unit test calls a real provider
- fake provider tests cover success and invalid output
- missing alias and capability mismatch are covered
- command tool and child-agent unhappy paths are covered
- stream success and writer failure paths are covered
- long-running queue behavior has integration coverage
