Testing

Testing LLM-based applications is notoriously difficult because of their non-deterministic nature. PURISTA provides tools to make your agent tests reliable, fast, and deterministic.

1. Unit Testing Agents

When you use purista add agent, a test file is automatically generated. The goal of a unit test is to verify your agent's logic (tool calls, state changes, schema validation) without making real LLM calls.

import { supportAgent } from './supportAgent.js'
import { MockModel, testAgent } from '@purista/ai'

describe('Support Agent', () => {
  it('should call the ticketing tool if the user reports a bug', async () => {
    const model = new MockModel()
      .on(/broken laptop/i)
      .reply('I have created a ticket for you.')

    const { instance, eventBridge, destroy } = await testAgent(supportAgent, {
      models: {
        'openai:gpt-4o-mini': model
      }
    })

    // 2. Mock the service command
    const createTicketMock = vi.fn().mockResolvedValue({ id: 'ticket-123' })
    eventBridge.registerCommand('ticketing', '1', 'createTicket', createTicketMock)

    // 3. Run the agent
    const result = await instance.invoke({ payload: { prompt: 'My laptop is broken' } })

    // 4. Verify assertions
    expect(createTicketMock).toHaveBeenCalledWith(
      expect.objectContaining({ reason: 'Broken laptop' })
    )
    expect(result.envelopes.some(e => e.frame.kind === 'message')).toBe(true)
    await destroy()
  })
})

2. Using the Test Helper (`testAgent`)

The testAgent helper is your best friend. It:

Sets up an in-memory EventBridge.
Creates a runtime instance of your agent.
Injects mock models and providers.
Provides a clean way to register mock commands.
Returns destroy() to cleanly stop the instance and bridge.

MockModel gives deterministic scripting:

.on(string | RegExp).reply(string | fn)
.onJson(matcher).reply(object | fn)

3. Strategies for Reliable Tests

A. Schema Validation

Verify that your agent correctly handles malformed input. Because you've defined addPayloadSchema, PURISTA will automatically throw a HandledError before the agent even starts.

B. State/History Checks

If your agent uses persistConversation, you can verify the history state after a run:

const session = await instance.session.load('test-session')
expect(session.data.messages).toHaveLength(2)

C. Deterministic Output

Mock the model output to verify how your agent handler processes it (e.g., extracting values from JSON or formatting a string).

4. Evaluation Datasets (Advanced)

For production-ready agents, unit tests are not enough. You need to evaluate the quality of the LLM responses.

PURISTA supports an "Evaluation Mode" where you can run your agent against a dataset of "Golden Questions" and "Expected Answers."

Metrics: BLEU, ROUGE, or LLM-as-a-judge scoring.
CI/CD: Block deployments if the evaluation score drops below a certain threshold.

See the AI Basic Example for a complete reference on evaluation datasets.

Service

Command

Stream

Subscription

Queues

Overview

Stores

Exposing Commands

Connect To PURISTA

Advanced

Event Bridges

Queue Bridges

Microservice Style

Temporal

Testing

1. Unit Testing Agents

2. Using the Test Helper (`testAgent`)

3. Strategies for Reliable Tests

A. Schema Validation

B. State/History Checks

C. Deterministic Output

4. Evaluation Datasets (Advanced)

Testing ​

1. Unit Testing Agents ​

2. Using the Test Helper (testAgent) ​

3. Strategies for Reliable Tests ​

A. Schema Validation ​

B. State/History Checks ​

C. Deterministic Output ​

4. Evaluation Datasets (Advanced) ​

Testing

1. Unit Testing Agents

2. Using the Test Helper (`testAgent`)

3. Strategies for Reliable Tests

A. Schema Validation

B. State/History Checks

C. Deterministic Output

4. Evaluation Datasets (Advanced)