# AI Harness Evaluations

Use evaluations to measure quality, safety, and regressions before rollout.

---
Canonical: /harness/evaluations/
Source: web/src/data/harness-markdown.ts
Format: Markdown for agents
---

Evaluations turn agent behavior into a release signal.

Create datasets from real use cases, support tickets, workflow traces, and known failure modes. Keep expected outcomes explicit.

## Evaluation Dimensions

- Correctness against approved sources.
- Tool-use precision.
- Privacy and data leakage prevention.
- Refusal behavior.
- Latency and cost.
- Stability across model upgrades.
