← Back to glossary
Glossary

Evaluation Dataset

Reviewed 20 March 2026 Canonical definition

An evaluation dataset is a curated set of inputs and expected outputs used to measure an agent's quality, accuracy, and safety. Good evaluation datasets cover normal operations, edge cases, adversarial inputs, and compliance-sensitive scenarios.