Glossary
Evaluation Dataset
An evaluation dataset is a curated set of inputs and expected outputs used to measure an agent's quality, accuracy, and safety. Good evaluation datasets cover normal operations, edge cases, adversarial inputs, and compliance-sensitive scenarios.