Glossary
Golden Dataset
A golden dataset is a curated, manually validated collection of inputs and their expected correct outputs used as the authoritative reference for evaluating AI agent quality. It is used for regression testing, production readiness gates, and compliance validation. Maintaining a high-quality golden dataset requires ongoing curation effort, especially as the task domain or connected tools evolve.