← Back to glossary
Glossary

Ground Truth Evaluation

Reviewed 9 April 2026 Canonical definition

Ground truth evaluation is the assessment of an AI agent's outputs against a known-correct reference dataset to measure factual accuracy, task completion, and output quality. It is the most reliable form of agent evaluation but requires investment in curating and maintaining accurate reference data. Ground truth evaluation is used for agent benchmarking, regression testing, and compliance validation in high-stakes use cases.