Glossary
Ground Truth Evaluation
Ground truth evaluation is the assessment of an AI agent's outputs against a known-correct reference dataset to measure factual accuracy, task completion, and output quality. It is the most reliable form of agent evaluation but requires investment in curating and maintaining accurate reference data. Ground truth evaluation is used for agent benchmarking, regression testing, and compliance validation in high-stakes use cases.