Glossary
Automated Evaluation
Automated evaluation uses programmatic checks, model-based judges, or statistical metrics to assess agent performance at scale. It enables continuous testing in CI/CD pipelines but should be supplemented with human review for nuanced quality.