Glossary
LLM-as-Judge
LLM-as-judge is an evaluation technique where a separate language model — typically a capable model like GPT-4 or Claude — is used to score the outputs of an agent being evaluated. It enables scalable, automated assessment of subjective output qualities such as coherence, completeness, and tone that would otherwise require human annotators.