← Back to glossary
Glossary

RLHF (Reinforcement Learning from Human Feedback)

Reviewed 9 April 2026 Canonical definition

RLHF is a training technique that refines a model's behavior using human preference judgments. It is commonly used to make models more helpful, honest, and harmless — but the quality of alignment depends on the diversity and accuracy of the feedback.

See how every agent performs — and make it better

Prefactor helps teams observe, evaluate, and improve their AI agents in production — across every framework and provider.