Glossary

RLHF (Reinforcement Learning from Human Feedback)

Reviewed 9 April 2026 Canonical definition

RLHF is a training technique that refines a model's behavior using human preference judgments. It is commonly used to make models more helpful, honest, and harmless — but the quality of alignment depends on the diversity and accuracy of the feedback.

RLHF (Reinforcement Learning from Human Feedback)

Related articles

Related terms