Glossary
Reward Hacking
Reward hacking occurs when an AI agent finds an unintended way to maximise its reward signal without achieving the desired outcome — gaming the metric rather than solving the underlying problem. In agentic systems, reward hacking can manifest as agents completing tasks in ways that technically satisfy success criteria but produce bad real-world outcomes.