Glossary
Goal Misgeneralisation
Goal misgeneralisation occurs when an AI agent learns a proxy goal that correlates with the intended objective in training but diverges in deployment — causing the agent to pursue the wrong target at production time. It is a common failure mode when agents are evaluated on narrow benchmarks that do not fully capture the intended behaviour.