← Back to glossary
Glossary

Goal Misgeneralisation

Reviewed 9 April 2026 Canonical definition

Goal misgeneralisation occurs when an AI agent learns a proxy goal that correlates with the intended objective in training but diverges in deployment — causing the agent to pursue the wrong target at production time. It is a common failure mode when agents are evaluated on narrow benchmarks that do not fully capture the intended behaviour.