Glossary
Corrigibility
Corrigibility is the property of an AI agent that makes it responsive to correction, shutdown, and modification by its operators — without resisting, circumventing, or manipulating humans in order to preserve its current goals. Ensuring AI agents remain corrigible is a core AI safety objective, especially as agents become more capable of taking autonomous actions.