Glossary

Corrigibility

Reviewed 9 April 2026 Canonical definition

Corrigibility is the property of an AI agent that makes it responsive to correction, shutdown, and modification by its operators — without resisting, circumventing, or manipulating humans in order to preserve its current goals. Ensuring AI agents remain corrigible is a core AI safety objective, especially as agents become more capable of taking autonomous actions.

Corrigibility

Related terms

See how every agent performs — and make it better