← Back to glossary
Glossary

Capability Control (AI)

Reviewed 9 April 2026 Canonical definition

Capability control is the practice of limiting what an AI agent is technically able to do — through tool restrictions, output filtering, API permissions, and execution environment constraints — rather than relying solely on the agent's own safety training to prevent harmful behaviour. Capability controls are a defence-in-depth strategy: even if an agent's reasoning can be manipulated, the controls ensure it cannot take actions outside its permitted scope.