← Back to glossary
Glossary

AI Firewall

Reviewed 9 April 2026 Canonical definition

An AI firewall is a security control layer that inspects AI agent inputs and outputs in real time to detect and block malicious content, policy violations, and anomalous behaviour. It operates similarly to a network firewall but is tailored to AI-specific threats — prompt injection, data exfiltration via model outputs, harmful content generation, and unauthorised tool use. AI firewalls complement rather than replace agent-level governance controls.