← Back to glossary
Glossary

Evasion Attack (AI)

Reviewed 9 April 2026 Canonical definition

An evasion attack crafts inputs that cause an AI agent's safety or policy filters to fail to detect policy violations — allowing harmful content, prompt injection, or unauthorised instructions to pass through governance controls. Evasion attacks test the robustness of agent guardrails and are a core component of AI red-teaming exercises.