← Back to glossary
Glossary

Adversarial Attack

Reviewed 20 March 2026 Canonical definition

An adversarial attack is a deliberate attempt to manipulate an AI system's behavior through carefully crafted inputs. In agentic systems, this can mean tricking an agent into taking unauthorized actions or bypassing safety controls.