← Back to glossary
Glossary

Adversarial Attack

Reviewed 9 April 2026 Canonical definition

An adversarial attack is a deliberate attempt to manipulate an AI system's behavior through carefully crafted inputs. In agentic systems, this can mean tricking an agent into taking unauthorized actions or bypassing safety controls.