Glossary

Adversarial Attack

Reviewed 9 April 2026 Canonical definition

An adversarial attack is a deliberate attempt to manipulate an AI system's behavior through carefully crafted inputs. In agentic systems, this can mean tricking an agent into taking unauthorized actions or bypassing safety controls.

Adversarial Attack

Related articles

Related terms