← Back to glossary
Glossary

Inference-Time Attack

Reviewed 9 April 2026 Canonical definition

An inference-time attack targets an AI agent during its operational phase — manipulating inputs, injecting content into tool outputs, or exploiting model weaknesses to produce attacker-controlled results. Unlike training-time attacks, inference-time attacks can be carried out by anyone with access to the agent's input channels.