Glossary

Backdoor Attack (AI)

Reviewed 9 April 2026 Canonical definition

A backdoor attack embeds hidden behaviour into an AI model during training — causing the model to behave normally under most inputs but to produce attacker-controlled outputs when a specific trigger pattern is present. Backdoors in foundation models used by AI agents can be extremely difficult to detect without extensive red-teaming.

Backdoor Attack (AI)

Related articles

Related terms