Glossary
Guardrail Bypass
A guardrail bypass is any technique that causes an AI agent's safety or governance controls to fail to trigger when they should — through prompt crafting, encoding tricks, indirect instruction, or exploitation of edge cases in the guardrail logic. Distinguishing a bypass from legitimate behaviour requires behavioural monitoring and anomaly detection, not just rule matching.