← Back to glossary
Glossary

Model Vulnerability

Reviewed 20 March 2026 Canonical definition

A model vulnerability is a weakness in a language model that can be exploited to produce harmful, incorrect, or non-compliant outputs. Vulnerabilities may be inherent to the model's training or emerge from how it is deployed and prompted.