What is AI Agent Security?
The threats, attack surfaces, and defences that matter when autonomous AI agents operate in production environments.
AI agent security is the discipline of protecting AI agents — and the systems they interact with — from attacks, misuse, and unintended behavior. It covers the full attack surface: prompt injection, tool poisoning, credential compromise, data exfiltration, privilege escalation, and supply chain risks specific to agentic systems.
The agent attack surface
AI agents have a broader attack surface than traditional applications because they combine multiple input channels, autonomous decision-making, and external system access.
The input layer is vulnerable to prompt injection — both direct (malicious user inputs) and indirect (malicious content in retrieved documents, emails, or web pages). An attacker who controls what an agent reads can influence what it does.
The tool layer is vulnerable to tool poisoning, where compromised or malicious tools execute harmful actions when invoked by an agent. It is also vulnerable to confused deputy attacks, where a legitimate tool is tricked into performing actions on behalf of an attacker.
The identity layer is vulnerable to credential theft, over-privileged service accounts, and shadow agents that operate without governance.
The output layer is vulnerable to data leakage, where agents expose sensitive information in their responses, logs, or downstream tool calls.
Top threats to AI agents in production
Prompt injection remains the most discussed threat. Direct injection manipulates the agent through user input; indirect injection embeds instructions in data the agent retrieves. Both can cause agents to bypass safety controls, exfiltrate data, or take unauthorized actions.
Privilege escalation occurs when an agent obtains permissions beyond what it was granted — either through misconfiguration, exploiting delegation chains, or manipulating orchestration logic.
Data exfiltration happens when an agent sends sensitive data to external systems through tool calls, API requests, or generated content. This can be deliberate (attack) or accidental (misconfigured agent).
Supply chain attacks target the models, libraries, tools, and data sources that agents depend on. A compromised model provider, poisoned vector database, or malicious MCP server can affect every agent that uses it.
Denial of service can target agents by triggering expensive operations — long reasoning chains, excessive tool calls, or token-intensive prompts — that consume resources and block legitimate use.
Defence in depth for agentic systems
Securing AI agents requires a defence-in-depth approach with controls at every layer.
Input validation screens and sanitises all inputs before they reach the model, including user messages and retrieved content. This is the first line of defence against injection attacks.
Least privilege ensures every agent has only the minimum permissions it needs. Broad service accounts shared across agents are replaced with individual, scoped credentials.
Runtime policy enforcement checks every agent action against governance rules before it is executed. Actions that violate policy are blocked, logged, and escalated.
Output filtering inspects agent responses for sensitive data, harmful content, and policy violations before they reach the user or downstream systems.
Network segmentation limits which systems an agent can communicate with, reducing the blast radius of a compromise.
Monitoring and anomaly detection identifies unusual agent behavior — unexpected tool calls, abnormal token usage, or novel output patterns — that may indicate an attack or misconfiguration.
Securing the agent supply chain
AI agents depend on a complex supply chain: foundation models from providers like OpenAI, Anthropic, or Google; frameworks like LangChain or CrewAI; tools exposed through MCP or custom APIs; knowledge bases and vector databases; and the infrastructure that hosts everything.
Each dependency is a potential attack vector. Securing the supply chain means maintaining an AI bill of materials that catalogs every component, vetting new dependencies before they are approved for use, monitoring for vulnerabilities and updates, and ensuring that data flows through the supply chain respect access controls and compliance requirements.
MCP tool approval processes are particularly important — every tool an agent can invoke should be reviewed for security implications before being enabled in production.
Incident response for AI agents
When an agent is compromised or behaves unexpectedly, response must be fast. This requires preparation.
Kill switches provide the ability to immediately suspend an agent's operations. They should be accessible to security teams and triggerable by automated anomaly detection.
Playbooks define the steps for investigating and resolving agent incidents — who is notified, how the agent is isolated, how evidence is preserved, and how affected systems and users are communicated with.
Forensic capability depends on having complete, immutable traces of what the agent did. Without detailed audit logs, it is impossible to determine the scope of an incident or its root cause.
Post-incident review examines what happened, why controls failed, and what improvements are needed. Findings should feed back into governance policies and security controls.
Building an agent security programme
An effective agent security programme starts with threat modelling — systematically identifying how agents could be attacked or fail in your specific environment.
It continues with red teaming — having adversarial testers probe your agents for vulnerabilities before attackers do.
It is operationalised through runtime controls — policy enforcement, monitoring, and incident response capabilities that run continuously in production.
And it matures through continuous improvement — tracking security metrics, reviewing incidents, updating threat models, and evolving controls as agents become more capable and attack techniques more sophisticated.
Security is not a feature that is bolted on after deployment. Like governance, it must be built into the agent architecture from the start.
See how Prefactor secures AI agents
Prefactor gives enterprises runtime governance, observability, and control over every AI agent in production.
Book a demo →