Sensitive data inadvertently exfiltrated via agent outputs, tool calls, or stored in traces.
A practical guide to data leakage from agents — what it is, what causes it, how to stop it before it ships harm, and how to catch it when prevention fails.
What it actually looks like in production
- Agent included customer PII in a tool call to a third-party non-BAA'd service
- Agent output to user contained another user's data due to context contamination
- PII captured in production traces was readable by unauthorized engineers
Why it happens
- No PII detection on agent inputs/outputs
- Context contamination between users
- Tool arguments not screened for sensitive data
- Trace access controls too permissive
How to prevent it (vendor-neutral)
1. PII detection on inputs, outputs, and tool args
2. Per-user/per-tenant context isolation
3. Tool destination allowlists for sensitive data
4. Trace access controls and field-level redaction
5. Customer-managed encryption for traces
How Prefactor helps detect and prevent it
Prefactor sits at the agent runtime and contributes specifically:
- Runtime guardrails that flag or block matching patterns before they land
- Continuous eval suites that catch quality regressions on every change
- Tamper-evident logs of every incident and response action
- Per-agent anomaly alerts on the signals listed below
Detection — what to monitor
- PII detector alerts in production
- Cross-tenant references in traces
- Tool calls to non-allowlisted destinations
Response — what to do when it happens
Immediate (minutes): confirm the incident from the trace; pause the affected agent if active harm possible; hotfix the trigger.
Short-term (hours): add the failure case to the eval suite; patch the root cause; redeploy with regression validation.
Medium-term (days): root cause analysis; tighten guardrails or controls; document the incident for post-mortem and audit.
FAQ
Can data leakage from agents be eliminated entirely? Usually no — reduce frequency and severity dramatically, and contain blast radius. Aim for low, detected, and contained.
How often should we test for this? Continuously, with every change. Every reported incident becomes a test case.
Can Prefactor detect this in real time? Yes for many variants — guardrails run in-line with sub-second latency.
Related
See Prefactor in action
[Get started free →] [Book a demo →]