Internal agents don't need the same bar as customer-facing ones — until they do
The hard part isn't picking one risk bar for every agent — it's catching a risky pattern before it becomes an incident, and stopping only the run that needs to stop.
Built on LangChain — a real design-partner deployment, anonymised.
- SDK-level integration — real-time visibility, not after-the-fact logs
- Different risk profiles for internal vs. customer-facing agents, one platform
- Risk tracked per span type over time, not just per run
Internal agents typically move fast with a low risk bar; customer-facing agents go through independent, regulator-driven audits. Prefactor lets both live in one platform with separate risk profiles tracked over time, runs your own golden-path comparisons as native evals, and can kill a single run — natively or via a custom span — without touching the rest of the agent's traffic.
The problem
A platform team builds internal agents to support operations — not customer-facing, lower risk, less sensitive data — so they ship fast with minimal eval. Customer-facing agents at the same organisation go through a completely different process: independent audit, framework-agnostic, judged purely on input/output against expected outcomes. Quality checks today are mostly manual: a hand-built "golden path" dataset of expected question/answer pairs, checked against a local pipeline because full automated evaluation on every commit is too expensive at scale. The hard problem: recognising where an agentic system actually breaks — a routing failure to the wrong sub-agent, for instance — before it becomes an incident, without over-engineering guardrails that introduce their own new problems.
How it works in Prefactor
SDK-level integration means real-time visibility and the ability to intervene, not just a record of what already happened.
Internal and customer-facing agents don't need the same bar — each carries its own risk profile instead of forcing every team onto the same tool or threshold.
Risk is tracked as a trend per span type: a routing pattern that's quietly getting riskier over time shows up before any single run crosses a threshold on its own.
The same expected-question/expected-answer comparisons a team already maintains run as a native eval — Prefactor captures and attaches the result; the comparison logic comes in via API.
A single risky run can be killed — triggered natively from the dashboard or programmatically via a custom span the moment your own code detects a problem — without shutting down the whole agent or every other conversation it's handling.
Frequently asked questions
Do internal and customer-facing agents need separate tools?
Can we stop one bad run without killing the agent entirely?
Do we have to replace our existing golden-path process?
Related glossary terms
See it on your own agents
Book a demo and we'll walk through internal agents don't need the same bar as customer-facing ones — until they do on a fleet like yours — real frameworks, real traces.
Unified performance platform for agents, authentication, and risk management