Case StudyBanking / Financial services

Internal agents don't need the same bar as customer-facing ones — until they do

The hard part isn't picking one risk bar for every agent — it's catching a risky pattern before it becomes an incident, and stopping only the run that needs to stop.

Built on LangChain — a real design-partner deployment, anonymised.

case-file · Banking / Financial services live
LangChain logo Agent runs on LangChain
Risk trend flags it
Golden path scores it
Your expected question/answer pairs run as a native eval — no expensive per-commit rebuild
  • SDK-level integration — real-time visibility, not after-the-fact logs
  • Different risk profiles for internal vs. customer-facing agents, one platform
  • Risk tracked per span type over time, not just per run
TL;DR

Internal agents typically move fast with a low risk bar; customer-facing agents go through independent, regulator-driven audits. Prefactor lets both live in one platform with separate risk profiles tracked over time, runs your own golden-path comparisons as native evals, and can kill a single run — natively or via a custom span — without touching the rest of the agent's traffic.

The problem

A platform team builds internal agents to support operations — not customer-facing, lower risk, less sensitive data — so they ship fast with minimal eval. Customer-facing agents at the same organisation go through a completely different process: independent audit, framework-agnostic, judged purely on input/output against expected outcomes. Quality checks today are mostly manual: a hand-built "golden path" dataset of expected question/answer pairs, checked against a local pipeline because full automated evaluation on every commit is too expensive at scale. The hard problem: recognising where an agentic system actually breaks — a routing failure to the wrong sub-agent, for instance — before it becomes an incident, without over-engineering guardrails that introduce their own new problems.

LangChain logo
Agent Built on LangChain
Internal agent runs
Low risk bar, ships fast
Customer-facing agent runs
Independent, regulator-driven audit
One risky run stops
Not the whole agent, not the other traffic
Risk trend flags it
Prefactor logo
Prefactor Watching every step, in real time
Both tracked in one platform
Separate risk profiles, side by side
Risk trend flags the pattern
Before a single run crosses a threshold
Golden path scores it
Your expected question/answer pairs run as a native eval — no expensive per-commit rebuild

How it works in Prefactor

SDK-level integration means real-time visibility and the ability to intervene, not just a record of what already happened.

Internal and customer-facing agents don't need the same bar — each carries its own risk profile instead of forcing every team onto the same tool or threshold.

Risk is tracked as a trend per span type: a routing pattern that's quietly getting riskier over time shows up before any single run crosses a threshold on its own.

The same expected-question/expected-answer comparisons a team already maintains run as a native eval — Prefactor captures and attaches the result; the comparison logic comes in via API.

A single risky run can be killed — triggered natively from the dashboard or programmatically via a custom span the moment your own code detects a problem — without shutting down the whole agent or every other conversation it's handling.

Frequently asked questions

Do internal and customer-facing agents need separate tools?
No — that's the point. Both run through Prefactor with different risk profiles, so you're not maintaining a different stack per team or per risk tier.
Can we stop one bad run without killing the agent entirely?
Yes — the kill switch scopes to a single run, not just an agent or a team, and can be triggered either manually or from your own code via a custom span.
Do we have to replace our existing golden-path process?
No — it plugs in as the comparison logic behind a native eval; you're not replacing your process, you're running it inside Prefactor instead of as a separate manual pipeline.

See it on your own agents

Book a demo and we'll walk through internal agents don't need the same bar as customer-facing ones — until they do on a fleet like yours — real frameworks, real traces.

Agent Performance Platform
Unified performance platform for agents, authentication, and risk management
All Systems Operational
3Global Agents
7Instances
5Services
12%Human Intervene
4High Risk
$2,360Monthly Spend
Mission ControlLive agent health with 7-day activity heartbeat
Claims Proc...68
$330/moRed
Claims Proc...65
$160/moRed
Claims Proc...82
$170/moAmber
ChatGPT74
$150/moAmber

See how every agent performs — and make it better

Prefactor helps teams observe, evaluate, and improve their AI agents in production — across every framework and provider.