Case StudyBanking / Financial services

Internal agents don't need the same bar as customer-facing ones — until they do

The hard part isn't picking one risk bar for every agent — it's catching a risky pattern before it becomes an incident, and stopping only the run that needs to stop.

Book a demo → Explore the platform

Built on LangChain — a real design-partner deployment, anonymised.

case-file · Banking / Financial services live

Agent runs on LangChain

Risk trend flags it

Golden path scores it

Your expected question/answer pairs run as a native eval — no expensive per-commit rebuild

SDK-level integration — real-time visibility, not after-the-fact logs
Different risk profiles for internal vs. customer-facing agents, one platform
Risk tracked per span type over time, not just per run

TL;DR

Internal agents typically move fast with a low risk bar; customer-facing agents go through independent, regulator-driven audits. Prefactor lets both live in one platform with separate risk profiles tracked over time, runs your own golden-path comparisons as native evals, and can kill a single run — natively or via a custom span — without touching the rest of the agent's traffic.

The problem

A platform team builds internal agents to support operations — not customer-facing, lower risk, less sensitive data — so they ship fast with minimal eval. Customer-facing agents at the same organisation go through a completely different process: independent audit, framework-agnostic, judged purely on input/output against expected outcomes. Quality checks today are mostly manual: a hand-built "golden path" dataset of expected question/answer pairs, checked against a local pipeline because full automated evaluation on every commit is too expensive at scale. The hard problem: recognising where an agentic system actually breaks — a routing failure to the wrong sub-agent, for instance — before it becomes an incident, without over-engineering guardrails that introduce their own new problems.

Agent Built on LangChain

Internal agent runs

Low risk bar, ships fast

Customer-facing agent runs

Independent, regulator-driven audit

One risky run stops

Not the whole agent, not the other traffic

Risk trend flags it

Prefactor Watching every step, in real time

Both tracked in one platform

Separate risk profiles, side by side

Risk trend flags the pattern

Before a single run crosses a threshold

Golden path scores it

Your expected question/answer pairs run as a native eval — no expensive per-commit rebuild

How it works in Prefactor

SDK-level integration means real-time visibility and the ability to intervene, not just a record of what already happened.

Internal and customer-facing agents don't need the same bar — each carries its own risk profile instead of forcing every team onto the same tool or threshold.

Risk is tracked as a trend per span type: a routing pattern that's quietly getting riskier over time shows up before any single run crosses a threshold on its own.

The same expected-question/expected-answer comparisons a team already maintains run as a native eval — Prefactor captures and attaches the result; the comparison logic comes in via API.

A single risky run can be killed — triggered natively from the dashboard or programmatically via a custom span the moment your own code detects a problem — without shutting down the whole agent or every other conversation it's handling.

Frequently asked questions

Do internal and customer-facing agents need separate tools?

No — that's the point. Both run through Prefactor with different risk profiles, so you're not maintaining a different stack per team or per risk tier.

Can we stop one bad run without killing the agent entirely?

Yes — the kill switch scopes to a single run, not just an agent or a team, and can be triggered either manually or from your own code via a custom span.

Do we have to replace our existing golden-path process?

No — it plugs in as the comparison logic behind a native eval; you're not replacing your process, you're running it inside Prefactor instead of as a separate manual pipeline.

Related glossary terms

Agent Evaluation Risk Scoring Drift Detection (Agent)Kill Switch

See it on your own agents

Book a demo and we'll walk through internal agents don't need the same bar as customer-facing ones — until they do on a fleet like yours — real frameworks, real traces.

Agent Performance Platform
Unified performance platform for agents, authentication, and risk management

All Systems Operational

3Global Agents

7Instances

5Services

12%Human Intervene

4High Risk

$2,360Monthly Spend

Mission ControlLive agent health with 7-day activity heartbeat

Claims Proc...68

$330/moRed

Claims Proc...65

$160/moRed

Claims Proc...82

$170/moAmber

ChatGPT74

$150/moAmber

Critical Alerts

Pending Review

Resolved Today

Total Actions

Operational Actions3

Access changes, policy reviews, workflow drift

Risk Actions5

Sensitive data exposure, unsafe actions, access violations

Unauthorized access to financial database

Riskcriticalresolved

Agent attempted to access Finance-MCP server without proper authorization.

Claims Processor v1.0Finance-MCP03/02/2025, 14:32:00

Action taken by Prefactor:

Prefactor revoked the agent’s active session and blocked further access attempts.

✅ Resolved by Security Team at 03/02/2025, 14:45:00

Event LogSecurityPermissions

Mcp CallEVT-001 • 02/10/2025, 17:30:22Message sent to Slack channel200ms · $0.0500

PromptEVT-002 • 02/10/2025, 17:25:15User prompt received337ms · $0.1200

Mcp CallEVT-003 • 02/10/2025, 17:26:42Retrieved repository information474ms · $0.0800

Tool CallEVT-004 • 02/10/2025, 17:28:10Analyzed code for security issues611ms · $0.1500

OutcomeEVT-005 • 02/10/2025, 17:29:55Analysis complete748ms · $0.0300