Case StudyCustomer Service Agent-as-a-Service

One call, one connected story — not three disconnected logs

A voice agent's real lifecycle spans more than the call itself: a conversational layer, a background agent that acts, and a workflow that runs after the call ends.

Book a demo → Explore the platform

Built on LiveKit + Ultravox + ElevenLabs — a real design-partner deployment, anonymised.

case-file · Customer Service Agent-as-a-Service live

Agent runs on LiveKit + Ultravox + ElevenLabs

Risk profile applied

Quality scored per call

Iteration count, TTS glitches, latency, and hallucination — native evals against your own definitions

p95/p99 tracked per span type — latency, cost, and quality together
Risk tracked as a trend, per agent layer, not just per call
Business-logic-level spans capture intent, not just tool calls

TL;DR

Voice agents often split into a low-risk conversational layer and a higher-risk background agent that performs actions — plus a workflow that runs after the call hangs up. Prefactor gives each layer its own risk profile tracked over time, captures intent-level spans instead of raw tool calls, runs your own quality definitions as native evals, and links the whole chain — call, background agent, after-call workflow — back to one conversation.

The problem

A voice agent has two tiers — one that talks to the customer (read-only, can't do much damage) and one that acts on their behalf in the background (mutating, gated behind approval). Quality metrics are specific to the team — clarifying-question iterations, voice glitches, latency, hallucination — with no formal weighting between them today, cross-referenced manually against error monitoring and customer feedback about once a week. The call, the background agent, and whatever runs after the call ends get checked as three separate things instead of one connected story.

Agent Built on LiveKit + Ultravox + ElevenLabs

Call starts

Conversational layer — read-only, low risk

Background agent acts

Mutating actions, gated by approval

After-call workflow runs

Async, on a separate service

Risk profile applied

Prefactor Watching every step, in real time

Spans captured per layer

Business-logic intent, not raw tool calls

Risk profile diverges

Conversational vs. background agent, tracked over time

Quality scored per call

Iteration count, TTS glitches, latency, and hallucination — native evals against your own definitions

How it works in Prefactor

p95 and p99 are tracked per span type on more than latency — cost and quality too — so a degrading slice of calls shows up as a number before it shows up as a pattern of complaints.

Risk is tracked as a trend, per layer: the conversational agent scores low-risk (read-only); the background agent scores higher (it mutates data) — and both are tracked over time, so drift toward riskier behaviour in either layer shows up before an individual call looks alarming.

Business-logic-level spans capture what the agent was actually trying to do, which matters when the pipeline underneath changes but the intent doesn't.

Your quality definitions — iteration count, TTS glitch rate, latency, hallucination — plug in as native evals: Prefactor captures the run and attaches the result; you bring the scoring logic via API. The same pattern applies to LLM-as-judge.

One phone call, its background agent's actions, and the async after-call workflow — even across separate services — stay connected to the same instance.

Proactive alerts fire on a real pattern, like a spike in negative feedback, rather than a single thumbs-down.

Frequently asked questions

Do we need to change our voice pipeline to use this?

No — Prefactor hooks into the agent session at the SDK level, alongside whatever voice stack you're already running.

How is quality actually scored — do we have to accept a generic metric?

No — your own definitions run as native evals or LLM-as-judge calls; Prefactor is the plumbing, you bring the scoring logic.

Can we link a call to something that happens after it ends?

Yes — as long as you reference the same instance ID, data from an after-call workflow, even on a different service, attaches to the original conversation.

Related glossary terms

Conversational Agent Agent Evaluation Drift Detection (Agent)

See it on your own agents

Book a demo and we'll walk through one call, one connected story — not three disconnected logs on a fleet like yours — real frameworks, real traces.

Agent Performance Platform
Unified performance platform for agents, authentication, and risk management

All Systems Operational

3Global Agents

7Instances

5Services

12%Human Intervene

4High Risk

$2,360Monthly Spend

Mission ControlLive agent health with 7-day activity heartbeat

Claims Proc...68

$330/moRed

Claims Proc...65

$160/moRed

Claims Proc...82

$170/moAmber

ChatGPT74

$150/moAmber

Critical Alerts

Pending Review

Resolved Today

Total Actions

Operational Actions3

Access changes, policy reviews, workflow drift

Risk Actions5

Sensitive data exposure, unsafe actions, access violations

Unauthorized access to financial database

Riskcriticalresolved

Agent attempted to access Finance-MCP server without proper authorization.

Claims Processor v1.0Finance-MCP03/02/2025, 14:32:00

Action taken by Prefactor:

Prefactor revoked the agent’s active session and blocked further access attempts.

✅ Resolved by Security Team at 03/02/2025, 14:45:00

Event LogSecurityPermissions

Mcp CallEVT-001 • 02/10/2025, 17:30:22Message sent to Slack channel200ms · $0.0500

PromptEVT-002 • 02/10/2025, 17:25:15User prompt received337ms · $0.1200

Mcp CallEVT-003 • 02/10/2025, 17:26:42Retrieved repository information474ms · $0.0800

Tool CallEVT-004 • 02/10/2025, 17:28:10Analyzed code for security issues611ms · $0.1500

OutcomeEVT-005 • 02/10/2025, 17:29:55Analysis complete748ms · $0.0300