Case StudyCustomer Service Agent-as-a-Service

One call, one connected story — not three disconnected logs

A voice agent's real lifecycle spans more than the call itself: a conversational layer, a background agent that acts, and a workflow that runs after the call ends.

Built on LiveKit + Ultravox + ElevenLabs — a real design-partner deployment, anonymised.

case-file · Customer Service Agent-as-a-Service live
LiveKit logoUltravox logoElevenLabs logo Agent runs on LiveKit + Ultravox + ElevenLabs
Risk profile applied
Quality scored per call
Iteration count, TTS glitches, latency, and hallucination — native evals against your own definitions
  • p95/p99 tracked per span type — latency, cost, and quality together
  • Risk tracked as a trend, per agent layer, not just per call
  • Business-logic-level spans capture intent, not just tool calls
TL;DR

Voice agents often split into a low-risk conversational layer and a higher-risk background agent that performs actions — plus a workflow that runs after the call hangs up. Prefactor gives each layer its own risk profile tracked over time, captures intent-level spans instead of raw tool calls, runs your own quality definitions as native evals, and links the whole chain — call, background agent, after-call workflow — back to one conversation.

The problem

A voice agent has two tiers — one that talks to the customer (read-only, can't do much damage) and one that acts on their behalf in the background (mutating, gated behind approval). Quality metrics are specific to the team — clarifying-question iterations, voice glitches, latency, hallucination — with no formal weighting between them today, cross-referenced manually against error monitoring and customer feedback about once a week. The call, the background agent, and whatever runs after the call ends get checked as three separate things instead of one connected story.

LiveKit logoUltravox logoElevenLabs logo
Agent Built on LiveKit + Ultravox + ElevenLabs
Call starts
Conversational layer — read-only, low risk
Background agent acts
Mutating actions, gated by approval
After-call workflow runs
Async, on a separate service
Risk profile applied
Prefactor logo
Prefactor Watching every step, in real time
Spans captured per layer
Business-logic intent, not raw tool calls
Risk profile diverges
Conversational vs. background agent, tracked over time
Quality scored per call
Iteration count, TTS glitches, latency, and hallucination — native evals against your own definitions

How it works in Prefactor

p95 and p99 are tracked per span type on more than latency — cost and quality too — so a degrading slice of calls shows up as a number before it shows up as a pattern of complaints.

Risk is tracked as a trend, per layer: the conversational agent scores low-risk (read-only); the background agent scores higher (it mutates data) — and both are tracked over time, so drift toward riskier behaviour in either layer shows up before an individual call looks alarming.

Business-logic-level spans capture what the agent was actually trying to do, which matters when the pipeline underneath changes but the intent doesn't.

Your quality definitions — iteration count, TTS glitch rate, latency, hallucination — plug in as native evals: Prefactor captures the run and attaches the result; you bring the scoring logic via API. The same pattern applies to LLM-as-judge.

One phone call, its background agent's actions, and the async after-call workflow — even across separate services — stay connected to the same instance.

Proactive alerts fire on a real pattern, like a spike in negative feedback, rather than a single thumbs-down.

Frequently asked questions

Do we need to change our voice pipeline to use this?
No — Prefactor hooks into the agent session at the SDK level, alongside whatever voice stack you're already running.
How is quality actually scored — do we have to accept a generic metric?
No — your own definitions run as native evals or LLM-as-judge calls; Prefactor is the plumbing, you bring the scoring logic.
Can we link a call to something that happens after it ends?
Yes — as long as you reference the same instance ID, data from an after-call workflow, even on a different service, attaches to the original conversation.

See it on your own agents

Book a demo and we'll walk through one call, one connected story — not three disconnected logs on a fleet like yours — real frameworks, real traces.

Agent Performance Platform
Unified performance platform for agents, authentication, and risk management
All Systems Operational
3Global Agents
7Instances
5Services
12%Human Intervene
4High Risk
$2,360Monthly Spend
Mission ControlLive agent health with 7-day activity heartbeat
Claims Proc...68
$330/moRed
Claims Proc...65
$160/moRed
Claims Proc...82
$170/moAmber
ChatGPT74
$150/moAmber

See how every agent performs — and make it better

Prefactor helps teams observe, evaluate, and improve their AI agents in production — across every framework and provider.