Real-Time Tracing
Every model call, tool call, and custom span — as it happens, not reconstructed afterward.
Prefactor captures every agent run as structured spans in real time — model calls, tool calls, and the custom spans you define for your own domain — and shows you the conversation as it unfolds, not a log you piece together after the fact.
Every agent run becomes an instance made of spans — one per LLM call, tool call, or custom operation you define — captured in real time with inputs, outputs, duration, and outcome. Spans are the raw material every other Prefactor capability reads, from cost to drift.
What gets captured, in real time
Every model call and tool call an agent makes becomes a span the moment it happens — not batched, not sampled after the fact. On top of that, you can define custom spans for whatever matters in your own domain: a retrieval step, a validation check, a specific tool your agent calls that's worth tracking on its own. Custom spans are what let quality assessment measure drift on the parts of a run that are actually specific to your product, not just generic model-call metrics.
Getting spans flowing takes a CLI install or an SDK integration — the TypeScript or Python core SDK, or native support for the frameworks Prefactor integrates with — and most teams are sending real-time data within the same session.
Watching a run as it happens
The conversational view shows a run unfolding live — what the agent said, which tools it called and with what arguments, what came back — the same way you'd watch a conversation, not the way you'd reconstruct one from a log file. Alongside it, p95 and p99 are tracked per agent and per span type — on latency, but also on cost, risk, and quality metrics — so a slow tool call, a costlier-than-usual run, or a quietly worsening tail of low-quality responses all show up as a number, not a vague sense that things feel off today.
This is the real-time half of the platform: observe isn't a dashboard you check at the end of the day, it's what's happening in production right now.
Comparing versions and environments
Every span carries the agent's version and environment, from the registry. That's what makes comparison possible: the same task's spans from last week's version against this week's, or staging against production, line up against each other directly. Drift shows up as a difference between those spans — a tool call that used to take 200ms now taking 900ms, a custom span's output shape changing after a prompt update — before it shows up as a quality score dropping or a support ticket arriving.
That comparison is the raw material quality assessment (Evaluate) turns into a drift trend, and the audit trail is where every span still lives after the fact for search and export.
Frequently asked questions
What is a span?
What's the difference between a span and a custom span?
How do I start sending traces to Prefactor?
Related glossary terms
Drop this into what you already run
TypeScript and Python SDKs, plus OpenTelemetry ingest — native for LangChain, Claude, Vercel AI, OpenClaw and LiveKit, with 15 framework integrations covered out of the box.
$ prefactor init
See it on your own agents
Book a demo and we'll walk through real-time tracing on a fleet like yours — real frameworks, real traces.
Unified performance platform for agents, authentication, and risk management