Problems · in production

The bill doubled. Nobody can say what it bought.

AI spend climbs month on month, and the only artefact anyone can point to is the invoice.

Book a demo →View docs

cost-view · unattributed spendexample

Illustrative cost view, the month attribution arrives

Monthly agent spend$9k → $38k

Spend attributed per agent0% → 100%

Most expensive agentcheapest model, 61% pass

Cost per passed taskunknown → $0.71

Runs paused at budget cap3 this week

✓ spend lands next to the verdict it paid for, per agent, per version, per task

§01 / THE SYMPTOMyou see: the signals

TL;DR

AI spend without attribution can neither be defended nor cut. Attribute cost per agent, task and version next to a quality verdict, and budget decisions run on evidence instead of totals.

The symptom

What unattributed spend looks like

The bill is real. Everything that would explain it is missing.

Five figures, one line item

The monthly AI bill reaches five figures with minimal usage data behind it. When someone asks what the spend produced, the answer is a reconstruction, not a report.

Spend never meets output

Token spend sits in the platform invoice; what the agents produced sits everywhere else. Nobody can put cost next to a completed task, so nobody can say whether the number is high or a bargain.

Estimates are guesswork

One user request can fan out into dozens of model calls, tool retries, and sub-agent steps. Forecasting next month's bill from last month's is the only method available, and it keeps being wrong.

Downgraded blind

When the bill spikes, teams move agents to a cheaper model purely to cap spend. Nobody measures what that costs in quality, because nothing measures quality per run.

§02 / WHY IT HAPPENScause: not carelessness

Why it happens

Why the bill and the work never meet

Each gap is structural, not careless. Platform billing was never built to answer per-agent questions.

Billing stops at the account

Per-seat and usage-based pricing report spend by account or API key, not by agent, version, or task. The unit finance asks about does not exist in the invoice.

Agentic workloads defeat estimation

A single completion has a knowable cost; an agent run does not. Loops without terminal conditions, retries on failing tools, and context history that grows every turn mean two identical requests can differ in cost many times over.

Cost and quality live apart

Finance sees the invoice, engineering sees the traces, and no tool holds both. "Expensive but right" and "cheap but wrong" collapse into one number: spend.

The ROI question blocks the answer

Monitoring that would attribute the spend stalls in procurement because nobody can prove in advance what it will save. The gap that causes the problem also defends it.

§03 / HOW YOU CATCH ITloop: observe → evaluate

How you catch it

How attribution finds the money

Prefactor watches every run and records what it cost as it happens, so spend stops being a monthly surprise.

Observe

Cost recorded per run, as runs happen. Every model call, retry, and tool step lands in one record with its token spend, tagged to the agent, the version, and the task that incurred it.

Evaluate

The verdict sits next to the spend. Every run is evaluated against the agent's job, so cost per run becomes cost per outcome. Expensive but right and cheap but wrong become different numbers.

Compare

Trends, per agent and per version. Cost per passed task over time shows which prompt edit, model swap, or integration moved the number, in either direction.

§04 / HOW YOU FIX ITloop: act → improve

How you fix it

From capped bills to priced outcomes

Attribution finds the money. The loop decides what to do with it.

Act

Budgets pause an agent before the overrun. A per-run or per-agent budget stops a looping run while it is a line item, not an escalation. The paused run keeps its record, so the decision to resume or fix starts from evidence.

Improve

Fix the pattern, not the invoice. The record shows where the money goes: a tool retried dozens of times, context growing without bound, a step a smaller model handles at the same quality. Each fix is a measured change, not a hunch.

Prove

Model swaps become evidence decisions. When a cheaper model is proposed, the quality trend and the cost trend for that agent answer the question before the swap ships. What it saves and what it gives up are both on record.

An agent's cost per task tripled in a week with no code change. The record showed a knowledge-base update had made one retrieval step return far more documents, and context grew on every turn after it. The team trimmed the retrieval step, set a budget that pauses runs past a threshold, and cost per passed task fell below where it started. Illustrative, but this is what attribution turns a billing dispute into.

§05 / WHO OWNS ITteams: the same record

Who owns it

The same problem, from every seat

Engineering leadership

See what each agent costs per run and per task, and find the loop or retry pattern behind a spike without a reproduction hunt.

See the solution →Product leaders

Price a feature on cost per successful outcome, not a share of one invoice, and defend the spend with the quality it bought.

See the solution →Heads of AI

One view of spend across the portfolio: which agents earn their cost, which do not, and what a model swap would actually change.

See the solution →

§06 / QUESTIONSfaq: the common ones

Questions

How do I control AI agent costs in production?

Attribute before you cap. Record cost per run, tagged to the agent, version, and task that incurred it, then set budgets that pause a run or an agent before the overrun. Caps without attribution just move the problem: spend goes down and nobody knows what quality went with it.

Why is my AI agent so expensive to run?

The usual causes: retries on failing tools without caps, loops without terminal conditions, context history growing over the session, and a larger model than the step needs. A per-run cost record shows which pattern is yours; without one, every theory costs an engineering week to test.

How do I attribute LLM costs per agent or per task?

The platform invoice cannot do it, so attribution has to happen at runtime: each model call and tool step is recorded with its token spend and tagged to the agent, version, and task it belonged to. Roll those up and cost per agent, per version, and per outcome become lookups instead of estimates.

Should I move my agents to a cheaper model to cut costs?

Only with the quality trend next to the cost trend. A cheaper model that fails more tasks can cost more per successful outcome than the model it replaced. Run both numbers per version, before and after, and let the swap ship on evidence.

How do I set a budget for an AI agent?

Set per-run and per-agent budgets that pause the agent before the limit is breached, not alerts that arrive after. A paused run keeps its full record, so the team can see what it was doing, decide whether to resume, and fix the pattern that hit the cap.

See it in action on a fleet like yours

Book a demo and we will attribute a bill like yours: cost per agent, per version, per task, with the quality it bought beside it.

Agent Performance Platform
Unified performance platform for agents, authentication, and risk management

All Systems Operational

3Global Agents

7Instances

5Services

12%Human Intervene

4High Risk

$2,360Monthly Spend

Mission ControlLive agent health with 7-day activity heartbeat

Claims Proc...68

$330/moRed

Claims Proc...65

$160/moRed

Claims Proc...82

$170/moAmber

ChatGPT74

$150/moAmber

Critical Alerts

Pending Review

Resolved Today

Total Actions

Operational Actions3

Access changes, policy reviews, workflow drift

Risk Actions5

Sensitive data exposure, unsafe actions, access violations

Unauthorized access to financial database

Riskcriticalresolved

Agent attempted to access Finance-MCP server without proper authorization.

Claims Processor v1.0Finance-MCP03/02/2025, 14:32:00

Action taken by Prefactor:

Prefactor revoked the agent’s active session and blocked further access attempts.

✅ Resolved by Security Team at 03/02/2025, 14:45:00

Event LogSecurityPermissions

Mcp CallEVT-001 • 02/10/2025, 17:30:22Message sent to Slack channel200ms · $0.0500

PromptEVT-002 • 02/10/2025, 17:25:15User prompt received337ms · $0.1200

Mcp CallEVT-003 • 02/10/2025, 17:26:42Retrieved repository information474ms · $0.0800

Tool CallEVT-004 • 02/10/2025, 17:28:10Analyzed code for security issues611ms · $0.1500

OutcomeEVT-005 • 02/10/2025, 17:29:55Analysis complete748ms · $0.0300

See how every agent performs, and make it better

Prefactor helps teams observe, evaluate, and improve their AI agents in production, across every framework and provider.

Book a demo View docs