Problems · in production

The agent said it confidently. None of it was true.

Agents state claims their context does not support, and the trace of a wrong answer looks identical to a right one.

Book a demo →View docs

run-view · groundednessexample

Illustrative run, the moment a wrong answer stops passing

Run tracecompleted, no errors

Groundedness verdictfailed, 2 unsupported claims

Rate since prompt change0.6% → 2.1%

Sensitive-topic runheld for review

First to see the answera reviewer, not the user

✓ every run checked against its sources, so a hallucination becomes a verdict, not a complaint

§01 / THE SYMPTOMyou see: the signals

TL;DR

A confident wrong answer traces identically to a right one. Evaluate every run for groundedness against the agent's sources, and hallucinations surface as failed verdicts instead of user complaints.

The symptom

What hallucination looks like in production

Rarely nonsense. Usually competence: fluent, specific, and wrong.

The invented policy

A customer asks about returns outside the window, retrieval finds nothing relevant, and the agent generates a plausible policy. The customer now has the wrong terms in writing.

The confirmed action that never ran

A booking API times out and the agent, mid-conversation, reports success anyway. Nothing was booked, and the failure surfaces days later as a person the system let down.

The trace that looks fine

No error was thrown and no log line turned red. A hallucinated answer traces identically to a correct one, which is why nobody knew until a user did.

Trust spends faster than it earns

After one visible incident, staff double-check every agent answer and adoption stalls. A failure rate of a few percent sounds small until it is multiplied by daily volume, and on member data, legal, or financial topics each wrong answer carries liability.

§02 / WHY IT HAPPENScause: not carelessness

Why it happens

Why agents state things that are not true

Trace hallucinated runs back to the moment the claim appeared and the causes rank consistently.

Retrieval came back empty

The most common cause. The agent needed information, retrieval returned nothing or the wrong passage, and the model filled the gap, because generation always produces something.

The prompt never covers missing answers

Most prompts say what to do; few say what to do when the context does not contain the answer. Without an explicit instruction to say so, refusal loses to fabrication.

Tool results misread

An empty result read as success, an error payload read as data, a partial result read as the whole answer. The fabrication happens at the interpretation step and compounds through everything downstream.

Context overflow, then the model blamed

Long sessions push early constraints out of the window, and the agent answers from the conversation's drift rather than its facts. The model itself is the last cause to suspect, and usually the first to be blamed.

§03 / HOW YOU CATCH ITloop: observe → evaluate

How you catch it

How every run gets a groundedness verdict

Reading transcripts finds the failures you already expected. Prefactor evaluates every run instead.

Watch

Every run recorded with the context the agent had. Retrieved documents, tool results, and the final answer land in one record as the run happens, per agent and per version.

Evaluate

Every claim checked against the agent's sources. Each run is evaluated for groundedness against the context it had and the job it was given. An unsupported claim fails the run, however fluent the answer.

Know

The rate, per agent and per version. Failed verdicts roll up into a hallucination rate, so a regression after a prompt or model change shows as a trend that alerts, not a complaint that arrives weeks later.

§04 / HOW YOU FIX ITloop: act → improve

How you fix it

From caught to kept out

A failed verdict is the start. The loop keeps the answer from shipping and the cause from returning.

Act

Hold the answer before it ships. Runs on sensitive topics, member data, legal, financial, can be held for review before the user sees the answer, and a failed verdict routes to a person while it is one run, not a pattern.

Improve

Fix the cause the record points to. A failed run shows what was in context at the moment of the unsupported claim: a retrieval gap, a silent tool error, a prompt with no instruction for missing answers. Each caught case becomes a regression test.

Prove

Changes ship against the evidence. Prompt and model changes run against the growing set of caught cases, and the live rate per version shows whether the fix held. A change that moves the rate up is a failed deploy, on record.

A support agent quoted a 60-day refund window; the policy says 30. The run completed cleanly, but the groundedness verdict failed it: the claim had no source in the retrieved policy. The answer was held, a reviewer sent the correct terms, and the retrieval gap behind it was patched and added to the test set. Illustrative, but this is the standard shape of a caught hallucination.

§05 / WHO OWNS ITteams: the same record

Who owns it

The same problem, from every seat

Product leaders

Wrong answers are what stall adoption. A hallucination rate that falls, and stays fallen, is what brings users back.

See the solution →Heads of AI

A rate per agent and per version, so "are the agents okay" is answered with a trend, not an anecdote.

See the solution →Security & governance

Sensitive-topic runs held for review before the answer ships, with a record of every verdict for the questions that follow an incident.

See the solution →

§06 / QUESTIONSfaq: the common ones

Questions

Why is my AI agent hallucinating?

The most common causes, in order: retrieval did not supply the information the agent needed, so it filled the gap; the prompt is ambiguous about what to do when the answer is missing; the agent misread a tool result, an empty result treated as confirmation or an error treated as data; the context window overflowed and earlier constraints fell out; or the model is too weak for the task, the last cause to suspect. Read the full record of a handful of hallucinated runs and the cause is usually visible in what the agent had in context at the moment of the claim.

How do I stop AI agent hallucinations?

In order of effort: add grounding instructions, answer only from provided context, state plainly when a tool returned nothing, quote tool results rather than paraphrasing them; fix retrieval where the failed runs show gaps; validate structured claims like booking IDs and amounts in code against the actual tool response before the answer ships; then gate deploys on the growing set of caught cases. Evaluating every run is what keeps those fixes from decaying.

How do I detect agent hallucinations without reading every output?

Evaluate runs for groundedness: the check receives the agent's answer plus the context the agent had, retrieved documents and tool results, and fails any claim that context does not support. Track failed verdicts as a rate per agent and per version, alert when it moves, and human-review a slice of verdicts each week to keep the check honest.

What hallucination rate is acceptable?

There is no universal number; it depends on blast radius. Internal copilots with expert users reviewing output tolerate rates that customer-facing agents cannot, where each fabrication can be a commitment you did not authorise. Whatever threshold you set, the unmeasured rate is the unacceptable one.

Can I stop a hallucinated answer from reaching the user?

For the runs that matter most, yes. Runs on sensitive topics, or runs whose groundedness verdict fails, can be held for review before the answer ships, so a person approves or corrects it first. For everything else, the verdict means your team catches the failure instead of learning about it from a complaint.

See it in action on a fleet like yours

Book a demo and we will evaluate runs like yours for groundedness: every claim checked against its sources, a rate per agent and per version, and sensitive answers held for review.

Agent Performance Platform
Unified performance platform for agents, authentication, and risk management

All Systems Operational

3Global Agents

7Instances

5Services

12%Human Intervene

4High Risk

$2,360Monthly Spend

Mission ControlLive agent health with 7-day activity heartbeat

Claims Proc...68

$330/moRed

Claims Proc...65

$160/moRed

Claims Proc...82

$170/moAmber

ChatGPT74

$150/moAmber

Critical Alerts

Pending Review

Resolved Today

Total Actions

Operational Actions3

Access changes, policy reviews, workflow drift

Risk Actions5

Sensitive data exposure, unsafe actions, access violations

Unauthorized access to financial database

Riskcriticalresolved

Agent attempted to access Finance-MCP server without proper authorization.

Claims Processor v1.0Finance-MCP03/02/2025, 14:32:00

Action taken by Prefactor:

Prefactor revoked the agent’s active session and blocked further access attempts.

✅ Resolved by Security Team at 03/02/2025, 14:45:00

Event LogSecurityPermissions

Mcp CallEVT-001 • 02/10/2025, 17:30:22Message sent to Slack channel200ms · $0.0500

PromptEVT-002 • 02/10/2025, 17:25:15User prompt received337ms · $0.1200

Mcp CallEVT-003 • 02/10/2025, 17:26:42Retrieved repository information474ms · $0.0800

Tool CallEVT-004 • 02/10/2025, 17:28:10Analyzed code for security issues611ms · $0.1500

OutcomeEVT-005 • 02/10/2025, 17:29:55Analysis complete748ms · $0.0300

See how every agent performs, and make it better

Prefactor helps teams observe, evaluate, and improve their AI agents in production, across every framework and provider.

Book a demo View docs