Langfuse vs LangSmith

Langfuse vs LangSmith: the honest version

Choose LangSmith if you're all-in on the LangChain/LangGraph ecosystem and want first-party integration. Choose Langfuse if you want open-source tracing you can self-host. Choose neither alone if your question is "is my agent actually doing its job" — that's a quality problem, and it's where Prefactor sits on top of, or instead of, both.

Decision factor	Langfuse	LangSmith
Ecosystem coupling	Framework-agnostic SDKs; no preferred framework	Works anywhere, shines with LangChain/LangGraph
Open source / self-host	✓ MIT-licensed core, self-hostable	— Proprietary; self-hosted on enterprise plans only
Tracing depth	Strong traces, sessions, multi-modal logging	Deepest LangGraph agent visibility; step-level debugging
Evals	LLM-as-judge evaluators, annotation queues, scores on traces	Datasets, experiments, LLM/heuristic evaluators, online evals
Feedback capture	User feedback API tied to traces	Feedback API tied to runs
Cost visibility	Token/cost per trace and user; per-agent attribution is yours to build	Token/cost per run; per-agent attribution is yours to build
Pricing model	OSS free; cloud platform fee + usage per ingested unit	Per seat + per trace, retention-based tiers
Ownership / direction risk	Acquired by ClickHouse (Jan 2026) — roadmap and residency questions in flight	Backed by LangChain's commercial trajectory; tied to that ecosystem's bets

Langfuse in depth

Langfuse is the strongest open-source option in LLM observability [1] . The core platform — tracing, sessions, prompt management with versioning, cost analytics, evaluators, annotation queues — is MIT-licensed and self-hostable, which makes it the default choice for teams with data-residency requirements or an allergy to per-trace metering. The SDKs are genuinely framework-neutral: LangChain, LlamaIndex, the OpenAI SDK, or raw HTTP all instrument cleanly, and OpenTelemetry compatibility keeps your exit costs low.

Limits, fairly stated: the evaluation features are trace-centric — good for scoring individual generations, lighter for multi-step task verdicts and eval-gated deploys. Dashboarding is functional rather than deep, and anything organisation-specific (per-customer cost rollups, business KPI joins) means exporting data and building it yourself. And since January 2026, Langfuse is part of ClickHouse [5] . The team has said the open-source product continues; the open questions are roadmap coupling to ClickHouse-native architecture and how cloud/residency options evolve. Not a reason to leave — a reason to keep your quality layer portable.

LangSmith in depth

LangSmith is the commercial observability and evaluation platform from the LangChain team [3] . Its defining advantage is first-party depth: if your agents are built on LangChain or LangGraph, instrumentation is automatic and the trace view understands your agent's structure — graph nodes, tool calls, retries — rather than showing generic spans. The evaluation tooling (datasets, experiments, online evaluators) and the prompt playground are polished, and the hosted product removes operational burden.

The reality-check on LangChain coupling: LangSmith does work without LangChain — OTel ingestion and standalone SDKs exist — but most of what makes it better than alternatives assumes LangChain primitives. Adopting LangSmith from outside that ecosystem buys you a good-but-not-exceptional tracer at proprietary pricing. It is not open source, self-hosting is gated to enterprise contracts, and per-seat-plus-per-trace pricing [4] needs modelling before you commit high-volume agent traffic to it.

What they cost at 1M traces a month

Published list prices as of June 2026 — both vendors revise pricing, so treat these as order-of-magnitude and verify on their pricing pages [2] [4] .

LangSmith: usage is metered per trace — list price around $0.50 per 1,000 base traces (14-day retention), several times that for extended retention — plus per-seat fees on paid plans. At 1M base traces a month that is roughly $500/month in trace charges before seats and retention upgrades; extended retention on a meaningful share of traffic moves it into four figures.

Langfuse Cloud: a platform fee per tier plus tiered usage per ingested unit (an agent session typically emits multiple units — spans, generations, scores). Comparable volumes typically land in the low hundreds of dollars a month on the published tiers. Self-hosted Langfuse has no usage bill at all — you pay in infrastructure (it runs on ClickHouse and Postgres) and the engineering time to operate it, which is rarely zero.

The honest summary: at small volume the difference is noise; at serious agent volume (agents are chatty — one user task can be dozens of traced steps) per-trace metering compounds fastest, self-hosting is cheapest in cash and most expensive in attention.

Where both stop: the quality gap

This section is our take — we sell the thing we're about to describe. Read it with that in mind.

Both products answer "what did my LLM application do?" extremely well. Neither is built to answer the question that determines whether your agent programme survives contact with the business: is each agent doing its job, at acceptable quality, at acceptable cost — and can you prove it?

Concretely, four things stay missing whichever you pick: task-completion verdicts for multi-step agent sessions (not per-generation scores); customer and business feedback captured where it happens and tied to the agent that caused it; cost attributed per agent, per version, per customer — not per trace; and an audit-grade record of quality over time that you can hand to a customer or a regulator. Prefactor is built for that layer. It ingests the traces you already collect — from Langfuse, LangSmith, or any OpenTelemetry source — and adds verdicts, evals, feedback, and per-agent analytics on top. Whichever tracer you choose below, you keep.

Pick Langfuse if

Open source or self-hosting is a requirement
You use multiple frameworks (or none)
You want prompt management without vendor lock-in
Data residency drives your architecture

Pick LangSmith if

You are committed to LangChain/LangGraph
You want first-party agent step visibility
Hosted-only is acceptable (or enterprise budget exists)
Your eval workflow lives in datasets and experiments

And in either case: if agents are doing real work for real customers, plan for the quality layer from day one. Tracing tells you what happened. Someone still has to decide whether it was any good. Also see our own head-to-heads — Prefactor vs Langfuse and Prefactor vs LangSmith — and the Langfuse alternative page if you are evaluating a move.

Get the quality layer that works with either

Prefactor adds completion verdicts, evals, feedback, and per-agent cost on top of the traces you already collect — Langfuse, LangSmith, or any OTel source.

Book a demo View all comparisons

Frequently asked questions

Langfuse vs LangSmith: which is better for agents?

For agent tracing, both are capable: LangSmith has the deepest visibility into LangGraph agents because it is built by the same team; Langfuse traces agents from any framework and lets you self-host. Neither is "better for agents" universally — the deciding factor is your stack. If you are committed to LangChain/LangGraph, LangSmith gives you first-party integration. If you are framework-agnostic or need open source, Langfuse. For agent quality — completion verdicts, business feedback, per-agent cost — both stop short, and that is the layer a quality platform adds on top.

Is LangSmith only for LangChain?

No. LangSmith supports OpenTelemetry-based ingestion and SDKs that work without LangChain. But it is fair to say its value concentrates in the LangChain/LangGraph ecosystem: the automatic instrumentation, agent step visualisation, and prompt tooling are at their best with LangChain primitives. Teams using other frameworks can use LangSmith, but they give up much of what makes it compelling versus alternatives.

Is Langfuse open source? Is LangSmith?

Langfuse is open source — the core platform is MIT-licensed and self-hostable, with some enterprise features under commercial licence; the project continues post-acquisition by ClickHouse (announced January 2026). LangSmith is proprietary: it is a hosted commercial product, with self-hosted deployment available on enterprise plans but no open-source edition.

What do Langfuse and LangSmith cost at scale?

As of June 2026, both publish usage-based pricing. LangSmith charges per seat plus per trace (published list price around $0.50 per 1,000 base traces, more for extended retention) — at 1 million traces a month that is on the order of $500/month in trace charges plus seats. Langfuse Cloud charges a platform fee plus tiered usage per ingested unit, which typically lands in the low hundreds of dollars a month at similar volumes — and self-hosting Langfuse trades the usage bill for your own infrastructure and operations cost. Both vendors change pricing; check their pricing pages before budgeting.

Do I still need evals if I have tracing?

Yes. Tracing records what the agent did; evals score whether it did its job. A trace of a hallucinated answer looks identical to a trace of a correct one — same spans, same latency, same token counts. Both Langfuse and LangSmith ship eval features attached to traces, which is a good start. A production agent additionally needs golden-dataset regression on every change and scored sampling of live traffic, with verdicts tracked per agent version over time.

How We Reviewed This Comparison

This page was reviewed against public product and documentation pages on June 13, 2026. If a vendor has changed a feature, product name, or positioning since then, send a correction and we will update the comparison.

Numbered source links in the page body point to the ordered public sources below.

Sources reviewed

Langfuse documentation
Langfuse pricing
LangSmith documentation
LangSmith pricing
Langfuse is joining ClickHouse (announcement) Acquisition announced January 2026; referenced in the post-acquisition risk discussion.

Prefactor context

Methodology

Reviewed public product, documentation, and launch material visible at the time of writing.
Mapped each page to the primary buyer, control layer, and runtime capabilities each vendor describes publicly.
Prefer direct product and documentation pages over analyst summaries or reseller material.

Reviewed against public sources on June 13, 2026 Suggest a correction