Which LLM is cheapest in 2026?

For raw input price, small models like GPT-4o mini, Gemini Flash, and DeepSeek lead. But 'cheapest' depends on your input/output ratio and which model is capable enough for the task — sort the table by the dimension that drives your cost.

Why does output cost more than input?

Generating tokens is more compute-intensive than reading them, so providers price output 3–5× higher. Workloads with long outputs (summaries, drafts) are dominated by output cost.

How often is this updated?

It's snapshotted from the open-source LiteLLM model table and refreshed regularly, so it tracks new model releases and price changes faster than most static comparison posts.

LLM Pricing Comparison 2026 — GPT-5, Claude, Gemini per-token cost

Model	Provider	Input / 1M ↑	Output / 1M	Context
GPT-4o minicheapest input	OpenAI	$0.15	$0.6	128k (~192 pp)
DeepSeek-V3	DeepSeek	$0.28	$0.42	131.072k (~197 pp)
Gemini 2.5 Flash	Google	$0.3	$2.5	1,048.576k (~1573 pp)
Mistral Large	Mistral	$0.5	$1.5	262.144k (~393 pp)
Llama 3.3 70B (Groq)	Meta / Groq	$0.59	$0.79	128k (~192 pp)
Claude Haiku 4.5	Anthropic	$1	$5	200k (~300 pp)
GPT-5	OpenAI	$1.25	$10	272k (~408 pp)
Gemini 2.5 Pro	Google	$1.25	$10	1,048.576k (~1573 pp)
o3	OpenAI	$2	$8	200k (~300 pp)
GPT-4o	OpenAI	$2.5	$10	128k (~192 pp)
Claude Sonnet 4.5	Anthropic	$3	$15	200k (~300 pp)
Claude Opus 4.5	Anthropic	$5	$25	200k (~300 pp)

Updated 2026-05-22 from the LiteLLM model table. USD per 1M tokens. Click a column to sort.

How to use it.

1. Sort by what matters to you

Click any column header to sort by input price, output price, or context window. Default sort is cheapest input first.

2. Watch input vs output prices

Output tokens cost several times more than input. If your workload generates long replies, output price dominates your bill — sort by it.

3. Factor in context window

A cheap model with a small context window may force you to chunk documents, adding complexity. The context column (with a rough page estimate) helps you weigh that.

4. Turn it into your number

Pricing tells you the rate; the cost calculator turns it into your actual monthly bill for a given workload.

Frequently asked questions.

Which LLM is cheapest in 2026?: For raw input price, small models like GPT-4o mini, Gemini Flash, and DeepSeek lead. But 'cheapest' depends on your input/output ratio and which model is capable enough for the task — sort the table by the dimension that drives your cost.
Why does output cost more than input?: Generating tokens is more compute-intensive than reading them, so providers price output 3–5× higher. Workloads with long outputs (summaries, drafts) are dominated by output cost.
How often is this updated?: It's snapshotted from the open-source LiteLLM model table and refreshed regularly, so it tracks new model releases and price changes faster than most static comparison posts.

More free AI tools.

LLM API Cost Calculator

Enter your tokens per call and monthly volume, and see what the same workload costs across every major model — sorted cheapest first.

Context Window Comparison

Compare every major model's context window — in tokens and in plain pages of text — so you can see what actually fits in a single prompt.

LLM Token Counter & Cost

Paste a prompt or document to count its tokens live, then see what it costs as input on every major model — and whether it even fits each model's context window.

LLM pricing, compared — and actually up to date.