What is prompt caching?

Prompt caching (sometimes called context or prefix caching) reuses repeated prompt context so providers bill repeated sections at a lower rate. We only reflect those savings when our pipeline has verified caching support for that row—keeping estimates conservative for finance teams in the US, Canada, and Australia.

Is DeepSeek cheaper than Llama on API?

Often yes on hosted APIs, but list prices move and routing differs by region. Use the cheapest tab with your workload to see blended monthly estimates; self-hosted Llama may beat hosted pricing if you already amortize GPUs—common for Canadian and Australian teams with data-residency requirements.

Interactive leaderboard

Cheapest LLM APIs 2026: Low-Cost AI Models Ranked by Workload

Discover the cheapest LLM APIs in 2026: blended input/output cost, batch discounts, and prompt caching. Compare budget AI models for startups and agencies in the US, Canada, and Australia.

Lowest estimated monthly API cost for the same workload in 2026

The cheapest tab ranks models by estimated spend for your exact monthly requests and token pattern, including batch and cached pricing only when our database confirms eligibility. Founders and agencies in the United States, Canada, and Australia use it to protect margins on high-volume chat, summarization, and RAG pipelines without guessing list prices from blog posts.

Est. monthly ROI score Coding Reasoning Speed Math Context Overall Open-weight

Workload & pricing toggles

Workload presets

Same three scenarios as the main AI API calculator: moderate traffic, large RAG-style context, or per-request max tokens with a lower request count.

Include Vision / Image Processing

Off — no image fees in cost estimates for vision-capable models.

Turn On to include image fees.

OffOn

Use Cached Pricing

Enable to get 50% off input tokens where cached rates apply

OffOn

Deep Reasoning / Thinking Mode

Model hidden reasoning / extended thinking charged like output tokens when enabled.

OffOn

Batch Pricing

Enable for 50% off input & output where batch/async pricing applies

OffOn

Input Tokens≈ $100.00/mo

1K—1.0M

Output Tokens≈ $100.00/mo

100—500K

Monthly API Requests≈ $200.00 total

10—100K

Cached / batch est. monthly values only change after the pipeline sets supports_caching or supports_batch in Supabase. The toggles here narrow the table to models whose catalog or provider typically supports those modes.

Magic quadrant (top 15)

X: est. monthly · Y: Cheapest (est. monthly) · Dot: provider color · Hover for rank, model & details

Full leaderboard

Showing 48 of 365 models.

Model	Est. monthly	ROI score	Coding	Reasoning	Speed	Math	Context	Overall
Auto Router	VARIABLE	81	90	90	70	90	2.0M	90
OpenRouter: Fusion	VARIABLE	78	85	85	40	85	128K	85
Elephant	Free	78	90	83	70	88	262K	86
Body Builder (beta)	VARIABLE	54	45	43	90	40	128K	43
Owl Alpha	Free	67	65	68	85	60	1.0M	65
Free Models Router	Free	56	45	48	90	45	200K	46
Google: Lyria 3 Pro Preview	Free	68	70	70	55	60	1.0M	68
Google: Lyria 3 Clip Preview	Free	31	0	5	50	0	1.0M	3
Pareto Code Router	VARIABLE	78	88	85	70	85	2.0M	86
inclusionAI: Ling-2.6-flash	$0.70	82	65	68	90	65	262K	66
Meta: Llama 3.1 8B Instruct	$1.10	63	45	48	90	25	131K	41
Mistral: Mistral Nemo	$1.10	65	45	50	90	35	131K	45
IBM: Granite 4.0 Micro	$1.80	65	45	58	85	65	131K	56
Sao10K: Llama 3 8B Lunaris	$2.10	62	45	63	90	45	8K	54
LiquidAI: LFM2-24B-A2B	$2.40	61	71	44	97	55	128K	54
OpenAI: gpt-oss-20b	$2.56	75	70	80	90	95	131K	81
Qwen: Qwen2.5 7B Instruct	$2.60	66	65	60	90	75	131K	65
Qwen: Qwen-Turbo	$2.60	58	50	50	50	50	131K	50
Mistral: Mistral Small 3	$2.80	71	75	75	90	75	33K	75
Amazon: Nova Micro 1.0	$2.80	66	68	63	95	75	128K	67
Cohere: Command R7B (12-2024)	$3.00	52	35	48	90	40	128K	43
IBM: Granite 4.1 8B	$3.00	65	68	64	90	65	131K	65
MythoMax 13B	$3.00	50	35	45	85	30	4K	39
Google: Gemma 3 4B	$3.00	64	45	68	95	75	131K	64
Meta: Llama 3.2 1B Instruct	$3.09	45	25	35	95	30	131K	31
NVIDIA: Nemotron Nano 9B V2	$3.20	67	70	63	90	85	131K	70
Arcee AI: Trinity Mini	$3.30	76	82	89	90	88	131K	87
OpenAI: gpt-oss-120b	$3.36	70	70	80	95	75	131K	76
Google: Gemma 3 12B	$3.50	68	70	70	88	85	131K	74
Google: Gemma 3n 4B	$3.60	62	60	63	90	65	33K	63
Qwen: Qwen3 30B A3B Instruct 2507	$3.86	65	68	70	85	70	131K	70
NVIDIA: Nemotron 3 Nano 30B A3B	$4.00	65	60	68	90	85	262K	70
Microsoft: Phi 4	$4.00	64	65	68	85	70	16K	68
Qwen: Qwen3 235B A22B Instruct 2507	$4.60	76	92	93	70	92	262K	92
Tencent: Hy3 preview	$4.62	71	80	83	70	85	262K	83
Amazon: Nova Lite 1.0	$4.80	65	75	73	90	75	300K	74
Google: Gemma 3 27B	$4.80	62	60	73	75	65	131K	68
Reka Edge	$5.00	49	45	48	85	45	16K	46
Mistral: Mistral Small 3.2 24B	$5.00	63	75	70	88	70	128K	71
Z.ai: GLM 4 32B	$5.00	64	67	78	60	65	128K	72
Mistral: Ministral 3 3B 2512	$5.00	47	35	45	90	45	131K	43
Qwen: Qwen3 235B A22B Thinking 2507	$5.00	75	88	93	60	95	262K	92
Qwen: Qwen3.5-Flash	$5.20	75	88	92	95	95	1.0M	92
Meta: Llama 3.2 3B Instruct	$5.39	51	35	55	95	55	131K	50
Qwen: Qwen3.5-9B	$5.50	69	70	87	85	83	262K	82
Qwen: Qwen3 Coder 30B A3B Instruct	$5.50	58	65	65	90	60	160K	64
Baidu: ERNIE 4.5 21B A3B Thinking	$5.60	57	55	63	70	65	131K	61
Baidu: ERNIE 4.5 21B A3B	$5.60	54	50	58	85	60	131K	56

Need a shareable artifact?

Get a print-ready PDF of your results and a CSV spreadsheet. Tap the button, then enter your work email. We use it to build your files and start the download—and to email you a copy if the site owner enabled that.

AI ROI Leaderboard & Discovery by LeadsCalc

Detailed analysis

PDF Breakdown

Receive a comprehensive native vector PDF of this leaderboard: your workload, filters, top rankings, and a table snapshot (sorted: Cheapest (est. monthly)).

Instant setup

No CC required

By submitting, you agree to our Privacy Policy and Terms.

Agency accelerator

Whitelabel Est. monthly Leaderboard
for your site

Embed the interactive cheapest (est. monthly) view on your own domain — whitelabel branding, lead capture, and the same workload sliders your prospects already use on LeadsCalc.

1-Click CRM sync

Custom branding

Branded reports

Lead analytics

Free to start

$0/mo*

GET STARTED

NO CREDIT CARD REQUIRED

How it works

Methodology: How we rank Cost-Optimized LLMs

Transparent, benchmark-driven rankings—same craft as our single-model deep dives.

How batch and prompt caching affect “cheapest” rankings

Rankings are based on the blended cost of input and output tokens for a comparable monthly workload. Where our database verifies eligibility, we factor in typical 50% batch API discounts and prompt-caching savings (caching is applied in estimates only when our pipeline has confirmed supports_caching for that model). Figures are indicative for buyers comparing vendors transparently—not a quote from any single provider.

Battle Arena

Compare up to four LLMs side by side

Tick up to four models in the leaderboard table, then open Battle Arena for API pricing, benchmarks, and workload math in one view—perfect when you are shortlisting vendors for a pilot in the US, Canada, or Australia.

Prefer a head start? Jump into high-intent comparisons people search for every day—same interactive calculator, zero signup.

Open Battle ArenaUp to 4 models · Live estimates

Popular comparisons

Signals & spend

Value analysis

Benchmarks vs. estimated API cost—read the story your CFO cares about.

When the cheapest model is not the best business choice

Extreme savings can hide weaker reliability or context limits. Use this tab to find a cost floor, then cross-check coding or reasoning tabs for quality gates. Enterprise buyers in Australia and Canada often layer vendor DPAs and region routing on top of list economics; US teams frequently validate support SLAs before migrating traffic.

Production deployment

High-Volume Data Processing

How teams in the US, Canada, and Australia deploy these models in production.

Bulk extraction, sentiment analysis, and content moderation at scale

When processing millions of rows of unstructured data, token economics dictate feasibility. Data engineering teams in the US and Canada use these cost-optimized models for bulk entity extraction, continuous social media sentiment analysis, and automated content moderation, where 'good enough' accuracy at 1/100th the price of a frontier model creates massive business value.

Architecture

Maximizing API Cost Efficiency

Strategies to reduce monthly API spend without sacrificing capability.

100% batch API utilization and aggressive prompt caching

To achieve the absolute lowest cost per million tokens, modern architectures combine two features: Batch APIs (which typically offer a 50% discount for 24-hour turnaround) and Prompt Caching (which discounts large, static system prompts by up to 90%). Filter this leaderboard by 'Batch pricing' to identify vendors that support these aggressive cost-reduction features.

Embed-ready

Need this live Cost-Optimized data on your website?

Join 500+ agencies in the US and Australia using LeadsCalc to capture high-intent leads. Embed this interactive Cost-Optimized leaderboard on your site in about a minute—Canadian teams use the same flows for CAD-priced proposals and compliance-friendly landing pages.

Customize & Embed this ToolWhite-label · No code required

United StatesCanadaAustralia

Live preview

Your visitors compare Cost-Optimized models without leaving your domain.

Support & clarity

Frequently Asked Questions

Focused on teams in the United States, Canada, and Australia.

Cost-Optimized

Start with realistic token and request volumes, enable batch pricing only when your integration supports it, and turn on cached pricing when your model and provider document cache tiers. Agencies across the US, Canada, and Australia use this table to shortlist models before negotiating enterprise commits.

Lowest estimated monthly API cost for the same workload in 2026

Workload & pricing toggles

Include Vision / Image Processing

Use Cached Pricing

Deep Reasoning / Thinking Mode

Batch Pricing

Magic quadrant (top 15)

Full leaderboard

PDF Breakdown

Whitelabel Est. monthly Leaderboardfor your site

Methodology: How we rank Cost-Optimized LLMs

How batch and prompt caching affect “cheapest” rankings

Compare up to four LLMs side by side

Value analysis

When the cheapest model is not the best business choice

High-Volume Data Processing

Bulk extraction, sentiment analysis, and content moderation at scale

Maximizing API Cost Efficiency

100% batch API utilization and aggressive prompt caching

Need this live Cost-Optimized data on your website?

Frequently Asked Questions

1How do I get the cheapest AI API?

2What is prompt caching?

3Is DeepSeek cheaper than Llama on API?

Whitelabel Est. monthly Leaderboard
for your site