Interactive leaderboard

Cheapest LLM APIs 2026: Low-Cost AI Models Ranked by Workload

Discover the cheapest LLM APIs in 2026: blended input/output cost, batch discounts, and prompt caching. Compare budget AI models for startups and agencies in the US, Canada, and Australia.

Lowest estimated monthly API cost for the same workload in 2026

The cheapest tab ranks models by estimated spend for your exact monthly requests and token pattern, including batch and cached pricing only when our database confirms eligibility. Founders and agencies in the United States, Canada, and Australia use it to protect margins on high-volume chat, summarization, and RAG pipelines without guessing list prices from blog posts.

Workload & pricing toggles

Workload presets

Same three scenarios as the main AI API calculator: moderate traffic, large RAG-style context, or per-request max tokens with a lower request count.

Include Vision / Image Processing

Off — no image fees in cost estimates for vision-capable models.

Turn On to include image fees.

OffOn

Use Cached Pricing

Enable to get 50% off input tokens where cached rates apply

OffOn

Deep Reasoning / Thinking Mode

Model hidden reasoning / extended thinking charged like output tokens when enabled.

OffOn

Batch Pricing

Enable for 50% off input & output where batch/async pricing applies

OffOn
≈ $100.00/mo
8K
1K1.0M
≈ $100.00/mo
2K
100500K
≈ $200.00 total
5K
10100K

Cached / batch est. monthly values only change after the pipeline sets supports_caching or supports_batch in Supabase. The toggles here narrow the table to models whose catalog or provider typically supports those modes.

Magic quadrant (top 15)

X: est. monthly · Y: Cheapest (est. monthly) · Dot: provider color · Hover for rank, model & details

Full leaderboard

Showing 48 of 327 models.

PickModelEst. monthlyROI scoreCodingReasoningSpeedMathContextOverall
Free Models Router
Free
59
43
55
85
55
200K
52
Pareto Code Router
VARIABLE
74
88
73
85
80
200K
78
Elephant
Free
78
90
83
70
88
262K
86
Google: Lyria 3 Pro Preview
Free
30
0
0
0
0
1.0M
0
Body Builder (beta)
VARIABLE
30
0
0
0
0
128K
0
Auto Router
VARIABLE
77
84
83
70
86
2.0M
84
Google: Lyria 3 Clip Preview
Free
6
0
0
70
0
1.0M
0
Mistral: Mistral Nemo$0.709
0
0
0
0
131K
0
Meta: Llama 3.1 8B Instruct$1.3059
0
34
85
85
16K
38
Meta: Llama 3 8B Instruct$1.6068
62
69
85
30
8K
58
IBM: Granite 4.0 Micro$1.7878
81
76
85
81
131K
78
Sao10K: Llama 3 8B Lunaris$2.106
0
0
0
0
8K
0
LiquidAI: LFM2-24B-A2B$2.4043
0
44
97
0
33K
22
Google: Gemma 3 4B$2.406
0
0
0
0
131K
0
OpenAI: gpt-oss-20b$2.6084
96
97
85
98
131K
97
Qwen: Qwen2.5 7B Instruct$2.606
0
0
0
0
33K
0
Qwen: Qwen-Turbo$2.606
0
0
0
0
131K
0
Mistral: Mistral Small 3$2.806
0
0
0
0
33K
0
Amazon: Nova Micro 1.0$2.8072
69
82
95
69
128K
76
Google: Gemma 3 12B$2.906
0
0
0
0
131K
0
Cohere: Command R7B (12-2024)$3.006
0
0
0
0
128K
0
MythoMax 13B$3.006
0
0
0
0
4K
0
Meta: Llama 3.2 1B Instruct$3.086
0
0
0
0
60K
0
NVIDIA: Nemotron Nano 9B V2$3.2075
72
84
85
98
131K
85
Arcee AI: Trinity Mini$3.3073
82
80
85
80
131K
81
OpenAI: gpt-oss-120b$3.4639
0
45
55
0
131K
23
Google: Gemma 3n 4B$3.605
0
0
0
0
33K
0
Qwen: Qwen3 235B A22B Instruct 2507$3.8470
73
83
55
78
262K
79
NVIDIA: Nemotron 3 Nano 30B A3B$4.0071
74
79
85
91
262K
81
Microsoft: Phi 4$4.0062
70
63
85
68
16K
66
Qwen: Qwen3 14B$4.8059
52
72
70
58
41K
63
Amazon: Nova Lite 1.0$4.805
0
0
0
0
300K
0
Google: Gemma 3 27B$4.805
0
0
0
0
131K
0
Mistral: Ministral 3 3B 2512$5.0031
0
28
85
0
131K
14
Mistral: Mistral Small 3.2 24B$5.0069
92
83
85
69
128K
82
Reka Edge$5.005
0
0
0
0
16K
0
Z.ai: GLM 4 32B$5.005
0
0
0
0
128K
0
Qwen: Qwen3.5-Flash$5.2023
0
0
90
0
1.0M
0
Meta: Llama 3.2 3B Instruct$5.444
0
0
0
0
80K
0
Qwen: Qwen3.5-9B$5.5068
65
87
70
83
262K
80
Qwen: Qwen3 Coder 30B A3B Instruct$5.5047
52
44
55
31
160K
43
Qwen: Qwen3 32B$5.6073
85
89
60
92
41K
89
Baidu: ERNIE 4.5 21B A3B$5.6072
85
89
60
87
120K
88
Baidu: ERNIE 4.5 21B A3B Thinking$5.604
0
0
0
0
131K
0
Google: Gemma 4 26B A4B$5.704
0
0
0
0
262K
0
ByteDance Seed: Seed 1.6 Flash$6.0066
87
76
85
77
262K
79
OpenAI: gpt-oss-safeguard-20b$6.0061
78
70
55
60
131K
70
Google: Gemini 2.0 Flash Lite$6.004
0
0
0
0
1.0M
0

Need a shareable artifact?

Download a print-ready PDF from the leaderboard and workload above. No email step—lead capture is off.

Detailed analysis

PDF Breakdown

Receive a comprehensive native vector PDF of this leaderboard: your workload, filters, top rankings, and a table snapshot (sorted: Cheapest (est. monthly)).

Instant setup
No CC required

By submitting, you agree to our Privacy Policy and Terms.

Agency accelerator

Whitelabel Est. monthly Leaderboard
for your site

Embed the interactive cheapest (est. monthly) view on your own domain — whitelabel branding, lead capture, and the same workload sliders your prospects already use on LeadsCalc.

1-Click CRM sync
Custom branding
Branded reports
Lead analytics

Free to start

$0/mo*
GET STARTED

NO CREDIT CARD REQUIRED

How it works

Methodology: How we rank Cost-Optimized LLMs

Transparent, benchmark-driven rankings—same craft as our single-model deep dives.

How batch and prompt caching affect “cheapest” rankings

Rankings are based on the blended cost of input and output tokens for a comparable monthly workload. Where our database verifies eligibility, we factor in typical 50% batch API discounts and prompt-caching savings (caching is applied in estimates only when our pipeline has confirmed supports_caching for that model). Figures are indicative for buyers comparing vendors transparently—not a quote from any single provider.

Battle Arena

Compare up to four LLMs side by side

Tick up to four models in the leaderboard table, then open Battle Arena for API pricing, benchmarks, and workload math in one view—perfect when you are shortlisting vendors for a pilot in the US, Canada, or Australia.

Prefer a head start? Jump into high-intent comparisons people search for every day—same interactive calculator, zero signup.

Open Battle ArenaUp to 4 models · Live estimates
Signals & spend

Value analysis

Benchmarks vs. estimated API cost—read the story your CFO cares about.

When the cheapest model is not the best business choice

Extreme savings can hide weaker reliability or context limits. Use this tab to find a cost floor, then cross-check coding or reasoning tabs for quality gates. Enterprise buyers in Australia and Canada often layer vendor DPAs and region routing on top of list economics; US teams frequently validate support SLAs before migrating traffic.

Production deployment

High-Volume Data Processing

How teams in the US, Canada, and Australia deploy these models in production.

Bulk extraction, sentiment analysis, and content moderation at scale

When processing millions of rows of unstructured data, token economics dictate feasibility. Data engineering teams in the US and Canada use these cost-optimized models for bulk entity extraction, continuous social media sentiment analysis, and automated content moderation, where 'good enough' accuracy at 1/100th the price of a frontier model creates massive business value.

Architecture

Maximizing API Cost Efficiency

Strategies to reduce monthly API spend without sacrificing capability.

100% batch API utilization and aggressive prompt caching

To achieve the absolute lowest cost per million tokens, modern architectures combine two features: Batch APIs (which typically offer a 50% discount for 24-hour turnaround) and Prompt Caching (which discounts large, static system prompts by up to 90%). Filter this leaderboard by 'Batch pricing' to identify vendors that support these aggressive cost-reduction features.

Embed-ready

Need this live Cost-Optimized data on your website?

Join 500+ agencies in the US and Australia using LeadsCalc to capture high-intent leads. Embed this interactive Cost-Optimized leaderboard on your site in about a minute—Canadian teams use the same flows for CAD-priced proposals and compliance-friendly landing pages.

Customize & Embed this ToolWhite-label · No code required
United StatesCanadaAustralia
Live preview

Your visitors compare Cost-Optimized models without leaving your domain.

Support & clarity

Frequently Asked Questions

Focused on teams in the United States, Canada, and Australia.

Start with realistic token and request volumes, enable batch pricing only when your integration supports it, and turn on cached pricing when your model and provider document cache tiers. Agencies across the US, Canada, and Australia use this table to shortlist models before negotiating enterprise commits.