Cheapest LLM APIs 2026: Low-Cost AI Models Ranked by Workload
Discover the cheapest LLM APIs in 2026: blended input/output cost, batch discounts, and prompt caching. Compare budget AI models for startups and agencies in the US, Canada, and Australia.
Lowest estimated monthly API cost for the same workload in 2026
The cheapest tab ranks models by estimated spend for your exact monthly requests and token pattern, including batch and cached pricing only when our database confirms eligibility. Founders and agencies in the United States, Canada, and Australia use it to protect margins on high-volume chat, summarization, and RAG pipelines without guessing list prices from blog posts.
Workload & pricing toggles
Same three scenarios as the main AI API calculator: moderate traffic, large RAG-style context, or per-request max tokens with a lower request count.
Include Vision / Image Processing
Off — no image fees in cost estimates for vision-capable models.
Turn On to include image fees.
Use Cached Pricing
Enable to get 50% off input tokens where cached rates apply
Deep Reasoning / Thinking Mode
Model hidden reasoning / extended thinking charged like output tokens when enabled.
Batch Pricing
Enable for 50% off input & output where batch/async pricing applies
Cached / batch est. monthly values only change after the pipeline sets supports_caching or supports_batch in Supabase. The toggles here narrow the table to models whose catalog or provider typically supports those modes.
Magic quadrant (top 15)
X: est. monthly · Y: Cheapest (est. monthly) · Dot: provider color · Hover for rank, model & detailsFull leaderboard
Showing 48 of 365 models.
| Pick | Model | Est. monthly | ROI score | Coding | Reasoning | Speed | Math | Context | Overall |
|---|---|---|---|---|---|---|---|---|---|
| Auto Router | VARIABLE | 81 | 90 | 90 | 70 | 90 | 2.0M | 90Auto Router optimizes across models for best output. Evidence cites top models reaching 92.3% MMLU. Mapped to 90 across logic, coding, and math to reflect frontier routing capabilities. Vision price defaulted to $0.007 per tier guidelines. | |
| OpenRouter: Fusion | VARIABLE | 78 | 85 | 85 | 40 | 85 | 128K | 85No explicit Fusion scores provided. Inferred as a heavyweight ensemble ('panel of expert models'), mapping to ~85 across coding (SWE-bench) and logic (GPQA). Speed is rated lower (40) due to multi-model deliberation and web search overhead. | |
| Elephant | Free | 78 | 90 | 83 | 70 | 88 | 262K | 86 | |
| Body Builder (beta) | VARIABLE | 54 | 45 | 43 | 90 | 40 | 128K | 43Lacking specific benchmarks, inferred from cited 1B-scale Fast-dLLM v2 (43.5 avg across HumanEval, GPQA, GSM8K). Mapped to 40-45 for coding, logic, math. Speed rated 90 for lightweight specialized API tool. | |
| Owl Alpha | Free | 67 | 65 | 68 | 85 | 60 | 1.0M | 65No exact scores for Owl Alpha; inferred as a lightweight reasoning model ('fewer parameters', 'designed for speed'). Mapped to mid-tier 0-100 scale (Coding 65, Logic 65) reflecting its agentic focus but smaller size. | |
| Free Models Router | Free | 56 | 45 | 48 | 90 | 45 | 200K | 46No specific benchmarks cited for the Free Router. Inferred lightweight tier scores (Coding/Logic ~45) based on typical free 8B-class models. Speed rated high (90). Vision price is $0 as the endpoint is free. | |
| Google: Lyria 3 Pro Preview | Free | 68 | 70 | 70 | 55 | 60 | 1.0M | 68Evidence notes Lyria 3 Pro scores well on SWE-bench and MMLU without exact figures. Mapped to 70s for Pro tier. Speed is 39.5 tok/s (55). Multimodal audio generation from images supported; default Pro vision price applied. | |
| Google: Lyria 3 Clip Preview | Free | 31 | 0 | 5 | 50 | 0 | 1.0M | 3Lyria 3 is a specialized music generation model lacking standard LLM benchmarks (SWE-bench, GPQA). Assigned 0 for coding/logic/math. Speed mapped to 50 from 38 tok/s. Multimodal scored 85 for native image-to-audio generation. | |
| Pareto Code Router | VARIABLE | 78 | 88 | 85 | 70 | 85 | 2.0M | 86OpenRouter docs state this is a router defaulting to High tier coding models based on Artificial Analysis percentiles. Lacking specific raw benchmarks, scores are mapped to ~85 reflecting flagship-level routed performance. Text-only inputs confirmed. | |
| inclusionAI: Ling-2.6-flash | $0.70 | 82 | 65 | 68 | 90 | 65 | 262K | 66Evidence cites GPQA, AIME, and LiveCodeBench without raw scores. Mapped to ~65 for coding/logic based on claimed ~40B dense equivalence. Speed scored 90 due to 200+ tokens/s. Flash tier adjustment applied. | |
| Meta: Llama 3.1 8B Instruct | $1.10 | 63 | 45 | 48 | 90 | 25 | 131K | 41MMLU 66.7%, GPQA 8.72%, MATH 15.56%, IFEval 49.22%. As an 8B lightweight tier, scores map to low/moderate logic (45) and math (25). Speed is rated high (90) due to its small, efficient architecture. | |
| Mistral: Mistral Nemo | $1.10 | 65 | 45 | 50 | 90 | 35 | 131K | 45GPQA 5.37% maps to Logic 35. IFEval 63.8% maps to Instruction 65. MATH Lvl 5 is 12.69%. As a 12B lightweight model, it scores lower on coding/logic than flagships but achieves high speed (90). | |
| IBM: Granite 4.0 Micro | $1.80 | 65 | 45 | 58 | 85 | 65 | 131K | 56Based on GPQA 32.15% (Logic ~35) and HumanEval 81.00% (Coding ~45). IFEval averages 84.32% (Instruction ~80). As a 3B 'Micro' tier model, scores reflect lower reasoning capacity compared to flagships, but strong instruction following. | |
| Sao10K: Llama 3 8B Lunaris | $2.10 | 62 | 45 | 63 | 90 | 45 | 8K | 54Evidence lacks specific benchmark scores. Inferred from Llama 3 8B lightweight tier: coding and math mapped to ~45, logic ~60. Speed rated high (90) due to small 8B parameter size. | |
| LiquidAI: LFM2-24B-A2B | $2.40 | 61 | 71 | 44 | 97 | 55 | 128K | 54Benchable.ai cites Coding 71%, Reasoning 50%, Instruction 38%, and Speed at 97th percentile. Mapped directly to 0-100 scale. As a lightweight 2B active MoE, it prioritizes speed over flagship-level logic and coding. | |
| OpenAI: gpt-oss-20b | $2.56 | 75 | 70 | 80 | 90 | 95 | 131K | 81GPQA 71.5% and MMLU 85.3% map to Logic 80. AIME 2025 98.7 maps to Math 95. As a 21B lightweight MoE, Speed is 90. Coding inferred at 70 due to lack of explicit SWE-bench. | |
| Qwen: Qwen2.5 7B Instruct | $2.60 | 66 | 65 | 60 | 90 | 75 | 131K | 65HumanEval 84.8% and GPQA 36.4% map to 65 coding and 45 logic. IFEval 71.2% maps to 75 instruction. As a 7B lightweight model, it scores lower than flagships but achieves high speed (138 tokens/s, mapped to 90). | |
| Qwen: Qwen-Turbo | $2.60 | 58 | 50 | 50 | 50 | 50 | 131K | 50 | |
| Mistral: Mistral Small 3 | $2.80 | 71 | 75 | 75 | 90 | 75 | 33K | 75HumanEval 88.41% maps to coding 75. GPQA Diamond 45.96% maps to logic 70. As a 24B 'Small' tier model, it scores lower than flagships but achieves high speed (90). | |
| Amazon: Nova Micro 1.0 | $2.80 | 66 | 68 | 63 | 95 | 75 | 128K | 67HumanEval 81.1% (Coding 68), GPQA 40% (Logic 45), IFEval 87.2% (Instruction 80), GSM8K 92.3% (Math 75). As a 'Micro' tier model, speed is rated very high (95) while coding and logic reflect its lightweight, text-only nature. | |
| Cohere: Command R7B (12-2024) | $3.00 | 52 | 35 | 48 | 90 | 40 | 128K | 43Evidence explicitly states no benchmark data is available, requiring a conservative rating. As a 7B lightweight model, capabilities are estimated lower than flagships, while speed is rated high (90) due to its small, fast architecture. | |
| IBM: Granite 4.1 8B | $3.00 | 65 | 68 | 64 | 90 | 65 | 131K | 65HumanEval >89.7% (coding ~68), MMLU >66% (logic ~58), IFEval >74.8% (instruction ~70) based on 3.3 baseline. Lightweight 8B tier adjustments applied for high speed and scaled capabilities. | |
| MythoMax 13B | $3.00 | 50 | 35 | 45 | 85 | 30 | 4K | 39Evidence cites SWE-bench and HumanEval without exact scores. As a 13B Llama-2 fine-tune, capabilities are inferred as lightweight tier. Assigned ~35-40 for coding/logic, with high speed (85) reflecting its small parameter count and fast inference claims. | |
| Google: Gemma 3 4B | $3.00 | 64 | 45 | 68 | 95 | 75 | 131K | 64Based on IFEval (90.2%) mapped to 90 Instruction, MATH (75.6%) to 75 Math, and MMLU-Pro (43.6%) to 45 Logic. As a 4B lightweight tier, Coding (MBPP 63.2%) maps to 45, while Speed is rated 95. | |
| Meta: Llama 3.2 1B Instruct | $3.09 | 45 | 25 | 35 | 95 | 30 | 131K | 31MMLU 49.3%, GSM8K 44.4%, MATH 30.6%. As a 1B lightweight model, Logic (35) and Math (30) reflect sub-50% benchmarks. Coding (25) based on 0.6 index. Speed (95) is maximized for this ultra-small tier. | |
| NVIDIA: Nemotron Nano 9B V2 | $3.20 | 67 | 70 | 63 | 90 | 85 | 131K | 70Evidence shows GPQA at 64.0% (Logic ~65) and LiveCodeBench at 72.4% (Coding ~70). MATH-500 is 97.8% (Math ~85). As a 9B lightweight reasoning model, it achieves high speed (115 tok/s, Speed ~90) but lacks multimodal support. | |
| Arcee AI: Trinity Mini | $3.30 | 76 | 82 | 89 | 90 | 88 | 131K | 87GPQA Diamond at 92.1% maps to 92 Logic. AIME 2025 at 58.6% maps to 88 Math. LM Market Cap coding score of 82 maps to 82 Coding. As a 'Mini' tier, speed is rated high (90). | |
| OpenAI: gpt-oss-120b | $3.36 | 70 | 70 | 80 | 95 | 75 | 131K | 76HumanEval 71% maps to 70 coding. MMLU 66-90% maps to 80 logic. GSM8K 75% maps to 75 math. 500 tok/s throughput maps to 95 speed. Native reasoning supported via OpenRouter reasoning parameter. | |
| Google: Gemma 3 12B | $3.50 | 68 | 70 | 70 | 88 | 85 | 131K | 74HumanEval 85.4% (Coding ~70), GPQA 40.9% (Logic ~55), IFEval 88.9% (Instruction ~85), MATH 83.8% (Math ~85). As a 12B lightweight tier, scores reflect strong math/instruction but moderate logic/coding compared to flagships. | |
| Google: Gemma 3n 4B | $3.60 | 62 | 60 | 63 | 90 | 65 | 33K | 63ARC-E 81.6% and HellaSwag 78.6% map to ~55 Logic. Outperforms Gemma 2 9B on HumanEval, mapping to ~60 Coding. As a 4B lightweight model, Speed is rated high (~90) while reasoning is scaled down. | |
| Qwen: Qwen3 30B A3B Instruct 2507 | $3.86 | 65 | 68 | 70 | 85 | 70 | 131K | 70Evidence cites HumanEval, GPQA, and IFEval testing but lacks exact scores. As a 30B (3.3B active) lightweight MoE, scores are inferred: Coding ~68, Logic ~65. Speed is high (77 tok/s), mapped to 85. | |
| NVIDIA: Nemotron 3 Nano 30B A3B | $4.00 | 65 | 60 | 68 | 90 | 85 | 262K | 70Evidence lacks exact percentages but notes AIME 2025 wins over DeepSeek-V3.1 (Math: 85) and GPQA/SWE-Bench Verified losses to GLM-4.7 (Logic: 70, Coding: 60). As a 3B-active Nano tier, speed is heavily weighted (90). | |
| Microsoft: Phi 4 | $4.00 | 64 | 65 | 68 | 85 | 70 | 16K | 68Evidence lacks exact benchmark scores for base Phi-4 14B. Inferred mid-tier scores (Coding 65, Logic 65) based on its 14B parameter size. Multimodal and native reasoning are false as those belong to separate Phi-4-Vision and Phi-4-Reasoning variants. | |
| Qwen: Qwen3 235B A22B Instruct 2507 | $4.60 | 76 | 92 | 93 | 70 | 92 | 262K | 92SWE-bench Verified 55.6% maps to 92 coding. GPQA 77.5% maps to 95 logic. IFEval 93.3% maps to 90 instruction. Flagship 235B MoE tier yields 70 speed. | |
| Tencent: Hy3 preview | $4.62 | 71 | 80 | 83 | 70 | 85 | 262K | 83No benchmark scores provided in evidence. Inferred coding (80) and logic (85) based on its high-efficiency MoE architecture and explicit support for configurable reasoning modes designed for agentic workflows. | |
| Amazon: Nova Lite 1.0 | $4.80 | 65 | 75 | 73 | 90 | 75 | 300K | 74HumanEval 85.4% (Coding ~75), GPQA 42% (Logic ~60), IFEval 89.7% (Instruction ~85), MATH 73.3% (Math ~75). As a 'Lite' tier model, it prioritizes speed (~90) over flagship reasoning, reflecting lower GPQA and coding capabilities. | |
| Google: Gemma 3 27B | $4.80 | 62 | 60 | 73 | 75 | 65 | 131K | 68MMLU-Pro 67.5 and LiveCodeBench 29.7 map to logic 70 and coding 60. Multimodal supported (free API tier). No native reasoning tokens confirmed. Mid-weight 27B tier offers balanced speed and capability. | |
| Reka Edge | $5.00 | 49 | 45 | 48 | 85 | 45 | 16K | 46No exact benchmark scores provided in evidence. Inferred capabilities (Coding 45, Logic 45) based on its 7B lightweight tier. Speed rated high (85) for 7B efficiency. Multimodal scored 65 due to strong vision optimization claims. | |
| Mistral: Mistral Small 3.2 24B | $5.00 | 63 | 75 | 70 | 88 | 70 | 128K | 71GPQA Diamond 46.13% maps to 65 logic; HumanEval+ 92.90% maps to 75 coding; MATH 69.42% maps to 70 math. As a 24B Small tier model, speed is rated high (88) reflecting its lightweight, cost-optimized architecture. | |
| Z.ai: GLM 4 32B | $5.00 | 64 | 67 | 78 | 60 | 65 | 128K | 72HumanEval 67.3% (Coding 67), MMLU 81.0% (Logic 80), MATH 52.1% (Math 65). Mid-tier 32B model; scores reflect solid but non-frontier capabilities. Caching and batch supported. | |
| Mistral: Ministral 3 3B 2512 | $5.00 | 47 | 35 | 45 | 90 | 45 | 131K | 43Evidence lacks exact benchmark scores. As a 3B lightweight tier, coding (35) and logic (40) are inferred significantly below Mistral Large. Speed (90) is high due to tiny size. Native reasoning tokens are explicitly supported. | |
| Qwen: Qwen3 235B A22B Thinking 2507 | $5.00 | 75 | 88 | 93 | 60 | 95 | 262K | 92GPQA 81.1% maps to Logic 95. MMLU-Pro 84.4% and HMMT25 83.9% map to Instruction 90 and Math 95. Coding inferred high (88) via LiveCodeBench. Speed 55 tok/s maps to 60. Flagship 235B reasoning model. | |
| Qwen: Qwen3.5-Flash | $5.20 | 75 | 88 | 92 | 95 | 95 | 1.0M | 92SWE-bench Verified at 69.2% maps to 88 coding. GPQA Diamond at 84.2% maps to 92 logic. IFEval 91.9% maps to 92 instruction. As a Flash tier, speed is 95, though its reasoning capabilities rival flagship models. | |
| Meta: Llama 3.2 3B Instruct | $5.39 | 51 | 35 | 55 | 95 | 55 | 131K | 50Lightweight 3B tier. Mapped MMLU 63.4% to Logic 40, IFEval 73.9% to Instruction 70, and GSM8K 77.7% to Math 55. Coding inferred low (35) lacking SWE-bench. Speed rated 95 for 3B size. | |
| Qwen: Qwen3.5-9B | $5.50 | 69 | 70 | 87 | 85 | 83 | 262K | 82GPQA Diamond 81.7% maps to 82 logic. IFEval 91.5% maps to 91 instruction. LiveCodeBench 65.6% maps to 70 coding. MMMU 78.4% maps to 78 multimodal. As a 9B lightweight model, speed is high (85). | |
| Qwen: Qwen3 Coder 30B A3B Instruct | $5.50 | 58 | 65 | 65 | 90 | 60 | 160K | 64GPQA 51.6% (Logic 60) and LiveCodeBench 40.3% (Coding 65) reflect its 3B active parameter lightweight MoE architecture. Speed is high at 112 tok/s (Speed 90). Math maps to 60 based on AIME 29.7%. | |
| Baidu: ERNIE 4.5 21B A3B Thinking | $5.60 | 57 | 55 | 63 | 70 | 65 | 131K | 61Evidence lacks specific benchmark scores. Inferred as a lightweight 21B MoE 'thinking' tier. Assigned moderate coding (55) and logic (65) due to small size but native reasoning tokens. Speed (70) reflects lightweight architecture with thinking overhead. | |
| Baidu: ERNIE 4.5 21B A3B | $5.60 | 54 | 50 | 58 | 85 | 60 | 131K | 56Evidence lacks exact SWE-bench/GPQA scores. Mapped 21B (3B active) lightweight MoE tier to Coding 50, Logic 55. Speed rated 85 for sparse architecture. 'Thinking' SKU indicates native reasoning. Multimodal claimed; used small-tier default. |
Need a shareable artifact?
Get a print-ready PDF of your results and a CSV spreadsheet. Tap the button, then enter your work email. We use it to build your files and start the download—and to email you a copy if the site owner enabled that.
AI ROI Leaderboard & Discovery by LeadsCalc
PDF Breakdown
Receive a comprehensive native vector PDF of this leaderboard: your workload, filters, top rankings, and a table snapshot (sorted: Cheapest (est. monthly)).
By submitting, you agree to our Privacy Policy and Terms.
Whitelabel Est. monthly Leaderboard
for your site
Embed the interactive cheapest (est. monthly) view on your own domain — whitelabel branding, lead capture, and the same workload sliders your prospects already use on LeadsCalc.