Best AI Models for Math 2026: Quantitative LLM Workloads
Compare top math LLMs in 2026 with GSM8K and MATH benchmarks alongside live API pricing. For finance, STEM, and analytics teams in the US, Canada, and Australia.
Math-focused LLM rankings with API spend context in 2026
Math-heavy workloads punish silent errors. This tab emphasizes quantitative benchmarks while surfacing estimated API cost so data and engineering groups in the United States, Canada, and Australia can pair accuracy targets with budget reality—before they commit to a model for spreadsheets, tutoring copilots, or internal calculators.
Workload & pricing toggles
Same three scenarios as the main AI API calculator: moderate traffic, large RAG-style context, or per-request max tokens with a lower request count.
Include Vision / Image Processing
Off — no image fees in cost estimates for vision-capable models.
Turn On to include image fees.
Use Cached Pricing
Enable to get 50% off input tokens where cached rates apply
Deep Reasoning / Thinking Mode
Model hidden reasoning / extended thinking charged like output tokens when enabled.
Batch Pricing
Enable for 50% off input & output where batch/async pricing applies
Cached / batch est. monthly values only change after the pipeline sets supports_caching or supports_batch in Supabase. The toggles here narrow the table to models whose catalog or provider typically supports those modes.
Magic quadrant (top 15)
X: est. monthly · Y: Math · Dot: provider color · Hover for rank, model & detailsFull leaderboard
Showing 48 of 327 models.
| Pick | Model | Est. monthly | ROI score | Coding | Reasoning | Speed | Math | Context | Overall |
|---|---|---|---|---|---|---|---|---|---|
| OpenAI: gpt-oss-20b | $2.60 | 84 | 96 | 97 | 85 | 98 | 131K | 97 | |
| NVIDIA: Nemotron Nano 9B V2 | $3.20 | 75 | 72 | 84 | 85 | 98 | 131K | 85 | |
| DeepSeek: DeepSeek V4 Pro | $104.40 | 56 | 67 | 80 | 55 | 97 | 1.0M | 81 | |
| NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 | $8.00 | 66 | 74 | 79 | 70 | 97 | 131K | 82 | |
| Google: Gemma 4 31B | $9.00 | 72 | 97 | 92 | 70 | 97 | 262K | 94 | |
| Claude Sonnet 4.6 | $270.00 | 56 | 93 | 75 | 70 | 96 | 1.0M | 85 | |
| Z.ai: GLM 4.7 | $32.60 | 64 | 94 | 87 | 85 | 96 | 203K | 91 | |
| Grok 3 | $270.00 | 61 | 93 | 93 | 55 | 96 | 131K | 94 | |
| Qwen: Qwen3.5-35B-A3B | $19.50 | 64 | 76 | 89 | 70 | 95 | 262K | 87 | |
| Xiaomi: MiMo-V2.5-Pro | $70.00 | 57 | 78 | 78 | 70 | 95 | 1.0M | 82 | |
| Meta: Llama 3.1 70B Instruct | $20.00 | 28 | 0 | 0 | 55 | 95 | 131K | 24 | |
| Anthropic: Claude Opus 4 | $1,350.00 | 55 | 85 | 81 | 55 | 95 | 200K | 86 | |
| Claude Opus 4.6 | $450.00 | 56 | 85 | 79 | 55 | 95 | 1.0M | 85 | |
| Anthropic: Claude Opus 4.5 | $450.00 | 57 | 85 | 84 | 55 | 95 | 200K | 87 | |
| Tencent: Hunyuan A13B Instruct | $11.30 | 64 | 64 | 86 | 55 | 94 | 131K | 83 | |
| Qwen: Qwen3.5 397B A17B | $39.00 | 62 | 85 | 89 | 60 | 92 | 262K | 89 | |
| MoonshotAI: Kimi K2 Thinking | $49.00 | 56 | 65 | 79 | 55 | 92 | 262K | 79 | |
| Qwen: Qwen3.5-27B | $23.40 | 64 | 80 | 91 | 70 | 92 | 262K | 88 | |
| Qwen: Qwen3 32B | $5.60 | 73 | 85 | 89 | 60 | 92 | 41K | 89 | |
| OpenAI: GPT-5.1-Codex-Mini | $30.00 | 61 | 84 | 82 | 85 | 92 | 400K | 85 | |
| Mistral: Mistral Medium 3 | $36.00 | 63 | 92 | 87 | 70 | 91 | 131K | 89 | |
| NVIDIA: Nemotron 3 Nano 30B A3B | $4.00 | 71 | 74 | 79 | 85 | 91 | 262K | 81 | |
| Z.ai: GLM 4.5V | $42.00 | 55 | 64 | 77 | 60 | 90 | 66K | 77 | |
| Qwen: Qwen3 Max | $70.20 | 61 | 93 | 87 | 70 | 89 | 262K | 89 | |
| EssentialAI: Rnj 1 Instruct | $7.50 | 66 | 75 | 81 | 85 | 89 | 33K | 81 | |
| AllenAI: Olmo 3 32B Think | $11.00 | 67 | 90 | 84 | 50 | 88 | 66K | 87 | |
| Elephant | Free | 78 | 90 | 83 | 70 | 88 | 262K | 86 | |
| Nous: Hermes 4 70B | $9.20 | 66 | 85 | 81 | 60 | 88 | 131K | 84 | |
| AionLabs: Aion-1.0-Mini | $42.00 | 61 | 85 | 85 | 95 | 88 | 131K | 86 | |
| Baidu: ERNIE 4.5 21B A3B | $5.60 | 72 | 85 | 89 | 60 | 87 | 120K | 88 | |
| xAI: Grok 4 | $270.00 | 55 | 87 | 80 | 70 | 87 | 256K | 83 | |
| Qwen: Qwen3 VL 32B Instruct | $8.32 | 69 | 88 | 88 | 65 | 87 | 131K | 88 | |
| Auto Router | VARIABLE | 77 | 84 | 83 | 70 | 86 | 2.0M | 84 | |
| Qwen: Qwen3 Coder Next | $13.60 | 62 | 93 | 73 | 65 | 85 | 262K | 81 | |
| Z.ai: GLM 5.1 | $77.00 | 61 | 92 | 89 | 70 | 85 | 203K | 89 | |
| Z.ai: GLM 4.6V | $21.00 | 53 | 41 | 75 | 70 | 85 | 131K | 69 | |
| Prime Intellect: INTELLECT-3 | $19.00 | 60 | 77 | 79 | 65 | 85 | 131K | 80 | |
| Qwen: Qwen3.5-122B-A10B | $31.20 | 62 | 81 | 90 | 55 | 85 | 262K | 87 | |
| Meta: Llama 3.1 8B Instruct | $1.30 | 59 | 0 | 34 | 85 | 85 | 16K | 38 | |
| NVIDIA: Nemotron 3 Super | $8.10 | 65 | 79 | 79 | 55 | 85 | 262K | 80 | |
| Z.ai: GLM 5 | $44.80 | 64 | 99 | 93 | 55 | 84 | 203K | 92 | |
| Qwen: Qwen3.5-9B | $5.50 | 68 | 65 | 87 | 70 | 83 | 262K | 80 | |
| Qwen: Qwen3 235B A22B Thinking 2507 | $20.93 | 62 | 74 | 89 | 50 | 83 | 131K | 84 | |
| DeepSeek: DeepSeek V3.2 | $13.86 | 61 | 67 | 84 | 55 | 82 | 131K | 79 | |
| Nous: Hermes 4 405B | $70.00 | 56 | 85 | 79 | 70 | 82 | 131K | 81 | |
| Qwen: Qwen3 30B A3B Thinking 2507 | $7.20 | 66 | 85 | 79 | 50 | 82 | 131K | 81 | |
| Z.ai: GLM 4.5 | $46.00 | 57 | 85 | 79 | 65 | 82 | 131K | 81 | |
| Qwen: Qwen3 30B A3B | $6.00 | 67 | 85 | 79 | 60 | 82 | 41K | 81 |
Need a shareable artifact?
Download a print-ready PDF from the leaderboard and workload above. No email step—lead capture is off.
PDF Breakdown
Receive a comprehensive native vector PDF of this leaderboard: your workload, filters, top rankings, and a table snapshot (sorted: Math).
By submitting, you agree to our Privacy Policy and Terms.
Whitelabel Math Leaderboard
for your site
Embed the interactive math view on your own domain — whitelabel branding, lead capture, and the same workload sliders your prospects already use on LeadsCalc.