Fastest LLM APIs 2026: Low-Latency AI Models Compared
Rank the fastest LLM APIs in 2026 using throughput and TTFT latency signals with pricing context. Shortlist responsive AI models for real-time apps in the US, Canada, and Australia.
Speed-ranked LLMs with API cost on the same canvas in 2026
Speed scores reflect interactive-class behavior: smaller fast tiers vs. heavy flagships, grounded in benchmark metadata and tier cues—not a single vendor’s marketing latency claim. Product teams across the United States, Canada, and Australia use this tab to protect UX on chat surfaces while still eyeballing what responsiveness costs at production token volumes.
Workload & pricing toggles
Same three scenarios as the main AI API calculator: moderate traffic, large RAG-style context, or per-request max tokens with a lower request count.
Include Vision / Image Processing
Off — no image fees in cost estimates for vision-capable models.
Turn On to include image fees.
Use Cached Pricing
Enable to get 50% off input tokens where cached rates apply
Deep Reasoning / Thinking Mode
Model hidden reasoning / extended thinking charged like output tokens when enabled.
Batch Pricing
Enable for 50% off input & output where batch/async pricing applies
Cached / batch est. monthly values only change after the pipeline sets supports_caching or supports_batch in Supabase. The toggles here narrow the table to models whose catalog or provider typically supports those modes.
Magic quadrant (top 15)
X: est. monthly · Y: Speed · Dot: provider color · Hover for rank, model & detailsFull leaderboard
Showing 48 of 327 models.
| Pick | Model | Est. monthly | ROI score | Coding | Reasoning | Speed | Math | Context | Overall |
|---|---|---|---|---|---|---|---|---|---|
| LiquidAI: LFM2-24B-A2B | $2.40 | 43 | 0 | 44 | 97 | 0 | 33K | 22 | |
| Relace: Relace Search | $70.00 | 52 | 85 | 73 | 95 | 65 | 256K | 74 | |
| DeepSeek: DeepSeek V4 Flash | $8.40 | 67 | 90 | 83 | 95 | 80 | 1.0M | 84 | |
| Amazon: Nova Micro 1.0 | $2.80 | 72 | 69 | 82 | 95 | 69 | 128K | 76 | |
| xAI: Grok 3 Mini Beta | $17.00 | 63 | 90 | 87 | 95 | 77 | 131K | 85 | |
| Inception: Mercury 2 | $17.50 | 51 | 67 | 73 | 95 | 38 | 128K | 63 | |
| AionLabs: Aion-1.0-Mini | $42.00 | 61 | 85 | 85 | 95 | 88 | 131K | 86 | |
| xAI: Grok 3 Mini | $17.00 | 63 | 88 | 86 | 92 | 76 | 131K | 84 | |
| Google: Gemini 3.1 Flash Lite Preview | $25.00 | 24 | 0 | 39 | 90 | 0 | 1.0M | 19 | |
| Qwen: Qwen3.5-Flash | $5.20 | 23 | 0 | 0 | 90 | 0 | 1.0M | 0 | |
| Mistral: Mistral Small Creative | $7.00 | 66 | 88 | 83 | 90 | 69 | 33K | 81 | |
| Z.ai: GLM 4.7 Flash | $6.40 | 66 | 85 | 79 | 90 | 75 | 203K | 80 | |
| Morph: Morph V3 Fast | $44.00 | 57 | 96 | 78 | 90 | 70 | 82K | 80 | |
| Mistral: Mistral 7B Instruct v0.1 | $6.30 | 65 | 85 | 79 | 90 | 70 | 3K | 78 | |
| StepFun: Step 3.5 Flash | $7.00 | 62 | 81 | 73 | 90 | 71 | 262K | 74 | |
| Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) | $50.00 | 11 | 0 | 0 | 90 | 0 | 66K | 0 | |
| OpenAI: GPT-5.2 Chat | $210.00 | 54 | 85 | 83 | 90 | 73 | 128K | 81 | |
| Mancer: Weaver (alpha) | $40.00 | 12 | 0 | 0 | 85 | 0 | 8K | 0 | |
| ByteDance Seed: Seed-2.0-Mini | $8.00 | 60 | 85 | 66 | 85 | 70 | 262K | 72 | |
| MiniMax: MiniMax M2-her | $24.00 | 14 | 0 | 0 | 85 | 0 | 66K | 0 | |
| OpenAI: gpt-oss-20b | $2.60 | 84 | 96 | 97 | 85 | 98 | 131K | 97 | |
| Free Models Router | Free | 59 | 43 | 55 | 85 | 55 | 200K | 52 | |
| MiniMax: MiniMax M2.5 | $17.50 | 60 | 80 | 83 | 85 | 72 | 197K | 79 | |
| Mistral: Ministral 3 3B 2512 | $5.00 | 31 | 0 | 28 | 85 | 0 | 131K | 14 | |
| IBM: Granite 4.0 Micro | $1.78 | 78 | 81 | 76 | 85 | 81 | 131K | 78 | |
| Mistral: Voxtral Small 24B 2507 | $7.00 | 20 | 0 | 0 | 85 | 0 | 32K | 0 | |
| Arcee AI: Trinity Mini | $3.30 | 73 | 82 | 80 | 85 | 80 | 131K | 81 | |
| xAI: Grok 4.1 Fast | $13.00 | 50 | 39 | 81 | 85 | 34 | 2.0M | 59 | |
| Xiaomi: MiMo-V2-Flash | $6.50 | 56 | 65 | 60 | 85 | 63 | 262K | 62 | |
| Writer: Palmyra X5 | $84.00 | 49 | 65 | 70 | 85 | 70 | 1.0M | 69 | |
| ByteDance Seed: Seed 1.6 Flash | $6.00 | 66 | 87 | 76 | 85 | 77 | 262K | 79 | |
| Xiaomi: MiMo-V2.5 | $36.00 | 57 | 89 | 75 | 85 | 75 | 1.0M | 78 | |
| Pareto Code Router | VARIABLE | 74 | 88 | 73 | 85 | 80 | 200K | 78 | |
| MiniMax: MiniMax M2 | $20.20 | 50 | 70 | 63 | 85 | 55 | 197K | 63 | |
| Mistral: Ministral 3 8B 2512 | $7.50 | 40 | 0 | 38 | 85 | 67 | 262K | 36 | |
| Mistral: Codestral 2508 | $21.00 | 24 | 70 | 0 | 85 | 0 | 256K | 18 | |
| OpenAI: GPT-5 Mini | $30.00 | 58 | 90 | 78 | 85 | 74 | 400K | 80 | |
| Z.ai: GLM 4.7 | $32.60 | 64 | 94 | 87 | 85 | 96 | 203K | 91 | |
| NVIDIA: Nemotron Nano 9B V2 | $3.20 | 75 | 72 | 84 | 85 | 98 | 131K | 85 | |
| Z.ai: GLM 4.5 Air | $13.70 | 51 | 24 | 71 | 85 | 81 | 131K | 62 | |
| Switchpoint Router | VARIABLE | 30 | 0 | 0 | 85 | 0 | 131K | 0 | |
| Mistral: Devstral Small 1.1 | $7.00 | 50 | 54 | 53 | 85 | 53 | 131K | 53 | |
| Mistral: Mistral Small 3.2 24B | $5.00 | 69 | 92 | 83 | 85 | 69 | 128K | 82 | |
| OpenAI: GPT-5.1-Codex-Mini | $30.00 | 61 | 84 | 82 | 85 | 92 | 400K | 85 | |
| OpenAI: GPT-5.1 Chat | $150.00 | 58 | 91 | 88 | 85 | 77 | 128K | 86 | |
| EssentialAI: Rnj 1 Instruct | $7.50 | 66 | 75 | 81 | 85 | 89 | 33K | 81 | |
| TheDrummer: Skyfall 36B V2 | $30.00 | 13 | 0 | 0 | 85 | 0 | 33K | 0 | |
| Meta: Llama 3 8B Instruct | $1.60 | 68 | 62 | 69 | 85 | 30 | 8K | 58 |
Need a shareable artifact?
Download a print-ready PDF from the leaderboard and workload above. No email step—lead capture is off.
PDF Breakdown
Receive a comprehensive native vector PDF of this leaderboard: your workload, filters, top rankings, and a table snapshot (sorted: Speed).
By submitting, you agree to our Privacy Policy and Terms.
Whitelabel Speed Leaderboard
for your site
Embed the interactive speed view on your own domain — whitelabel branding, lead capture, and the same workload sliders your prospects already use on LeadsCalc.