Fastest LLM APIs 2026: Low-Latency AI Models Compared
Rank the fastest LLM APIs in 2026 using throughput and TTFT latency signals with pricing context. Shortlist responsive AI models for real-time apps in the US, Canada, and Australia.
Speed-ranked LLMs with API cost on the same canvas in 2026
Speed scores reflect interactive-class behavior: smaller fast tiers vs. heavy flagships, grounded in benchmark metadata and tier cues—not a single vendor’s marketing latency claim. Product teams across the United States, Canada, and Australia use this tab to protect UX on chat surfaces while still eyeballing what responsiveness costs at production token volumes.
Workload & pricing toggles
Same three scenarios as the main AI API calculator: moderate traffic, large RAG-style context, or per-request max tokens with a lower request count.
Include Vision / Image Processing
Off — no image fees in cost estimates for vision-capable models.
Turn On to include image fees.
Use Cached Pricing
Enable to get 50% off input tokens where cached rates apply
Deep Reasoning / Thinking Mode
Model hidden reasoning / extended thinking charged like output tokens when enabled.
Batch Pricing
Enable for 50% off input & output where batch/async pricing applies
Cached / batch est. monthly values only change after the pipeline sets supports_caching or supports_batch in Supabase. The toggles here narrow the table to models whose catalog or provider typically supports those modes.
Magic quadrant (top 15)
X: est. monthly · Y: Speed · Dot: provider color · Hover for rank, model & detailsFull leaderboard
Showing 48 of 365 models.
| Pick | Model | Est. monthly | ROI score | Coding | Reasoning | Speed | Math | Context | Overall |
|---|---|---|---|---|---|---|---|---|---|
| Inception: Mercury 2 | $17.50 | 40 | 43 | 44 | 100 | 44 | 128K | 44Fast-dLLM v2 (Mercury 2) averages 43.5 across HumanEval, GPQA, and MMLU. As a 1B-scale lightweight model, coding and logic map to ~43. Speed is 100 due to >1,000 tokens/sec diffusion generation. | |
| Relace: Relace Apply 3 | $46.50 | 47 | 70 | 65 | 100 | 50 | 256K | 63Evidence explicitly states benchmarks are unavailable. As a specialized code-patching model with 10,000 tok/s throughput, speed is scored 100. Coding and logic are inferred moderately (70/60) due to lack of SWE-bench or GPQA data. | |
| Google: Gemini 3.1 Flash Lite Preview | $25.00 | 53 | 55 | 78 | 98 | 70 | 1.0M | 70GPQA Diamond at 86.9% maps logic to 85. Coding score of 30.1 maps to 55. MMMU Pro at 76.8% sets multimodal to 77. As a Lite tier, speed is exceptionally high (381 tok/s) mapping to 98. | |
| Google: Gemini 3.1 Flash Lite | $25.00 | 50 | 25 | 83 | 98 | 66 | 1.0M | 64SWE-bench Verified at 22% maps to 25 coding. GPQA Diamond at 86.9% maps to 87 logic. As a Lite tier, it excels in speed (381 t/s, 98) but trails flagships in coding. | |
| Llama Guard 3 8B | $19.66 | 46 | 55 | 60 | 98 | 45 | 131K | 55Based on Llama 3 8B stats: HumanEval 62.2% (Coding 55), MMLU 68.4% (Logic 60), MATH 30% (Math 45). As an 8B lightweight tier, scores are adjusted down versus flagships. Speed is 98 reflecting 765 tps on Groq. | |
| LiquidAI: LFM2-24B-A2B | $2.40 | 61 | 71 | 44 | 97 | 55 | 128K | 54Benchable.ai cites Coding 71%, Reasoning 50%, Instruction 38%, and Speed at 97th percentile. Mapped directly to 0-100 scale. As a lightweight 2B active MoE, it prioritizes speed over flagship-level logic and coding. | |
| OpenAI: GPT-5.4 Nano | $20.50 | 56 | 75 | 75 | 95 | 70 | 400K | 74SWE-Bench Pro at 52.4% maps to 75 coding. Humanity's Last Exam at 24.3% maps to 70 logic. As a 'Nano' tier model, it is optimized for speed (95) over deep reasoning, reflecting its lightweight architecture. | |
| Relace: Relace Search | $70.00 | 41 | 50 | 60 | 95 | 40 | 256K | 53No standard benchmarks (SWE-bench, GPQA) provided. Scores estimated for a specialized codebase search subagent. Speed rated 95 based on claims of 10,000 tokens/sec. Native reasoning confirmed via explicit mention of reasoning tokens. | |
| xAI: Grok 3 Mini | $17.00 | 63 | 85 | 85 | 95 | 85 | 131K | 85Web digest notes Grok 3 Mini outperforms Grok-2 mini (HumanEval >87.2%, MATH >70.2%). Mapped to ~85 for coding/math. As a 'Mini' tier with native reasoning, it prioritizes speed (100 tok/s, mapped to 95) over flagship-level logic. | |
| Anthropic: Claude 3 Haiku | $22.50 | 53 | 65 | 73 | 95 | 65 | 200K | 69Claude 3 Haiku (Lightweight tier) scores MMLU 76.7%, HumanEval 75.9%, GSM8K 88.9%, and MMMU 50.2%. Mapped coding to 65 and logic to 70, reflecting its distilled nature compared to flagship models. Speed is rated 95. | |
| Google: Gemini 3.5 Flash | $150.00 | 57 | 85 | 88 | 95 | 75 | 1.0M | 84SWE-bench Verified 78.0% and GPQA Diamond 90.4% map to 85 and 90. Despite being a lightweight Flash tier (speed 95), explicit evidence dictates high capability scores, though typically lower than Pro. | |
| Qwen: Qwen3 Coder Flash | $17.55 | 54 | 70 | 68 | 95 | 65 | 1.0M | 68Evidence lacks exact benchmark numbers but notes Qwen3 Coder Flash is a speed-optimized, lightweight tier. Scores inferred cautiously for a Flash model, prioritizing speed (95) over coding/logic compared to the flagship Coder Plus. | |
| Google Gemini Flash Latest | $150.00 | 48 | 70 | 65 | 95 | 75 | 1.0M | 69Evidence cites HumanEval 74.3% (Coding ~70), GPQA 51.0% (Logic ~60), and MMMU 62.3% (Multimodal ~65). As a lightweight Flash tier, scores are adjusted lower than Pro flagships, while Speed is rated high (~95) for its class. | |
| Qwen: Qwen3.6 Flash | $18.75 | 60 | 85 | 78 | 95 | 80 | 1.0M | 80SWE-bench Verified 73.4 (via 35B-A3B base) maps to 85 coding. Flash tier yields 95 speed (119 tok/s). Logic/Math inferred ~75-80 due to missing GPQA/MMLU scores. Native reasoning supported. | |
| OpenAI: GPT-5.1-Codex-Mini | $30.00 | 59 | 88 | 78 | 95 | 80 | 400K | 81SWE-bench Verified 55.0% (mapped to 88) and GPQA Diamond 52.0% (mapped to 75) show strong capabilities. As a Mini tier, it prioritizes speed (175 tok/s, mapped to 95) while maintaining solid reasoning and coding performance. | |
| Qwen: Qwen3 Coder Next | $12.40 | 61 | 85 | 72 | 95 | 85 | 262K | 78HumanEval 92.7% and GPQA-D 42.4% map to 85 coding and 55 logic. IFEval 89.6% yields 88 instruction. Speed is 95 based on 162 tok/s. As an 80B (3B active) efficient model, logic is appropriately scaled. | |
| xAI: Grok 4 Fast | $13.00 | 41 | 40 | 43 | 95 | 45 | 2.0M | 43Evidence cites a 43.5 average across HumanEval, GPQA, and MMLU for this 1B-scale Fast model. Mapped to ~40-45 for coding and logic. As a lightweight tier, speed is rated very high. | |
| Google: Gemini 2.5 Flash Lite | $8.00 | 54 | 45 | 68 | 95 | 65 | 1.0M | 61Evidence lacks exact Flash-Lite scores but notes it underperforms Flash (GPQA 78.3%, MMMU 76.7%). As a Lite tier, scores are adjusted downward (Logic 65, Coding 45). Speed is heavily weighted (95) due to 68 tok/s and ultra-low latency. | |
| Mistral: Ministral 3 8B 2512 | $7.50 | 52 | 45 | 58 | 95 | 65 | 262K | 56GPQA 47.1 and MMLU 76.1% map to Logic 55. LiveCodeBench 30.3 maps to Coding 45. MATH 62.6% maps to Math 65. As an 8B lightweight tier, it scores lower on reasoning but achieves 161 tok/s (Speed 95). | |
| OpenAI: GPT-5.1 Chat | $150.00 | 54 | 85 | 78 | 95 | 80 | 128K | 80HumanEval 91% maps to 85 coding. GPQA 53.6% maps to 75 logic. MMMU 69.1% maps to 75 multimodal. As a lightweight 'Instant' tier model, speed is rated 95, with capabilities adjusted below flagship levels. | |
| Baidu: Qianfan-OCR-Fast | $55.30 | 37 | 40 | 50 | 95 | 40 | 66K | 45No benchmarks provided. Inferred scores based on 'Fast' tier and OCR specialization. Speed is exceptional (claimed 1M tokens/s). Multimodal scored high for OCR focus; coding and logic scored lower as a specialized lightweight model. | |
| Claude Haiku 4.5 | $90.00 | 59 | 92 | 85 | 95 | 80 | 200K | 86SWE-bench Verified 73.3% maps to 92 coding. Though a lightweight Haiku tier, evidence explicitly states it matches Sonnet 4's reasoning and coding, justifying high logic (85) alongside top-tier speed (95). | |
| Qwen: Qwen3.5-Flash | $5.20 | 75 | 88 | 92 | 95 | 95 | 1.0M | 92SWE-bench Verified at 69.2% maps to 88 coding. GPQA Diamond at 84.2% maps to 92 logic. IFEval 91.9% maps to 92 instruction. As a Flash tier, speed is 95, though its reasoning capabilities rival flagship models. | |
| Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) | $50.00 | 47 | 60 | 65 | 95 | 60 | 131K | 63No text benchmarks provided; inferred Coding/Logic at 60 for Flash-tier. Speed set to 95 (102 tok/s). Multimodal set to 95 based on 'Pro-level visual quality' and 88% Graphic Design ELO. Vision price derived from $0.0005/K images. | |
| Google: Gemini 2.5 Flash Lite Preview 09-2025 | $8.00 | 52 | 45 | 67 | 95 | 55 | 1.0M | 58Based on GPQA (65.1-70.9%) and LiveCodeBench (64.1-68.8%), logic and coding map to 68 and 45. As a 'Flash Lite' tier, speed is heavily weighted (95), reflecting its ultra-low latency design over flagship-level reasoning. | |
| Meta: Llama 3.2 1B Instruct | $3.09 | 45 | 25 | 35 | 95 | 30 | 131K | 31MMLU 49.3%, GSM8K 44.4%, MATH 30.6%. As a 1B lightweight model, Logic (35) and Math (30) reflect sub-50% benchmarks. Coding (25) based on 0.6 index. Speed (95) is maximized for this ultra-small tier. | |
| StepFun: Step 3.5 Flash | $6.60 | 70 | 88 | 83 | 95 | 95 | 262K | 87SWE-bench Verified 74.4% maps to 88 coding; AIME 99.8% maps to 95 math. Despite Flash tier, explicit evidence shows frontier-level SWE-bench Verified, elevating coding score. Speed is 95 (143 tok/s). | |
| Mistral: Ministral 3 14B 2512 | $10.00 | 58 | 60 | 72 | 95 | 75 | 262K | 70GPQA Diamond 58.6% maps to Logic 65; IFEval 77.3% to Instruction 78; MMMU 55.3% to Multimodal 60. As a 14B lightweight tier, Coding is inferred at 60. Speed is 95 based on reported 2512 tokens/s. | |
| OpenAI: GPT-5.4 Mini | $75.00 | 55 | 70 | 80 | 95 | 80 | 400K | 78No exact benchmarks for GPT-5.4 Mini in evidence. Inferred coding (70) and logic (75) based on its 'Mini' tier status and reasoning capabilities. Speed (95) reflects high-throughput optimization. Vision price estimated from $0.75/1M input cost. | |
| Gemini 2.0 Flash (001) | $8.00 | 59 | 65 | 73 | 95 | 70 | 1.0M | 70MMLU 76.4%, MMMU 71.7%, and MATH 53.2% map to Logic 76, Multimodal 72, and Math 70. As a Flash-tier model, Coding (65) is adjusted lower than heavyweights, while Speed (95) reflects its highly optimized latency. | |
| Anthropic Claude Haiku Latest | $90.00 | 56 | 88 | 83 | 95 | 75 | 200K | 82SWE-bench Verified at 73.3% maps to 88 coding. MATH at 69.4% maps to 75 math. As a lightweight tier, it excels in speed (95) while native reasoning boosts its logic (80) to near-frontier levels. | |
| Morph: Morph V3 Large | $55.00 | 48 | 80 | 65 | 95 | 50 | 262K | 65No standard benchmarks available. Evidence cites 98% accuracy for code transformations and ~4,500 tok/s. Mapped coding to 80 for specialized code-edit focus, speed to 95 for extreme throughput. Logic/math inferred cautiously due to missing data. | |
| Google: Gemma 3 4B | $3.00 | 64 | 45 | 68 | 95 | 75 | 131K | 64Based on IFEval (90.2%) mapped to 90 Instruction, MATH (75.6%) to 75 Math, and MMLU-Pro (43.6%) to 45 Logic. As a 4B lightweight tier, Coding (MBPP 63.2%) maps to 45, while Speed is rated 95. | |
| Morph: Morph V3 Fast | $44.00 | 45 | 80 | 55 | 95 | 40 | 82K | 58Evidence lacks standard SWE-bench/GPQA scores, citing only 96% accuracy for rapid code transformations. Mapped coding to 80 for specialized apply tasks. As a 'Fast' tier model, speed is rated 95 (10,500 tok/s claimed), with logic/math inferred lower. | |
| Google: Gemini 3 Flash Preview | $50.00 | 61 | 88 | 89 | 95 | 85 | 1.0M | 88GPQA Diamond 90.4% maps to Logic 92; SWE-bench 78% maps to Coding 88. As a Flash-tier model, Speed is rated very high (95). Multimodal inferred at 80 due to extensive video/audio/image support. | |
| Anthropic: Claude 3.5 Haiku | $72.00 | 52 | 75 | 73 | 95 | 75 | 200K | 74SWE-bench Verified 40.6% (Coding ~75), MMLU-Pro 65% (Logic ~70), MATH 69.4% (Math ~75). As a lightweight Haiku tier, it scores lower than flagships on reasoning but achieves exceptional speed. | |
| Meta: Llama 3.2 3B Instruct | $5.39 | 51 | 35 | 55 | 95 | 55 | 131K | 50Lightweight 3B tier. Mapped MMLU 63.4% to Logic 40, IFEval 73.9% to Instruction 70, and GSM8K 77.7% to Math 55. Coding inferred low (35) lacking SWE-bench. Speed rated 95 for 3B size. | |
| StepFun: Step 3.7 Flash | $19.50 | 60 | 88 | 75 | 95 | 85 | 256K | 81SWE-bench Verified at 74.4% maps to 88 coding. AIME 2025 win implies strong math (85). Speed is 143 tok/s (95). As a Flash tier, logic/instruction are estimated ~75 despite high coding/math peaks. | |
| Google: Nano Banana (Gemini 2.5 Flash Image) | $37.00 | 37 | 40 | 45 | 95 | 40 | 33K | 43Evidence explicitly states 'Benchmark not available' for MMLU/MMMU/GSM8K. Inferred Flash-tier baseline scores (40-50). Speed scored 95 due to 172 tok/s throughput. Multimodal inferred at 70 for a lightweight image model. | |
| OpenAI: GPT-4o-mini (2024-07-18) | $12.00 | 58 | 70 | 73 | 95 | 75 | 128K | 73Evidence lacks raw benchmarks. Scores inferred cautiously from the 'Mini' lightweight tier profile. Speed is heavily weighted (95), while coding (70) and logic (70) are adjusted downward to reflect its distilled nature compared to flagship models. | |
| Amazon: Nova Micro 1.0 | $2.80 | 66 | 68 | 63 | 95 | 75 | 128K | 67HumanEval 81.1% (Coding 68), GPQA 40% (Logic 45), IFEval 87.2% (Instruction 80), GSM8K 92.3% (Math 75). As a 'Micro' tier model, speed is rated very high (95) while coding and logic reflect its lightweight, text-only nature. | |
| OpenAI: GPT-5 Nano | $6.00 | 58 | 60 | 65 | 95 | 65 | 400K | 64No exact GPT-5 Nano scores provided; inferred from predecessor GPT-4.1 Nano (GPQA 50.3%, MMLU 80.1%). Mapped Logic to 60, Coding to 60. As a Nano tier, it prioritizes speed (100 tok/s -> 95) over heavyweight reasoning. | |
| OpenAI: gpt-oss-120b | $3.36 | 70 | 70 | 80 | 95 | 75 | 131K | 76HumanEval 71% maps to 70 coding. MMLU 66-90% maps to 80 logic. GSM8K 75% maps to 75 math. 500 tok/s throughput maps to 95 speed. Native reasoning supported via OpenRouter reasoning parameter. | |
| Google: Gemini 2.0 Flash Lite | $6.00 | 58 | 65 | 65 | 95 | 65 | 1.0M | 65Evidence lacks exact percentages but confirms 2.0 Flash-Lite outperforms 1.5 Flash and trails 2.0 Flash on GPQA, MATH, and MMMU. Scores inferred cautiously for this lightweight tier, prioritizing its high speed and lower reasoning/coding capabilities. | |
| OpenAI: GPT-4.1 Mini | $32.00 | 56 | 68 | 82 | 95 | 75 | 1.0M | 77SWE-bench Verified 23.6% (mapped to 68), GPQA Diamond 65% (mapped to 80). As a Mini tier, speed is rated high (95) while coding/logic reflect its lightweight nature compared to flagship models. | |
| OpenAI: GPT-4.1 Nano | $8.00 | 56 | 65 | 65 | 95 | 65 | 1.0M | 65GPQA 50.3% and MMLU 80.1% map to ~60 logic. HumanEval 86.6% and Aider 9.8% map to ~65 coding. As a 'Nano' lightweight tier, it prioritizes speed (~95) over flagship reasoning. | |
| Google: Gemini 2.5 Flash | $37.00 | 56 | 70 | 78 | 92 | 85 | 1.0M | 78GPQA Diamond 78.3% (Logic 80), LiveCodeBench 63.5% (Coding 70), MMMU 76.7%. As a Flash-tier model, it excels in speed (93 tok/s) and math (AIME 78%), but trails Pro in heavy coding. | |
| MiniMax: MiniMax M2.1 | $21.10 | 59 | 75 | 83 | 92 | 75 | 205K | 79Multi-SWE-Bench (49.4%) maps to Coding 75; MMLU-Pro (88.0%) maps to Logic 85. As a 10B lightweight model, speed is heavily weighted (92), while coding and logic reflect its size class despite strong benchmark claims. |
Need a shareable artifact?
Get a print-ready PDF of your results and a CSV spreadsheet. Tap the button, then enter your work email. We use it to build your files and start the download—and to email you a copy if the site owner enabled that.
AI ROI Leaderboard & Discovery by LeadsCalc
PDF Breakdown
Receive a comprehensive native vector PDF of this leaderboard: your workload, filters, top rankings, and a table snapshot (sorted: Speed).
By submitting, you agree to our Privacy Policy and Terms.
Whitelabel Speed Leaderboard
for your site
Embed the interactive speed view on your own domain — whitelabel branding, lead capture, and the same workload sliders your prospects already use on LeadsCalc.