Interactive leaderboard

Best Open-Source LLMs 2026: Open-Weight Models vs. Hosted API Cost

Compare open-weight LLMs in 2026 for self-hosting and dedicated deployments, with pricing context for hosted variants. For teams in the US, Canada, and Australia evaluating open vs. proprietary APIs.

Open-source friendly LLM comparison with TCO in mind in 2026

Open-weight models can unlock on-prem and dedicated-cloud strategies for residency-sensitive workloads. This tab filters the open-weight subset so platform teams in the United States, Canada, and Australia can compare capability signals while still grounding decisions in realistic engineering and GPU spend—not list hype alone.

Workload & pricing toggles

Workload presets

Same three scenarios as the main AI API calculator: moderate traffic, large RAG-style context, or per-request max tokens with a lower request count.

Include Vision / Image Processing

Off — no image fees in cost estimates for vision-capable models.

Turn On to include image fees.

OffOn

Use Cached Pricing

Enable to get 50% off input tokens where cached rates apply

OffOn

Deep Reasoning / Thinking Mode

Model hidden reasoning / extended thinking charged like output tokens when enabled.

OffOn

Batch Pricing

Enable for 50% off input & output where batch/async pricing applies

OffOn
≈ $100.00/mo
8K
1K1.0M
≈ $100.00/mo
2K
100500K
≈ $200.00 total
5K
10100K

Cached / batch est. monthly values only change after the pipeline sets supports_caching or supports_batch in Supabase. The toggles here narrow the table to models whose catalog or provider typically supports those modes.

Magic quadrant (top 15)

X: est. monthly · Y: Open-weight · Dot: provider color · Hover for rank, model & details

Full leaderboard

Showing 48 of 74 models (open-weight / self-hostable catalog hints).

PickModelEst. monthlyROI scoreCodingReasoningSpeedMathContextOverall
Google: Gemma 4 31B$9.0072
97
92
70
97
262K
94
Qwen: Qwen3.5 397B A17B$39.0062
85
89
60
92
262K
89
Qwen: Qwen3 32B$5.6073
85
89
60
92
41K
89
Qwen: Qwen3 Max$70.2061
93
87
70
89
262K
89
Mistral: Mistral Medium 3$36.0063
92
87
70
91
131K
89
Qwen: Qwen3.5-27B$23.4064
80
91
70
92
262K
88
Qwen: Qwen3 VL 32B Instruct$8.3269
88
88
65
87
131K
88
Qwen: Qwen3.5-35B-A3B$19.5064
76
89
70
95
262K
87
Qwen: Qwen3.5-122B-A10B$31.2062
81
90
55
85
262K
87
Meta: Llama 3.3 70B Instruct$7.2069
88
89
70
77
131K
86
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5$8.0066
74
79
70
97
131K
82
Mistral Large 2411$140.0056
87
85
70
72
131K
82
Mistral: Mistral Small 3.2 24B$5.0069
92
83
85
69
128K
82
Mistral: Mistral Large 3 2512$35.0059
87
85
60
72
262K
82
Qwen: Qwen3 Coder Next$13.6062
93
73
65
85
262K
81
Mistral: Mistral Small Creative$7.0066
88
83
90
69
33K
81
DeepSeek: DeepSeek V4 Pro$104.4056
67
80
55
97
1.0M
81
Mistral: Mistral Small 4$12.0063
88
83
55
69
262K
81
Qwen: Qwen3 30B A3B$6.0067
85
79
60
82
41K
81
Qwen: Qwen3.5-9B$5.5068
65
87
70
83
262K
80
Qwen: Qwen3 Coder Plus$58.5056
85
78
60
78
1.0M
80
Qwen: Qwen3.6 Plus$32.5057
79
79
65
78
1.0M
79
DeepSeek: DeepSeek V3.2$13.8661
67
84
55
82
131K
79
Qwen: Qwen3 235B A22B Instruct 2507$3.8470
73
83
55
78
262K
79
Mistral: Mistral 7B Instruct v0.1$6.3065
85
79
90
70
3K
78
Mistral: Saba$14.0060
88
77
85
69
33K
78
DeepSeek: DeepSeek V3.1 Terminus$16.3058
68
78
65
75
164K
75
Qwen: Qwen3 Coder 480B A35B$18.8057
85
73
60
65
262K
74
Qwen: Qwen3 14B$4.8059
52
72
70
58
41K
63
Mistral: Mistral Medium 3.1$36.0041
40
62
70
38
131K
51
Qwen: Qwen3 Coder 30B A3B Instruct$5.5047
52
44
55
31
160K
43
Meta: Llama 3.1 70B Instruct$20.0028
0
0
55
95
131K
24
DeepSeek: DeepSeek V3 0324$15.7016
0
0
55
0
164K
0
Qwen: Qwen3.5 Plus 2026-02-15$26.0013
0
0
60
0
1.0M
0
Qwen: Qwen Plus 0728$18.2015
0
0
70
0
1.0M
0
Qwen: Qwen3 VL 235B A22B Instruct$16.8015
0
0
0
0
262K
0
Meta: Llama 4 Scout$6.204
0
0
0
0
328K
0
Mistral: Mistral Small 3.1 24B$19.603
0
0
0
0
128K
0
DeepSeek: DeepSeek V3.2 Speciale$28.003
0
0
0
0
164K
0
Qwen: Qwen VL Max$41.602
0
0
0
0
131K
0
Meta: Llama 4 Maverick$12.003
0
0
0
0
1.0M
0
Qwen: Qwen3 VL 30B A3B Instruct$10.4018
0
0
70
0
131K
0
Mistral: Mistral Nemo$0.709
0
0
0
0
131K
0
Qwen: Qwen VL Plus$9.5618
0
0
55
0
131K
0
Qwen2.5 72B Instruct$8.704
0
0
0
0
33K
0
AlfredPros: CodeLLaMa 7B Instruct Solidity$44.002
0
0
0
0
4K
0
Qwen: Qwen-Plus$18.203
0
0
0
0
1.0M
0
NVIDIA: Llama 3.1 Nemotron 70B Instruct$60.002
0
0
0
0
131K
0

Need a shareable artifact?

Download a print-ready PDF from the leaderboard and workload above. No email step—lead capture is off.

Detailed analysis

PDF Breakdown

Receive a comprehensive native vector PDF of this leaderboard: your workload, filters, top rankings, and a table snapshot (sorted: Open-weight).

Instant setup
No CC required

By submitting, you agree to our Privacy Policy and Terms.

Agency accelerator

Whitelabel Open-weight Leaderboard
for your site

Embed the interactive open-weight view on your own domain — whitelabel branding, lead capture, and the same workload sliders your prospects already use on LeadsCalc.

1-Click CRM sync
Custom branding
Branded reports
Lead analytics

Free to start

$0/mo*
GET STARTED

NO CREDIT CARD REQUIRED

How it works

Methodology: How we rank Open-Weight LLMs

Transparent, benchmark-driven rankings—same craft as our single-model deep dives.

What “open-weight” means on this leaderboard

Open-weight rankings filter to models tagged as open-weight / self-hostable in our qualitative catalog, then score within that subset. This helps teams avoiding proprietary API lock-in—common among startups and enterprises in the US, Canada, and Australia evaluating on-prem or dedicated cloud deployments.

Battle Arena

Compare up to four LLMs side by side

Tick up to four models in the leaderboard table, then open Battle Arena for API pricing, benchmarks, and workload math in one view—perfect when you are shortlisting vendors for a pilot in the US, Canada, or Australia.

Prefer a head start? Jump into high-intent comparisons people search for every day—same interactive calculator, zero signup.

Open Battle ArenaUp to 4 models · Live estimates
Signals & spend

Value analysis

Benchmarks vs. estimated API cost—read the story your CFO cares about.

Hosted API vs. self-host: how to read the trade-offs

Self-hosting shifts cost from tokens to hardware, ops, and reliability. Use hosted estimates here as a directional anchor, then build your TCO model. Australian and Canadian buyers often start with sovereignty requirements; US buyers may weigh velocity of managed APIs against control.

Production deployment

Data Sovereignty & Custom Fine-Tuning

How teams in the US, Canada, and Australia deploy these models in production.

Air-gapped enterprise, HIPAA/SOC2 compliance, and domain adaptation

Open-weight models are non-negotiable for organizations with strict data residency requirements. Healthcare providers in the US, government agencies in Canada, and financial institutions in Australia deploy these models in air-gapped VPCs to ensure zero-data retention. They also serve as the foundation for LoRA fine-tuning, allowing teams to bake proprietary domain knowledge directly into the model weights.

Architecture

Self-Hosting vs Managed Endpoints

Strategies to reduce monthly API spend without sacrificing capability.

Total Cost of Ownership (TCO) and GPU provisioning

While the weights are free, GPU compute is not. Teams must weigh the Total Cost of Ownership (TCO) of provisioning AWS or Azure instances against using managed serverless endpoints. This leaderboard displays the API cost of hosted open-weight models, providing a baseline to determine if your token volume justifies the engineering overhead of managing your own vLLM or Ollama infrastructure.

Embed-ready

Need this live Open-Weight data on your website?

Join 500+ agencies in the US and Australia using LeadsCalc to capture high-intent leads. Embed this interactive Open-Weight leaderboard on your site in about a minute—Canadian teams use the same flows for CAD-priced proposals and compliance-friendly landing pages.

Customize & Embed this ToolWhite-label · No code required
United StatesCanadaAustralia
Live preview

Your visitors compare Open-Weight models without leaving your domain.

Support & clarity

Frequently Asked Questions

Focused on teams in the United States, Canada, and Australia.

Not automatically—you must add GPU, ops, and engineering time. Compare hosted API estimates here against your self-host TCO; Australian and Canadian teams sometimes prefer open-weight for data sovereignty.