LLM Hosting Research — Session 2026-06-25
Goal: First cost comparison research pass — MiniMax-M3-class models on $10,000 AUD/year budget with >10 tok/s throughput.
Chunk 2026-06-25T01:24 (autopilot tick)
What was done:
- Gathered current GPU cloud pricing across major providers (RunPod, Vast.ai, Lambda, AWS, GCP, Spheron, JarvisLabs)
- Researched MiniMax-M3 model specifications and hardware requirements
- Compiled cost comparison for GPUs that can run M3
Key Findings
MiniMax-M3 Requirements
- Size: ~428B parameters (Mixture of Experts, 23B active params)
- Quantized sizes: UD-IQ1_M (128GB), UD-IQ3_XXS (159GB), UD-IQ4_XS (208GB), UD-Q4_K_XL (265GB)
- Minimum VRAM: ~80GB for smallest quantization (UD-IQ1_M), 133GB total memory needed
- Inference throughput: Depends on GPU; need >10 tok/s target
GPUs that fit the M3 model:
| GPU | VRAM | Can Run M3? | Min Quantization |
|---|---|---|---|
| RTX 4090 | 24GB | No (too small, even quantized) | — |
| A100 PCIe 40GB | 40GB | No | — |
| A100 SXM 80GB | 80GB | Marginal (UD-IQ1_M only) | UD-IQ1_M |
| H100 SXM 80GB | 80GB | Yes (UD-IQ3_XXS likely, tight for UD-IQ4_XS) | UD-IQ3_XXS / UD-IQ4_XS |
| H200 SXM 141GB | 141GB | Yes comfortably | UD-Q4_K_XL+ |
| B200 SXM 192GB | 192GB | Yes, plenty of headroom | Any quantization |
GPU Cloud Pricing (USD/hour, on-demand unless noted):
| Provider | A100 80GB | H100 SXM 80GB | H200 SXM | B200 SXM | RTX 4090 |
|---|---|---|---|---|---|
| Lambda | $1.99 | 3.99 | — | $6.69 | — |
| RunPod (Secure Cloud) | 1.64 | 3.89 | $4.39 | — | $0.69 |
| Vast.ai (marketplace) | 2.00 | 4.00 | — | — | 0.55 |
| Spheron | 0.60) | 1.03) | $4.54 | $6.02 | $0.55 |
| JarvisLabs | $1.49 | $2.69 | $3.80 | — | — |
| AWS p5 | ~$3.43 | ~$6.88 | — | — | N/A |
| GCP A3 | ~$5.78 | ~3.69) | — | — | N/A |
Cost Analysis for 6,400 USD at 0.64 rate)
Scenario 1: H100 SXM 80GB — Minimum viable for M3 inference
- Lambda Labs: 78.96/day → $5,400/yr ✅ (within budget at ~62% utilisation)
- RunPod Secure: $3.29/hr × 24h = same as Lambda above
- Spheron spot: 24.72/day → $1,800/yr ✅✅ (excellent value, but spot risk)
- Vast.ai marketplace:
2.27/hr → **4,050/yr** ✅ - AWS p5: 165.12/day → $60,269/yr ❌ (way over budget)
Scenario 2: H200 SXM — Comfortable for M3 with headroom
- JarvisLabs: 91.20/day → $33,288/yr ❌
- Spheron: 108.96/day → $39,770/yr ❌
Scenario 3: B200 — Overkill but available
- Lambda Labs: $6.69/hr → way over budget
- Spheron spot: 4,450/yr ✅ (surprisingly affordable)
Recommended Options (ranked):
| Rank | Option | GPU | Est. Cost/yr (USD) | Pros | Cons |
|---|---|---|---|---|---|
| 1 | Lambda Labs | H100 SXM 80GB | ~7,368 (11,600 AUD) | Reliable, no spot risk, good support | Price pushes budget slightly |
| 2 | Spheron (spot) | H100 SXM 80GB | ~2,800 AUD) | Cheapest viable option | Spot interruptions possible |
| 3 | Vast.ai | H100 marketplace | ~4,050 (6,300 AUD) | Flexible, cheap | Variable reliability, no SLA |
| 4 | Spheron (spot) | B200 SXM 192GB | ~7,000 AUD) | Huge headroom, fast inference | Spot risk, newer architecture |
Next chunk picks up:
- Validate throughput claims (>10 tok/s on H100 with M3 UD-IQ4_XS quantization)
- Check if multi-GPU RTX 4090 setups could work (dual 4090 = 48GB VRAM total, likely insufficient for any reasonable quant)
- Investigate serverless inference options (RunPod Serverless, Together AI API pricing at 1.20 per 1M tokens for M3)
Sources:
- Spheron GPU pricing: https://www.spheron.network/blog/gpu-cloud-pricing-comparison-2026/
- Medium Vast.ai vs RunPod comparison: https://medium.com/@velinxs/vast-ai-vs-runpod-pricing-in-2026-which-gpu-cloud-is-cheaper-bd4104aa591b
- Lambda pricing: https://lambda.ai/pricing
- Unsloth MiniMax M3 docs: https://unsloth.ai/docs/models/minimax-m3
- JarvisLabs providers: https://jarvislabs.ai/ai-faqs/best-cloud-gpu-providers-2026