LLM Hosting Research — Session 2026-06-25

Goal: First cost comparison research pass — MiniMax-M3-class models on $10,000 AUD/year budget with >10 tok/s throughput.

Chunk 2026-06-25T01:24 (autopilot tick)

What was done:

Gathered current GPU cloud pricing across major providers (RunPod, Vast.ai, Lambda, AWS, GCP, Spheron, JarvisLabs)
Researched MiniMax-M3 model specifications and hardware requirements
Compiled cost comparison for GPUs that can run M3

Key Findings

MiniMax-M3 Requirements

Size: ~428B parameters (Mixture of Experts, 23B active params)
Quantized sizes: UD-IQ1_M (128GB), UD-IQ3_XXS (159GB), UD-IQ4_XS (208GB), UD-Q4_K_XL (265GB)
Minimum VRAM: ~80GB for smallest quantization (UD-IQ1_M), 133GB total memory needed
Inference throughput: Depends on GPU; need >10 tok/s target

GPUs that fit the M3 model:

GPU	VRAM	Can Run M3?	Min Quantization
RTX 4090	24GB	No (too small, even quantized)	—
A100 PCIe 40GB	40GB	No	—
A100 SXM 80GB	80GB	Marginal (UD-IQ1_M only)	UD-IQ1_M
H100 SXM 80GB	80GB	Yes (UD-IQ3_XXS likely, tight for UD-IQ4_XS)	UD-IQ3_XXS / UD-IQ4_XS
H200 SXM 141GB	141GB	Yes comfortably	UD-Q4_K_XL+
B200 SXM 192GB	192GB	Yes, plenty of headroom	Any quantization

GPU Cloud Pricing (USD/hour, on-demand unless noted):

Provider	A100 80GB	H100 SXM 80GB	H200 SXM	B200 SXM	RTX 4090
Lambda	$1.99	$3.29-$ 3.99	—	$6.69	—
RunPod (Secure Cloud)	$0.79-$ 1.64	$3.29-$ 3.89	$4.39	—	$0.69
Vast.ai (marketplace)	$0.52-$ 2.00	$1.53-$ 4.00	—	—	$0.35-$ 0.55
Spheron	$1.07 (s p o t :$ 0.60)	$2.50 (s p o t :$ 1.03)	$4.54	$6.02	$0.55
JarvisLabs	$1.49	$2.69	$3.80	—	—
AWS p5	~$3.43	~$6.88	—	—	N/A
GCP A3	~$5.78	~ $10.98 (s p o t :$ 3.69)	—	—	N/A

Cost Analysis for $10, 000 A U D B u d g e t ($ 6,400 USD at 0.64 rate)

Scenario 1: H100 SXM 80GB — Minimum viable for M3 inference

Lambda Labs: $3.29/ h r \times 24 h =$ 78.96/day → $5,400/yr ✅ (within budget at ~62% utilisation)
RunPod Secure: $3.29/hr × 24h = same as Lambda above
Spheron spot: $1.03/ h r \times 24 h =$ 24.72/day → $1,800/yr ✅✅ (excellent value, but spot risk)
Vast.ai marketplace: $1.53-$ 2.27/hr → ** $2, 660-$ 4,050/yr** ✅
AWS p5: $6.88/ h r \times 24 h =$ 165.12/day → $60,269/yr ❌ (way over budget)

Scenario 2: H200 SXM — Comfortable for M3 with headroom

JarvisLabs: $3.80/ h r \times 24 h =$ 91.20/day → $33,288/yr ❌
Spheron: $4.54/ h r \times 24 h =$ 108.96/day → $39,770/yr ❌

Scenario 3: B200 — Overkill but available

Lambda Labs: $6.69/hr → way over budget
Spheron spot: $2.12/ h r \to$ 4,450/yr ✅ (surprisingly affordable)

Recommended Options (ranked):

Rank	Option	GPU	Est. Cost/yr (USD)	Pros	Cons
1	Lambda Labs	H100 SXM 80GB	~ $5, 400-$ 7,368 ( $8, 400-$ 11,600 AUD)	Reliable, no spot risk, good support	Price pushes budget slightly
2	Spheron (spot)	H100 SXM 80GB	~ $1, 800 ($ 2,800 AUD)	Cheapest viable option	Spot interruptions possible
3	Vast.ai	H100 marketplace	~ $2, 660-$ 4,050 ( $4, 200-$ 6,300 AUD)	Flexible, cheap	Variable reliability, no SLA
4	Spheron (spot)	B200 SXM 192GB	~ $4, 450 ($ 7,000 AUD)	Huge headroom, fast inference	Spot risk, newer architecture

Next chunk picks up:

Validate throughput claims (>10 tok/s on H100 with M3 UD-IQ4_XS quantization)
Check if multi-GPU RTX 4090 setups could work (dual 4090 = 48GB VRAM total, likely insufficient for any reasonable quant)
Investigate serverless inference options (RunPod Serverless, Together AI API pricing at $0.30/$ 1.20 per 1M tokens for M3)

Sources:

Spheron GPU pricing: https://www.spheron.network/blog/gpu-cloud-pricing-comparison-2026/
Medium Vast.ai vs RunPod comparison: https://medium.com/@velinxs/vast-ai-vs-runpod-pricing-in-2026-which-gpu-cloud-is-cheaper-bd4104aa591b
Lambda pricing: https://lambda.ai/pricing
Unsloth MiniMax M3 docs: https://unsloth.ai/docs/models/minimax-m3
JarvisLabs providers: https://jarvislabs.ai/ai-faqs/best-cloud-gpu-providers-2026

Quartz 4

Explorer

2026-06-25

LLM Hosting Research — Session 2026-06-25

Chunk 2026-06-25T01:24 (autopilot tick)

What was done:

Key Findings

MiniMax-M3 Requirements

GPUs that fit the M3 model:

GPU Cloud Pricing (USD/hour, on-demand unless noted):

Cost Analysis for $10, 000 A U D B u d g e t ($ 6,400 USD at 0.64 rate)

Recommended Options (ranked):

Next chunk picks up:

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

2026-06-25

LLM Hosting Research — Session 2026-06-25

Chunk 2026-06-25T01:24 (autopilot tick)

What was done:

Key Findings

MiniMax-M3 Requirements

GPUs that fit the M3 model:

GPU Cloud Pricing (USD/hour, on-demand unless noted):

Cost Analysis for 10,000AUDBudget( 6,400 USD at 0.64 rate)

Recommended Options (ranked):

Next chunk picks up:

Graph View

Table of Contents

Backlinks

Cost Analysis for $10, 000 A U D B u d g e t ($ 6,400 USD at 0.64 rate)