MiniMax M3 Hosting Cost Comparison

Budget: ≤10,000 AUD/yr (~$7,060 USD at 1.416 exchange rate) Target: >10 tokens/sec throughput for MiniMax-M3-class inference

Model Specifications

Spec	Value
Parameters (total)	229.9B MoE (9.8B active/tok, 428B dense equivalent)
VRAM BF16	~460 GB
VRAM FP8	~230 GB
VRAM AWQ INT4	~115–233 GB (reports vary)
KV Cache @ 1M ctx (FP8)	~120 GB

Minimum GPU Requirements

Config	Precision	GPUs needed	Min VRAM total
Single-HGPU	FP8/INT4	1× H200 or 2× H100	~280–320 GB
Multi-Retail	INT4	2× RTX 5090 (24GB ea) or similar	~56+ GB but insufficient — too small

Key finding: MiniMax M3 cannot fit on consumer GPUs (RTX 4090/5090) even at INT4. Requires ≥141 GB VRAM minimum (single H200 SXM5), typically needs multi-GPU server configs.

Cloud GPU Pricing (June 2026)

Spot Prices (best value for budget constraint)

Provider	GPU Config	Price/hr USD	$/hr AUD	Annual cost @ 8760h AUD
Spheron	1× H200 SXM5 INT4	$1.82	$2.58	$22,574
Vast.ai	1× H200 SXM5	$1.00-$ 13.78	$1.42-$ 19.51	$12, 415-$ 170,726
Vast.ai	1× H100 SXM5	$0.57	$0.81	$7,060 (at floor)
Runpod	1× A100 80GB	~ $0.34-$ 1.19	$0.48-$ 1.68	$4, 215-$ 14,700

Serverless API Pricing (per 1M tokens)

Provider	Input / 1M tok USD	Output / 1M tok USD	Input AUD	Output AUD
Together AI	$0.30	$1.20	$0.425	$1.70
Spheron	~$0.55 (est.)	—	~$0.78	—

At 1 tok/s average (8760 tok/hr = ~32M tok/day):

Together API: ~$12,168 AUD/yr (exceeds budget)
At 10 tok/s: ~$121,680 AUD/yr (far exceeds budget)

Self-hosted throughput benchmarks

GPU	MiniMax M3 tok/s (vLLM FP8)	$/hr spot USD	$/tok
1× H200 SXM5	~699 tok/s	$1.00-$ 3.64	$0.00054-$ 0.0052
1× H100 SXM5	~537 tok/s	$0.57-$ 1.82	$0.00011-$ 0.0034

H100 at spot floor ( $0.57/ h r) o nVa s t . ai = ann u a l cos t$ 5,023 USD ≈ $7,112 AUD — within budget

Conclusion & Recommendations

Feasibility: ✅ Barely feasible at spot pricing

The only viable path within 10,000 AUD/yr:

Vast.ai H100 SXM5 spot at $0.57/ h r f l oor \to$ 7,112 AUD/yr
- Caveat: spot instances may be preempted; actual price often higher than floor
- Provides ~537 tok/s (well above 10 tok/s target)
Runpod A100 80GB at low end → ~ $4, 215-$ 14,700 AUD/yr (depending on actual rate)

Not viable within budget:

H200 configs (minimum ~$12,415 AUD/yr)
Serverless API at meaningful throughput (Together AI: $12k+ at 1 tok/s average)
Consumer GPUs (RTX 4090/5090): insufficient VRAM for MiniMax M3 weights even at INT4

Key risks:

Spot instance preemption
Price floors change; Vast.ai H100 at $0.57/hr may be temporary low
Storage costs (230+ GB model checkpoint) not included above
24/7 availability requires on-demand pricing (1.5–2× spot cost)

Quartz 4

Explorer

MiniMax M3 Hosting Cost Comparison (2026-06)