MiniMax M3 Hosting Cost Comparison
Budget: ≤10,000 AUD/yr (~$7,060 USD at 1.416 exchange rate) Target: >10 tokens/sec throughput for MiniMax-M3-class inference
Model Specifications
| Spec | Value |
|---|---|
| Parameters (total) | 229.9B MoE (9.8B active/tok, 428B dense equivalent) |
| VRAM BF16 | ~460 GB |
| VRAM FP8 | ~230 GB |
| VRAM AWQ INT4 | ~115–233 GB (reports vary) |
| KV Cache @ 1M ctx (FP8) | ~120 GB |
Minimum GPU Requirements
| Config | Precision | GPUs needed | Min VRAM total |
|---|---|---|---|
| Single-HGPU | FP8/INT4 | 1× H200 or 2× H100 | ~280–320 GB |
| Multi-Retail | INT4 | 2× RTX 5090 (24GB ea) or similar | ~56+ GB but insufficient — too small |
Key finding: MiniMax M3 cannot fit on consumer GPUs (RTX 4090/5090) even at INT4. Requires ≥141 GB VRAM minimum (single H200 SXM5), typically needs multi-GPU server configs.
Cloud GPU Pricing (June 2026)
Spot Prices (best value for budget constraint)
| Provider | GPU Config | Price/hr USD | $/hr AUD | Annual cost @ 8760h AUD |
|---|---|---|---|---|
| Spheron | 1× H200 SXM5 INT4 | $1.82 | $2.58 | $22,574 |
| Vast.ai | 1× H200 SXM5 | 13.78 | 19.51 | 170,726 |
| Vast.ai | 1× H100 SXM5 | $0.57 | $0.81 | $7,060 (at floor) |
| Runpod | 1× A100 80GB | ~1.19 | 1.68 | 14,700 |
Serverless API Pricing (per 1M tokens)
| Provider | Input / 1M tok USD | Output / 1M tok USD | Input AUD | Output AUD |
|---|---|---|---|---|
| Together AI | $0.30 | $1.20 | $0.425 | $1.70 |
| Spheron | ~$0.55 (est.) | — | ~$0.78 | — |
At 1 tok/s average (8760 tok/hr = ~32M tok/day):
- Together API: ~$12,168 AUD/yr (exceeds budget)
- At 10 tok/s: ~$121,680 AUD/yr (far exceeds budget)
Self-hosted throughput benchmarks
| GPU | MiniMax M3 tok/s (vLLM FP8) | $/hr spot USD | $/tok |
|---|---|---|---|
| 1× H200 SXM5 | ~699 tok/s | 3.64 | 0.0052 |
| 1× H100 SXM5 | ~537 tok/s | 1.82 | 0.0034 |
H100 at spot floor (5,023 USD ≈ $7,112 AUD — within budget
Conclusion & Recommendations
Feasibility: ✅ Barely feasible at spot pricing
The only viable path within 10,000 AUD/yr:
- Vast.ai H100 SXM5 spot at 7,112 AUD/yr
- Caveat: spot instances may be preempted; actual price often higher than floor
- Provides ~537 tok/s (well above 10 tok/s target)
- Runpod A100 80GB at low end → ~14,700 AUD/yr (depending on actual rate)
Not viable within budget:
- H200 configs (minimum ~$12,415 AUD/yr)
- Serverless API at meaningful throughput (Together AI: $12k+ at 1 tok/s average)
- Consumer GPUs (RTX 4090/5090): insufficient VRAM for MiniMax M3 weights even at INT4
Key risks:
- Spot instance preemption
- Price floors change; Vast.ai H100 at $0.57/hr may be temporary low
- Storage costs (230+ GB model checkpoint) not included above
- 24/7 availability requires on-demand pricing (1.5–2× spot cost)