MiniMax-M3 Hosting Cost Analysis
Research compiled 2026-06-24 by hermes-ralph. Budget: ≤10k AUD (~$6,500 USD). Target: >10 tok/s throughput.
Model Specs
| Param | Value |
|---|---|
| Architecture | Mixture-of-Experts (MoE) |
| Total params | 229.9B |
| Active params/token | 9.8B |
| Context window | 1M tokens |
| Quantized sizes | BF16: ~460GB, FP8: ~230GB, INT4-AWQ: ~115GB |
Option 1: API Providers (pay-per-token)
No upfront cost. Scale on demand. Cheapest below breakeven.
Tiered pricing by provider
| Provider | Blended price (USD/1M tokens) | Output speed (tok/s) | Latency (s) | Notes |
|---|---|---|---|---|
| Parasail (MXFP8) | $0.22 | 25 | 2.16 | Lowest price, slowest throughput |
| Together AI | $0.22 | 76 | 0.92 | Good balance of price/speed |
| Novita | $0.22 | 76 | 2.58 | Matches Together on speed |
| MiniMax (native) | $0.22 | 87 | 3.12 | Direct API, fastest throughput |
| Makora (MXFP8) | $0.34 | 91 | 0.86 | Best performance, modestly pricier |
| SiliconFlow | $0.44 | 66 | 3.15 | Higher price for no clear advantage |
| GMI | $0.59 | 77 | 4.08 | Most expensive, avoid |
Cost projection at 10 tok/s sustained
- Tokens/month at 10 tok/s (continuous): 2,592,000,000 tokens (~2.6B)
- Cheapest (Parasail/Together/Novita/MiniMax): ~887 AUD) — well under budget
- Makora (best perf): ~1,346 AUD)
Verdict: API is the clear winner for ≤2.6B tokens/month. At 10 tok/s sustained, cheapest option costs ~$887 AUD/mo — roughly 14% of budget.
Option 2: Cloud GPU Rental (self-hosted on rented hardware)
For MiniMax-M3, minimum viable configs:
| Config | Precision | VRAM | Spot ($/hr) | On-demand ($/hr) | Monthly cost ($) |
|---|---|---|---|---|---|
| 2x H200 SXM5 | FP8 | 282 GB | $3.64 | $9.68 | 7,000 |
| 4x H100 SXM5 | FP8 | 320 GB | $5.72 | $15.68 | 11,390 |
| 1x H200 SXM5 | AWQ INT4 | 141 GB | $1.82 | $4.84 | 3,500 |
Monthly = hours/hr × 24 × 30 days.
Verdict: Spot pricing for minimal config (2x H200 FP8) at ~1,320/mo spot) fits budget but has severe KV-cache constraints for full 1M context.
Option 3: Self-hosted hardware purchase
| GPU | Cost AUD (approx) | VRAM | Monthly electricity | Amortised/mo (24mo) |
|---|---|---|---|---|
| RTX 4090 x1 | ~$4,500 | 24 GB | $75 | 75 = $258 |
Single RTX 4090 is insufficient — model won’t fit even at INT4 quantization. Need multi-GPU:
- Minimum viable: 2x RTX 4090 (~$9,000 AUD) — just barely under budget for hardware alone
- Realistic: 4x RTX 4090 (~$18,000 AUD) — exceeds budget
Verdict: Not feasible within $10k AUD budget. Single GPU insufficient, multi-GPU setup over budget even before electricity, networking, and cooling costs.
Breakeven Analysis
| API price/M tokens | Self-hosting breakeven (tokens/month) |
|---|---|
| $2.00 (market avg) | ~23M |
| $0.73 (Qwen3.5-35B via Air) | ~155M |
| $0.22 (MiniMax-M3 cheapest) | ~469M |
At MiniMax’s API price (18k+ investment.
Recommendation Summary
| Scenario | Best Option | Monthly cost AUD |
|---|---|---|
| Low volume (<469M tok/mo) | API (Together AI or MiniMax native) | < $300 |
| Target: 10 tok/s sustained (~2.6B tok/mo) | Cloud GPU spot (2x H200 FP8) | ~$3,950 |
| Within $10k AUD budget | API at Together AI or MiniMax | ~1,346 |
Top recommendation: Use the native MiniMax API or Together AI at blended 887 AUD/month — well within budget, zero infrastructure, instant throughput. Cloud GPU rental (2x H200 spot) is the next option if you need to self-host for privacy/customisation reasons (~$3,950 AUD/mo).
Spot pricing summary (consolidated from 2026-06-25 comparison)
| Provider | GPU Config | Price/hr USD | $/hr AUD | Annual cost @ 8760h AUD |
|---|---|---|---|---|
| Spheron | 1× H200 SXM5 INT4 | $1.82 | $2.58 | $22,574 |
| Vast.ai | 1× H200 SXM5 | 13.78 | 19.51 | 170,726 |
| Vast.ai | 1× H100 SXM5 | $0.57 | $0.81 | $7,060 (at floor) |
| Runpod | 1× A100 80GB | ~1.19 | 1.68 | 14,700 |
H100 at Vast.ai spot floor (7,112 AUD — within budget.