MiniMax M3 Hosting Cost Comparison

Budget: ≤10,000 AUD/yr (~$7,060 USD at 1.416 exchange rate) Target: >10 tokens/sec throughput for MiniMax-M3-class inference

Model Specifications

SpecValue
Parameters (total)229.9B MoE (9.8B active/tok, 428B dense equivalent)
VRAM BF16~460 GB
VRAM FP8~230 GB
VRAM AWQ INT4~115–233 GB (reports vary)
KV Cache @ 1M ctx (FP8)~120 GB

Minimum GPU Requirements

ConfigPrecisionGPUs neededMin VRAM total
Single-HGPUFP8/INT41× H200 or 2× H100~280–320 GB
Multi-RetailINT42× RTX 5090 (24GB ea) or similar~56+ GB but insufficient — too small

Key finding: MiniMax M3 cannot fit on consumer GPUs (RTX 4090/5090) even at INT4. Requires ≥141 GB VRAM minimum (single H200 SXM5), typically needs multi-GPU server configs.

Cloud GPU Pricing (June 2026)

Spot Prices (best value for budget constraint)

ProviderGPU ConfigPrice/hr USD$/hr AUDAnnual cost @ 8760h AUD
Spheron1× H200 SXM5 INT4$1.82$2.58$22,574
Vast.ai1× H200 SXM513.7819.51170,726
Vast.ai1× H100 SXM5$0.57$0.81$7,060 (at floor)
Runpod1× A100 80GB~1.191.6814,700

Serverless API Pricing (per 1M tokens)

ProviderInput / 1M tok USDOutput / 1M tok USDInput AUDOutput AUD
Together AI$0.30$1.20$0.425$1.70
Spheron~$0.55 (est.)~$0.78

At 1 tok/s average (8760 tok/hr = ~32M tok/day):

  • Together API: ~$12,168 AUD/yr (exceeds budget)
  • At 10 tok/s: ~$121,680 AUD/yr (far exceeds budget)

Self-hosted throughput benchmarks

GPUMiniMax M3 tok/s (vLLM FP8)$/hr spot USD$/tok
1× H200 SXM5~699 tok/s3.640.0052
1× H100 SXM5~537 tok/s1.820.0034

H100 at spot floor (5,023 USD ≈ $7,112 AUD — within budget

Conclusion & Recommendations

Feasibility: ✅ Barely feasible at spot pricing

The only viable path within 10,000 AUD/yr:

  1. Vast.ai H100 SXM5 spot at 7,112 AUD/yr
    • Caveat: spot instances may be preempted; actual price often higher than floor
    • Provides ~537 tok/s (well above 10 tok/s target)
  2. Runpod A100 80GB at low end → ~14,700 AUD/yr (depending on actual rate)

Not viable within budget:

  • H200 configs (minimum ~$12,415 AUD/yr)
  • Serverless API at meaningful throughput (Together AI: $12k+ at 1 tok/s average)
  • Consumer GPUs (RTX 4090/5090): insufficient VRAM for MiniMax M3 weights even at INT4

Key risks:

  • Spot instance preemption
  • Price floors change; Vast.ai H100 at $0.57/hr may be temporary low
  • Storage costs (230+ GB model checkpoint) not included above
  • 24/7 availability requires on-demand pricing (1.5–2× spot cost)