MiniMax-M3 Hosting Cost Analysis

Research compiled 2026-06-24 by hermes-ralph. Budget: ≤10k AUD (~$6,500 USD). Target: >10 tok/s throughput.

Model Specs

ParamValue
ArchitectureMixture-of-Experts (MoE)
Total params229.9B
Active params/token9.8B
Context window1M tokens
Quantized sizesBF16: ~460GB, FP8: ~230GB, INT4-AWQ: ~115GB

Option 1: API Providers (pay-per-token)

No upfront cost. Scale on demand. Cheapest below breakeven.

Tiered pricing by provider

ProviderBlended price (USD/1M tokens)Output speed (tok/s)Latency (s)Notes
Parasail (MXFP8)$0.22252.16Lowest price, slowest throughput
Together AI$0.22760.92Good balance of price/speed
Novita$0.22762.58Matches Together on speed
MiniMax (native)$0.22873.12Direct API, fastest throughput
Makora (MXFP8)$0.34910.86Best performance, modestly pricier
SiliconFlow$0.44663.15Higher price for no clear advantage
GMI$0.59774.08Most expensive, avoid

Cost projection at 10 tok/s sustained

  • Tokens/month at 10 tok/s (continuous): 2,592,000,000 tokens (~2.6B)
  • Cheapest (Parasail/Together/Novita/MiniMax): ~887 AUD) — well under budget
  • Makora (best perf): ~1,346 AUD)

Verdict: API is the clear winner for ≤2.6B tokens/month. At 10 tok/s sustained, cheapest option costs ~$887 AUD/mo — roughly 14% of budget.


Option 2: Cloud GPU Rental (self-hosted on rented hardware)

For MiniMax-M3, minimum viable configs:

ConfigPrecisionVRAMSpot ($/hr)On-demand ($/hr)Monthly cost ($)
2x H200 SXM5FP8282 GB$3.64$9.687,000
4x H100 SXM5FP8320 GB$5.72$15.6811,390
1x H200 SXM5AWQ INT4141 GB$1.82$4.843,500

Monthly = hours/hr × 24 × 30 days.

Verdict: Spot pricing for minimal config (2x H200 FP8) at ~1,320/mo spot) fits budget but has severe KV-cache constraints for full 1M context.


Option 3: Self-hosted hardware purchase

GPUCost AUD (approx)VRAMMonthly electricityAmortised/mo (24mo)
RTX 4090 x1~$4,50024 GB$7575 = $258

Single RTX 4090 is insufficient — model won’t fit even at INT4 quantization. Need multi-GPU:

  • Minimum viable: 2x RTX 4090 (~$9,000 AUD) — just barely under budget for hardware alone
  • Realistic: 4x RTX 4090 (~$18,000 AUD) — exceeds budget

Verdict: Not feasible within $10k AUD budget. Single GPU insufficient, multi-GPU setup over budget even before electricity, networking, and cooling costs.


Breakeven Analysis

API price/M tokensSelf-hosting breakeven (tokens/month)
$2.00 (market avg)~23M
$0.73 (Qwen3.5-35B via Air)~155M
$0.22 (MiniMax-M3 cheapest)~469M

At MiniMax’s API price (18k+ investment.


Recommendation Summary

ScenarioBest OptionMonthly cost AUD
Low volume (<469M tok/mo)API (Together AI or MiniMax native)< $300
Target: 10 tok/s sustained (~2.6B tok/mo)Cloud GPU spot (2x H200 FP8)~$3,950
Within $10k AUD budgetAPI at Together AI or MiniMax~1,346

Top recommendation: Use the native MiniMax API or Together AI at blended 887 AUD/month — well within budget, zero infrastructure, instant throughput. Cloud GPU rental (2x H200 spot) is the next option if you need to self-host for privacy/customisation reasons (~$3,950 AUD/mo).


Spot pricing summary (consolidated from 2026-06-25 comparison)

ProviderGPU ConfigPrice/hr USD$/hr AUDAnnual cost @ 8760h AUD
Spheron1× H200 SXM5 INT4$1.82$2.58$22,574
Vast.ai1× H200 SXM513.7819.51170,726
Vast.ai1× H100 SXM5$0.57$0.81$7,060 (at floor)
Runpod1× A100 80GB~1.191.6814,700

H100 at Vast.ai spot floor (7,112 AUD — within budget.