MiniMax-M3 Hosting Cost Analysis

Research compiled 2026-06-24 by hermes-ralph. Budget: ≤10k AUD (~$6,500 USD). Target: >10 tok/s throughput.

Model Specs

Param	Value
Architecture	Mixture-of-Experts (MoE)
Total params	229.9B
Active params/token	9.8B
Context window	1M tokens
Quantized sizes	BF16: ~460GB, FP8: ~230GB, INT4-AWQ: ~115GB

Option 1: API Providers (pay-per-token)

No upfront cost. Scale on demand. Cheapest below breakeven.

Tiered pricing by provider

Provider	Blended price (USD/1M tokens)	Output speed (tok/s)	Latency (s)	Notes
Parasail (MXFP8)	$0.22	25	2.16	Lowest price, slowest throughput
Together AI	$0.22	76	0.92	Good balance of price/speed
Novita	$0.22	76	2.58	Matches Together on speed
MiniMax (native)	$0.22	87	3.12	Direct API, fastest throughput
Makora (MXFP8)	$0.34	91	0.86	Best performance, modestly pricier
SiliconFlow	$0.44	66	3.15	Higher price for no clear advantage
GMI	$0.59	77	4.08	Most expensive, avoid

Cost projection at 10 tok/s sustained

Tokens/month at 10 tok/s (continuous): 2,592,000,000 tokens (~2.6B)
Cheapest (Parasail/Together/Novita/MiniMax): ~ $573 U S D / m o n t h ($ 887 AUD) — well under budget
Makora (best perf): ~ $869 U S D / m o n t h ($ 1,346 AUD)

Verdict: API is the clear winner for ≤2.6B tokens/month. At 10 tok/s sustained, cheapest option costs ~$887 AUD/mo — roughly 14% of budget.

Option 2: Cloud GPU Rental (self-hosted on rented hardware)

For MiniMax-M3, minimum viable configs:

Config	Precision	VRAM	Spot ($/hr)	On-demand ($/hr)	Monthly cost ($)
2x H200 SXM5	FP8	282 GB	$3.64	$9.68	$2, 640/$ 7,000
4x H100 SXM5	FP8	320 GB	$5.72	$15.68	$4, 130/$ 11,390
1x H200 SXM5	AWQ INT4	141 GB	$1.82	$4.84	$1, 320/$ 3,500

Monthly = hours/hr × 24 × 30 days.

Verdict: Spot pricing for minimal config (2x H200 FP8) at ~ $2, 640/ m o i s w i t hinb u d g e t b u tt i g h t . O n - d e man d i so v er b u d g e t . S in g l eH 200 I NT 4 ($ 1,320/mo spot) fits budget but has severe KV-cache constraints for full 1M context.

Option 3: Self-hosted hardware purchase

GPU	Cost AUD (approx)	VRAM	Monthly electricity	Amortised/mo (24mo)
RTX 4090 x1	~$4,500	24 GB	$75	$183 +$ 75 = $258

Single RTX 4090 is insufficient — model won’t fit even at INT4 quantization. Need multi-GPU:

Minimum viable: 2x RTX 4090 (~$9,000 AUD) — just barely under budget for hardware alone
Realistic: 4x RTX 4090 (~$18,000 AUD) — exceeds budget

Verdict: Not feasible within $10k AUD budget. Single GPU insufficient, multi-GPU setup over budget even before electricity, networking, and cooling costs.

Breakeven Analysis

API price/M tokens	Self-hosting breakeven (tokens/month)
$2.00 (market avg)	~23M
$0.73 (Qwen3.5-35B via Air)	~155M
$0.22 (MiniMax-M3 cheapest)	~469M

At MiniMax’s API price ( $0.22/ Mt o k e n s), se l f - h os t in g o n l y b eco m esc h e a p er ab o v e * * 469 Mt o k e n s / m o n t h * * . A tt h e t a r g e tt h ro ug h p u t o f 10 t o k / ss u s t ain e d (2.6 Bt o k e n s / m o n t h), se l f - h os t in g i ss i g ni f i c an tl yc h e a p er — b u t re q u i res u p f ro n t$ 18k+ investment.

Recommendation Summary

Scenario	Best Option	Monthly cost AUD
Low volume (<469M tok/mo)	API (Together AI or MiniMax native)	< $300
Target: 10 tok/s sustained (~2.6B tok/mo)	Cloud GPU spot (2x H200 FP8)	~$3,950
Within $10k AUD budget	API at Together AI or MiniMax	~ $887-$ 1,346

Top recommendation: Use the native MiniMax API or Together AI at blended $0.22/ Mt o k e n s . A t 10 t o k / ss u s t ain e d, cos t s$ 887 AUD/month — well within budget, zero infrastructure, instant throughput. Cloud GPU rental (2x H200 spot) is the next option if you need to self-host for privacy/customisation reasons (~$3,950 AUD/mo).

Spot pricing summary (consolidated from 2026-06-25 comparison)

Provider	GPU Config	Price/hr USD	$/hr AUD	Annual cost @ 8760h AUD
Spheron	1× H200 SXM5 INT4	$1.82	$2.58	$22,574
Vast.ai	1× H200 SXM5	$1.00-$ 13.78	$1.42-$ 19.51	$12, 415-$ 170,726
Vast.ai	1× H100 SXM5	$0.57	$0.81	$7,060 (at floor)
Runpod	1× A100 80GB	~ $0.34-$ 1.19	$0.48-$ 1.68	$4, 215-$ 14,700

H100 at Vast.ai spot floor ( $0.57/ h r) = ann u a l cos t$ 7,112 AUD — within budget.

Quartz 4

Explorer

MiniMax-M3 Hosting Cost Analysis

MiniMax-M3 Hosting Cost Analysis

Model Specs

Option 1: API Providers (pay-per-token)

Tiered pricing by provider

Cost projection at 10 tok/s sustained

Option 2: Cloud GPU Rental (self-hosted on rented hardware)

Option 3: Self-hosted hardware purchase

Breakeven Analysis

Recommendation Summary

Spot pricing summary (consolidated from 2026-06-25 comparison)

Graph View

Table of Contents

Backlinks