Multi-model tiering — project index
Route different task types to appropriate models/agents. Phase 1 = routing layer; Phase 2 = MiniMax-3 serving on 4090 + RAM/NVMe split; Phase 3 = Kimi-2.6.
Status
- Design: Complete (see design)
- Control API patch drafted (control_api_patch.md — not yet applied to live repo)
- No code deployed yet
Architecture
Three tiers: Architect (planning/design/audit → MiniMax-3 on 4090), Builder (implementation → carnice on 4090), Operator (infra/email/admin → mercury on cosmos). Architects and Builders time-share the 4090.
Next milestones
- Apply control_api_patch.md to tasks.paralla.org control plane codebase
- Stand up model-manager service on .106
- Download MiniMax-3 Q4_K to .106, serve with llama-server + —n-cpu-moe
- Flip architect-type tasks to MiniMax-3