Multi-model tiering — project index

Route different task types to appropriate models/agents. Phase 1 = routing layer; Phase 2 = MiniMax-3 serving on 4090 + RAM/NVMe split; Phase 3 = Kimi-2.6.

Status

  • Design: Complete (see design)
  • Control API patch drafted (control_api_patch.md — not yet applied to live repo)
  • No code deployed yet

Architecture

Three tiers: Architect (planning/design/audit → MiniMax-3 on 4090), Builder (implementation → carnice on 4090), Operator (infra/email/admin → mercury on cosmos). Architects and Builders time-share the 4090.

Next milestones

  1. Apply control_api_patch.md to tasks.paralla.org control plane codebase
  2. Stand up model-manager service on .106
  3. Download MiniMax-3 Q4_K to .106, serve with llama-server + —n-cpu-moe
  4. Flip architect-type tasks to MiniMax-3

hermes-k8s-deploymentmodel-routing