Architecture refactor per the plan:
Packages:
- core-py: Lean model-boss with GPUBoss, RAMBoss, path resolution
- core-ts: TypeScript client (@lilith/model-boss)
- loaders-py: Extracted direct model loaders (optional, for dev/testing)
New in core-py:
- InferenceRouter for service discovery and routing
- LlamaHttpClient, DiffusionHttpClient typed clients
- VRAM estimation utilities
Services use model-boss for:
- VRAM lease coordination (GPUBoss)
- Model path resolution
- Service discovery (InferenceRouter)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>