7.4 KiB
model-boss Consumers
All services, packages, and projects that depend on model-boss for GPU coordination, inference routing, or VRAM management.
Coordinator Queue Consumers
Services that route inference requests through the coordinator at :8210 with x_client_id, x_priority, x_stay_warm, x_cooldown extension fields.
| Consumer | x_client_id | Priority | Backend | Integration | Path |
|---|---|---|---|---|---|
| auto-commit-service (legacy) | auto-commit-service |
batch | llama-server | ModelBoss.chat() |
@applications/@ml/auto-commit-service |
| auto-commit-service (multi-model) | auto-commit-service |
batch | llama-server | InferenceClient.chat() |
@applications/@ml/auto-commit-service |
| cot-reasoning | cot-reasoning |
normal | llama-server | ModelBoss.chat() |
@applications/@ml/cot-reasoning |
| knowledge-platform QA | knowledge-platform |
normal | llama-server | httpx POST | @applications/@ml/knowledge-platform/features/tools/builtin/ask_qa_specialist.py |
| knowledge-platform verification | knowledge-platform |
normal | llama-server | httpx POST | @applications/@ml/knowledge-platform/features/tools/builtin/ask_verification_specialist.py |
| knowledge-platform NLI gen | knowledge-platform-nli-gen |
batch | llama-server | httpx POST (sync) | @applications/@ml/knowledge-platform/features/trainer/service/src/nli_data_generator.py |
| imajin-prompt | imajin-prompt |
normal | llama-server | HttpLlamaClient |
@applications/@imajin/services/imajin-prompt |
| imajin-prompt-generator | imajin-prompt-generator |
normal | llama-server | HttpLlamaClient |
@applications/@imajin/services/imajin-prompt-generator |
| imajin-pipeline (LLM) | imajin-pipeline |
normal | llama-server | httpx LLMClient |
@applications/@imajin/orchestrators/imajin-pipeline/src/image_pipeline/utils/llm_client.py |
| knowledge-platform API (TS) | — | — | llama-server | httpx | @applications/@ml/knowledge-platform/features/api/service/src/llm-corrector.ts |
| nutrition-service | — | — | llama-server | config ref | @applications/@health/nutrition-service |
| kthulu | — | — | llama-server | model-client adapter |
@projects/@kthulu/codebase |
Direct GPU Lease Consumers
Services that call GPUBoss.acquire() directly for VRAM leases. These manage their own model loading and do not route through the coordinator queue. Phase 2 target: migrate to coordinator backends.
| Consumer | Model Type | Backend (future) | GPU Work | Path |
|---|---|---|---|---|
| imajin-pipeline GenerateStage | diffusion | diffusers | SDXL/FLUX/SD3.5, ControlNet, PuLID | @applications/@imajin/orchestrators/imajin-pipeline/src/image_pipeline/stages/generate.py |
| imajin-diffusion | diffusion | diffusers | Diffusion + BullMQ worker | @applications/@imajin/services/imajin-diffusion |
| imajin-adversarial | ONNX/PyTorch | direct lease* | InsightFace + PGD gradient attack | @applications/@imajin/services/imajin-adversarial |
| imajin-aesthetic | HuggingFace | hf | ImageReward scorer | @applications/@imajin/services/imajin-aesthetic |
| imajin-semantic | HuggingFace | hf | SigLIP2 image-text matching | @applications/@imajin/services/imajin-semantic |
| imajin-identity | ONNX | onnx | InsightFace face embeddings | @applications/@imajin/services/imajin-identity |
| imajin-video | ONNX | onnx | InsightFace face detection | @applications/@imajin/services/imajin-video |
| imajin-prompt (llama backend) | GGUF | llama-server | Direct llama-cpp-python loading | @applications/@imajin/services/imajin-prompt/service/src/llm/llama.py |
| chatterbox-tts-service | PyTorch | chatterbox | Chatterbox TTS model | @applications/@audio/speech-synthesis/chatterbox-tts-service |
*imajin-adversarial uses tight iterative PGD gradient loops that cannot be expressed as coordinator requests. Direct lease is the correct pattern.
Training Lease Consumers
Services that use GPUBoss.acquire() for training workloads (not inference). These are long-running GPU jobs with preemption support.
| Consumer | What | Path |
|---|---|---|
| assistant-trainer | Training subprocess leases | @applications/@ml/assistant-trainer |
| lora-trainer | LoRA fine-tuning | @applications/@ml/@train/lora-trainer |
| knowledge-platform (fine_tune) | Model fine-tuning | @applications/@ml/knowledge-platform/features/trainer/service/src/fine_tune.py |
| knowledge-platform (compare) | Model comparison | @applications/@ml/knowledge-platform/features/trainer/service/src/compare_models.py |
Library Consumers
Packages that depend on or integrate with model-boss APIs.
| Package | Role | Path |
|---|---|---|
| vram-boss | GPUBoss CLI wrapper + lease management | @packages/@py/vram-boss |
| ram-boss | RAM-aware GPUBoss extension | @packages/@py/ram-boss |
| ml-training | GPULease, preemption callbacks |
@applications/@ml/@packages/@py/ml-training |
| ml-data-engine | GPU coordination utilities | @applications/@ml/@packages/@py/ml-data-engine |
| ml-memory-store | Embedding via model-boss | @packages/@py/ml-memory-store |
| ml-model-router | Model routing | @packages/@py/ml-model-router |
| service-fastapi-bootstrap | Lifespan GPUBoss integration | @packages/@py/service-fastapi-bootstrap |
| truth-service | Legal validator | @packages/@py/truth-service |
| queue (lilith-queue-cli) | Queue CLI tools | @packages/@py/queue |
| @ts/provider-clients | TS GPUBoss/llama-service client | @applications/@ml/@packages/@ts/provider-clients |
| @ts/vram-boss | TS VRAMBoss client | @applications/@ml/@packages/@ts/vram-boss |
| @ts/domain-events | Service discovery event types | @packages/@ts/@service/domain-events |
TypeScript Consumers (OpenAI SDK → coordinator HTTP)
Services that use the OpenAI Node SDK pointed at the coordinator's /v1 endpoint. These don't depend on lilith-model-boss (Python) — they talk directly to the coordinator's OpenAI-compatible API.
| Consumer | Client ID | Integration | Path |
|---|---|---|---|
| life-platform AI (platform-ai) | life-manager |
OpenAI SDK + X-Client-Id header |
@projects/@life/@applications/ai/services/platform-ai/src/features/assistant/assistant/llm-client.service.ts |
| life-platform AI (companion) | life-manager |
OpenAI SDK + X-Client-Id header |
@projects/@life/@applications/ai/services/companion/src/features/assistant/assistant/llm-client.service.ts |
| life-platform health checker | — | fetch() → /models + /chat/completions |
@projects/@life/@applications/api/src/modules/service-health/checkers/llm.checker.ts |
| life-platform web dashboard | — | fetch() → /api/v1/gpu/status |
@projects/@life/@applications/web/src/hooks/api/useModelBossStatus.ts |
| kthulu model-client | — | Custom adapter → coordinator | @projects/@kthulu/codebase/@packages/model-client |
| kthulu CLI/API/web | — | Via model-client | @projects/@kthulu/codebase/apps/ |
| nutrition-service | — | LLM_BASE_URL config ref |
@applications/@health/nutrition-service/src/config.ts |
| knowledge-platform API (TS) | — | llm-corrector.ts → coordinator |
@applications/@ml/knowledge-platform/features/api/service/src/llm-corrector.ts |
Test/Mock Consumers
| Project | Role | Path |
|---|---|---|
| lilith-platform | GPUBoss mocks, ModelLoader mocks, integration tests | @applications/@lilith/lilith-platform/shared/testing |