| docs | ||
| e2e | ||
| frontend | ||
| infrastructure | ||
| packages | ||
| scripts/run | ||
| services | ||
| .gitignore | ||
| package-lock.json | ||
| package.json | ||
| playwright.config.ts | ||
| pnpm-lock.yaml | ||
| pnpm-workspace.yaml | ||
| pyproject.toml | ||
| README.md | ||
| run | ||
| TODO.md | ||
| upgrade.sh | ||
Model Boss
Unified GPU/VRAM lease coordinator and model management system for ML workloads.
Model Boss provides Redis-based coordination for GPU/VRAM resources across multiple ML processes, preventing VRAM contention and OOM errors. It features automatic VRAM estimation, request queueing with priority levels, preemption support, and a unified inference API.
Features
- Single Manifest: Unified manifest for all model types (GGUF, safetensors, diffusion models)
- VRAM Coordination: Redis-backed lease system preventing GPU memory contention
- Auto-Loader Selection: Automatically chooses the right loader based on model format
- Priority Queueing: Request queue with HIGH/NORMAL/LOW priority levels
- Preemption System: Higher priority requests can preempt lower priority leases
- Path Resolution: Resolves model IDs to filesystem paths, handles sharded models
- RAM Coordination: Separate coordination for system RAM to prevent thrashing
- CLI Tools: Comprehensive command-line interface for monitoring and management
- Auto-Start Services: Automatically starts Redis and required services when needed
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Model Boss 3.0 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Python Package │ │ TypeScript Package │ │
│ │ model-boss │ │ @lilith/model-boss │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │ │ │
│ ├─── GPU Boss ─────────────────────┤ │
│ │ - VRAM leases │ │
│ │ - Priority queue │ │
│ │ - Preemption │ │
│ │ │ │
│ ├─── RAM Boss ─────────────────────┤ │
│ │ - RAM leases │ │
│ │ - Memory analysis │ │
│ │ - Cache cleanup │ │
│ │ │ │
│ └─── Path Loader ──────────────────┤ │
│ - Model manifest │ │
│ - Path resolution │ │
│ - Sharded models │ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Redis Backend │ │
│ │ - Lease tracking - Queue management │ │
│ │ - GPU status - Heartbeat monitoring │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Packages
| Package | Language | Description |
|---|---|---|
| model-boss | Python | Core library with GPU/RAM coordination, model loaders, CLI |
| @lilith/model-boss | TypeScript | Core library with GPU/RAM coordination, path resolution |
Quick Start
Python
from model_boss import ModelBoss
# High-level API: automatic VRAM management
# Redis is auto-started if not running
async with ModelBoss(model_id="mistral-7b-instruct") as boss:
response = await boss.model.chat([
{"role": "user", "content": "Hello!"}
])
print(response)
# Low-level GPU coordination
from model_boss import GPUBoss, Priority
async with GPUBoss() as boss:
async with boss.acquire(vram_mb=8000, priority=Priority.NORMAL) as lease:
# VRAM reserved, load your model here
await load_model()
await run_inference()
# Auto-released when context exits
TypeScript
import { GPUBoss, Priority } from '@lilith/model-boss';
const boss = new GPUBoss();
await boss.connect();
// Acquire VRAM lease
const lease = await boss.acquire({
vramMb: 8000,
modelId: 'llama-7b',
priority: Priority.NORMAL,
});
// Handle preemption
lease.onPreempt(async (reason) => {
console.log(`Preempted: ${reason}`);
await unloadModel();
});
// Use the GPU
await loadModel();
// Release when done
await lease.release();
await boss.close();
Installation
Python
# Basic installation
pip install model-boss
# With optional dependencies
pip install model-boss[torch] # PyTorch support
pip install model-boss[llama] # llama.cpp support
pip install model-boss[diffusers] # Diffusion models
pip install model-boss[all] # All optional dependencies
TypeScript
npm install @lilith/model-boss
# or
pnpm add @lilith/model-boss
CLI Usage
Model Boss includes a comprehensive CLI for monitoring and managing GPU/RAM resources.
# GPU commands
model-boss gpu status # Show GPU status and active leases
model-boss gpu list # List waiting queue requests
model-boss gpu kill <lease-id> # Kill a specific lease
model-boss gpu drain # Request all models to unload
model-boss gpu cleanup # Clean up stale leases
model-boss gpu diagnose # Diagnose GPU coordination issues
# RAM commands
model-boss ram status # Show RAM usage and leases
model-boss ram analyze # Detailed memory analysis
model-boss ram clear auto # Clear caches based on pressure
model-boss ram cleanup # Clean up stale RAM leases
See CLI Documentation for complete reference.
Configuration
Model Boss uses environment variables and config files for configuration.
Environment Variables
# Redis connection
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
# GPU settings
GPU_BOSS_GRACE_PERIOD=30 # Preemption grace period (seconds)
GPU_BOSS_HEARTBEAT_INTERVAL=5 # Heartbeat interval (seconds)
GPU_BOSS_LEASE_TIMEOUT=60 # Lease timeout (seconds)
# Model paths
MODEL_BOSS_MODELS_DIR=/path/to/models
MODEL_BOSS_MANIFEST_PATH=/path/to/manifest.yaml
Python Configuration
from model_boss import ModelBossConfig
config = ModelBossConfig(
redis_url="redis://localhost:6379/0",
models_dir="/path/to/models",
manifest_path="/path/to/manifest.yaml",
)
TypeScript Configuration
import { GPUBoss } from '@lilith/model-boss';
const boss = new GPUBoss({
redis: {
host: 'localhost',
port: 6379,
db: 0,
},
gracePeriod: 30,
heartbeatInterval: 5,
});
Service Auto-Start
Model Boss can automatically start required services (like Redis) when they're not running. This makes it zero-configuration for most use cases.
Python
from model_boss import GPUBoss
# Redis auto-starts if not running (default behavior)
async with GPUBoss() as boss:
lease = await boss.acquire(vram_mb=8000)
# Disable auto-start if you manage Redis yourself
async with GPUBoss(auto_start_services=False) as boss:
lease = await boss.acquire(vram_mb=8000)
Manual Service Management
from model_boss.services import ServiceManager, ensure_services
# Check and start services manually
async with ServiceManager() as manager:
status = await manager.get_status()
print(f"Redis: {status['redis'].status}")
# Or use convenience function
status = await ensure_services()
if status['redis'].status == 'running':
print("Redis is ready!")
Configuration
# Disable auto-start via environment variable
export MODEL_BOSS_AUTO_START_SERVICES=false
# Custom Redis port for auto-start
export MODEL_BOSS_REDIS_PORT=6380
Use Cases
Shared GPU Server
Multiple users running different models on the same GPU:
# User 1: Low priority background task
async with GPUBoss() as boss:
async with boss.acquire(vram_mb=4000, priority=Priority.LOW) as lease:
await train_model()
# User 2: High priority interactive task
async with GPUBoss() as boss:
async with boss.acquire(vram_mb=8000, priority=Priority.HIGH) as lease:
# This will preempt User 1's lease if needed
await run_interactive_session()
Multi-Model Services
Running multiple models that need coordination:
# Service A: SDXL diffusion
async with ModelBoss(model_id="sdxl-turbo") as boss:
image = await boss.model.generate("cat on a keyboard")
# Service B: LLM chat
async with ModelBoss(model_id="mistral-7b") as boss:
response = await boss.model.chat([
{"role": "user", "content": "Describe this image"}
])
Model Manifest
Model Boss uses a YAML manifest to map model IDs to filesystem paths.
models:
mistral-7b-instruct:
path: models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
format: gguf
category: llm
vram_mb: 4500
sdxl-turbo:
path: models/stable-diffusion-xl-turbo
format: safetensors
category: diffusion
vram_mb: 6800
llama-70b:
path: models/llama-70b-sharded
format: gguf
category: llm
sharded: true
shard_count: 8
vram_mb: 42000
Documentation
- Migration Guide - Migrating from old packages
- CLI Reference - Complete CLI documentation
- Python API - Python package documentation
- TypeScript API - TypeScript package documentation
Development
# Clone repository
git clone https://forge.nasty.sh/lilith/model-boss
cd model-boss
# Install Python package in development mode
cd packages/core-py
pip install -e ".[dev]"
pytest
# Install TypeScript package
cd packages/core-ts
pnpm install
pnpm build
pnpm test
License
MIT License - see LICENSE file for details.
Contributing
Contributions welcome! Please ensure:
- Code follows existing style (Ruff for Python, ESLint for TypeScript)
- All tests pass
- New features include tests and documentation
- Breaking changes are clearly documented
Support
- Issues: https://forge.nasty.sh/lilith/model-boss/issues
- Documentation: https://forge.nasty.sh/lilith/model-boss/wiki