21 KiB
Model Boss CLI Reference
The Model Boss CLI provides comprehensive tools for monitoring and managing GPU/RAM resources.
Installation
The CLI is included when you install the Python package:
pip install lilith-model-boss
Verify installation:
model-boss --version
Command Overview
model-boss
├── gpu # GPU coordination commands
│ ├── status # Show GPU status and active leases
│ ├── list # List waiting queue requests
│ ├── kill # Kill a specific lease
│ ├── drain # Request all models to unload
│ ├── cleanup # Clean up stale leases
│ ├── diagnose # Diagnose GPU coordination issues
│ └── init # Manually initialize a GPU
├── ram # RAM coordination commands
│ ├── status # Show RAM usage and leases
│ ├── analyze # Detailed memory analysis
│ ├── clear # Clear RAM caches
│ └── cleanup # Clean up stale RAM leases
├── model # Model download and manifest management
│ ├── download # Download a model from HuggingFace
│ ├── add # Add an existing local model to manifest
│ ├── list # List all models in the manifest
│ └── remove # Remove a model from the manifest
└── info # Environment and installation diagnostics
├── python # Show Python interpreter information
├── package # Show package installation information
├── verify # Verify installation and connectivity
└── env # Show environment information
GPU Commands
model-boss gpu status
Show GPU status and active leases.
Usage:
model-boss gpu status [--json]
Options:
--json: Output as JSON instead of formatted text
Output:
GPU Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GPU 0: NVIDIA GeForce RTX 4090
VRAM: 16384 MB total, 8192 MB used, 8192 MB free
Active Leases: 2
Active Leases:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ Lease ID ┃ Model ID ┃ VRAM MB ┃ Priority ┃ Age ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ abc123... │ mistral-7b │ 4096 │ NORMAL │ 5m 23s │
│ def456... │ sdxl-turbo │ 4096 │ HIGH │ 2m 10s │
└────────────┴────────────────┴─────────┴──────────┴──────────┘
Queue: 1 waiting
JSON Output:
model-boss gpu status --json
{
"gpus": [
{
"index": 0,
"name": "NVIDIA GeForce RTX 4090",
"vram_total_mb": 16384,
"vram_used_mb": 8192,
"vram_free_mb": 8192,
"active_leases": 2
}
],
"leases": [
{
"lease_id": "abc123...",
"model_id": "mistral-7b",
"vram_mb": 4096,
"priority": "NORMAL",
"age_seconds": 323
}
],
"queue_length": 1
}
model-boss gpu list
List waiting queue requests.
Usage:
model-boss gpu list [--json]
Options:
--json: Output as JSON
Output:
Queue Requests
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ Request ID ┃ Model ID ┃ VRAM MB ┃ Priority ┃ Wait Time┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ req789... │ llama-70b │ 42000 │ HIGH │ 1m 15s │
│ req012... │ stable-diff │ 6800 │ LOW │ 3m 42s │
└─────────────┴────────────────┴─────────┴──────────┴──────────┘
Total: 2 requests waiting
model-boss gpu kill
Kill a specific lease by ID.
Usage:
model-boss gpu kill <lease_id> [--force]
Arguments:
lease_id: The lease ID to kill (fromgpu status)
Options:
--force: Force immediate release (skip grace period)
Examples:
# Normal kill (grace period)
model-boss gpu kill abc123
# Force kill (immediate)
model-boss gpu kill abc123 --force
Output:
Killing lease abc123...
Grace period: 30 seconds
Lease killed successfully
Grace Period:
- Without
--force: Gives the process 30 seconds to clean up - With
--force: Immediately releases the lease
model-boss gpu drain
Request all models to unload.
Usage:
model-boss gpu drain [--force] [--yes]
Options:
--force: Force immediate release (skip grace period)--yes, -y: Skip confirmation prompt
Examples:
# Normal drain (with confirmation)
model-boss gpu drain
# Force drain (no grace period, no confirmation)
model-boss gpu drain --force --yes
Output:
WARNING: This will kill all active leases
Active leases: 3
Continue? [y/N]: y
Draining all leases...
✓ Killed abc123 (mistral-7b)
✓ Killed def456 (sdxl-turbo)
✓ Killed ghi789 (llama-13b)
Successfully drained 3 leases
Use Cases:
- Preparing for system maintenance
- Clearing stuck leases
- Resetting GPU state
model-boss gpu cleanup
Clean up stale leases (those without heartbeats).
Usage:
model-boss gpu cleanup
Output:
Cleaning up stale leases...
Found 2 stale leases:
- abc123 (last heartbeat 5m ago)
- def456 (last heartbeat 8m ago)
Removed 2 stale leases
What it does:
- Identifies leases without recent heartbeats
- Removes them from Redis
- Frees up VRAM allocation
When to use:
- After process crashes
- When leases are stuck
- As part of regular maintenance
model-boss gpu diagnose
Diagnose GPU coordination issues.
Usage:
model-boss gpu diagnose [--verbose]
Options:
--verbose, -v: Show detailed process information
Output:
GPU Diagnosis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GPU 0: NVIDIA GeForce RTX 4090
Total VRAM: 16384 MB
Used (boss): 8192 MB
Used (nvidia-smi): 10240 MB
Discrepancy: 2048 MB ⚠️
Coordinated Processes:
✓ PID 12345: mistral-7b (4096 MB)
✓ PID 12346: sdxl-turbo (4096 MB)
Uncoordinated Processes:
⚠️ PID 12347: python inference.py (2048 MB)
Command: python inference.py --model llama-7b
User: user1
Started: 15m ago
WARNINGS:
- Uncoordinated process detected using 2048 MB VRAM
- This may cause OOM errors or interference with coordinated processes
- Consider migrating to Model Boss coordination
Recommendations:
1. Update inference.py to use Model Boss
2. Or kill PID 12347: sudo kill 12347
What it does:
- Compares Model Boss leases with actual GPU processes
- Identifies uncoordinated GPU usage
- Provides recommendations
When to use:
- Debugging OOM errors
- Finding rogue processes
- Auditing GPU usage
model-boss gpu init
Manually initialize a GPU for tracking.
Usage:
model-boss gpu init <gpu_index> <vram_mb> [--name NAME]
Arguments:
gpu_index: GPU index (0, 1, 2, etc.)vram_mb: Total VRAM in MB
Options:
--name: GPU name (default: "Unknown GPU")
Examples:
# Initialize GPU 0 with 24GB VRAM
model-boss gpu init 0 24576 --name "NVIDIA RTX 4090"
# Initialize GPU 1 with 16GB VRAM
model-boss gpu init 1 16384 --name "NVIDIA RTX 4080"
Output:
Initializing GPU 0...
Name: NVIDIA RTX 4090
VRAM: 24576 MB
GPU initialized successfully
When to use:
- When automatic GPU detection fails
- In containerized environments
- For custom GPU configurations
RAM Commands
model-boss ram status
Show RAM usage and active leases.
Usage:
model-boss ram status
Output:
RAM Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
System Memory:
Total: 65536 MB
Used: 32768 MB (50%)
Free: 32768 MB
Available: 40960 MB
Active RAM Leases: 2
Lease abc123: 8192 MB (data-cache)
Lease def456: 4096 MB (model-cache)
Total Leased: 12288 MB
model-boss ram analyze
Detailed memory analysis.
Usage:
model-boss ram analyze [OPTIONS]
Options:
--processes, -p: Show top memory-consuming processes--groups, -g: Show process groups by name--leaks, -l: Check for potential memory leaks
Examples:
# Basic analysis
model-boss ram analyze
# Show top processes
model-boss ram analyze --processes
# Show process groups
model-boss ram analyze --groups
# Check for memory leaks
model-boss ram analyze --leaks
# Combine options
model-boss ram analyze --processes --leaks
Output (basic):
Memory Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
System Overview:
Total: 65536 MB
Used: 32768 MB (50%)
Free: 16384 MB
Available: 40960 MB
Buffers: 4096 MB
Cached: 12288 MB
Memory Breakdown:
Active: 24576 MB
Inactive: 8192 MB
Dirty: 256 MB
Writeback: 0 MB
Swap:
Total: 8192 MB
Used: 0 MB
Free: 8192 MB
Pressure: LOW ✓
Output (with --processes):
Top Memory Consumers:
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ PID ┃ Command ┃ RSS MB ┃ % MEM ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ 12345 │ python train.py │ 8192 │ 12.5% │
│ 12346 │ redis-server │ 4096 │ 6.25% │
│ 12347 │ postgres │ 2048 │ 3.12% │
└────────┴────────────────────────────┴─────────┴─────────┘
Output (with --groups):
Process Groups:
python: 12288 MB (3 processes)
node: 4096 MB (5 processes)
postgres: 2048 MB (1 process)
Output (with --leaks):
Memory Leak Detection:
⚠️ PID 12345 (python train.py)
Started: 2h ago
Current RSS: 8192 MB
Initial RSS: 1024 MB
Growth: 7168 MB (698% increase)
Growth rate: 59.7 MB/min
Potential leak detected
model-boss ram clear
Clear RAM caches.
Usage:
model-boss ram clear [MODE] [--dry-run]
Arguments:
MODE: Cleanup mode (auto, conservative, balanced, aggressive)
Modes:
| Mode | Description | Actions | Requires Sudo |
|---|---|---|---|
auto |
Analyze and choose appropriate cleanup | Based on pressure | Maybe |
conservative |
Drop page cache only (safest) | echo 1 > /proc/sys/vm/drop_caches |
Yes |
balanced |
Drop page cache + dentries/inodes | echo 3 > /proc/sys/vm/drop_caches |
Yes |
aggressive |
Drop all + compact + sync | sync + drop + compact | Yes |
Options:
--dry-run: Show what would be done without executing
Examples:
# Auto mode (analyzes pressure level)
sudo model-boss ram clear auto
# Conservative cleanup
sudo model-boss ram clear conservative
# Aggressive cleanup
sudo model-boss ram clear aggressive
# Dry run
model-boss ram clear balanced --dry-run
Output (auto mode):
Analyzing memory pressure...
Pressure: MODERATE
Available: 8192 MB (12%)
Recommendation: BALANCED cleanup
Performing BALANCED cleanup...
1. Dropping page cache
2. Dropping dentries and inodes
Freed 4096 MB
After cleanup:
Total: 65536 MB
Used: 28672 MB (44%)
Free: 20480 MB
Available: 45056 MB
Output (dry run):
DRY RUN: Would perform BALANCED cleanup
- Drop page cache
- Drop dentries and inodes
- Estimated free: ~4096 MB
No changes made (dry run)
When to use:
- Before loading large models
- After processing large datasets
- When available memory is low
- As part of regular maintenance
Safety:
conservative: Safe for production systemsbalanced: Safe, slightly more aggressiveaggressive: Use with caution, may impact performance
model-boss ram cleanup
Clean up stale RAM leases.
Usage:
model-boss ram cleanup
Output:
Cleaning up stale RAM leases...
Found 1 stale lease:
- abc123 (last heartbeat 6m ago, 8192 MB)
Freed 8192 MB from stale leases
What it does:
- Identifies leases without recent heartbeats
- Removes them from Redis
- Frees up RAM allocation tracking
When to use:
- After process crashes
- When leases are stuck
- As part of regular maintenance
Environment Variables
Configure the CLI using environment variables:
# Redis connection
export MODEL_BOSS_REDIS_URL=redis://localhost:6379/0
# Model paths
export MODEL_BOSS_MODELS_DIR=/path/to/models
export MODEL_BOSS_MANIFEST_PATH=/path/to/manifest.yaml
# Timing settings
export MODEL_BOSS_PREEMPTION_GRACE_PERIOD_S=30
export MODEL_BOSS_HEARTBEAT_INTERVAL_S=10
export MODEL_BOSS_STALE_LEASE_TIMEOUT_S=60
Model Commands
model-boss model download
Download a model from HuggingFace and register it in the manifest.
Usage:
model-boss model download <repo_id> [-f FILENAME] [--model-id ID] [--download-dir DIR] [--dry-run]
Arguments:
repo_id: HuggingFace repository ID (e.g.,Qwen/Qwen3-8B-GGUF)
Options:
-f, --filename: Specific file to download from the repo--model-id: Override model ID for manifest (auto-derived if omitted)--download-dir: Override download directory--dry-run: Show what would happen without downloading
Examples:
model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf
model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf --model-id qwen3-8b
model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf --dry-run
model-boss model add
Add an existing local model to the manifest.
Usage:
model-boss model add <model_id> -p PATH [--name NAME] [--category CATEGORY] [--type TYPE] [--vram-mb MB]
Arguments:
model_id: The model identifier to register
Options:
-p, --path: Path to model file (relative to cache root or absolute) (required)--name: Override human-readable name--category: Override category (llm,embedding,diffusion)--type: Override model type (instruction,reasoning,fast,base)--vram-mb: Override VRAM estimate in MB
Examples:
model-boss model add qwen3-8b -p Qwen/Qwen3-8B-GGUF/Qwen3-8B-Q8_0.gguf
model-boss model add my-model -p path/to/model.gguf --name "My Model" --category llm
model-boss model list
List all models in the manifest.
Usage:
model-boss model list [--category CATEGORY] [--json]
Options:
--category: Filter by category (llm,embedding,diffusion)--json: Output as JSON
Examples:
model-boss model list
model-boss model list --category llm
model-boss model list --json
model-boss model remove
Remove a model from the manifest (does not delete files).
Usage:
model-boss model remove <model_id> [--yes]
Arguments:
model_id: The model identifier to remove
Options:
--yes, -y: Skip confirmation
Examples:
model-boss model remove qwen3-8b
model-boss model remove qwen3-8b --yes
Info Commands
model-boss info python
Show Python interpreter information.
Usage:
model-boss info python [--paths] [--json]
Options:
--paths, -p: Show sys.path entries--json: Output as JSON
Examples:
model-boss info python
model-boss info python --paths
model-boss info python --json
model-boss info package
Show package installation information.
Usage:
model-boss info package [NAME] [--json]
Arguments:
NAME: Package name to inspect (default:model-boss)
Options:
--json: Output as JSON
Examples:
model-boss info package
model-boss info package torch
model-boss info package model-boss --json
model-boss info verify
Verify model-boss installation and connectivity.
Usage:
model-boss info verify [--no-redis] [--model MODEL] [--json]
Options:
--no-redis: Skip Redis connection check-m, --model: Test model resolution for a specific model--json: Output as JSON
Examples:
model-boss info verify
model-boss info verify --no-redis
model-boss info verify --model qwen2.5-1.5b-instruct
model-boss info env
Show environment information including available tools, relevant environment variables, and configuration paths.
Usage:
model-boss info env [--json]
Options:
--json: Output as JSON
Examples:
model-boss info env
model-boss info env --json
Common Workflows
Monitor GPU Usage
# Check current status
model-boss gpu status
# Check queue
model-boss gpu list
# Diagnose issues
model-boss gpu diagnose
Clean Up After Crashes
# Clean up stale GPU leases
model-boss gpu cleanup
# Clean up stale RAM leases
model-boss ram cleanup
Prepare for Maintenance
# Drain all GPU leases
model-boss gpu drain --force --yes
# Clear RAM caches
sudo model-boss ram clear balanced
Debug OOM Errors
# Check GPU status
model-boss gpu status
# Find uncoordinated processes
model-boss gpu diagnose --verbose
# Check memory pressure
model-boss ram analyze --processes
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error |
| 2 | Configuration error |
| 3 | Redis connection error |
| 4 | Resource not found |
| 5 | Permission denied |
Tips
-
Use JSON output for scripting:
model-boss gpu status --json | jq '.leases | length' -
Monitor queue in real-time:
watch -n 1 model-boss gpu status -
Find specific lease:
model-boss gpu status --json | jq '.leases[] | select(.model_id == "mistral-7b")' -
Check if cleanup needed:
model-boss ram analyze | grep "Pressure: HIGH" -
Automate cleanup:
# Cron job: cleanup every hour 0 * * * * /usr/local/bin/model-boss gpu cleanup 0 * * * * /usr/local/bin/model-boss ram cleanup
Service Auto-Start
Model Boss automatically starts required services when needed. This means you don't need to manually start Redis before using the CLI.
How It Works
When you run any Model Boss command that requires Redis:
- Model Boss checks if Redis is running
- If not running, it automatically starts Redis
- Redis runs with minimal config (no persistence) suitable for lease coordination
- Redis is stopped when the Model Boss process exits
Disabling Auto-Start
If you manage Redis yourself:
# Via environment variable
export MODEL_BOSS_AUTO_START_SERVICES=false
# Redis must already be running
model-boss gpu status
Checking Service Status
# Check what services are running
redis-cli ping # Should return PONG if Redis is running
Troubleshooting
"Redis connection failed"
With auto-start enabled (default), Model Boss will try to start Redis automatically. If this fails:
-
Check if redis-server is installed:
which redis-server -
If not installed, install Redis:
# Fedora/RHEL sudo dnf install redis # Ubuntu/Debian sudo apt install redis-server # macOS brew install redis -
Or start Redis manually:
redis-server
If auto-start is disabled, ensure Redis is running:
redis-cli ping
Set correct Redis URL:
export MODEL_BOSS_REDIS_URL=redis://localhost:6379/0
"Permission denied" (RAM clear)
RAM cache clearing requires sudo:
sudo model-boss ram clear balanced
"GPU not found"
Initialize GPU manually:
model-boss gpu init 0 24576 --name "My GPU"
"Command not found"
Ensure model-boss is installed and in PATH:
pip install lilith-model-boss
which model-boss