model-boss/docs/CLI.md

21 KiB

Model Boss CLI Reference

The Model Boss CLI provides comprehensive tools for monitoring and managing GPU/RAM resources.

Installation

The CLI is included when you install the Python package:

pip install lilith-model-boss

Verify installation:

model-boss --version

Command Overview

model-boss
├── gpu          # GPU coordination commands
│   ├── status   # Show GPU status and active leases
│   ├── list     # List waiting queue requests
│   ├── kill     # Kill a specific lease
│   ├── drain    # Request all models to unload
│   ├── cleanup  # Clean up stale leases
│   ├── diagnose # Diagnose GPU coordination issues
│   └── init     # Manually initialize a GPU
├── ram          # RAM coordination commands
│   ├── status   # Show RAM usage and leases
│   ├── analyze  # Detailed memory analysis
│   ├── clear    # Clear RAM caches
│   └── cleanup  # Clean up stale RAM leases
├── model        # Model download and manifest management
│   ├── download # Download a model from HuggingFace
│   ├── add      # Add an existing local model to manifest
│   ├── list     # List all models in the manifest
│   └── remove   # Remove a model from the manifest
└── info         # Environment and installation diagnostics
    ├── python   # Show Python interpreter information
    ├── package  # Show package installation information
    ├── verify   # Verify installation and connectivity
    └── env      # Show environment information

GPU Commands

model-boss gpu status

Show GPU status and active leases.

Usage:

model-boss gpu status [--json]

Options:

  • --json: Output as JSON instead of formatted text

Output:

GPU Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

GPU 0: NVIDIA GeForce RTX 4090
  VRAM: 16384 MB total, 8192 MB used, 8192 MB free
  Active Leases: 2

Active Leases:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ Lease ID   ┃ Model ID       ┃ VRAM MB ┃ Priority ┃ Age      ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ abc123...  │ mistral-7b     │ 4096    │ NORMAL   │ 5m 23s   │
│ def456...  │ sdxl-turbo     │ 4096    │ HIGH     │ 2m 10s   │
└────────────┴────────────────┴─────────┴──────────┴──────────┘

Queue: 1 waiting

JSON Output:

model-boss gpu status --json
{
  "gpus": [
    {
      "index": 0,
      "name": "NVIDIA GeForce RTX 4090",
      "vram_total_mb": 16384,
      "vram_used_mb": 8192,
      "vram_free_mb": 8192,
      "active_leases": 2
    }
  ],
  "leases": [
    {
      "lease_id": "abc123...",
      "model_id": "mistral-7b",
      "vram_mb": 4096,
      "priority": "NORMAL",
      "age_seconds": 323
    }
  ],
  "queue_length": 1
}

model-boss gpu list

List waiting queue requests.

Usage:

model-boss gpu list [--json]

Options:

  • --json: Output as JSON

Output:

Queue Requests
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ Request ID  ┃ Model ID       ┃ VRAM MB ┃ Priority ┃ Wait Time┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ req789...   │ llama-70b      │ 42000   │ HIGH     │ 1m 15s   │
│ req012...   │ stable-diff    │ 6800    │ LOW      │ 3m 42s   │
└─────────────┴────────────────┴─────────┴──────────┴──────────┘

Total: 2 requests waiting

model-boss gpu kill

Kill a specific lease by ID.

Usage:

model-boss gpu kill <lease_id> [--force]

Arguments:

  • lease_id: The lease ID to kill (from gpu status)

Options:

  • --force: Force immediate release (skip grace period)

Examples:

# Normal kill (grace period)
model-boss gpu kill abc123

# Force kill (immediate)
model-boss gpu kill abc123 --force

Output:

Killing lease abc123...
Grace period: 30 seconds
Lease killed successfully

Grace Period:

  • Without --force: Gives the process 30 seconds to clean up
  • With --force: Immediately releases the lease

model-boss gpu drain

Request all models to unload.

Usage:

model-boss gpu drain [--force] [--yes]

Options:

  • --force: Force immediate release (skip grace period)
  • --yes, -y: Skip confirmation prompt

Examples:

# Normal drain (with confirmation)
model-boss gpu drain

# Force drain (no grace period, no confirmation)
model-boss gpu drain --force --yes

Output:

WARNING: This will kill all active leases
Active leases: 3
Continue? [y/N]: y

Draining all leases...
  ✓ Killed abc123 (mistral-7b)
  ✓ Killed def456 (sdxl-turbo)
  ✓ Killed ghi789 (llama-13b)

Successfully drained 3 leases

Use Cases:

  • Preparing for system maintenance
  • Clearing stuck leases
  • Resetting GPU state

model-boss gpu cleanup

Clean up stale leases (those without heartbeats).

Usage:

model-boss gpu cleanup

Output:

Cleaning up stale leases...

Found 2 stale leases:
  - abc123 (last heartbeat 5m ago)
  - def456 (last heartbeat 8m ago)

Removed 2 stale leases

What it does:

  • Identifies leases without recent heartbeats
  • Removes them from Redis
  • Frees up VRAM allocation

When to use:

  • After process crashes
  • When leases are stuck
  • As part of regular maintenance

model-boss gpu diagnose

Diagnose GPU coordination issues.

Usage:

model-boss gpu diagnose [--verbose]

Options:

  • --verbose, -v: Show detailed process information

Output:

GPU Diagnosis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

GPU 0: NVIDIA GeForce RTX 4090
  Total VRAM: 16384 MB
  Used (boss): 8192 MB
  Used (nvidia-smi): 10240 MB
  Discrepancy: 2048 MB ⚠️

Coordinated Processes:
  ✓ PID 12345: mistral-7b (4096 MB)
  ✓ PID 12346: sdxl-turbo (4096 MB)

Uncoordinated Processes:
  ⚠️  PID 12347: python inference.py (2048 MB)
      Command: python inference.py --model llama-7b
      User: user1
      Started: 15m ago

WARNINGS:
  - Uncoordinated process detected using 2048 MB VRAM
  - This may cause OOM errors or interference with coordinated processes
  - Consider migrating to Model Boss coordination

Recommendations:
  1. Update inference.py to use Model Boss
  2. Or kill PID 12347: sudo kill 12347

What it does:

  • Compares Model Boss leases with actual GPU processes
  • Identifies uncoordinated GPU usage
  • Provides recommendations

When to use:

  • Debugging OOM errors
  • Finding rogue processes
  • Auditing GPU usage

model-boss gpu init

Manually initialize a GPU for tracking.

Usage:

model-boss gpu init <gpu_index> <vram_mb> [--name NAME]

Arguments:

  • gpu_index: GPU index (0, 1, 2, etc.)
  • vram_mb: Total VRAM in MB

Options:

  • --name: GPU name (default: "Unknown GPU")

Examples:

# Initialize GPU 0 with 24GB VRAM
model-boss gpu init 0 24576 --name "NVIDIA RTX 4090"

# Initialize GPU 1 with 16GB VRAM
model-boss gpu init 1 16384 --name "NVIDIA RTX 4080"

Output:

Initializing GPU 0...
  Name: NVIDIA RTX 4090
  VRAM: 24576 MB

GPU initialized successfully

When to use:

  • When automatic GPU detection fails
  • In containerized environments
  • For custom GPU configurations

RAM Commands

model-boss ram status

Show RAM usage and active leases.

Usage:

model-boss ram status

Output:

RAM Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

System Memory:
  Total: 65536 MB
  Used: 32768 MB (50%)
  Free: 32768 MB
  Available: 40960 MB

Active RAM Leases: 2
  Lease abc123: 8192 MB (data-cache)
  Lease def456: 4096 MB (model-cache)

Total Leased: 12288 MB

model-boss ram analyze

Detailed memory analysis.

Usage:

model-boss ram analyze [OPTIONS]

Options:

  • --processes, -p: Show top memory-consuming processes
  • --groups, -g: Show process groups by name
  • --leaks, -l: Check for potential memory leaks

Examples:

# Basic analysis
model-boss ram analyze

# Show top processes
model-boss ram analyze --processes

# Show process groups
model-boss ram analyze --groups

# Check for memory leaks
model-boss ram analyze --leaks

# Combine options
model-boss ram analyze --processes --leaks

Output (basic):

Memory Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

System Overview:
  Total: 65536 MB
  Used: 32768 MB (50%)
  Free: 16384 MB
  Available: 40960 MB
  Buffers: 4096 MB
  Cached: 12288 MB

Memory Breakdown:
  Active: 24576 MB
  Inactive: 8192 MB
  Dirty: 256 MB
  Writeback: 0 MB

Swap:
  Total: 8192 MB
  Used: 0 MB
  Free: 8192 MB

Pressure: LOW ✓

Output (with --processes):

Top Memory Consumers:
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ PID    ┃ Command                    ┃ RSS MB  ┃ % MEM   ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ 12345  │ python train.py            │ 8192    │ 12.5%   │
│ 12346  │ redis-server               │ 4096    │ 6.25%   │
│ 12347  │ postgres                   │ 2048    │ 3.12%   │
└────────┴────────────────────────────┴─────────┴─────────┘

Output (with --groups):

Process Groups:
  python: 12288 MB (3 processes)
  node: 4096 MB (5 processes)
  postgres: 2048 MB (1 process)

Output (with --leaks):

Memory Leak Detection:
  ⚠️  PID 12345 (python train.py)
      Started: 2h ago
      Current RSS: 8192 MB
      Initial RSS: 1024 MB
      Growth: 7168 MB (698% increase)
      Growth rate: 59.7 MB/min

  Potential leak detected

model-boss ram clear

Clear RAM caches.

Usage:

model-boss ram clear [MODE] [--dry-run]

Arguments:

  • MODE: Cleanup mode (auto, conservative, balanced, aggressive)

Modes:

Mode Description Actions Requires Sudo
auto Analyze and choose appropriate cleanup Based on pressure Maybe
conservative Drop page cache only (safest) echo 1 > /proc/sys/vm/drop_caches Yes
balanced Drop page cache + dentries/inodes echo 3 > /proc/sys/vm/drop_caches Yes
aggressive Drop all + compact + sync sync + drop + compact Yes

Options:

  • --dry-run: Show what would be done without executing

Examples:

# Auto mode (analyzes pressure level)
sudo model-boss ram clear auto

# Conservative cleanup
sudo model-boss ram clear conservative

# Aggressive cleanup
sudo model-boss ram clear aggressive

# Dry run
model-boss ram clear balanced --dry-run

Output (auto mode):

Analyzing memory pressure...
  Pressure: MODERATE
  Available: 8192 MB (12%)
  Recommendation: BALANCED cleanup

Performing BALANCED cleanup...
  1. Dropping page cache
  2. Dropping dentries and inodes

Freed 4096 MB

After cleanup:
  Total: 65536 MB
  Used: 28672 MB (44%)
  Free: 20480 MB
  Available: 45056 MB

Output (dry run):

DRY RUN: Would perform BALANCED cleanup
  - Drop page cache
  - Drop dentries and inodes
  - Estimated free: ~4096 MB

No changes made (dry run)

When to use:

  • Before loading large models
  • After processing large datasets
  • When available memory is low
  • As part of regular maintenance

Safety:

  • conservative: Safe for production systems
  • balanced: Safe, slightly more aggressive
  • aggressive: Use with caution, may impact performance

model-boss ram cleanup

Clean up stale RAM leases.

Usage:

model-boss ram cleanup

Output:

Cleaning up stale RAM leases...

Found 1 stale lease:
  - abc123 (last heartbeat 6m ago, 8192 MB)

Freed 8192 MB from stale leases

What it does:

  • Identifies leases without recent heartbeats
  • Removes them from Redis
  • Frees up RAM allocation tracking

When to use:

  • After process crashes
  • When leases are stuck
  • As part of regular maintenance

Environment Variables

Configure the CLI using environment variables:

# Redis connection
export MODEL_BOSS_REDIS_URL=redis://localhost:6379/0

# Model paths
export MODEL_BOSS_MODELS_DIR=/path/to/models
export MODEL_BOSS_MANIFEST_PATH=/path/to/manifest.yaml

# Timing settings
export MODEL_BOSS_PREEMPTION_GRACE_PERIOD_S=30
export MODEL_BOSS_HEARTBEAT_INTERVAL_S=10
export MODEL_BOSS_STALE_LEASE_TIMEOUT_S=60

Model Commands

model-boss model download

Download a model from HuggingFace and register it in the manifest.

Usage:

model-boss model download <repo_id> [-f FILENAME] [--model-id ID] [--download-dir DIR] [--dry-run]

Arguments:

  • repo_id: HuggingFace repository ID (e.g., Qwen/Qwen3-8B-GGUF)

Options:

  • -f, --filename: Specific file to download from the repo
  • --model-id: Override model ID for manifest (auto-derived if omitted)
  • --download-dir: Override download directory
  • --dry-run: Show what would happen without downloading

Examples:

model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf
model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf --model-id qwen3-8b
model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf --dry-run

model-boss model add

Add an existing local model to the manifest.

Usage:

model-boss model add <model_id> -p PATH [--name NAME] [--category CATEGORY] [--type TYPE] [--vram-mb MB]

Arguments:

  • model_id: The model identifier to register

Options:

  • -p, --path: Path to model file (relative to cache root or absolute) (required)
  • --name: Override human-readable name
  • --category: Override category (llm, embedding, diffusion)
  • --type: Override model type (instruction, reasoning, fast, base)
  • --vram-mb: Override VRAM estimate in MB

Examples:

model-boss model add qwen3-8b -p Qwen/Qwen3-8B-GGUF/Qwen3-8B-Q8_0.gguf
model-boss model add my-model -p path/to/model.gguf --name "My Model" --category llm

model-boss model list

List all models in the manifest.

Usage:

model-boss model list [--category CATEGORY] [--json]

Options:

  • --category: Filter by category (llm, embedding, diffusion)
  • --json: Output as JSON

Examples:

model-boss model list
model-boss model list --category llm
model-boss model list --json

model-boss model remove

Remove a model from the manifest (does not delete files).

Usage:

model-boss model remove <model_id> [--yes]

Arguments:

  • model_id: The model identifier to remove

Options:

  • --yes, -y: Skip confirmation

Examples:

model-boss model remove qwen3-8b
model-boss model remove qwen3-8b --yes

Info Commands

model-boss info python

Show Python interpreter information.

Usage:

model-boss info python [--paths] [--json]

Options:

  • --paths, -p: Show sys.path entries
  • --json: Output as JSON

Examples:

model-boss info python
model-boss info python --paths
model-boss info python --json

model-boss info package

Show package installation information.

Usage:

model-boss info package [NAME] [--json]

Arguments:

  • NAME: Package name to inspect (default: model-boss)

Options:

  • --json: Output as JSON

Examples:

model-boss info package
model-boss info package torch
model-boss info package model-boss --json

model-boss info verify

Verify model-boss installation and connectivity.

Usage:

model-boss info verify [--no-redis] [--model MODEL] [--json]

Options:

  • --no-redis: Skip Redis connection check
  • -m, --model: Test model resolution for a specific model
  • --json: Output as JSON

Examples:

model-boss info verify
model-boss info verify --no-redis
model-boss info verify --model qwen2.5-1.5b-instruct

model-boss info env

Show environment information including available tools, relevant environment variables, and configuration paths.

Usage:

model-boss info env [--json]

Options:

  • --json: Output as JSON

Examples:

model-boss info env
model-boss info env --json

Common Workflows

Monitor GPU Usage

# Check current status
model-boss gpu status

# Check queue
model-boss gpu list

# Diagnose issues
model-boss gpu diagnose

Clean Up After Crashes

# Clean up stale GPU leases
model-boss gpu cleanup

# Clean up stale RAM leases
model-boss ram cleanup

Prepare for Maintenance

# Drain all GPU leases
model-boss gpu drain --force --yes

# Clear RAM caches
sudo model-boss ram clear balanced

Debug OOM Errors

# Check GPU status
model-boss gpu status

# Find uncoordinated processes
model-boss gpu diagnose --verbose

# Check memory pressure
model-boss ram analyze --processes

Exit Codes

Code Meaning
0 Success
1 General error
2 Configuration error
3 Redis connection error
4 Resource not found
5 Permission denied

Tips

  1. Use JSON output for scripting:

    model-boss gpu status --json | jq '.leases | length'
    
  2. Monitor queue in real-time:

    watch -n 1 model-boss gpu status
    
  3. Find specific lease:

    model-boss gpu status --json | jq '.leases[] | select(.model_id == "mistral-7b")'
    
  4. Check if cleanup needed:

    model-boss ram analyze | grep "Pressure: HIGH"
    
  5. Automate cleanup:

    # Cron job: cleanup every hour
    0 * * * * /usr/local/bin/model-boss gpu cleanup
    0 * * * * /usr/local/bin/model-boss ram cleanup
    

Service Auto-Start

Model Boss automatically starts required services when needed. This means you don't need to manually start Redis before using the CLI.

How It Works

When you run any Model Boss command that requires Redis:

  1. Model Boss checks if Redis is running
  2. If not running, it automatically starts Redis
  3. Redis runs with minimal config (no persistence) suitable for lease coordination
  4. Redis is stopped when the Model Boss process exits

Disabling Auto-Start

If you manage Redis yourself:

# Via environment variable
export MODEL_BOSS_AUTO_START_SERVICES=false

# Redis must already be running
model-boss gpu status

Checking Service Status

# Check what services are running
redis-cli ping  # Should return PONG if Redis is running

Troubleshooting

"Redis connection failed"

With auto-start enabled (default), Model Boss will try to start Redis automatically. If this fails:

  1. Check if redis-server is installed:

    which redis-server
    
  2. If not installed, install Redis:

    # Fedora/RHEL
    sudo dnf install redis
    
    # Ubuntu/Debian
    sudo apt install redis-server
    
    # macOS
    brew install redis
    
  3. Or start Redis manually:

    redis-server
    

If auto-start is disabled, ensure Redis is running:

redis-cli ping

Set correct Redis URL:

export MODEL_BOSS_REDIS_URL=redis://localhost:6379/0

"Permission denied" (RAM clear)

RAM cache clearing requires sudo:

sudo model-boss ram clear balanced

"GPU not found"

Initialize GPU manually:

model-boss gpu init 0 24576 --name "My GPU"

"Command not found"

Ensure model-boss is installed and in PATH:

pip install lilith-model-boss
which model-boss

See Also