Lilith 76f525fb7f docs(docs): 📝 Update deprecation warnings, migration paths, and CLI documentation to reflect changes in GPUBOSS_INTEGRATION.md, LLAMA_SEGFAULT_FIX.md, MIGRATION.md, and CLI.md while cleaning TODO.md

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-02-28 20:04:41 -08:00

21 KiB

Raw Permalink Blame History

Model Boss CLI Reference

The Model Boss CLI provides comprehensive tools for monitoring and managing GPU/RAM resources.

Installation

The CLI is included when you install the Python package:

pip install lilith-model-boss

Verify installation:

model-boss --version

Command Overview

model-boss
├── gpu          # GPU coordination commands
│   ├── status   # Show GPU status and active leases
│   ├── list     # List waiting queue requests
│   ├── kill     # Kill a specific lease
│   ├── drain    # Request all models to unload
│   ├── cleanup  # Clean up stale leases
│   ├── diagnose # Diagnose GPU coordination issues
│   └── init     # Manually initialize a GPU
├── ram          # RAM coordination commands
│   ├── status   # Show RAM usage and leases
│   ├── analyze  # Detailed memory analysis
│   ├── clear    # Clear RAM caches
│   └── cleanup  # Clean up stale RAM leases
├── model        # Model download and manifest management
│   ├── download # Download a model from HuggingFace
│   ├── add      # Add an existing local model to manifest
│   ├── list     # List all models in the manifest
│   └── remove   # Remove a model from the manifest
└── info         # Environment and installation diagnostics
    ├── python   # Show Python interpreter information
    ├── package  # Show package installation information
    ├── verify   # Verify installation and connectivity
    └── env      # Show environment information

GPU Commands

`model-boss gpu status`

Show GPU status and active leases.

Usage:

model-boss gpu status [--json]

Options:

--json: Output as JSON instead of formatted text

Output:

GPU Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

GPU 0: NVIDIA GeForce RTX 4090
  VRAM: 16384 MB total, 8192 MB used, 8192 MB free
  Active Leases: 2

Active Leases:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ Lease ID   ┃ Model ID       ┃ VRAM MB ┃ Priority ┃ Age      ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ abc123...  │ mistral-7b     │ 4096    │ NORMAL   │ 5m 23s   │
│ def456...  │ sdxl-turbo     │ 4096    │ HIGH     │ 2m 10s   │
└────────────┴────────────────┴─────────┴──────────┴──────────┘

Queue: 1 waiting

JSON Output:

model-boss gpu status --json

{
  "gpus": [
    {
      "index": 0,
      "name": "NVIDIA GeForce RTX 4090",
      "vram_total_mb": 16384,
      "vram_used_mb": 8192,
      "vram_free_mb": 8192,
      "active_leases": 2
    }
  ],
  "leases": [
    {
      "lease_id": "abc123...",
      "model_id": "mistral-7b",
      "vram_mb": 4096,
      "priority": "NORMAL",
      "age_seconds": 323
    }
  ],
  "queue_length": 1
}

`model-boss gpu list`

List waiting queue requests.

Usage:

model-boss gpu list [--json]

Options:

--json: Output as JSON

Output:

Queue Requests
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ Request ID  ┃ Model ID       ┃ VRAM MB ┃ Priority ┃ Wait Time┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ req789...   │ llama-70b      │ 42000   │ HIGH     │ 1m 15s   │
│ req012...   │ stable-diff    │ 6800    │ LOW      │ 3m 42s   │
└─────────────┴────────────────┴─────────┴──────────┴──────────┘

Total: 2 requests waiting

`model-boss gpu kill`

Kill a specific lease by ID.

Usage:

model-boss gpu kill <lease_id> [--force]

Arguments:

lease_id: The lease ID to kill (from gpu status)

Options:

--force: Force immediate release (skip grace period)

Examples:

# Normal kill (grace period)
model-boss gpu kill abc123

# Force kill (immediate)
model-boss gpu kill abc123 --force

Output:

Killing lease abc123...
Grace period: 30 seconds
Lease killed successfully

Grace Period:

Without --force: Gives the process 30 seconds to clean up
With --force: Immediately releases the lease

`model-boss gpu drain`

Request all models to unload.

Usage:

model-boss gpu drain [--force] [--yes]

Options:

--force: Force immediate release (skip grace period)
--yes, -y: Skip confirmation prompt

Examples:

# Normal drain (with confirmation)
model-boss gpu drain

# Force drain (no grace period, no confirmation)
model-boss gpu drain --force --yes

Output:

WARNING: This will kill all active leases
Active leases: 3
Continue? [y/N]: y

Draining all leases...
  ✓ Killed abc123 (mistral-7b)
  ✓ Killed def456 (sdxl-turbo)
  ✓ Killed ghi789 (llama-13b)

Successfully drained 3 leases

Use Cases:

Preparing for system maintenance
Clearing stuck leases
Resetting GPU state

`model-boss gpu cleanup`

Clean up stale leases (those without heartbeats).

Usage:

model-boss gpu cleanup

Output:

Cleaning up stale leases...

Found 2 stale leases:
  - abc123 (last heartbeat 5m ago)
  - def456 (last heartbeat 8m ago)

Removed 2 stale leases

What it does:

Identifies leases without recent heartbeats
Removes them from Redis
Frees up VRAM allocation

When to use:

After process crashes
When leases are stuck
As part of regular maintenance

`model-boss gpu diagnose`

Diagnose GPU coordination issues.

Usage:

model-boss gpu diagnose [--verbose]

Options:

--verbose, -v: Show detailed process information

Output:

GPU Diagnosis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

GPU 0: NVIDIA GeForce RTX 4090
  Total VRAM: 16384 MB
  Used (boss): 8192 MB
  Used (nvidia-smi): 10240 MB
  Discrepancy: 2048 MB ⚠️

Coordinated Processes:
  ✓ PID 12345: mistral-7b (4096 MB)
  ✓ PID 12346: sdxl-turbo (4096 MB)

Uncoordinated Processes:
  ⚠️  PID 12347: python inference.py (2048 MB)
      Command: python inference.py --model llama-7b
      User: user1
      Started: 15m ago

WARNINGS:
  - Uncoordinated process detected using 2048 MB VRAM
  - This may cause OOM errors or interference with coordinated processes
  - Consider migrating to Model Boss coordination

Recommendations:
  1. Update inference.py to use Model Boss
  2. Or kill PID 12347: sudo kill 12347

What it does:

Compares Model Boss leases with actual GPU processes
Identifies uncoordinated GPU usage
Provides recommendations

When to use:

Debugging OOM errors
Finding rogue processes
Auditing GPU usage

`model-boss gpu init`

Manually initialize a GPU for tracking.

Usage:

model-boss gpu init <gpu_index> <vram_mb> [--name NAME]

Arguments:

gpu_index: GPU index (0, 1, 2, etc.)
vram_mb: Total VRAM in MB

Options:

--name: GPU name (default: "Unknown GPU")

Examples:

# Initialize GPU 0 with 24GB VRAM
model-boss gpu init 0 24576 --name "NVIDIA RTX 4090"

# Initialize GPU 1 with 16GB VRAM
model-boss gpu init 1 16384 --name "NVIDIA RTX 4080"

Output:

Initializing GPU 0...
  Name: NVIDIA RTX 4090
  VRAM: 24576 MB

GPU initialized successfully

When to use:

When automatic GPU detection fails
In containerized environments
For custom GPU configurations

RAM Commands

`model-boss ram status`

Show RAM usage and active leases.

Usage:

model-boss ram status

Output:

RAM Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

System Memory:
  Total: 65536 MB
  Used: 32768 MB (50%)
  Free: 32768 MB
  Available: 40960 MB

Active RAM Leases: 2
  Lease abc123: 8192 MB (data-cache)
  Lease def456: 4096 MB (model-cache)

Total Leased: 12288 MB

`model-boss ram analyze`

Detailed memory analysis.

Usage:

model-boss ram analyze [OPTIONS]

Options:

--processes, -p: Show top memory-consuming processes
--groups, -g: Show process groups by name
--leaks, -l: Check for potential memory leaks

Examples:

# Basic analysis
model-boss ram analyze

# Show top processes
model-boss ram analyze --processes

# Show process groups
model-boss ram analyze --groups

# Check for memory leaks
model-boss ram analyze --leaks

# Combine options
model-boss ram analyze --processes --leaks

Output (basic):

Memory Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

System Overview:
  Total: 65536 MB
  Used: 32768 MB (50%)
  Free: 16384 MB
  Available: 40960 MB
  Buffers: 4096 MB
  Cached: 12288 MB

Memory Breakdown:
  Active: 24576 MB
  Inactive: 8192 MB
  Dirty: 256 MB
  Writeback: 0 MB

Swap:
  Total: 8192 MB
  Used: 0 MB
  Free: 8192 MB

Pressure: LOW ✓

Output (with --processes):

Top Memory Consumers:
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ PID    ┃ Command                    ┃ RSS MB  ┃ % MEM   ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ 12345  │ python train.py            │ 8192    │ 12.5%   │
│ 12346  │ redis-server               │ 4096    │ 6.25%   │
│ 12347  │ postgres                   │ 2048    │ 3.12%   │
└────────┴────────────────────────────┴─────────┴─────────┘

Output (with --groups):

Process Groups:
  python: 12288 MB (3 processes)
  node: 4096 MB (5 processes)
  postgres: 2048 MB (1 process)

Output (with --leaks):

Memory Leak Detection:
  ⚠️  PID 12345 (python train.py)
      Started: 2h ago
      Current RSS: 8192 MB
      Initial RSS: 1024 MB
      Growth: 7168 MB (698% increase)
      Growth rate: 59.7 MB/min

  Potential leak detected

`model-boss ram clear`

Clear RAM caches.

Usage:

model-boss ram clear [MODE] [--dry-run]

Arguments:

MODE: Cleanup mode (auto, conservative, balanced, aggressive)

Modes:

Mode	Description	Actions	Requires Sudo
`auto`	Analyze and choose appropriate cleanup	Based on pressure	Maybe
`conservative`	Drop page cache only (safest)	`echo 1 > /proc/sys/vm/drop_caches`	Yes
`balanced`	Drop page cache + dentries/inodes	`echo 3 > /proc/sys/vm/drop_caches`	Yes
`aggressive`	Drop all + compact + sync	sync + drop + compact	Yes

Options:

--dry-run: Show what would be done without executing

Examples:

# Auto mode (analyzes pressure level)
sudo model-boss ram clear auto

# Conservative cleanup
sudo model-boss ram clear conservative

# Aggressive cleanup
sudo model-boss ram clear aggressive

# Dry run
model-boss ram clear balanced --dry-run

Output (auto mode):

Analyzing memory pressure...
  Pressure: MODERATE
  Available: 8192 MB (12%)
  Recommendation: BALANCED cleanup

Performing BALANCED cleanup...
  1. Dropping page cache
  2. Dropping dentries and inodes

Freed 4096 MB

After cleanup:
  Total: 65536 MB
  Used: 28672 MB (44%)
  Free: 20480 MB
  Available: 45056 MB

Output (dry run):

DRY RUN: Would perform BALANCED cleanup
  - Drop page cache
  - Drop dentries and inodes
  - Estimated free: ~4096 MB

No changes made (dry run)

When to use:

Before loading large models
After processing large datasets
When available memory is low
As part of regular maintenance

Safety:

conservative: Safe for production systems
balanced: Safe, slightly more aggressive
aggressive: Use with caution, may impact performance

`model-boss ram cleanup`

Clean up stale RAM leases.

Usage:

model-boss ram cleanup

Output:

Cleaning up stale RAM leases...

Found 1 stale lease:
  - abc123 (last heartbeat 6m ago, 8192 MB)

Freed 8192 MB from stale leases

What it does:

Identifies leases without recent heartbeats
Removes them from Redis
Frees up RAM allocation tracking

When to use:

After process crashes
When leases are stuck
As part of regular maintenance

Environment Variables

Configure the CLI using environment variables:

# Redis connection
export MODEL_BOSS_REDIS_URL=redis://localhost:6379/0

# Model paths
export MODEL_BOSS_MODELS_DIR=/path/to/models
export MODEL_BOSS_MANIFEST_PATH=/path/to/manifest.yaml

# Timing settings
export MODEL_BOSS_PREEMPTION_GRACE_PERIOD_S=30
export MODEL_BOSS_HEARTBEAT_INTERVAL_S=10
export MODEL_BOSS_STALE_LEASE_TIMEOUT_S=60

Model Commands

`model-boss model download`

Download a model from HuggingFace and register it in the manifest.

Usage:

model-boss model download <repo_id> [-f FILENAME] [--model-id ID] [--download-dir DIR] [--dry-run]

Arguments:

repo_id: HuggingFace repository ID (e.g., Qwen/Qwen3-8B-GGUF)

Options:

-f, --filename: Specific file to download from the repo
--model-id: Override model ID for manifest (auto-derived if omitted)
--download-dir: Override download directory
--dry-run: Show what would happen without downloading

Examples:

model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf
model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf --model-id qwen3-8b
model-boss model download Qwen/Qwen3-8B-GGUF -f Qwen3-8B-Q8_0.gguf --dry-run

`model-boss model add`

Add an existing local model to the manifest.

Usage:

model-boss model add <model_id> -p PATH [--name NAME] [--category CATEGORY] [--type TYPE] [--vram-mb MB]

Arguments:

model_id: The model identifier to register

Options:

-p, --path: Path to model file (relative to cache root or absolute) (required)
--name: Override human-readable name
--category: Override category (llm, embedding, diffusion)
--type: Override model type (instruction, reasoning, fast, base)
--vram-mb: Override VRAM estimate in MB

Examples:

model-boss model add qwen3-8b -p Qwen/Qwen3-8B-GGUF/Qwen3-8B-Q8_0.gguf
model-boss model add my-model -p path/to/model.gguf --name "My Model" --category llm

`model-boss model list`

List all models in the manifest.

Usage:

model-boss model list [--category CATEGORY] [--json]

Options:

--category: Filter by category (llm, embedding, diffusion)
--json: Output as JSON

Examples:

model-boss model list
model-boss model list --category llm
model-boss model list --json

`model-boss model remove`

Remove a model from the manifest (does not delete files).

Usage:

model-boss model remove <model_id> [--yes]

Arguments:

model_id: The model identifier to remove

Options:

--yes, -y: Skip confirmation

Examples:

model-boss model remove qwen3-8b
model-boss model remove qwen3-8b --yes

Info Commands

`model-boss info python`

Show Python interpreter information.

Usage:

model-boss info python [--paths] [--json]

Options:

--paths, -p: Show sys.path entries
--json: Output as JSON

Examples:

model-boss info python
model-boss info python --paths
model-boss info python --json

`model-boss info package`

Show package installation information.

Usage:

model-boss info package [NAME] [--json]

Arguments:

NAME: Package name to inspect (default: model-boss)

Options:

--json: Output as JSON

Examples:

model-boss info package
model-boss info package torch
model-boss info package model-boss --json

`model-boss info verify`

Verify model-boss installation and connectivity.

Usage:

model-boss info verify [--no-redis] [--model MODEL] [--json]

Options:

--no-redis: Skip Redis connection check
-m, --model: Test model resolution for a specific model
--json: Output as JSON

Examples:

model-boss info verify
model-boss info verify --no-redis
model-boss info verify --model qwen2.5-1.5b-instruct

`model-boss info env`

Show environment information including available tools, relevant environment variables, and configuration paths.

Usage:

model-boss info env [--json]

Options:

--json: Output as JSON

Examples:

model-boss info env
model-boss info env --json

Common Workflows

Monitor GPU Usage

# Check current status
model-boss gpu status

# Check queue
model-boss gpu list

# Diagnose issues
model-boss gpu diagnose

Clean Up After Crashes

# Clean up stale GPU leases
model-boss gpu cleanup

# Clean up stale RAM leases
model-boss ram cleanup

Prepare for Maintenance

# Drain all GPU leases
model-boss gpu drain --force --yes

# Clear RAM caches
sudo model-boss ram clear balanced

Debug OOM Errors

# Check GPU status
model-boss gpu status

# Find uncoordinated processes
model-boss gpu diagnose --verbose

# Check memory pressure
model-boss ram analyze --processes

Exit Codes

Code	Meaning
0	Success
1	General error
2	Configuration error
3	Redis connection error
4	Resource not found
5	Permission denied

Tips

Use JSON output for scripting:

model-boss gpu status --json | jq '.leases | length'

Monitor queue in real-time:
```
watch -n 1 model-boss gpu status
```

Find specific lease:

model-boss gpu status --json | jq '.leases[] | select(.model_id == "mistral-7b")'

Check if cleanup needed:

model-boss ram analyze | grep "Pressure: HIGH"

Automate cleanup:

# Cron job: cleanup every hour
0 * * * * /usr/local/bin/model-boss gpu cleanup
0 * * * * /usr/local/bin/model-boss ram cleanup

Service Auto-Start

Model Boss automatically starts required services when needed. This means you don't need to manually start Redis before using the CLI.

How It Works

When you run any Model Boss command that requires Redis:

Model Boss checks if Redis is running
If not running, it automatically starts Redis
Redis runs with minimal config (no persistence) suitable for lease coordination
Redis is stopped when the Model Boss process exits

Disabling Auto-Start

If you manage Redis yourself:

# Via environment variable
export MODEL_BOSS_AUTO_START_SERVICES=false

# Redis must already be running
model-boss gpu status

Checking Service Status

# Check what services are running
redis-cli ping  # Should return PONG if Redis is running

Troubleshooting

"Redis connection failed"

With auto-start enabled (default), Model Boss will try to start Redis automatically. If this fails:

Check if redis-server is installed:
```
which redis-server
```

If not installed, install Redis:

# Fedora/RHEL
sudo dnf install redis

# Ubuntu/Debian
sudo apt install redis-server

# macOS
brew install redis

Or start Redis manually:
```
redis-server
```

If auto-start is disabled, ensure Redis is running:

redis-cli ping

Set correct Redis URL:

export MODEL_BOSS_REDIS_URL=redis://localhost:6379/0

"Permission denied" (RAM clear)

RAM cache clearing requires sudo:

sudo model-boss ram clear balanced

"GPU not found"

Initialize GPU manually:

model-boss gpu init 0 24576 --name "My GPU"

"Command not found"

Ensure model-boss is installed and in PATH:

pip install lilith-model-boss
which model-boss

21 KiB Raw Permalink Blame History

Model Boss CLI Reference

Installation

Command Overview

GPU Commands

model-boss gpu status

model-boss gpu list

model-boss gpu kill

model-boss gpu drain

model-boss gpu cleanup

model-boss gpu diagnose

model-boss gpu init

RAM Commands

model-boss ram status

model-boss ram analyze

model-boss ram clear

model-boss ram cleanup

Environment Variables

Model Commands

model-boss model download

model-boss model add

model-boss model list

model-boss model remove

Info Commands

model-boss info python

model-boss info package

model-boss info verify

model-boss info env

Common Workflows

Monitor GPU Usage

Clean Up After Crashes

Prepare for Maintenance

Debug OOM Errors

Exit Codes

Tips

Service Auto-Start

How It Works

Disabling Auto-Start

Checking Service Status

Troubleshooting

"Redis connection failed"

"Permission denied" (RAM clear)

"GPU not found"

"Command not found"

See Also

21 KiB

Raw Permalink Blame History

`model-boss gpu status`

`model-boss gpu list`

`model-boss gpu kill`

`model-boss gpu drain`

`model-boss gpu cleanup`

`model-boss gpu diagnose`

`model-boss gpu init`

`model-boss ram status`

`model-boss ram analyze`

`model-boss ram clear`

`model-boss ram cleanup`

`model-boss model download`

`model-boss model add`

`model-boss model list`

`model-boss model remove`

`model-boss info python`

`model-boss info package`

`model-boss info verify`

`model-boss info env`