No description
|
|
||
|---|---|---|
| .forgejo/workflows | ||
| src | ||
| .gitignore | ||
| package.json | ||
| README.md | ||
| tsconfig.json | ||
| tsup.config.ts | ||
@lilith/ml-vram-boss
GPU/VRAM lease coordinator for preventing race conditions in multi-model ML systems.
Features
- Lease-based coordination: Acquire exclusive VRAM allocations via Redis
- Priority queuing: Support for URGENT, HIGH, NORMAL, LOW, and BATCH priorities
- Automatic heartbeat: Keep leases alive automatically
- Preemption support: Gracefully handle resource preemption
- Stale lease cleanup: Automatically clean up crashed processes
- Multi-GPU support: Coordinate across multiple GPUs
Installation
pnpm add @lilith/ml-vram-boss
Quick Start
import { GPUBoss, Priority } from '@lilith/ml-vram-boss';
const boss = new GPUBoss();
await boss.connect();
// Initialize GPUs
await boss.initializeGpu(0, 24000, 'NVIDIA RTX 4090');
// Acquire a lease
const lease = await boss.acquire({
vramMb: 8000,
modelId: 'llama-7b',
priority: Priority.NORMAL,
timeoutMs: 60000,
});
// Handle preemption
lease.onPreempt(async (reason) => {
console.log(`Preempted: ${reason}`);
await unloadModel();
});
// Use the GPU
await loadModel();
// Release when done
await lease.release();
await boss.close();
Configuration
const boss = new GPUBoss({
redisUrl: 'redis://localhost:6379',
heartbeatIntervalMs: 10000,
staleLeaseTimeoutMs: 60000,
preemptionGracePeriodMs: 30000,
defaultTimeoutMs: 300000,
keyPrefix: 'gpu',
autoCleanup: true,
cleanupIntervalSeconds: 30,
});
API
GPUBoss
connect(): Promise<void>
Connect to Redis and start background cleanup task.
initializeGpu(gpuIndex: number, vramTotalMb: number, gpuName?: string): Promise<void>
Initialize a GPU for tracking.
acquire(options: AcquireOptions): Promise<GPULease>
Acquire a GPU lease.
Options:
vramMb: Required VRAM in megabytespriority: Priority level (default: NORMAL)modelId: Identifier for the modeltimeoutMs: Max wait timegpuPreference: Preferred GPU indicesserviceName: Service identifier
getStatus(): Promise<BossStatus>
Get current status of all GPUs and queues.
forceRelease(leaseId: string): Promise<boolean>
Force release a lease (for admin operations).
drainAll(reason?: string): Promise<string[]>
Request all models to unload gracefully.
GPULease
onPreempt(callback: (reason: string) => Promise<void>): void
Register a callback for preemption signals.
release(): Promise<boolean>
Release the lease and free VRAM.
Properties
leaseId: Unique lease identifiergpuIndex: GPU indexvramMb: Reserved VRAMpriority: Lease prioritymodelId: Model identifierisReleased: Whether lease has been released
Priority Levels
enum Priority {
URGENT = 1, // Immediate, bypasses queue
HIGH = 5, // Critical paths
NORMAL = 10, // Default
LOW = 20, // Background tasks
BATCH = 50, // Bulk operations
}
Redis Key Structure
gpu:{index}:leases - Hash of active leases
gpu:{index}:vram:total - Total VRAM for GPU
gpu:{index}:vram:used - Currently used VRAM
gpu:{index}:name - GPU name
gpu:count - Number of GPUs
gpu:leases:all - Mapping of lease IDs to GPU indices
gpu:queue - Sorted set of queued requests
gpu:queue:requests - Hash of request details
gpu:heartbeat:{leaseId} - Heartbeat timestamp
gpu:preempt:{leaseId} - Preemption channel
Related Packages
@lilith/ml-model-boss- Full model loading system (uses this package)lilith-vram-boss- Python implementation
License
MIT