9 KiB
NVIDIA GPU Overclocking Control Panel (@infrastructure/nvidia-oc)
Network-accessible GPU overclocking control panel with CLI, REST API, and real-time web dashboard.
Features
- NVML-based GPU monitoring - Read GPU metrics on any display server
- X11-based overclocking - Clock/fan control requires X11 session (see requirements)
- CLI tool -
nvidia-occommand for terminal operations - FastAPI backend - REST + WebSocket API for remote access
- React webapp - Live telemetry dashboard using @ui components
- Profile management - Pre-configured profiles (quiet, balanced, performance)
- Multi-GPU support - Independent control of multiple GPUs
- Safety mechanisms - Automatic thermal protection and validation
Hardware Support
- NVIDIA GPUs - RTX 30-series, RTX 40-series, and newer
- Requires - NVIDIA proprietary drivers (not Nouveau)
- Coolbits - Must enable Coolbits for overclocking support
Display Server Requirements
Dual-Backend Architecture
nvidia-oc automatically selects the appropriate overclocking backend based on your display server:
| Backend | Display Server | Method | Features |
|---|---|---|---|
| nvidia-settings | X11 | Offset-based (+150 MHz) | Clock offsets, fan curves, full control |
| nvidia-smi | Wayland/Any | Clock locking (absolute freq) | Works everywhere, requires sudo |
All features work on both X11 and Wayland:
| Feature | Wayland (nvidia-smi) | X11 (nvidia-settings) |
|---|---|---|
| GPU monitoring | ✅ Works | ✅ Works |
| Clock control | ✅ Works (via nvidia-smi) | ✅ Works (via nvidia-settings) |
| Fan speed control | ✅ Works | ✅ Works |
| Profile application | ✅ Works | ✅ Works |
Backend Differences
nvidia-settings (X11):
- Offset-based:
+150 MHzadded to base clocks - More flexible with GPU boost behavior
- Requires Coolbits in Xorg configuration
nvidia-smi (Wayland):
- Absolute locking: Locks clocks to
2265 MHz - Works on any display server (Wayland, X11, headless)
- Requires sudo/root permissions
Both backends provide full overclocking functionality - the choice is automatic based on your session type.
Installation
Prerequisites
# 1. Enable Coolbits (one-time setup)
sudo nvidia-xconfig -a --cool-bits=28
# 2. Switch to X11 session (see "Display Server Requirements" above)
# 3. Restart display manager or reboot
sudo systemctl restart display-manager # or reboot
Install Package
pip install lilith-nvidia-oc
Verify Installation
nvidia-oc status
CLI Usage
Show GPU Status
# One-time status
nvidia-oc status
# Live monitoring
nvidia-oc status --watch
Overclocking
# Set clock offsets
nvidia-oc set-clock --gpu 0 --core +100 --memory +500
# Reset to defaults
nvidia-oc set-clock --gpu 0 --reset
Fan Control
# Manual fan speed
nvidia-oc set-fan --gpu 0 --speed 70
# Enable automatic control
nvidia-oc set-fan --gpu 0 --auto
Profile Management
# List profiles
nvidia-oc profile list
# Apply profile
nvidia-oc profile apply balanced
# Save current settings as profile
nvidia-oc profile save my-profile
Web UI Usage
Development vs Production
The application supports two separate deployment modes with different port configurations:
| Mode | Backend Port | Frontend | Use Case |
|---|---|---|---|
| Development | 9421 | Vite dev server (3420) | Local development with hot reload |
| Production | 9420 | Static files served by backend | System service on boot |
Port separation benefits:
- Run development and production simultaneously without conflicts
- Clear separation between testing and production environments
- Production uses standard port (9420) for consistency
Development Mode
Use the convenient startup script:
./run
# Backend starts on http://localhost:9421
# Frontend starts on http://localhost:3420
Access the development dashboard at: http://localhost:3420
Production Mode
First-Time Setup
- Install systemd service:
sudo ./scripts/install-service.sh
This will:
- Copy service file to
/etc/systemd/system/ - Enable the service to start on boot
- Create necessary directories in
/var/lib/nvidia-oc/
- Deploy and start:
./upgrade
This will:
- Build the frontend for production
- Deploy static files to
/var/lib/nvidia-oc/static/ - Sync backend dependencies
- Restart the systemd service
- Verify the deployment with health checks
Subsequent Updates
After making changes to code:
./upgrade
The upgrade script handles the complete deployment pipeline automatically.
Service Management
# Check service status
sudo systemctl status nvidia-oc
# View live logs
sudo journalctl -u nvidia-oc -f
# Restart service
sudo systemctl restart nvidia-oc
# Stop service
sudo systemctl stop nvidia-oc
Access Dashboard
Development:
http://localhost:3420 # Frontend dev server
http://localhost:9421/health # Backend health check
Production:
http://localhost:9420 # Production dashboard
http://192.168.x.x:9420 # From other machines on LAN
Features
- Real-time telemetry - Live GPU metrics updated every second
- Interactive controls - Sliders for clock and fan adjustments
- Temperature charts - Historical temperature and power draw graphs
- Profile switcher - Quick switching between performance modes
- Multi-GPU view - Side-by-side monitoring of all GPUs
Default Profiles
Quiet Profile
- Core offset: 0 MHz (stock)
- Memory offset: 0 MHz (stock)
- Fan curve: Low (40% at 60°C, 60% at 75°C)
Balanced Profile
- Core offset: +100 MHz
- Memory offset: +500 MHz
- Fan curve: Moderate (50% at 60°C, 70% at 70°C, 85% at 75°C)
Performance Profile
- Core offset: +150 MHz
- Memory offset: +700 MHz
- Fan curve: Aggressive (70% at 60°C, 85% at 70°C, 100% at 75°C)
Safety Features
- Max temp threshold: 85°C (emergency fan to 100%)
- Clock validation: Rejects unsafe offsets (>200MHz core, >1000MHz memory)
- Profile validation: Pydantic schemas prevent invalid configurations
- Coolbits check: Warns if overclocking not enabled
API Reference
REST Endpoints
GET /api/gpus- List all GPUsGET /api/gpus/{gpu_id}/status- Get GPU metricsPOST /api/gpus/{gpu_id}/clock- Set clock offsetsPOST /api/gpus/{gpu_id}/fan- Set fan speedGET /api/profiles- List profilesPOST /api/profiles/{name}/apply- Apply profile
WebSocket
WS /ws/telemetry- Stream live telemetry at 1Hz
Development
Setup
cd @infrastructure/nvidia-oc
# Install Python dependencies
uv sync
# Install frontend dependencies
cd frontend && pnpm install
Run Development Servers
Quick start (recommended):
./run
Manual start (two terminals):
# Terminal 1: Backend on port 9421
uv run python -m uvicorn nvidia_oc.api.main:app --host 0.0.0.0 --port 9421 --reload
# Terminal 2: Frontend on port 3420
cd frontend && pnpm dev
Access at: http://localhost:3420
Run Tests
# Python tests
uv run pytest backend/tests/
# TypeScript typecheck
cd frontend && pnpm typecheck
Project Structure
nvidia-oc/
├── run # Development startup script
├── upgrade # Production deployment script
├── scripts/
│ └── install-service.sh # Systemd service installer
├── systemd/
│ └── nvidia-oc.service # Systemd service definition
├── backend/ # Python FastAPI backend
│ └── nvidia_oc/
│ ├── core/ # GPU control logic
│ ├── api/ # REST API endpoints
│ ├── cli/ # CLI commands
│ └── daemon/ # Service daemon
├── frontend/ # React TypeScript frontend
│ └── src/
│ ├── components/ # React components
│ └── api/ # API client
└── configs/ # OC profile YAML files
Architecture
See ARCHITECTURE.md for detailed technical design.
Troubleshooting
"Could not initialize NVML" Error
- Ensure NVIDIA proprietary drivers are installed
- Check NVIDIA kernel modules are loaded:
lsmod | grep nvidia - Try running with sudo:
sudo nvidia-oc status
"Coolbits not enabled" Warning
sudo nvidia-xconfig -a --cool-bits=28
sudo systemctl restart display-manager
"Permission denied" on Clock/Fan Control
GPU control requires root privileges:
sudo nvidia-oc set-clock --gpu 0 --core +100
License
MIT