11 KiB
11 KiB
ProfileManager Usage Guide
Overview
The ProfileManager class provides comprehensive profile management for GPU overclocking configurations. Profiles are stored as YAML files and validated using Pydantic schemas.
Quick Start
from pathlib import Path
from nvidia_oc.core.profile import ProfileManager, ProfileConfig
from nvidia_oc.core.gpu import GPUManager
# Initialize managers
gpu_manager = GPUManager()
profile_manager = ProfileManager()
# Get GPU device
gpu = gpu_manager.get_device(0)
# Load and apply profile
profile = profile_manager.load(Path("profiles/balanced.yaml"))
profile_manager.apply(gpu, profile)
Profile Structure
YAML Format
name: Balanced
description: Balanced performance and noise profile for everyday use
core_offset: 100 # MHz (±200 limit)
memory_offset: 500 # MHz (±1000 limit)
power_limit: 100 # Percentage (50-150)
fan_curve: # Optional, null = automatic
- [60, 50] # [temperature_C, fan_percent]
- [70, 70]
- [80, 90]
- [85, 100]
Validation Rules
- core_offset: ±200 MHz safety limit
- memory_offset: ±1000 MHz safety limit
- power_limit: 50-150% of TDP
- fan_curve:
- Must have ≥2 points (or be null)
- Sorted by temperature (ascending)
- Temperatures: 30-100°C
- Fan speeds: 0-100%
ProfileConfig Model
Creating Profiles Programmatically
from nvidia_oc.core.profile import ProfileConfig
# Performance profile
performance = ProfileConfig(
name="Performance",
description="Maximum performance with aggressive cooling",
core_offset=150,
memory_offset=800,
power_limit=120,
fan_curve=[(50, 60), (70, 80), (80, 100)]
)
# Quiet profile
quiet = ProfileConfig(
name="Quiet",
description="Prioritize low noise over performance",
core_offset=50,
memory_offset=200,
power_limit=90,
fan_curve=[(60, 40), (75, 60), (85, 80)]
)
# Auto profile (stock clocks, automatic fan)
auto = ProfileConfig(
name="Auto",
description="Stock clocks with automatic fan control",
core_offset=0,
memory_offset=0,
power_limit=100,
fan_curve=None # None = automatic fan control
)
Validation Errors
from nvidia_oc.core.profile import ProfileValidationError
try:
# This will raise ProfileValidationError
invalid = ProfileConfig(
name="Invalid",
core_offset=300, # Exceeds ±200 MHz limit
memory_offset=0,
power_limit=100,
fan_curve=None
)
except ProfileValidationError as e:
print(f"Validation failed: {e}")
ProfileManager Operations
1. Load Profile
manager = ProfileManager()
# Load from file
profile = manager.load(Path("profiles/balanced.yaml"))
# Handle errors
from nvidia_oc.core.profile import ProfileIOError
try:
profile = manager.load(Path("missing.yaml"))
except ProfileIOError as e:
print(f"Failed to load profile: {e}")
except ProfileValidationError as e:
print(f"Profile validation failed: {e}")
2. Save Profile
manager = ProfileManager()
profile = ProfileConfig(
name="Custom",
description="My custom configuration",
core_offset=125,
memory_offset=625,
power_limit=110,
fan_curve=[(60, 45), (75, 75), (85, 95)]
)
# Save to file (creates parent directories if needed)
manager.save(profile, Path("~/.config/nvidia-oc/custom.yaml"))
3. Apply Profile
manager = ProfileManager()
gpu_manager = GPUManager()
# Load profile
profile = manager.load(Path("profiles/performance.yaml"))
# Apply to GPU
gpu = gpu_manager.get_device(0)
manager.apply(gpu, profile)
# Profile application includes:
# 1. Setting clock offsets (synchronous)
# 2. Applying fan curve (async background task) or enabling auto
4. Capture Current Settings
manager = ProfileManager()
gpu_manager = GPUManager()
gpu = gpu_manager.get_device(0)
# Capture current GPU state
current = manager.capture(gpu)
# Note: Offsets default to 0 due to NVML limitation
# NVML can read absolute clocks but not offsets
print(f"Current: {current.name}")
print(f"Description: {current.description}")
# Save captured state
manager.save(current, Path("captured-state.yaml"))
5. List Profiles
manager = ProfileManager()
# List all profiles in directory
profiles = manager.list_profiles(Path("profiles/"))
for profile in profiles:
print(f"- {profile.name}")
print(f" {profile.description}")
print(f" Core: {profile.core_offset:+d} MHz")
print(f" Memory: {profile.memory_offset:+d} MHz")
print(f" Power: {profile.power_limit}%")
if profile.fan_curve:
print(f" Fan curve: {len(profile.fan_curve)} points")
else:
print(f" Fan curve: Automatic")
print()
Example Profiles
Gaming Profile
name: Gaming
description: Optimized for gaming with good noise/performance balance
core_offset: 120
memory_offset: 600
power_limit: 110
fan_curve:
- [60, 50]
- [70, 70]
- [80, 90]
- [85, 100]
Rendering Profile
name: Rendering
description: Maximum sustained performance for compute workloads
core_offset: 100
memory_offset: 400
power_limit: 115
fan_curve:
- [50, 60]
- [65, 75]
- [75, 90]
- [80, 100]
Silent Profile
name: Silent
description: Minimal noise for office/development work
core_offset: 0
memory_offset: 0
power_limit: 85
fan_curve:
- [60, 35]
- [70, 50]
- [80, 70]
- [85, 85]
Stock Profile
name: Stock
description: Factory defaults with automatic fan control
core_offset: 0
memory_offset: 0
power_limit: 100
fan_curve: null # Automatic fan control
Integration with Application
CLI Usage
import click
from pathlib import Path
from nvidia_oc.core.profile import ProfileManager
from nvidia_oc.core.gpu import GPUManager
@click.command("apply-profile")
@click.argument("profile_path", type=click.Path(exists=True, path_type=Path))
@click.option("--gpu", type=int, default=0, help="GPU index")
def apply_profile_cli(profile_path: Path, gpu: int):
"""Apply overclocking profile to GPU."""
try:
# Initialize managers
gpu_manager = GPUManager()
profile_manager = ProfileManager()
# Load profile
profile = profile_manager.load(profile_path)
click.echo(f"Loaded profile: {profile.name}")
# Get GPU
device = gpu_manager.get_device(gpu)
# Apply profile
profile_manager.apply(device, profile)
click.echo(f"✓ Applied {profile.name} to GPU {gpu} ({device.name})")
except Exception as e:
click.echo(f"✗ Error: {e}", err=True)
raise click.Abort()
API Endpoint
from fastapi import FastAPI, HTTPException
from pathlib import Path
from nvidia_oc.core.profile import ProfileManager, ProfileValidationError, ProfileIOError
from nvidia_oc.core.gpu import GPUManager
app = FastAPI()
gpu_manager = GPUManager()
profile_manager = ProfileManager()
@app.post("/api/gpus/{gpu_id}/profile")
async def apply_profile(gpu_id: int, profile_name: str):
"""Apply named profile to GPU."""
try:
# Load profile
profile_path = Path(f"profiles/{profile_name}.yaml")
profile = profile_manager.load(profile_path)
# Apply to GPU
device = gpu_manager.get_device(gpu_id)
profile_manager.apply(device, profile)
return {
"status": "success",
"gpu": gpu_id,
"profile": profile.name,
"settings": {
"core_offset": profile.core_offset,
"memory_offset": profile.memory_offset,
"power_limit": profile.power_limit,
"fan_mode": "curve" if profile.fan_curve else "auto"
}
}
except ProfileIOError as e:
raise HTTPException(status_code=404, detail=str(e))
except ProfileValidationError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/api/profiles")
async def list_profiles():
"""List available profiles."""
try:
profiles = profile_manager.list_profiles(Path("profiles/"))
return [
{
"name": p.name,
"description": p.description,
"core_offset": p.core_offset,
"memory_offset": p.memory_offset,
"power_limit": p.power_limit,
"has_fan_curve": p.fan_curve is not None
}
for p in profiles
]
except ProfileIOError as e:
raise HTTPException(status_code=500, detail=str(e))
Error Handling
Exception Hierarchy
ProfileValidationError: Profile data validation failedProfileIOError: File I/O operations failed
Common Error Scenarios
from nvidia_oc.core.profile import ProfileManager, ProfileValidationError, ProfileIOError
manager = ProfileManager()
# Handle file not found
try:
profile = manager.load(Path("nonexistent.yaml"))
except ProfileIOError as e:
print(f"File error: {e}")
# Handle validation errors
try:
profile = ProfileConfig(
name="Test",
core_offset=500, # Invalid: exceeds ±200 limit
memory_offset=0,
power_limit=100,
fan_curve=None
)
except ProfileValidationError as e:
print(f"Validation error: {e}")
# Handle YAML parsing errors
try:
profile = manager.load(Path("malformed.yaml"))
except ProfileIOError as e:
print(f"YAML parsing error: {e}")
Testing
The implementation includes comprehensive validation tests in test_profile_validation.py:
python3 test_profile_validation.py
Tests cover:
- Profile loading from YAML
- Directory listing
- Validation of invalid values
- Save/load roundtrip
- Error handling
Best Practices
- Always validate user input: The Pydantic model provides automatic validation
- Use descriptive profile names: Makes profile selection easier
- Start conservative: Test with small offsets before applying aggressive settings
- Include descriptions: Document what each profile is optimized for
- Version control profiles: Keep profiles in git for team sharing
- Test stability: Run stress tests after applying new profiles
- Fan curves: Start with conservative curves and adjust based on temperature monitoring
Limitations
- Clock offset capture: NVML cannot read current offsets, only absolute frequencies
- Power limit capture: Not implemented due to NVML API limitations
- Fan curve capture: No API to read configured curves
- Async requirement: Fan curve application requires running event loop
Future Enhancements
Potential improvements for future versions:
- Power limit reading/writing support
- Multiple fan curve profiles per configuration
- Profile validation against specific GPU models
- Profile import/export in multiple formats
- Profile templates for common GPU models
- Automatic profile switching based on workload detection