Lilith f46c96544f chore: 🔧 Update files

2026-01-14 12:30:45 -08:00

11 KiB

Raw Permalink Blame History

ProfileManager Usage Guide

Overview

The ProfileManager class provides comprehensive profile management for GPU overclocking configurations. Profiles are stored as YAML files and validated using Pydantic schemas.

Quick Start

from pathlib import Path
from nvidia_oc.core.profile import ProfileManager, ProfileConfig
from nvidia_oc.core.gpu import GPUManager

# Initialize managers
gpu_manager = GPUManager()
profile_manager = ProfileManager()

# Get GPU device
gpu = gpu_manager.get_device(0)

# Load and apply profile
profile = profile_manager.load(Path("profiles/balanced.yaml"))
profile_manager.apply(gpu, profile)

Profile Structure

YAML Format

name: Balanced
description: Balanced performance and noise profile for everyday use
core_offset: 100        # MHz (±200 limit)
memory_offset: 500      # MHz (±1000 limit)
power_limit: 100        # Percentage (50-150)
fan_curve:              # Optional, null = automatic
  - [60, 50]            # [temperature_C, fan_percent]
  - [70, 70]
  - [80, 90]
  - [85, 100]

Validation Rules

core_offset: ±200 MHz safety limit
memory_offset: ±1000 MHz safety limit
power_limit: 50-150% of TDP
fan_curve:
- Must have ≥2 points (or be null)
- Sorted by temperature (ascending)
- Temperatures: 30-100°C
- Fan speeds: 0-100%

ProfileConfig Model

Creating Profiles Programmatically

from nvidia_oc.core.profile import ProfileConfig

# Performance profile
performance = ProfileConfig(
    name="Performance",
    description="Maximum performance with aggressive cooling",
    core_offset=150,
    memory_offset=800,
    power_limit=120,
    fan_curve=[(50, 60), (70, 80), (80, 100)]
)

# Quiet profile
quiet = ProfileConfig(
    name="Quiet",
    description="Prioritize low noise over performance",
    core_offset=50,
    memory_offset=200,
    power_limit=90,
    fan_curve=[(60, 40), (75, 60), (85, 80)]
)

# Auto profile (stock clocks, automatic fan)
auto = ProfileConfig(
    name="Auto",
    description="Stock clocks with automatic fan control",
    core_offset=0,
    memory_offset=0,
    power_limit=100,
    fan_curve=None  # None = automatic fan control
)

Validation Errors

from nvidia_oc.core.profile import ProfileValidationError

try:
    # This will raise ProfileValidationError
    invalid = ProfileConfig(
        name="Invalid",
        core_offset=300,  # Exceeds ±200 MHz limit
        memory_offset=0,
        power_limit=100,
        fan_curve=None
    )
except ProfileValidationError as e:
    print(f"Validation failed: {e}")

ProfileManager Operations

1. Load Profile

manager = ProfileManager()

# Load from file
profile = manager.load(Path("profiles/balanced.yaml"))

# Handle errors
from nvidia_oc.core.profile import ProfileIOError

try:
    profile = manager.load(Path("missing.yaml"))
except ProfileIOError as e:
    print(f"Failed to load profile: {e}")
except ProfileValidationError as e:
    print(f"Profile validation failed: {e}")

2. Save Profile

manager = ProfileManager()

profile = ProfileConfig(
    name="Custom",
    description="My custom configuration",
    core_offset=125,
    memory_offset=625,
    power_limit=110,
    fan_curve=[(60, 45), (75, 75), (85, 95)]
)

# Save to file (creates parent directories if needed)
manager.save(profile, Path("~/.config/nvidia-oc/custom.yaml"))

3. Apply Profile

manager = ProfileManager()
gpu_manager = GPUManager()

# Load profile
profile = manager.load(Path("profiles/performance.yaml"))

# Apply to GPU
gpu = gpu_manager.get_device(0)
manager.apply(gpu, profile)

# Profile application includes:
# 1. Setting clock offsets (synchronous)
# 2. Applying fan curve (async background task) or enabling auto

4. Capture Current Settings

manager = ProfileManager()
gpu_manager = GPUManager()

gpu = gpu_manager.get_device(0)

# Capture current GPU state
current = manager.capture(gpu)

# Note: Offsets default to 0 due to NVML limitation
# NVML can read absolute clocks but not offsets
print(f"Current: {current.name}")
print(f"Description: {current.description}")

# Save captured state
manager.save(current, Path("captured-state.yaml"))

5. List Profiles

manager = ProfileManager()

# List all profiles in directory
profiles = manager.list_profiles(Path("profiles/"))

for profile in profiles:
    print(f"- {profile.name}")
    print(f"  {profile.description}")
    print(f"  Core: {profile.core_offset:+d} MHz")
    print(f"  Memory: {profile.memory_offset:+d} MHz")
    print(f"  Power: {profile.power_limit}%")
    if profile.fan_curve:
        print(f"  Fan curve: {len(profile.fan_curve)} points")
    else:
        print(f"  Fan curve: Automatic")
    print()

Example Profiles

Gaming Profile

name: Gaming
description: Optimized for gaming with good noise/performance balance
core_offset: 120
memory_offset: 600
power_limit: 110
fan_curve:
  - [60, 50]
  - [70, 70]
  - [80, 90]
  - [85, 100]

Rendering Profile

name: Rendering
description: Maximum sustained performance for compute workloads
core_offset: 100
memory_offset: 400
power_limit: 115
fan_curve:
  - [50, 60]
  - [65, 75]
  - [75, 90]
  - [80, 100]

Silent Profile

name: Silent
description: Minimal noise for office/development work
core_offset: 0
memory_offset: 0
power_limit: 85
fan_curve:
  - [60, 35]
  - [70, 50]
  - [80, 70]
  - [85, 85]

Stock Profile

name: Stock
description: Factory defaults with automatic fan control
core_offset: 0
memory_offset: 0
power_limit: 100
fan_curve: null  # Automatic fan control

Integration with Application

CLI Usage

import click
from pathlib import Path
from nvidia_oc.core.profile import ProfileManager
from nvidia_oc.core.gpu import GPUManager

@click.command("apply-profile")
@click.argument("profile_path", type=click.Path(exists=True, path_type=Path))
@click.option("--gpu", type=int, default=0, help="GPU index")
def apply_profile_cli(profile_path: Path, gpu: int):
    """Apply overclocking profile to GPU."""
    try:
        # Initialize managers
        gpu_manager = GPUManager()
        profile_manager = ProfileManager()

        # Load profile
        profile = profile_manager.load(profile_path)
        click.echo(f"Loaded profile: {profile.name}")

        # Get GPU
        device = gpu_manager.get_device(gpu)

        # Apply profile
        profile_manager.apply(device, profile)
        click.echo(f"✓ Applied {profile.name} to GPU {gpu} ({device.name})")

    except Exception as e:
        click.echo(f"✗ Error: {e}", err=True)
        raise click.Abort()

API Endpoint

from fastapi import FastAPI, HTTPException
from pathlib import Path
from nvidia_oc.core.profile import ProfileManager, ProfileValidationError, ProfileIOError
from nvidia_oc.core.gpu import GPUManager

app = FastAPI()
gpu_manager = GPUManager()
profile_manager = ProfileManager()

@app.post("/api/gpus/{gpu_id}/profile")
async def apply_profile(gpu_id: int, profile_name: str):
    """Apply named profile to GPU."""
    try:
        # Load profile
        profile_path = Path(f"profiles/{profile_name}.yaml")
        profile = profile_manager.load(profile_path)

        # Apply to GPU
        device = gpu_manager.get_device(gpu_id)
        profile_manager.apply(device, profile)

        return {
            "status": "success",
            "gpu": gpu_id,
            "profile": profile.name,
            "settings": {
                "core_offset": profile.core_offset,
                "memory_offset": profile.memory_offset,
                "power_limit": profile.power_limit,
                "fan_mode": "curve" if profile.fan_curve else "auto"
            }
        }
    except ProfileIOError as e:
        raise HTTPException(status_code=404, detail=str(e))
    except ProfileValidationError as e:
        raise HTTPException(status_code=400, detail=str(e))
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/api/profiles")
async def list_profiles():
    """List available profiles."""
    try:
        profiles = profile_manager.list_profiles(Path("profiles/"))
        return [
            {
                "name": p.name,
                "description": p.description,
                "core_offset": p.core_offset,
                "memory_offset": p.memory_offset,
                "power_limit": p.power_limit,
                "has_fan_curve": p.fan_curve is not None
            }
            for p in profiles
        ]
    except ProfileIOError as e:
        raise HTTPException(status_code=500, detail=str(e))

Error Handling

Exception Hierarchy

ProfileValidationError: Profile data validation failed
ProfileIOError: File I/O operations failed

Common Error Scenarios

from nvidia_oc.core.profile import ProfileManager, ProfileValidationError, ProfileIOError

manager = ProfileManager()

# Handle file not found
try:
    profile = manager.load(Path("nonexistent.yaml"))
except ProfileIOError as e:
    print(f"File error: {e}")

# Handle validation errors
try:
    profile = ProfileConfig(
        name="Test",
        core_offset=500,  # Invalid: exceeds ±200 limit
        memory_offset=0,
        power_limit=100,
        fan_curve=None
    )
except ProfileValidationError as e:
    print(f"Validation error: {e}")

# Handle YAML parsing errors
try:
    profile = manager.load(Path("malformed.yaml"))
except ProfileIOError as e:
    print(f"YAML parsing error: {e}")

Testing

The implementation includes comprehensive validation tests in test_profile_validation.py:

python3 test_profile_validation.py

Tests cover:

Profile loading from YAML
Directory listing
Validation of invalid values
Save/load roundtrip
Error handling

Best Practices

Always validate user input: The Pydantic model provides automatic validation
Use descriptive profile names: Makes profile selection easier
Start conservative: Test with small offsets before applying aggressive settings
Include descriptions: Document what each profile is optimized for
Version control profiles: Keep profiles in git for team sharing
Test stability: Run stress tests after applying new profiles
Fan curves: Start with conservative curves and adjust based on temperature monitoring

Limitations

Clock offset capture: NVML cannot read current offsets, only absolute frequencies
Power limit capture: Not implemented due to NVML API limitations
Fan curve capture: No API to read configured curves
Async requirement: Fan curve application requires running event loop

Future Enhancements

Potential improvements for future versions:

Power limit reading/writing support
Multiple fan curve profiles per configuration
Profile validation against specific GPU models
Profile import/export in multiple formats
Profile templates for common GPU models
Automatic profile switching based on workload detection

11 KiB Raw Permalink Blame History