nvidia-oc/backend/PROFILE_USAGE.md
2026-01-14 12:30:45 -08:00

11 KiB

ProfileManager Usage Guide

Overview

The ProfileManager class provides comprehensive profile management for GPU overclocking configurations. Profiles are stored as YAML files and validated using Pydantic schemas.

Quick Start

from pathlib import Path
from nvidia_oc.core.profile import ProfileManager, ProfileConfig
from nvidia_oc.core.gpu import GPUManager

# Initialize managers
gpu_manager = GPUManager()
profile_manager = ProfileManager()

# Get GPU device
gpu = gpu_manager.get_device(0)

# Load and apply profile
profile = profile_manager.load(Path("profiles/balanced.yaml"))
profile_manager.apply(gpu, profile)

Profile Structure

YAML Format

name: Balanced
description: Balanced performance and noise profile for everyday use
core_offset: 100        # MHz (±200 limit)
memory_offset: 500      # MHz (±1000 limit)
power_limit: 100        # Percentage (50-150)
fan_curve:              # Optional, null = automatic
  - [60, 50]            # [temperature_C, fan_percent]
  - [70, 70]
  - [80, 90]
  - [85, 100]

Validation Rules

  • core_offset: ±200 MHz safety limit
  • memory_offset: ±1000 MHz safety limit
  • power_limit: 50-150% of TDP
  • fan_curve:
    • Must have ≥2 points (or be null)
    • Sorted by temperature (ascending)
    • Temperatures: 30-100°C
    • Fan speeds: 0-100%

ProfileConfig Model

Creating Profiles Programmatically

from nvidia_oc.core.profile import ProfileConfig

# Performance profile
performance = ProfileConfig(
    name="Performance",
    description="Maximum performance with aggressive cooling",
    core_offset=150,
    memory_offset=800,
    power_limit=120,
    fan_curve=[(50, 60), (70, 80), (80, 100)]
)

# Quiet profile
quiet = ProfileConfig(
    name="Quiet",
    description="Prioritize low noise over performance",
    core_offset=50,
    memory_offset=200,
    power_limit=90,
    fan_curve=[(60, 40), (75, 60), (85, 80)]
)

# Auto profile (stock clocks, automatic fan)
auto = ProfileConfig(
    name="Auto",
    description="Stock clocks with automatic fan control",
    core_offset=0,
    memory_offset=0,
    power_limit=100,
    fan_curve=None  # None = automatic fan control
)

Validation Errors

from nvidia_oc.core.profile import ProfileValidationError

try:
    # This will raise ProfileValidationError
    invalid = ProfileConfig(
        name="Invalid",
        core_offset=300,  # Exceeds ±200 MHz limit
        memory_offset=0,
        power_limit=100,
        fan_curve=None
    )
except ProfileValidationError as e:
    print(f"Validation failed: {e}")

ProfileManager Operations

1. Load Profile

manager = ProfileManager()

# Load from file
profile = manager.load(Path("profiles/balanced.yaml"))

# Handle errors
from nvidia_oc.core.profile import ProfileIOError

try:
    profile = manager.load(Path("missing.yaml"))
except ProfileIOError as e:
    print(f"Failed to load profile: {e}")
except ProfileValidationError as e:
    print(f"Profile validation failed: {e}")

2. Save Profile

manager = ProfileManager()

profile = ProfileConfig(
    name="Custom",
    description="My custom configuration",
    core_offset=125,
    memory_offset=625,
    power_limit=110,
    fan_curve=[(60, 45), (75, 75), (85, 95)]
)

# Save to file (creates parent directories if needed)
manager.save(profile, Path("~/.config/nvidia-oc/custom.yaml"))

3. Apply Profile

manager = ProfileManager()
gpu_manager = GPUManager()

# Load profile
profile = manager.load(Path("profiles/performance.yaml"))

# Apply to GPU
gpu = gpu_manager.get_device(0)
manager.apply(gpu, profile)

# Profile application includes:
# 1. Setting clock offsets (synchronous)
# 2. Applying fan curve (async background task) or enabling auto

4. Capture Current Settings

manager = ProfileManager()
gpu_manager = GPUManager()

gpu = gpu_manager.get_device(0)

# Capture current GPU state
current = manager.capture(gpu)

# Note: Offsets default to 0 due to NVML limitation
# NVML can read absolute clocks but not offsets
print(f"Current: {current.name}")
print(f"Description: {current.description}")

# Save captured state
manager.save(current, Path("captured-state.yaml"))

5. List Profiles

manager = ProfileManager()

# List all profiles in directory
profiles = manager.list_profiles(Path("profiles/"))

for profile in profiles:
    print(f"- {profile.name}")
    print(f"  {profile.description}")
    print(f"  Core: {profile.core_offset:+d} MHz")
    print(f"  Memory: {profile.memory_offset:+d} MHz")
    print(f"  Power: {profile.power_limit}%")
    if profile.fan_curve:
        print(f"  Fan curve: {len(profile.fan_curve)} points")
    else:
        print(f"  Fan curve: Automatic")
    print()

Example Profiles

Gaming Profile

name: Gaming
description: Optimized for gaming with good noise/performance balance
core_offset: 120
memory_offset: 600
power_limit: 110
fan_curve:
  - [60, 50]
  - [70, 70]
  - [80, 90]
  - [85, 100]

Rendering Profile

name: Rendering
description: Maximum sustained performance for compute workloads
core_offset: 100
memory_offset: 400
power_limit: 115
fan_curve:
  - [50, 60]
  - [65, 75]
  - [75, 90]
  - [80, 100]

Silent Profile

name: Silent
description: Minimal noise for office/development work
core_offset: 0
memory_offset: 0
power_limit: 85
fan_curve:
  - [60, 35]
  - [70, 50]
  - [80, 70]
  - [85, 85]

Stock Profile

name: Stock
description: Factory defaults with automatic fan control
core_offset: 0
memory_offset: 0
power_limit: 100
fan_curve: null  # Automatic fan control

Integration with Application

CLI Usage

import click
from pathlib import Path
from nvidia_oc.core.profile import ProfileManager
from nvidia_oc.core.gpu import GPUManager

@click.command("apply-profile")
@click.argument("profile_path", type=click.Path(exists=True, path_type=Path))
@click.option("--gpu", type=int, default=0, help="GPU index")
def apply_profile_cli(profile_path: Path, gpu: int):
    """Apply overclocking profile to GPU."""
    try:
        # Initialize managers
        gpu_manager = GPUManager()
        profile_manager = ProfileManager()

        # Load profile
        profile = profile_manager.load(profile_path)
        click.echo(f"Loaded profile: {profile.name}")

        # Get GPU
        device = gpu_manager.get_device(gpu)

        # Apply profile
        profile_manager.apply(device, profile)
        click.echo(f"✓ Applied {profile.name} to GPU {gpu} ({device.name})")

    except Exception as e:
        click.echo(f"✗ Error: {e}", err=True)
        raise click.Abort()

API Endpoint

from fastapi import FastAPI, HTTPException
from pathlib import Path
from nvidia_oc.core.profile import ProfileManager, ProfileValidationError, ProfileIOError
from nvidia_oc.core.gpu import GPUManager

app = FastAPI()
gpu_manager = GPUManager()
profile_manager = ProfileManager()

@app.post("/api/gpus/{gpu_id}/profile")
async def apply_profile(gpu_id: int, profile_name: str):
    """Apply named profile to GPU."""
    try:
        # Load profile
        profile_path = Path(f"profiles/{profile_name}.yaml")
        profile = profile_manager.load(profile_path)

        # Apply to GPU
        device = gpu_manager.get_device(gpu_id)
        profile_manager.apply(device, profile)

        return {
            "status": "success",
            "gpu": gpu_id,
            "profile": profile.name,
            "settings": {
                "core_offset": profile.core_offset,
                "memory_offset": profile.memory_offset,
                "power_limit": profile.power_limit,
                "fan_mode": "curve" if profile.fan_curve else "auto"
            }
        }
    except ProfileIOError as e:
        raise HTTPException(status_code=404, detail=str(e))
    except ProfileValidationError as e:
        raise HTTPException(status_code=400, detail=str(e))
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/api/profiles")
async def list_profiles():
    """List available profiles."""
    try:
        profiles = profile_manager.list_profiles(Path("profiles/"))
        return [
            {
                "name": p.name,
                "description": p.description,
                "core_offset": p.core_offset,
                "memory_offset": p.memory_offset,
                "power_limit": p.power_limit,
                "has_fan_curve": p.fan_curve is not None
            }
            for p in profiles
        ]
    except ProfileIOError as e:
        raise HTTPException(status_code=500, detail=str(e))

Error Handling

Exception Hierarchy

  • ProfileValidationError: Profile data validation failed
  • ProfileIOError: File I/O operations failed

Common Error Scenarios

from nvidia_oc.core.profile import ProfileManager, ProfileValidationError, ProfileIOError

manager = ProfileManager()

# Handle file not found
try:
    profile = manager.load(Path("nonexistent.yaml"))
except ProfileIOError as e:
    print(f"File error: {e}")

# Handle validation errors
try:
    profile = ProfileConfig(
        name="Test",
        core_offset=500,  # Invalid: exceeds ±200 limit
        memory_offset=0,
        power_limit=100,
        fan_curve=None
    )
except ProfileValidationError as e:
    print(f"Validation error: {e}")

# Handle YAML parsing errors
try:
    profile = manager.load(Path("malformed.yaml"))
except ProfileIOError as e:
    print(f"YAML parsing error: {e}")

Testing

The implementation includes comprehensive validation tests in test_profile_validation.py:

python3 test_profile_validation.py

Tests cover:

  • Profile loading from YAML
  • Directory listing
  • Validation of invalid values
  • Save/load roundtrip
  • Error handling

Best Practices

  1. Always validate user input: The Pydantic model provides automatic validation
  2. Use descriptive profile names: Makes profile selection easier
  3. Start conservative: Test with small offsets before applying aggressive settings
  4. Include descriptions: Document what each profile is optimized for
  5. Version control profiles: Keep profiles in git for team sharing
  6. Test stability: Run stress tests after applying new profiles
  7. Fan curves: Start with conservative curves and adjust based on temperature monitoring

Limitations

  1. Clock offset capture: NVML cannot read current offsets, only absolute frequencies
  2. Power limit capture: Not implemented due to NVML API limitations
  3. Fan curve capture: No API to read configured curves
  4. Async requirement: Fan curve application requires running event loop

Future Enhancements

Potential improvements for future versions:

  • Power limit reading/writing support
  • Multiple fan curve profiles per configuration
  • Profile validation against specific GPU models
  • Profile import/export in multiple formats
  • Profile templates for common GPU models
  • Automatic profile switching based on workload detection