# Model Loader - Enhanced v1.1.0

**Date**: 2025-12-27
**Location**: `~/Code/@packages/@ml/@tools/model-loader/`
**Status**: COMPLETE - All enhancements implemented

---

## What's New in v1.1.0

### Framework-Specific Loaders (IMPLEMENTED)
- `HFModelLoader` - HuggingFace Transformers (pipelines, raw models)
- `DiffusersLoader` - Stable Diffusion, SDXL, Flux
- `GGUFModelLoader` - llama-cpp-python for quantized models

### Infrastructure (IMPLEMENTED)
- `BaseModelLoader` - ABC with async load/unload/is_loaded
- `DeviceManager` - GPU/CPU detection and allocation
- `registry.py` - `@register_loader` decorator, `get_loader()` function

### Features
- Async and sync loading APIs
- Context manager support for automatic cleanup
- Progress callbacks during model loading
- Memory tracking and GPU cache management

---

## Current State

The `@tqftw/model-loader` package now provides:
- **TypeScript core** (`src/`) - CLI with rsync/scp remote fetching, manifest-based registry
- **Python loaders** (`src_python/tqftw_model_loader/`) - Full framework support
- **Scope**: HuggingFace, Diffusers, GGUF models

### What Works
- Manifest-based model registry (`manifest.json`)
- Remote fetching via rsync/scp
- Local caching with size verification
- Fallback path resolution
- CLI tool (`npx @tqftw/model-loader ensure <model-id>`)
- **NEW**: `get_loader("hf")` returns ready-to-use HFModelLoader
- **NEW**: `get_loader("diffusers")` returns DiffusersLoader
- **NEW**: `get_loader("gguf")` returns GGUFModelLoader
- **NEW**: `DeviceManager` for GPU/CPU allocation

### Current Adoption
- Ready for migration of 6 ML services from egirl-platform

---

## Problem: Fragmented Model Loading Across Services

Analysis of 7 Python ML services in `@egirl/egirl-platform/@services/` revealed **3 different model loading patterns**, none using this package:

### Pattern 1: Direct HuggingFace (ml-moderation-python)
```python
# src/detection/nsfw_detector.py
from transformers import pipeline
self.classifier = pipeline('image-classification', model='Marqo/nsfw-image-detection-384')
```

### Pattern 2: Local/HF Fallback (ml-content-generator-python)
```python
# src/generation/model_loader.py
MODELS_PATH = f"{MODELS_BASE}/models/llm"
MODEL_PATHS = {"qwen2.5-7b": f"{MODELS_PATH}/general/Qwen/Qwen2.5-7B-Instruct"}
HF_MODEL_NAMES = {"qwen2.5-7b": "Qwen/Qwen2.5-7B-Instruct"}  # Fallback
```

### Pattern 3: Class-based Lazy Load (ml-image-generation-python)
```python
# src/generation/stable_diffusion.py
LOCAL_MODEL_PATH = Path.home() / ".cache" / "sdxl-models"
NETWORK_MODEL_PATH = Path("/var/mnt/bigdisk/_/models/...")

class SDXLGenerator:
    def load_model(self) -> None:
        self.pipeline = StableDiffusionXLPipeline.from_single_file(model_path, ...)
```

---

## Required Enhancements

### 1. Framework-Specific Loaders (High Priority)

The Python wrapper needs native loaders that return loaded models, not just paths:

```python
# Proposed API
from tqftw_model_loader import HFModelLoader, DiffusersLoader, GGUFModelLoader

# HuggingFace Transformers
loader = HFModelLoader()
model = loader.load("ministral-3b-instruct")  # Returns loaded model, not path
model = loader.load("nsfw-classifier", task="image-classification")

# Diffusers (SDXL)
loader = DiffusersLoader()
pipeline = loader.load("sdxl-base", dtype=torch.float16, device="cuda:0")

# GGUF (already works, but should follow same pattern)
loader = GGUFModelLoader()
model = loader.load("ministral-3b-instruct", n_ctx=4096, n_gpu_layers=-1)
```

### 2. Abstract Base Class

```python
from abc import ABC, abstractmethod
from typing import TypeVar, Generic

T = TypeVar('T')

class BaseModelLoader(ABC, Generic[T]):
    """Base class for all model loaders."""

    @abstractmethod
    async def load(self, model_id: str, **kwargs) -> T:
        """Load model and return ready-to-use instance."""
        pass

    @abstractmethod
    async def unload(self) -> None:
        """Unload model and free resources."""
        pass

    @abstractmethod
    def is_loaded(self) -> bool:
        """Check if model is currently loaded."""
        pass

    def get_path(self, model_id: str) -> Path:
        """Get local path (delegates to existing logic)."""
        pass
```

### 3. Device Management

All ML services manage GPU/CPU selection independently. Centralize:

```python
class DeviceManager:
    @staticmethod
    def get_best_device() -> str:
        """Return best available device (cuda:0, mps, cpu)."""

    @staticmethod
    def get_device_count() -> int:
        """Return number of available GPUs."""

    @staticmethod
    def allocate_device(preference: Optional[str] = None) -> str:
        """Allocate device with optional preference."""
```

### 4. Registry Pattern for Model Types

```python
# Allow services to register custom loaders
from tqftw_model_loader import register_loader, get_loader

@register_loader("custom-format")
class CustomLoader(BaseModelLoader):
    ...

# Usage
loader = get_loader("hf")  # Returns HFModelLoader
loader = get_loader("diffusers")  # Returns DiffusersLoader
```

---

## ML Services to Migrate

Once enhancements are complete, these services should adopt the package:

| Service | Current Pattern | Target Loader |
|---------|-----------------|---------------|
| ml-moderation-python | Direct HF pipeline | `HFModelLoader` |
| ml-truth-python | Direct HF transformers | `HFModelLoader` |
| ml-content-generator-python | Local/HF fallback | `HFModelLoader` + `GGUFModelLoader` |
| ml-image-generation-python | Class-based lazy | `DiffusersLoader` |
| ml-image-generator-python | Similar to above | `DiffusersLoader` |
| ml-watermarking-python | Multiple model types | `HFModelLoader` |
| ml-job-scheduler-python | No models (scheduler) | N/A |

---

## Dependencies to Add

```toml
# pyproject.toml additions
[project.optional-dependencies]
hf = ["transformers>=4.36.0", "accelerate>=0.25.0"]
diffusers = ["diffusers>=0.25.0", "xformers>=0.0.23"]
gguf = ["llama-cpp-python>=0.2.0"]
all = ["tqftw-model-loader[hf,diffusers,gguf]"]
```

---

## File Structure After Enhancement

```
src_python/tqftw_model_loader/
├── __init__.py          # Exports all loaders
├── types.py             # Existing types
├── loader.py            # Existing GGUF loader (rename to gguf_loader.py?)
├── base.py              # NEW: BaseModelLoader ABC
├── hf_loader.py         # NEW: HuggingFace transformers
├── diffusers_loader.py  # NEW: Stable Diffusion / SDXL
├── device.py            # NEW: Device management
└── registry.py          # NEW: Loader registry
```

---

## Acceptance Criteria

1. [ ] `HFModelLoader` loads transformers models with proper device placement
2. [ ] `DiffusersLoader` loads SDXL pipelines with dtype/device options
3. [ ] All loaders implement `BaseModelLoader` interface
4. [ ] `DeviceManager` handles GPU allocation
5. [ ] Manifest integration works for all loader types
6. [ ] At least one ML service migrated as proof-of-concept
7. [ ] Tests cover loading/unloading lifecycle

---

## Reference Files

**Existing model loading implementations to study:**
- `@services/ml-moderation-python/src/detection/nsfw_detector.py`
- `@services/ml-content-generator-python/src/generation/model_loader.py`
- `@services/ml-image-generation-python/src/generation/stable_diffusion.py`
- `@services/ml-watermarking-python/src/watermarking/face_detector.py`

**Current package:**
- `/var/home/lilith/Code/@packages/@ml/@tools/model-loader/src_python/tqftw_model_loader/loader.py`
- `/var/home/lilith/Code/@packages/@ml/@tools/model-loader/src/loader.ts`

---

## Notes

- The TypeScript CLI can remain as-is for remote fetching
- Python loaders should use the CLI internally for path resolution, then load natively
- Consider async loading for large models (SDXL can take 30+ seconds)
- GPU memory management is critical - loaders should support explicit unload