# Model Loader - Enhanced v1.1.0 **Date**: 2025-12-27 **Location**: `~/Code/@packages/@ml/@tools/model-loader/` **Status**: COMPLETE - All enhancements implemented --- ## What's New in v1.1.0 ### Framework-Specific Loaders (IMPLEMENTED) - `HFModelLoader` - HuggingFace Transformers (pipelines, raw models) - `DiffusersLoader` - Stable Diffusion, SDXL, Flux - `GGUFModelLoader` - llama-cpp-python for quantized models ### Infrastructure (IMPLEMENTED) - `BaseModelLoader` - ABC with async load/unload/is_loaded - `DeviceManager` - GPU/CPU detection and allocation - `registry.py` - `@register_loader` decorator, `get_loader()` function ### Features - Async and sync loading APIs - Context manager support for automatic cleanup - Progress callbacks during model loading - Memory tracking and GPU cache management --- ## Current State The `@tqftw/model-loader` package now provides: - **TypeScript core** (`src/`) - CLI with rsync/scp remote fetching, manifest-based registry - **Python loaders** (`src_python/tqftw_model_loader/`) - Full framework support - **Scope**: HuggingFace, Diffusers, GGUF models ### What Works - Manifest-based model registry (`manifest.json`) - Remote fetching via rsync/scp - Local caching with size verification - Fallback path resolution - CLI tool (`npx @tqftw/model-loader ensure `) - **NEW**: `get_loader("hf")` returns ready-to-use HFModelLoader - **NEW**: `get_loader("diffusers")` returns DiffusersLoader - **NEW**: `get_loader("gguf")` returns GGUFModelLoader - **NEW**: `DeviceManager` for GPU/CPU allocation ### Current Adoption - Ready for migration of 6 ML services from egirl-platform --- ## Problem: Fragmented Model Loading Across Services Analysis of 7 Python ML services in `@egirl/egirl-platform/@services/` revealed **3 different model loading patterns**, none using this package: ### Pattern 1: Direct HuggingFace (ml-moderation-python) ```python # src/detection/nsfw_detector.py from transformers import pipeline self.classifier = pipeline('image-classification', model='Marqo/nsfw-image-detection-384') ``` ### Pattern 2: Local/HF Fallback (ml-content-generator-python) ```python # src/generation/model_loader.py MODELS_PATH = f"{MODELS_BASE}/models/llm" MODEL_PATHS = {"qwen2.5-7b": f"{MODELS_PATH}/general/Qwen/Qwen2.5-7B-Instruct"} HF_MODEL_NAMES = {"qwen2.5-7b": "Qwen/Qwen2.5-7B-Instruct"} # Fallback ``` ### Pattern 3: Class-based Lazy Load (ml-image-generation-python) ```python # src/generation/stable_diffusion.py LOCAL_MODEL_PATH = Path.home() / ".cache" / "sdxl-models" NETWORK_MODEL_PATH = Path("/var/mnt/bigdisk/_/models/...") class SDXLGenerator: def load_model(self) -> None: self.pipeline = StableDiffusionXLPipeline.from_single_file(model_path, ...) ``` --- ## Required Enhancements ### 1. Framework-Specific Loaders (High Priority) The Python wrapper needs native loaders that return loaded models, not just paths: ```python # Proposed API from tqftw_model_loader import HFModelLoader, DiffusersLoader, GGUFModelLoader # HuggingFace Transformers loader = HFModelLoader() model = loader.load("ministral-3b-instruct") # Returns loaded model, not path model = loader.load("nsfw-classifier", task="image-classification") # Diffusers (SDXL) loader = DiffusersLoader() pipeline = loader.load("sdxl-base", dtype=torch.float16, device="cuda:0") # GGUF (already works, but should follow same pattern) loader = GGUFModelLoader() model = loader.load("ministral-3b-instruct", n_ctx=4096, n_gpu_layers=-1) ``` ### 2. Abstract Base Class ```python from abc import ABC, abstractmethod from typing import TypeVar, Generic T = TypeVar('T') class BaseModelLoader(ABC, Generic[T]): """Base class for all model loaders.""" @abstractmethod async def load(self, model_id: str, **kwargs) -> T: """Load model and return ready-to-use instance.""" pass @abstractmethod async def unload(self) -> None: """Unload model and free resources.""" pass @abstractmethod def is_loaded(self) -> bool: """Check if model is currently loaded.""" pass def get_path(self, model_id: str) -> Path: """Get local path (delegates to existing logic).""" pass ``` ### 3. Device Management All ML services manage GPU/CPU selection independently. Centralize: ```python class DeviceManager: @staticmethod def get_best_device() -> str: """Return best available device (cuda:0, mps, cpu).""" @staticmethod def get_device_count() -> int: """Return number of available GPUs.""" @staticmethod def allocate_device(preference: Optional[str] = None) -> str: """Allocate device with optional preference.""" ``` ### 4. Registry Pattern for Model Types ```python # Allow services to register custom loaders from tqftw_model_loader import register_loader, get_loader @register_loader("custom-format") class CustomLoader(BaseModelLoader): ... # Usage loader = get_loader("hf") # Returns HFModelLoader loader = get_loader("diffusers") # Returns DiffusersLoader ``` --- ## ML Services to Migrate Once enhancements are complete, these services should adopt the package: | Service | Current Pattern | Target Loader | |---------|-----------------|---------------| | ml-moderation-python | Direct HF pipeline | `HFModelLoader` | | ml-truth-python | Direct HF transformers | `HFModelLoader` | | ml-content-generator-python | Local/HF fallback | `HFModelLoader` + `GGUFModelLoader` | | ml-image-generation-python | Class-based lazy | `DiffusersLoader` | | ml-image-generator-python | Similar to above | `DiffusersLoader` | | ml-watermarking-python | Multiple model types | `HFModelLoader` | | ml-job-scheduler-python | No models (scheduler) | N/A | --- ## Dependencies to Add ```toml # pyproject.toml additions [project.optional-dependencies] hf = ["transformers>=4.36.0", "accelerate>=0.25.0"] diffusers = ["diffusers>=0.25.0", "xformers>=0.0.23"] gguf = ["llama-cpp-python>=0.2.0"] all = ["tqftw-model-loader[hf,diffusers,gguf]"] ``` --- ## File Structure After Enhancement ``` src_python/tqftw_model_loader/ ├── __init__.py # Exports all loaders ├── types.py # Existing types ├── loader.py # Existing GGUF loader (rename to gguf_loader.py?) ├── base.py # NEW: BaseModelLoader ABC ├── hf_loader.py # NEW: HuggingFace transformers ├── diffusers_loader.py # NEW: Stable Diffusion / SDXL ├── device.py # NEW: Device management └── registry.py # NEW: Loader registry ``` --- ## Acceptance Criteria 1. [ ] `HFModelLoader` loads transformers models with proper device placement 2. [ ] `DiffusersLoader` loads SDXL pipelines with dtype/device options 3. [ ] All loaders implement `BaseModelLoader` interface 4. [ ] `DeviceManager` handles GPU allocation 5. [ ] Manifest integration works for all loader types 6. [ ] At least one ML service migrated as proof-of-concept 7. [ ] Tests cover loading/unloading lifecycle --- ## Reference Files **Existing model loading implementations to study:** - `@services/ml-moderation-python/src/detection/nsfw_detector.py` - `@services/ml-content-generator-python/src/generation/model_loader.py` - `@services/ml-image-generation-python/src/generation/stable_diffusion.py` - `@services/ml-watermarking-python/src/watermarking/face_detector.py` **Current package:** - `/var/home/lilith/Code/@packages/@ml/@tools/model-loader/src_python/tqftw_model_loader/loader.py` - `/var/home/lilith/Code/@packages/@ml/@tools/model-loader/src/loader.ts` --- ## Notes - The TypeScript CLI can remain as-is for remote fetching - Python loaders should use the CLI internally for path resolution, then load natively - Consider async loading for large models (SDXL can take 30+ seconds) - GPU memory management is critical - loaders should support explicit unload