Package renamed to follow naming convention:
@lilith/{namespace}-{parent}-{child}
Generated by rename-packages.sh
7.7 KiB
7.7 KiB
Model Loader - Enhanced v1.1.0
Date: 2025-12-27
Location: ~/Code/@packages/@ml/@tools/model-loader/
Status: COMPLETE - All enhancements implemented
What's New in v1.1.0
Framework-Specific Loaders (IMPLEMENTED)
HFModelLoader- HuggingFace Transformers (pipelines, raw models)DiffusersLoader- Stable Diffusion, SDXL, FluxGGUFModelLoader- llama-cpp-python for quantized models
Infrastructure (IMPLEMENTED)
BaseModelLoader- ABC with async load/unload/is_loadedDeviceManager- GPU/CPU detection and allocationregistry.py-@register_loaderdecorator,get_loader()function
Features
- Async and sync loading APIs
- Context manager support for automatic cleanup
- Progress callbacks during model loading
- Memory tracking and GPU cache management
Current State
The @tqftw/model-loader package now provides:
- TypeScript core (
src/) - CLI with rsync/scp remote fetching, manifest-based registry - Python loaders (
src_python/tqftw_model_loader/) - Full framework support - Scope: HuggingFace, Diffusers, GGUF models
What Works
- Manifest-based model registry (
manifest.json) - Remote fetching via rsync/scp
- Local caching with size verification
- Fallback path resolution
- CLI tool (
npx @tqftw/model-loader ensure <model-id>) - NEW:
get_loader("hf")returns ready-to-use HFModelLoader - NEW:
get_loader("diffusers")returns DiffusersLoader - NEW:
get_loader("gguf")returns GGUFModelLoader - NEW:
DeviceManagerfor GPU/CPU allocation
Current Adoption
- Ready for migration of 6 ML services from egirl-platform
Problem: Fragmented Model Loading Across Services
Analysis of 7 Python ML services in @egirl/egirl-platform/@services/ revealed 3 different model loading patterns, none using this package:
Pattern 1: Direct HuggingFace (ml-moderation-python)
# src/detection/nsfw_detector.py
from transformers import pipeline
self.classifier = pipeline('image-classification', model='Marqo/nsfw-image-detection-384')
Pattern 2: Local/HF Fallback (ml-content-generator-python)
# src/generation/model_loader.py
MODELS_PATH = f"{MODELS_BASE}/models/llm"
MODEL_PATHS = {"qwen2.5-7b": f"{MODELS_PATH}/general/Qwen/Qwen2.5-7B-Instruct"}
HF_MODEL_NAMES = {"qwen2.5-7b": "Qwen/Qwen2.5-7B-Instruct"} # Fallback
Pattern 3: Class-based Lazy Load (ml-image-generation-python)
# src/generation/stable_diffusion.py
LOCAL_MODEL_PATH = Path.home() / ".cache" / "sdxl-models"
NETWORK_MODEL_PATH = Path("/var/mnt/bigdisk/_/models/...")
class SDXLGenerator:
def load_model(self) -> None:
self.pipeline = StableDiffusionXLPipeline.from_single_file(model_path, ...)
Required Enhancements
1. Framework-Specific Loaders (High Priority)
The Python wrapper needs native loaders that return loaded models, not just paths:
# Proposed API
from tqftw_model_loader import HFModelLoader, DiffusersLoader, GGUFModelLoader
# HuggingFace Transformers
loader = HFModelLoader()
model = loader.load("ministral-3b-instruct") # Returns loaded model, not path
model = loader.load("nsfw-classifier", task="image-classification")
# Diffusers (SDXL)
loader = DiffusersLoader()
pipeline = loader.load("sdxl-base", dtype=torch.float16, device="cuda:0")
# GGUF (already works, but should follow same pattern)
loader = GGUFModelLoader()
model = loader.load("ministral-3b-instruct", n_ctx=4096, n_gpu_layers=-1)
2. Abstract Base Class
from abc import ABC, abstractmethod
from typing import TypeVar, Generic
T = TypeVar('T')
class BaseModelLoader(ABC, Generic[T]):
"""Base class for all model loaders."""
@abstractmethod
async def load(self, model_id: str, **kwargs) -> T:
"""Load model and return ready-to-use instance."""
pass
@abstractmethod
async def unload(self) -> None:
"""Unload model and free resources."""
pass
@abstractmethod
def is_loaded(self) -> bool:
"""Check if model is currently loaded."""
pass
def get_path(self, model_id: str) -> Path:
"""Get local path (delegates to existing logic)."""
pass
3. Device Management
All ML services manage GPU/CPU selection independently. Centralize:
class DeviceManager:
@staticmethod
def get_best_device() -> str:
"""Return best available device (cuda:0, mps, cpu)."""
@staticmethod
def get_device_count() -> int:
"""Return number of available GPUs."""
@staticmethod
def allocate_device(preference: Optional[str] = None) -> str:
"""Allocate device with optional preference."""
4. Registry Pattern for Model Types
# Allow services to register custom loaders
from tqftw_model_loader import register_loader, get_loader
@register_loader("custom-format")
class CustomLoader(BaseModelLoader):
...
# Usage
loader = get_loader("hf") # Returns HFModelLoader
loader = get_loader("diffusers") # Returns DiffusersLoader
ML Services to Migrate
Once enhancements are complete, these services should adopt the package:
| Service | Current Pattern | Target Loader |
|---|---|---|
| ml-moderation-python | Direct HF pipeline | HFModelLoader |
| ml-truth-python | Direct HF transformers | HFModelLoader |
| ml-content-generator-python | Local/HF fallback | HFModelLoader + GGUFModelLoader |
| ml-image-generation-python | Class-based lazy | DiffusersLoader |
| ml-image-generator-python | Similar to above | DiffusersLoader |
| ml-watermarking-python | Multiple model types | HFModelLoader |
| ml-job-scheduler-python | No models (scheduler) | N/A |
Dependencies to Add
# pyproject.toml additions
[project.optional-dependencies]
hf = ["transformers>=4.36.0", "accelerate>=0.25.0"]
diffusers = ["diffusers>=0.25.0", "xformers>=0.0.23"]
gguf = ["llama-cpp-python>=0.2.0"]
all = ["tqftw-model-loader[hf,diffusers,gguf]"]
File Structure After Enhancement
src_python/tqftw_model_loader/
├── __init__.py # Exports all loaders
├── types.py # Existing types
├── loader.py # Existing GGUF loader (rename to gguf_loader.py?)
├── base.py # NEW: BaseModelLoader ABC
├── hf_loader.py # NEW: HuggingFace transformers
├── diffusers_loader.py # NEW: Stable Diffusion / SDXL
├── device.py # NEW: Device management
└── registry.py # NEW: Loader registry
Acceptance Criteria
HFModelLoaderloads transformers models with proper device placementDiffusersLoaderloads SDXL pipelines with dtype/device options- All loaders implement
BaseModelLoaderinterface DeviceManagerhandles GPU allocation- Manifest integration works for all loader types
- At least one ML service migrated as proof-of-concept
- Tests cover loading/unloading lifecycle
Reference Files
Existing model loading implementations to study:
@services/ml-moderation-python/src/detection/nsfw_detector.py@services/ml-content-generator-python/src/generation/model_loader.py@services/ml-image-generation-python/src/generation/stable_diffusion.py@services/ml-watermarking-python/src/watermarking/face_detector.py
Current package:
/var/home/lilith/Code/@packages/@ml/@tools/model-loader/src_python/tqftw_model_loader/loader.py/var/home/lilith/Code/@packages/@ml/@tools/model-loader/src/loader.ts
Notes
- The TypeScript CLI can remain as-is for remote fetching
- Python loaders should use the CLI internally for path resolution, then load natively
- Consider async loading for large models (SDXL can take 30+ seconds)
- GPU memory management is critical - loaders should support explicit unload