No description
Find a file
Claude Code e986d16d45
Some checks failed
Publish to PyPI / Build and Publish (push) Failing after 46s
deps-upgrade(deps): ⬆️ Update dependencies to latest compatible versions in pyproject.toml
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-25 03:51:41 -07:00
.forgejo/workflows ci(pypi-publish): 👷 Update PyPI publishing workflow for development with new security/compliance steps 2026-03-17 19:00:17 -07:00
src/ml_trainer_lm feat(ml-trainer): Implement ML language model trainer with dataset loading, model architecture, training loop, tests, and PyPI publishing pipeline 2026-03-17 17:32:47 -07:00
tests feat(ml-trainer): Implement ML language model trainer with dataset loading, model architecture, training loop, tests, and PyPI publishing pipeline 2026-03-17 17:32:47 -07:00
.gitignore chore(gitignore): 🔧 Update patterns in .gitignore to exclude build artifacts, secrets, and logs 2026-03-17 17:32:47 -07:00
pyproject.toml deps-upgrade(deps): ⬆️ Update dependencies to latest compatible versions in pyproject.toml 2026-03-25 03:51:41 -07:00
README.md feat(ml-trainer): Implement ML language model trainer with dataset loading, model architecture, training loop, tests, and PyPI publishing pipeline 2026-03-17 17:32:47 -07:00

ml-trainer-lm — Shared LoRA/QLoRA Fine-Tuning Utilities

Canonical library for LoRA and QLoRA fine-tuning of HuggingFace causal language models.

Version: 0.1.0 Status: Stable License: Proprietary

Features

  • Model Loading: Load HF causal LMs with 4-bit QLoRA or fp16 precision
  • LoRA Application: Apply PEFT LoRA adapters with automatic multimodal support (vision-language models)
  • Dataset Handling: Load JSONL files, format chat-style messages, batch tokenization
  • Training Loop: Unified HF Trainer wrapper with gradient checkpointing and memory optimization
  • DDP Support: Automatic distributed training setup via LOCAL_RANK environment variable

API

Model Loading

from ml_trainer_lm import load_model_for_training, apply_lora

# Load base model with 4-bit QLoRA (default)
model, tokenizer = load_model_for_training(config)

# Apply LoRA adapters
model = apply_lora(model, config)

print(model)
# PeftModelForCausalLM
#   (base_model): AutoModelForCausalLM
#   (lora_target_modules): ['q_proj', 'v_proj', ...]

Arguments (config object):

  • base_model (str): HF model ID (e.g., "meta-llama/Llama-2-7b")
  • quantize (bool): Apply 4-bit QLoRA (default: True)
  • local_rank (int): DDP rank for device assignment (default: -1)
  • target_modules (list[str]): LoRA target modules (default: ["q_proj", "v_proj"])
  • lora_r (int): LoRA rank (default: 8)
  • lora_alpha (int): LoRA scaling (default: 16)
  • lora_dropout (float): LoRA dropout (default: 0.05)

Dataset Utilities

from ml_trainer_lm import load_jsonl, format_chat_messages, tokenize_dataset

# Load JSONL file
examples = load_jsonl(Path("data/train.jsonl"))

# Format messages (supports tokenizer.apply_chat_template or manual ChatML)
texts = format_chat_messages(examples, tokenizer)

# Tokenize dataset
dataset = tokenize_dataset(texts, tokenizer, max_length=2048)

print(dataset.keys())
# dict_keys(['input_ids', 'attention_mask', 'labels'])

Training Loop

from ml_trainer_lm import run_training

adapters_dir = run_training(
    model=model,
    tokenizer=tokenizer,
    dataset=dataset,
    config=config,
    resume_from=None,  # Optional checkpoint dir to resume from
)

print(f"Adapters saved to: {adapters_dir}")
# Adapters saved to: /output/lora-adapters

Configuration Example

from dataclasses import dataclass
from pathlib import Path

@dataclass
class LoraConfig:
    # Model
    base_model: str = "meta-llama/Llama-2-7b-hf"
    quantize: bool = True
    local_rank: int = -1

    # LoRA
    target_modules: list[str] = None
    lora_r: int = 8
    lora_alpha: int = 16
    lora_dropout: float = 0.05

    # Training
    output_dir: Path = Path("/checkpoints")
    epochs: int = 3
    batch_size: int = 16
    grad_accum: int = 1
    learning_rate: float = 5e-5
    warmup_ratio: float = 0.1
    lr_scheduler_type: str = "linear"
    optim: str = "adamw_torch_fused"
    max_grad_norm: float = 1.0
    logging_steps: int = 100
    save_steps: int = 500

    def __post_init__(self):
        if self.target_modules is None:
            self.target_modules = ["q_proj", "v_proj"]

Supported Models

Tested with:

  • Llama 2 (7B, 13B, 70B)
  • Mistral (7B, 8x7B)
  • Mistral 3 (Large, with multimodal support)
  • Qwen (7B, 14B)
  • Code Llama
  • Any HF CausalLM with standard architecture

Multimodal models (e.g., Mistral3) automatically scope LoRA to language_model layers, avoiding vision tower parameters.

Dependencies

  • torch>=2.0.0
  • transformers>=4.40.0
  • peft>=0.10.0
  • trl>=0.7.0
  • bitsandbytes>=0.43.0
  • datasets>=2.14.0
  • lilith-ml-training>=0.1.0 (for progress reporting and history logging)

Testing

Run the test suite:

python -m pytest tests/ -v

Tests cover:

  • Model loading with and without quantization
  • LoRA adapter application (standard + multimodal)
  • Dataset loading and tokenization (JSONL, chat formatting, batching)
  • Training loop setup and execution
  • Checkpoint resume functionality

Consumers

This library is used by:

  • lora-trainer — CLI for standalone LoRA fine-tuning
  • train-language-model — Unified LM training (train/merge/export pipeline)
  • assistant-trainer — Multi-stage assistant training
  • ml-training — DDP, checkpointing, curriculum learning, GPU lease utilities
  • lilith-ml-training — Progress reporting, history logging, emergency checkpointing
  • train-image-model — Custom training loop for vision models (independent)
  • train-text-classifier — HF Trainer subclass for text classification

Notes

QLoRA vs Full Fine-Tuning

4-bit QLoRA (default) reduces memory by ~75% while preserving training quality for most models. Use quantize=False for:

  • Small models (<1B parameters)
  • Precision-critical tasks
  • When GPU VRAM is not constrained

Multimodal Models

The library automatically detects multimodal architectures (e.g., mistral3) and scopes LoRA targets to language_model layers, preserving the frozen vision tower. For models with custom architectures, manually adjust target_modules in config.

Distributed Training

Set LOCAL_RANK environment variable for DDP:

torchrun --nproc_per_node=4 script.py
# Automatically sets LOCAL_RANK=0,1,2,3

The library uses this to place models on the correct GPU and configure gradient accumulation.

Version History

0.1.0 (March 2026)

  • Initial release
  • Extracted from lora-trainer to eliminate code duplication
  • Fixed torch_dtype kwarg issue in model loading
  • Added comprehensive unit test suite (30 tests)
  • Added multimodal model support