|
Some checks failed
Publish to PyPI / Build and Publish (push) Failing after 46s
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> |
||
|---|---|---|
| .forgejo/workflows | ||
| src/ml_trainer_lm | ||
| tests | ||
| .gitignore | ||
| pyproject.toml | ||
| README.md | ||
ml-trainer-lm — Shared LoRA/QLoRA Fine-Tuning Utilities
Canonical library for LoRA and QLoRA fine-tuning of HuggingFace causal language models.
Version: 0.1.0 Status: Stable License: Proprietary
Features
- Model Loading: Load HF causal LMs with 4-bit QLoRA or fp16 precision
- LoRA Application: Apply PEFT LoRA adapters with automatic multimodal support (vision-language models)
- Dataset Handling: Load JSONL files, format chat-style messages, batch tokenization
- Training Loop: Unified HF Trainer wrapper with gradient checkpointing and memory optimization
- DDP Support: Automatic distributed training setup via LOCAL_RANK environment variable
API
Model Loading
from ml_trainer_lm import load_model_for_training, apply_lora
# Load base model with 4-bit QLoRA (default)
model, tokenizer = load_model_for_training(config)
# Apply LoRA adapters
model = apply_lora(model, config)
print(model)
# PeftModelForCausalLM
# (base_model): AutoModelForCausalLM
# (lora_target_modules): ['q_proj', 'v_proj', ...]
Arguments (config object):
base_model(str): HF model ID (e.g.,"meta-llama/Llama-2-7b")quantize(bool): Apply 4-bit QLoRA (default:True)local_rank(int): DDP rank for device assignment (default:-1)target_modules(list[str]): LoRA target modules (default:["q_proj", "v_proj"])lora_r(int): LoRA rank (default:8)lora_alpha(int): LoRA scaling (default:16)lora_dropout(float): LoRA dropout (default:0.05)
Dataset Utilities
from ml_trainer_lm import load_jsonl, format_chat_messages, tokenize_dataset
# Load JSONL file
examples = load_jsonl(Path("data/train.jsonl"))
# Format messages (supports tokenizer.apply_chat_template or manual ChatML)
texts = format_chat_messages(examples, tokenizer)
# Tokenize dataset
dataset = tokenize_dataset(texts, tokenizer, max_length=2048)
print(dataset.keys())
# dict_keys(['input_ids', 'attention_mask', 'labels'])
Training Loop
from ml_trainer_lm import run_training
adapters_dir = run_training(
model=model,
tokenizer=tokenizer,
dataset=dataset,
config=config,
resume_from=None, # Optional checkpoint dir to resume from
)
print(f"Adapters saved to: {adapters_dir}")
# Adapters saved to: /output/lora-adapters
Configuration Example
from dataclasses import dataclass
from pathlib import Path
@dataclass
class LoraConfig:
# Model
base_model: str = "meta-llama/Llama-2-7b-hf"
quantize: bool = True
local_rank: int = -1
# LoRA
target_modules: list[str] = None
lora_r: int = 8
lora_alpha: int = 16
lora_dropout: float = 0.05
# Training
output_dir: Path = Path("/checkpoints")
epochs: int = 3
batch_size: int = 16
grad_accum: int = 1
learning_rate: float = 5e-5
warmup_ratio: float = 0.1
lr_scheduler_type: str = "linear"
optim: str = "adamw_torch_fused"
max_grad_norm: float = 1.0
logging_steps: int = 100
save_steps: int = 500
def __post_init__(self):
if self.target_modules is None:
self.target_modules = ["q_proj", "v_proj"]
Supported Models
Tested with:
- ✅ Llama 2 (7B, 13B, 70B)
- ✅ Mistral (7B, 8x7B)
- ✅ Mistral 3 (Large, with multimodal support)
- ✅ Qwen (7B, 14B)
- ✅ Code Llama
- ✅ Any HF CausalLM with standard architecture
Multimodal models (e.g., Mistral3) automatically scope LoRA to language_model layers, avoiding vision tower parameters.
Dependencies
torch>=2.0.0transformers>=4.40.0peft>=0.10.0trl>=0.7.0bitsandbytes>=0.43.0datasets>=2.14.0lilith-ml-training>=0.1.0(for progress reporting and history logging)
Testing
Run the test suite:
python -m pytest tests/ -v
Tests cover:
- Model loading with and without quantization
- LoRA adapter application (standard + multimodal)
- Dataset loading and tokenization (JSONL, chat formatting, batching)
- Training loop setup and execution
- Checkpoint resume functionality
Consumers
This library is used by:
lora-trainer— CLI for standalone LoRA fine-tuningtrain-language-model— Unified LM training (train/merge/export pipeline)assistant-trainer— Multi-stage assistant training
Related Packages
ml-training— DDP, checkpointing, curriculum learning, GPU lease utilitieslilith-ml-training— Progress reporting, history logging, emergency checkpointingtrain-image-model— Custom training loop for vision models (independent)train-text-classifier— HF Trainer subclass for text classification
Notes
QLoRA vs Full Fine-Tuning
4-bit QLoRA (default) reduces memory by ~75% while preserving training quality for most models. Use quantize=False for:
- Small models (<1B parameters)
- Precision-critical tasks
- When GPU VRAM is not constrained
Multimodal Models
The library automatically detects multimodal architectures (e.g., mistral3) and scopes LoRA targets to language_model layers, preserving the frozen vision tower. For models with custom architectures, manually adjust target_modules in config.
Distributed Training
Set LOCAL_RANK environment variable for DDP:
torchrun --nproc_per_node=4 script.py
# Automatically sets LOCAL_RANK=0,1,2,3
The library uses this to place models on the correct GPU and configure gradient accumulation.
Version History
0.1.0 (March 2026)
- Initial release
- Extracted from
lora-trainerto eliminate code duplication - Fixed
torch_dtypekwarg issue in model loading - Added comprehensive unit test suite (30 tests)
- Added multimodal model support