Claude Code 82814ed90e docs(model-encyclopedia): 📝 Improve model encyclopedia documentation with expanded definitions, usage examples, and structured entries

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-03-27 13:09:15 -07:00

3.5 KiB

Raw Blame History

Diffusion Models

juggernaut-xi-v11 (RECOMMENDED for game sprites)

Manifest ID: juggernaut-xi-v11 Backend: diffusers VRAM: ~7GB (float16) Speed: ~10-15s per 1024x1024 image

Why v11 over v9

Tested head-to-head on sprite generation (dwarf warriors on green background):

v9: Photorealistic output, ignores "green background" prompt, generates terrain backgrounds
v11: Painterly game art style, respects green background, better style adherence

Optimal Prompt Formula (proven)

single character game sprite on solid lime green (#00ff00) background,
isometric three-quarter rear view, character walking toward lower-left corner,
[ENTITY DESCRIPTION HERE],
hand-painted digital fantasy art, Warcraft III style unit,
non-green armor and clothing, metal and leather colors,
rich saturated colors, sharp clean edges, full body visible, masterpiece

Key discoveries:

#00ff00 hex code forces solid green background
"three-quarter rear view" + "walking toward lower-left" gets southwest facing ~40% of the time
"non-green armor" prevents green color bleed from background
"Warcraft III style" better than "DOTA 2 style" (DOTA triggers orc associations)
Entity description BEFORE style references (SDXL weights early tokens heaviest)
Guidance scale 9.0 for stronger prompt adherence

img2img Dtype Bug (FIXED)

Float16 models crashed with "Input type (c10::Half) and bias type (float)" when receiving PIL Image references for img2img. Fixed in diffusers_worker.py — now preprocesses reference images through pipeline.image_processor with correct dtype.

max_concurrent_requests = 1

Diffusion is single-threaded on GPU. Sending multiple concurrent requests causes 502/503 cascades. The backend enforces max_concurrent_requests = 1.

juggernaut-xl-v9

Manifest ID: juggernaut-xl-v9 Backend: diffusers VRAM: ~7GB

Legacy photorealistic SDXL model. Produces high-quality photorealistic images but struggles with:

Game art style (always photorealistic)
Green chroma key backgrounds (generates terrain/studio instead)
Stylized proportions (produces human-realistic proportions)

Use v11 instead for game sprites.

flux-dev / flux-schnell

Manifest ID: flux-dev, flux-schnell Backend: diffusers VRAM: ~20GB (requires single GPU with enough space) Speed: schnell ~5s, dev ~30s

FLUX models produce excellent quality but consume 2-3x more VRAM than SDXL models. flux-schnell is the fast variant (fewer steps). Not tested for game sprite generation.

sd35-large

Manifest ID: sd35-large Backend: diffusers VRAM: ~12GB

Stable Diffusion 3.5 Large. Middle ground between SDXL and FLUX in quality and VRAM usage. Not tested for game sprites.

animagine-xl-4.0-opt / animagine-xl-3.1

Manifest ID: animagine-xl-4.0-opt, animagine-xl-3.1 Backend: diffusers

Anime-style SDXL models. DO NOT use for Magic Civilization sprites — game requires painted fantasy style, not anime.

ControlNet Models

sd35-controlnet-blur, sd35-controlnet-canny, sd35-controlnet-depth

SD3.5 ControlNet variants for image conditioning. Used by @imajin pipeline for pose/depth/edge-guided generation.

General Notes

All diffusion models load via DiffusersBackend (subprocess isolation)
SDXL models: 1024x1024 native resolution, ~7GB VRAM
FLUX models: variable resolution, ~20GB VRAM
keep_alive defaults to 15 minutes for diffusion (longer than LLMs)
After generation, worker calls torch.cuda.empty_cache() to release fragmented VRAM

3.5 KiB Raw Blame History