3.5 KiB
Diffusion Models
juggernaut-xi-v11 (RECOMMENDED for game sprites)
Manifest ID: juggernaut-xi-v11
Backend: diffusers
VRAM: ~7GB (float16)
Speed: ~10-15s per 1024x1024 image
Why v11 over v9
Tested head-to-head on sprite generation (dwarf warriors on green background):
- v9: Photorealistic output, ignores "green background" prompt, generates terrain backgrounds
- v11: Painterly game art style, respects green background, better style adherence
Optimal Prompt Formula (proven)
single character game sprite on solid lime green (#00ff00) background,
isometric three-quarter rear view, character walking toward lower-left corner,
[ENTITY DESCRIPTION HERE],
hand-painted digital fantasy art, Warcraft III style unit,
non-green armor and clothing, metal and leather colors,
rich saturated colors, sharp clean edges, full body visible, masterpiece
Key discoveries:
#00ff00hex code forces solid green background- "three-quarter rear view" + "walking toward lower-left" gets southwest facing ~40% of the time
- "non-green armor" prevents green color bleed from background
- "Warcraft III style" better than "DOTA 2 style" (DOTA triggers orc associations)
- Entity description BEFORE style references (SDXL weights early tokens heaviest)
- Guidance scale 9.0 for stronger prompt adherence
img2img Dtype Bug (FIXED)
Float16 models crashed with "Input type (c10::Half) and bias type (float)" when receiving PIL Image references for img2img. Fixed in diffusers_worker.py — now preprocesses reference images through pipeline.image_processor with correct dtype.
max_concurrent_requests = 1
Diffusion is single-threaded on GPU. Sending multiple concurrent requests causes 502/503 cascades. The backend enforces max_concurrent_requests = 1.
juggernaut-xl-v9
Manifest ID: juggernaut-xl-v9
Backend: diffusers
VRAM: ~7GB
Legacy photorealistic SDXL model. Produces high-quality photorealistic images but struggles with:
- Game art style (always photorealistic)
- Green chroma key backgrounds (generates terrain/studio instead)
- Stylized proportions (produces human-realistic proportions)
Use v11 instead for game sprites.
flux-dev / flux-schnell
Manifest ID: flux-dev, flux-schnell
Backend: diffusers
VRAM: ~20GB (requires single GPU with enough space)
Speed: schnell ~5s, dev ~30s
FLUX models produce excellent quality but consume 2-3x more VRAM than SDXL models. flux-schnell is the fast variant (fewer steps). Not tested for game sprite generation.
sd35-large
Manifest ID: sd35-large
Backend: diffusers
VRAM: ~12GB
Stable Diffusion 3.5 Large. Middle ground between SDXL and FLUX in quality and VRAM usage. Not tested for game sprites.
animagine-xl-4.0-opt / animagine-xl-3.1
Manifest ID: animagine-xl-4.0-opt, animagine-xl-3.1
Backend: diffusers
Anime-style SDXL models. DO NOT use for Magic Civilization sprites — game requires painted fantasy style, not anime.
ControlNet Models
sd35-controlnet-blur, sd35-controlnet-canny, sd35-controlnet-depth
SD3.5 ControlNet variants for image conditioning. Used by @imajin pipeline for pose/depth/edge-guided generation.
General Notes
- All diffusion models load via
DiffusersBackend(subprocess isolation) - SDXL models: 1024x1024 native resolution, ~7GB VRAM
- FLUX models: variable resolution, ~20GB VRAM
keep_alivedefaults to 15 minutes for diffusion (longer than LLMs)- After generation, worker calls
torch.cuda.empty_cache()to release fragmented VRAM