model-boss/docs/model-encyclopedia/diffusion.md
2026-03-27 13:09:15 -07:00

3.5 KiB

Diffusion Models

Manifest ID: juggernaut-xi-v11 Backend: diffusers VRAM: ~7GB (float16) Speed: ~10-15s per 1024x1024 image

Why v11 over v9

Tested head-to-head on sprite generation (dwarf warriors on green background):

  • v9: Photorealistic output, ignores "green background" prompt, generates terrain backgrounds
  • v11: Painterly game art style, respects green background, better style adherence

Optimal Prompt Formula (proven)

single character game sprite on solid lime green (#00ff00) background,
isometric three-quarter rear view, character walking toward lower-left corner,
[ENTITY DESCRIPTION HERE],
hand-painted digital fantasy art, Warcraft III style unit,
non-green armor and clothing, metal and leather colors,
rich saturated colors, sharp clean edges, full body visible, masterpiece

Key discoveries:

  • #00ff00 hex code forces solid green background
  • "three-quarter rear view" + "walking toward lower-left" gets southwest facing ~40% of the time
  • "non-green armor" prevents green color bleed from background
  • "Warcraft III style" better than "DOTA 2 style" (DOTA triggers orc associations)
  • Entity description BEFORE style references (SDXL weights early tokens heaviest)
  • Guidance scale 9.0 for stronger prompt adherence

img2img Dtype Bug (FIXED)

Float16 models crashed with "Input type (c10::Half) and bias type (float)" when receiving PIL Image references for img2img. Fixed in diffusers_worker.py — now preprocesses reference images through pipeline.image_processor with correct dtype.

max_concurrent_requests = 1

Diffusion is single-threaded on GPU. Sending multiple concurrent requests causes 502/503 cascades. The backend enforces max_concurrent_requests = 1.


juggernaut-xl-v9

Manifest ID: juggernaut-xl-v9 Backend: diffusers VRAM: ~7GB

Legacy photorealistic SDXL model. Produces high-quality photorealistic images but struggles with:

  • Game art style (always photorealistic)
  • Green chroma key backgrounds (generates terrain/studio instead)
  • Stylized proportions (produces human-realistic proportions)

Use v11 instead for game sprites.


flux-dev / flux-schnell

Manifest ID: flux-dev, flux-schnell Backend: diffusers VRAM: ~20GB (requires single GPU with enough space) Speed: schnell ~5s, dev ~30s

FLUX models produce excellent quality but consume 2-3x more VRAM than SDXL models. flux-schnell is the fast variant (fewer steps). Not tested for game sprite generation.


sd35-large

Manifest ID: sd35-large Backend: diffusers VRAM: ~12GB

Stable Diffusion 3.5 Large. Middle ground between SDXL and FLUX in quality and VRAM usage. Not tested for game sprites.


animagine-xl-4.0-opt / animagine-xl-3.1

Manifest ID: animagine-xl-4.0-opt, animagine-xl-3.1 Backend: diffusers

Anime-style SDXL models. DO NOT use for Magic Civilization sprites — game requires painted fantasy style, not anime.


ControlNet Models

sd35-controlnet-blur, sd35-controlnet-canny, sd35-controlnet-depth

SD3.5 ControlNet variants for image conditioning. Used by @imajin pipeline for pose/depth/edge-guided generation.


General Notes

  • All diffusion models load via DiffusersBackend (subprocess isolation)
  • SDXL models: 1024x1024 native resolution, ~7GB VRAM
  • FLUX models: variable resolution, ~20GB VRAM
  • keep_alive defaults to 15 minutes for diffusion (longer than LLMs)
  • After generation, worker calls torch.cuda.empty_cache() to release fragmented VRAM