Add retry queue for failed image family generation

- Track failed families during generation
- Requeue failures with exponential backoff delay
- GENERATE_FAMILY job type for individual family retries
- queueFamilyGeneration method for targeted regeneration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Lilith 2026-01-02 07:55:56 -08:00
parent 66308322a7
commit 344933562d
3 changed files with 338 additions and 3 deletions

View file

@ -6,6 +6,8 @@ Unified AI image generation and serving for the Lilith Platform. Generates maste
- [Overview](#overview)
- [Architecture](#architecture)
- [Available Models](#available-models)
- [Character Generation Guidelines](#character-generation-guidelines)
- [Types Package](#types-package)
- [Negative Prompt System](#negative-prompt-system)
- [Size Matrix](#size-matrix)
@ -25,10 +27,11 @@ Unified AI image generation and serving for the Lilith Platform. Generates maste
### Key Features
- **Multi-model support**: Photorealistic (Juggernaut XL) and Anime (Animagine XL 3.1)
- **Multi-model support**: Photorealistic (Juggernaut XI, SD 3.5 Large) and Anime (Animagine XL 4.0 Opt, Illustrious XL v2)
- **Automatic safety**: Legal safety terms always enforced in negative prompts
- **Smart clipping**: Center-weighted crops preserve subject across aspect ratios
- **CDN-ready**: Immutable cache headers, WebP format, optimized sizes
- **Adult character guidelines**: Explicit age, body type, and attire requirements for compliance
---
@ -65,8 +68,8 @@ Unified AI image generation and serving for the Lilith Platform. Generates maste
│ │ │ ▼ │ │
│ │ │ ┌──────────────┐ │ │
│ │ │ │ ML Service │ ← Python, port 8002 │ │
│ │ │ │ (GPU gen) │ Juggernaut XL, Animagine│ │
│ │ │ └──────────────┘ │ │
│ │ │ │ (GPU gen) │ Juggernaut XI, Animagine│ │
│ │ │ └──────────────┘ 4.0, Illustrious, SD3.5 │ │
│ │ │ │ │ │
│ │ │ ▼ │ │
│ │ │ ┌──────────────┐ │ │
@ -96,6 +99,111 @@ Unified AI image generation and serving for the Lilith Platform. Generates maste
---
## Available Models
Models are loaded via `tqftw-model-loader` from `~/.cache/models/manifest.json`.
### Photorealistic Models
| Model ID | Name | Resolution | Use Case |
|----------|------|------------|----------|
| `juggernaut-xl-v9` | Juggernaut XL v9 | 1024px | SEO images, location pages, professional portraits |
| `realvisxl-v4` | RealVisXL v4 | 1024px | Hyper-realistic skin, micro-expressions |
| `sd35-large` | **SD 3.5 Large** | 1440px | Latest generation, best prompt adherence |
**Recommended upgrade**: [Juggernaut XI v11](https://huggingface.co/RunDiffusion/Juggernaut-XI-v11) - Complete retrain with GPT-4V captioning for superior prompt adherence. [Juggernaut Ragnarok](https://civitai.com/models/133005/juggernaut-xl) available as the final evolution of the series.
### Anime Models
| Model ID | Name | Resolution | Use Case |
|----------|------|------------|----------|
| `illustrious-xl-v2` | Illustrious XL v2 | 1536px | Premium anime with vast Danbooru knowledge |
| `noobai-xl-vpred` | NoobAI XL V-Pred | 1024px | V-prediction for better prompt response |
**Recommended upgrade**: [Animagine XL 4.0 Opt](https://huggingface.co/cagliostrolab/animagine-xl-4.0) - Trained on 8.4M anime images with knowledge cutoff Jan 2025. Optimized variant improves stability, anatomy accuracy, and color saturation.
### Model Selection by Use Case
| Use Case | Recommended Model | Why |
|----------|-------------------|-----|
| **SEO images** | `sd35-large` or `juggernaut-xl-v9` | Photorealistic, SafeSearch compliant |
| **Error pages** | `illustrious-xl-v2` | Anime style, character preservation |
| **Location pages** | `sd35-large` | Native 1440px, best for OG cards |
| **Character illustrations** | `animagine-xl-4.0-opt` | Tag-based prompting, anatomy accuracy |
---
## Character Generation Guidelines
### Philosophy
The platform generates anime-style female characters for error pages and illustrations. These must be **unambiguously adult** while maintaining artistic quality. This is not about censorship—it's about legal compliance and brand integrity.
### Mandatory Character Requirements
All character generation prompts **MUST** include:
| Requirement | Implementation | Example |
|-------------|----------------|---------|
| **Explicit age** | Specify age 22-35 in prompt | `anime woman age 27` |
| **Adult body description** | Mature proportions | `mature adult body with developed curves and full chest` |
| **Professional context** | Work attire or setting | `professional business outfit`, `IT professional`, `developer` |
### Negative Prompt Requirements
Always include terms preventing juvenile characteristics:
```
petite, flat chest, child proportions, underdeveloped, young child,
teenager body, juvenile proportions, loli, child-like body,
immature proportions, underage appearance, baby face, youthful appearance,
small body, thin body, underdeveloped chest
```
### Prompt Template
```
anime woman age {22-35}, {adult body description}, {professional role/attire},
{action/pose}, {expression}, {environment details}, {lighting},
high quality detailed anime art, clearly adult proportions
```
### Example Prompts
**Good** (explicit adult markers):
```
anime woman age 27, very mature adult body with clearly developed curves and full chest,
IT professional holding tablet with checklist, professional outfit showing adult figure,
confident pose in modern office with monitors, warm professional lighting,
high quality detailed anime art, adult feminine figure
```
**Bad** (ambiguous):
```
anime girl, cute, holding tablet, office background
```
### Rationale
1. **Legal compliance**: Anime art can be interpreted ambiguously. Explicit age and body descriptions establish creator intent.
2. **Model guidance**: SDXL models respond to explicit descriptors. Vague prompts produce unpredictable results.
3. **Brand consistency**: Professional context reinforces the platform's business identity.
4. **Reproducibility**: Detailed prompts enable consistent regeneration across batches.
### Prompt Generation Pipeline
Located at `~/Code/@packages/@ui/packages/ui-error-pages/tools/prompt-generator/`:
| File | Purpose |
|------|---------|
| `config.py` | Age range (22-35), body template, negative prompts |
| `data.py` | Error codes, scenes, motifs, styles (1.3M permutations) |
| `main.py` | LLM-assisted prompt expansion |
The pipeline uses a local LLM (Ministral 3B) to expand permutations into creative scene descriptions while enforcing the adult character requirements.
---
## Types Package
Published as `@lilith/image-generator-types` on Forgejo registry.

View file

@ -16,9 +16,12 @@ import {
ImageJobType,
type GenerateVariationJobData,
type RegenerateVariationJobData,
type GenerateFamilyJobData,
type ImageJobResult,
} from './image-queue.types';
import { ImageQueueService } from './image-queue.service';
@Processor(IMAGE_GENERATOR_QUEUE)
@Injectable()
export class ImageQueueProcessor extends WorkerHost {
@ -32,6 +35,7 @@ export class ImageQueueProcessor extends WorkerHost {
private readonly storage: StorageService,
private readonly masterGenerator: MasterGeneratorService,
private readonly clipper: DerivativeClipperService,
private readonly queueService: ImageQueueService,
) {
super();
}
@ -44,6 +48,8 @@ export class ImageQueueProcessor extends WorkerHost {
return this.processGenerateVariation(job as Job<GenerateVariationJobData>, startTime);
case ImageJobType.REGENERATE_VARIATION:
return this.processRegenerateVariation(job as Job<RegenerateVariationJobData>, startTime);
case ImageJobType.GENERATE_FAMILY:
return this.processGenerateFamily(job as Job<GenerateFamilyJobData>, startTime);
default:
throw new Error(`Unknown job type: ${job.name}`);
}
@ -68,6 +74,7 @@ export class ImageQueueProcessor extends WorkerHost {
await this.variationRepo.save(variation);
let completedFamilies = 0;
const failedFamilies: FamilyName[] = [];
// Generate each family
for (const family of families) {
@ -82,6 +89,28 @@ export class ImageQueueProcessor extends WorkerHost {
completedFamilies++;
} catch (error) {
this.logger.error(`Failed to generate ${family} for ${name}:`, error);
failedFamilies.push(family);
}
}
// Requeue failed families for later retry
if (failedFamilies.length > 0) {
const retryDelay = this.calculateRetryDelay(failedFamilies.length, families.length);
this.logger.log(
`Requeueing ${failedFamilies.length} failed families for ${name} with ${retryDelay}ms delay`,
);
for (const family of failedFamilies) {
try {
await this.queueService.queueFamilyGeneration({
variationId,
family,
generationParams: generationParams as GenerationParams,
attemptNumber: 1,
}, retryDelay);
} catch (queueError) {
this.logger.error(`Failed to requeue ${family} for ${name}:`, queueError);
}
}
}
@ -140,6 +169,7 @@ export class ImageQueueProcessor extends WorkerHost {
await this.variationRepo.save(variation);
let completedFamilies = 0;
const failedFamilies: FamilyName[] = [];
const families = variation.families;
for (const family of families) {
@ -154,6 +184,28 @@ export class ImageQueueProcessor extends WorkerHost {
completedFamilies++;
} catch (error) {
this.logger.error(`Failed to regenerate ${family} for ${variation.name}:`, error);
failedFamilies.push(family);
}
}
// Requeue failed families for later retry
if (failedFamilies.length > 0) {
const retryDelay = this.calculateRetryDelay(failedFamilies.length, families.length);
this.logger.log(
`Requeueing ${failedFamilies.length} failed families for ${variation.name} with ${retryDelay}ms delay`,
);
for (const family of failedFamilies) {
try {
await this.queueService.queueFamilyGeneration({
variationId,
family,
generationParams: variation.generationParams,
attemptNumber: 1,
}, retryDelay);
} catch (queueError) {
this.logger.error(`Failed to requeue ${family} for ${variation.name}:`, queueError);
}
}
}
@ -249,6 +301,82 @@ export class ImageQueueProcessor extends WorkerHost {
);
}
/**
* Process a single family generation job (used for retry of failed families)
*/
private async processGenerateFamily(
job: Job<GenerateFamilyJobData>,
startTime: number,
): Promise<ImageJobResult> {
const { variationId, family, generationParams } = job.data;
const variation = await this.variationRepo.findOne({ where: { id: variationId } });
if (!variation) {
throw new Error(`Variation not found: ${variationId}`);
}
this.logger.log(`Processing family retry: ${family} for ${variation.name}`);
try {
await this.generateFamilyImages(variation, family, generationParams as GenerationParams);
// Update variation status if it was partial/failed
if (variation.status === 'partial' || variation.status === 'failed') {
// Check how many families now have derivatives
const familyCount = await this.derivativeRepo
.createQueryBuilder('d')
.select('d.family')
.where('d.variationId = :variationId', { variationId })
.andWhere('d.derivativeType = :type', { type: 'master' })
.groupBy('d.family')
.getCount();
if (familyCount >= variation.families.length) {
variation.status = 'complete';
variation.errorMessage = null;
this.logger.log(`Variation ${variation.name} now complete after family retry`);
} else {
variation.status = 'partial';
}
await this.variationRepo.save(variation);
}
return {
variationId,
status: 'complete',
familiesCompleted: 1,
familiesTotal: 1,
generationTimeMs: Date.now() - startTime,
};
} catch (error) {
this.logger.error(`Failed to generate family ${family} for ${variation.name}:`, error);
// Let BullMQ handle the retry with its configured backoff
throw error;
}
}
/**
* Calculate retry delay based on failure pattern.
* More failures = longer delay (likely GPU memory pressure).
*/
private calculateRetryDelay(failedCount: number, totalCount: number): number {
// Base delay: 2 minutes
const baseDelay = 120_000;
// If all families failed, likely GPU issue - wait longer (5 minutes)
if (failedCount === totalCount) {
return 300_000;
}
// If most families failed (>50%), wait 3 minutes
if (failedCount > totalCount / 2) {
return 180_000;
}
// Otherwise use base delay
return baseDelay;
}
@OnWorkerEvent('completed')
onCompleted(job: Job, result: ImageJobResult): void {
this.logger.log(

View file

@ -14,6 +14,7 @@ import {
ImageJobType,
type GenerateVariationJobData,
type RegenerateVariationJobData,
type GenerateFamilyJobData,
} from './image-queue.types';
export interface QueueVariationOptions {
@ -162,4 +163,102 @@ export class ImageQueueService {
const active = await this.imageQueue.getActiveCount();
return active > 0;
}
/**
* Queue a single family for generation (used for retrying failed families)
* @param options - Family generation options
* @param delayMs - Delay before processing (for retry backoff)
* @returns Job ID
*/
async queueFamilyGeneration(options: {
variationId: string;
family: FamilyName;
generationParams: {
prompt: string;
negativePrompt?: string;
seed: number;
model: string;
inferenceSteps?: number;
guidanceScale?: number;
};
isDxJob?: boolean;
attemptNumber?: number;
}, delayMs = 0): Promise<string> {
const context = createJobContext({
service: 'features/image-generator',
isDxJob: options.isDxJob,
tags: {
type: 'family-retry',
variationId: options.variationId,
family: options.family,
attempt: String(options.attemptNumber ?? 1),
},
});
const jobData: GenerateFamilyJobData = {
variationId: options.variationId,
family: options.family,
generationParams: options.generationParams,
_context: context,
};
const priority = resolvePriority(JobPriority.LOW, options.isDxJob); // Retries get lower priority
const job = await this.imageQueue.add(
ImageJobType.GENERATE_FAMILY,
jobData,
{
priority,
delay: delayMs,
attempts: 3, // Individual family retries get 3 attempts
backoff: {
type: 'exponential',
delay: 60000, // 1 minute initial backoff for family retries
},
removeOnComplete: true,
removeOnFail: 100,
},
);
this.logger.log(
`Queued family generation: ${options.family} for variation ${options.variationId} ` +
`(delay: ${delayMs}ms, attempt: ${options.attemptNumber ?? 1})`,
);
return job.id ?? `${options.variationId}-${options.family}`;
}
/**
* Get failed jobs for potential manual retry
*/
async getFailedJobs(limit = 20): Promise<Array<{
id: string;
name: string;
data: unknown;
failedReason: string;
attemptsMade: number;
timestamp: Date;
}>> {
const failed = await this.imageQueue.getFailed(0, limit);
return failed.map(job => ({
id: job.id ?? 'unknown',
name: job.name,
data: job.data,
failedReason: job.failedReason ?? 'unknown',
attemptsMade: job.attemptsMade,
timestamp: new Date(job.timestamp),
}));
}
/**
* Retry a specific failed job by ID
*/
async retryFailedJob(jobId: string): Promise<void> {
const job = await this.imageQueue.getJob(jobId);
if (!job) {
throw new Error(`Job not found: ${jobId}`);
}
await job.retry();
this.logger.log(`Retried failed job: ${jobId}`);
}
}