06 — Architecture
Three pipelines, one system
The SEO engine orchestrates three independent systems, each valuable on its own. The Content Pipeline generates and validates pages. The Verification Pipeline ensures source-consistency through semantic RAG. The Image Pipeline generates contextual imagery via GPU-accelerated diffusion models. Together they produce complete, verified, illustrated pages at scale.
CITY DATABASE
GeoNames + OSM
ATTRIBUTE SCHEMA
Client-defined filters
CLIENT DOCS
700+ source files for RAG
SEO CONTENT PIPELINE
Route Generator
locales x categories x cities x filters
Content Generator
Self-hosted LLM (no API calls)
Source Verification + Schema Generation
RAG source-checking (economics, claims, competitors) → Schema.org JSON-LD structured data
Translation Engine
40+ languages
Static Builder
Static Build → CDN
SOURCE VERIFICATION PIPELINE
Embedding Service
Embedding Model
Vector Index
Fast Semantic Search
Semantic Source Validator
Pattern matching + RAG retrieval + claim verification
IMAGE GENERATION PIPELINE
(GPU-accelerated)
Prompt Builder
LLM → Diffusion prompt craft
Image Diffusion
GPU-powered generation
Processing
Resize, optimize, families
API Orchestrator
API service layer
STATIC HTML + IMAGES
CDN-ready pages with structured data, sitemaps, contextual imagery
KEY ARCHITECTURAL DECISIONS
✔ Self-hosted LLM = near-zero marginal content cost at scale
✔ Self-hosted diffusion = near-zero marginal image cost (GPU amortized)
✔ RAG source verification = no hallucinated claims in output
✔ Static output = CDN-friendly, sub-second loads, max Lighthouse
✔ No cloud lock-in = runs on any Linux box with a GPU, consumer or server
✔ Each pipeline independently valuable and independently licensable
Each pipeline is independently valuable
⚙
SEO Content Pipeline
The orchestrator. Coordinates the other two pipelines and produces CDN-ready static HTML.
Route generation (locales × cities × categories × attributes)
Brand-aware prompt building with voice presets
Self-hosted LLM content creation
Source verification & schema generation
40+ language translation
Static build to CDN
✔
Source Verification Pipeline
Every LLM-generated claim checked against client source documents via semantic RAG. Ensures the pipeline says what the client says.
Ingests client source docs (700+ files in current deployment)
Generates semantic embeddings for source matching
High-performance vector index for fast retrieval
Semantic matching validates economics, claims, competitors
Unverifiable claims flagged for human review
Independently licensable
🎨
Image Generation Pipeline
GPU-accelerated diffusion generates 9 image families per page from a single seed. Not stock photos.
9 families: square, hero, portrait, OG, compact, tall, ultrawide, sidebar, header
Same seed = visual cohesion across all layouts
Different aspect ratios = optimized composition per context
LLM-crafted category-aware, city-atmospheric prompts
Art-directed per viewport and device
Independently licensable