platform-codebase/features/truth-validation
Lilith 1bdc7a4d1d 🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes
This commit captures unstaged changes from the import path standardization
refactor (b65c5cc2). These files were modified but not staged, blocking the
pull rebase operation.

Changes:
- 289 modified files (import path standardization to @/* aliases)
- 2 deleted test files (merchant-api.e2e.spec.ts, rate-limiting.e2e.spec.ts)

Resolution: Stage all changes and commit before rebasing.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-01-11 00:57:02 -08:00
..
.cache 🔧 Update locale validation cache 2026-01-02 18:10:22 -08:00
client/typescript 🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes 2026-01-11 00:57:02 -08:00
frontend-admin 🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes 2026-01-11 00:57:02 -08:00
ml-service feat(features/seo/ml-service/python/lilith_seo_service/config.py): update SEO service configuration with LLM backend and truth service integration 2026-01-09 23:23:06 -08:00
scripts 🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes 2026-01-11 00:57:02 -08:00
semantic-service 🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes 2026-01-11 00:57:02 -08:00
shared fix(frontend): update legal review page logic for structured suggestions 2026-01-04 20:06:47 -08:00
docker-compose.yml 🔧 Update docker-compose configs and SEO frontend 2026-01-02 00:23:02 -08:00
MIGRATION.md
package.json ♻️ Refactor truth-validation scripts and LLM corrector 2026-01-01 04:17:48 -08:00
README.md fix(codebase): 🐛 resolve linting issues in README.md 2026-01-10 00:48:10 -08:00
services.yaml fix(main): 🐛 resolve missing environment variables in configuration files 2026-01-09 23:23:05 -08:00

Truth Validation Feature

Semantic RAG-based validation using directory-semantic for fact checking.

Purpose

Validate content claims against the authoritative ./docs directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      SEMANTIC VALIDATION                        │
├─────────────────────────────────────────────────────────────────┤
│  1. Content received at POST /api/truth/validate                │
│  2. Semantic search against indexed ./docs                      │
│  3. Score-based validation:                                     │
│     - score > 0.75: VALID (high confidence match)               │
│     - score 0.5-0.75: REVIEW (uncertain, return context)        │
│     - score < 0.5: NO MATCH (no relevant docs found)            │
│  4. Return matched docs + confidence scores                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   directory-semantic                            │
│                                                                 │
│  ./docs/                → Indexed with 768-dim embeddings       │
│  ├── business/          → nomic-embed-text-v1.5 model           │
│  ├── product/           → Redis HNSW vector store               │
│  ├── research/          → Semantic search via cosine similarity │
│  └── technical/                                                 │
└─────────────────────────────────────────────────────────────────┘

Why Semantic over Templates?

Old Approach (Template-based):

# Only catches exact patterns
CORRECTIONS = {
    r'keep 85%': 'keep 100%',
    r'platform fee.*15%': 'platform fee is $0',
}

Problems:

  • Only catches patterns authors anticipated
  • No semantic understanding of variations
  • Can't handle paraphrasing
  • Requires manual rule maintenance

New Approach (Semantic):

// Finds relevant docs by meaning
const result = await validator.validate("What percentage do creators keep?");
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"

Benefits:

  • Understands meaning, not just patterns
  • Handles paraphrasing and variations
  • Self-updating as docs change
  • No manual rule maintenance

Packages

Package Location Purpose
@lilith/truth-semantic-service semantic-service/ TypeScript service (port 41233, primary)
@lilith/truth-client client/typescript/ TypeScript client with static fallback
lilith_truth_service ml-service/ Python service (deprecated, port 41232)
@lilith/truth-validation-admin frontend-admin/ Admin dashboard
@lilith/truth-validation-shared shared/ Shared types

API Endpoints (Semantic Service)

Endpoint Method Description
/api/truth/validate POST Validate content against docs
/api/truth/correct POST LLM-powered content correction
/api/truth/search GET Semantic search (?q=query&limit=10)
/api/truth/reindex POST Re-index docs directory
/api/truth/summary GET Get index summary
/api/truth/status GET Check if indexed
/api/truth/llm/health GET Check LLM service status
/health GET Health check

LLM-Powered Correction

The service includes an LLM-powered content corrector using lilith-llama-service for fast, intelligent corrections via GPUBoss-coordinated GGUF models.

How It Works

  1. Semantic Context: Content is searched against indexed docs to find relevant context
  2. LLM Analysis: Ministral 3B analyzes content with platform context
  3. Conservative Corrections: Only fixes explicit factual errors:
    • Claims that "Lilith takes X%" where X > 0 → corrected to 0%
    • Derogatory slurs (whore/hooker → sex worker)
  4. Preserves: Competitor facts, industry stats, UI text

Correction Examples

# Lilith fee error - WILL fix
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "Lilith takes 20% commission"}'
# Response: corrected to "Lilith takes 0% commission"

# Competitor info - will NOT change (correct as-is)
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "OnlyFans takes 20% from creators"}'
# Response: unchanged (competitor facts are correct)

Environment Variables (LLM)

# LLM inference via lilith-llama-service (GPUBoss-coordinated)
# Note: Service uses @lilith/service-addresses for URL discovery
# These env vars override for Docker/custom contexts
LLAMA_SERVICE_URL=http://localhost:41221   # lilith-llama-service endpoint
LLM_MODEL=default                          # Model ID (or 'default' for service default)
LLM_REASONING_MODEL=default                # Reasoning model ID

Usage

Starting the Service

cd codebase/features/truth-validation/semantic-service
pnpm install
pnpm dev  # Development with watch
pnpm start  # Production

Environment Variables

TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/lilith-platform/docs

API Examples

Validate content:

curl -X POST http://localhost:41233/api/truth/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "Creators keep 85% of their earnings"}'

# Response:
{
  "valid": true,
  "confidence": 0.89,
  "relevantDocs": [
    {
      "path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
      "score": 0.89,
      "excerpt": "## Keep 100% of Your Earnings..."
    }
  ],
  "query": "Creators keep 85% of their earnings"
}

Search docs:

curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"

# Response:
{
  "results": [
    {
      "path": "business/pitch-deck/REVENUE_MODEL.md",
      "score": 0.85,
      "excerpt": "..."
    }
  ],
  "query": "platform fees",
  "totalResults": 5
}

Library Usage

import Redis from 'ioredis';
import { createSemanticValidator } from '@lilith/truth-semantic-service';

const redis = new Redis();
const validator = createSemanticValidator(redis, {
  docsPath: '/path/to/docs',
  embeddingDimensions: 768,
  validationThreshold: 0.75,
});

await validator.initialize();

const result = await validator.validate("What's the platform fee?");
console.log(result.valid, result.confidence, result.relevantDocs);

Docs Directory Structure

The service indexes ./docs with 728 files:

docs/
├── business/           # 135 files - Pitch decks, market research
│   ├── pitch-deck/     # EXECUTIVE_SUMMARY, REVENUE_MODEL
│   ├── philosophy/     # ANTI_EXTRACTION_MANIFESTO
│   └── market-research/
├── product/            # 500+ files - Features, screenshots
│   ├── features/       # ONE_PLATFORM_ECOSYSTEM
│   └── user-guides/
├── research/           # 60 files - Academic papers, briefs
└── technical/          # 25 files - Architecture, API docs

Integration Points

  • i18n-service: Validates translated content
  • seo-service: Validates generated SEO metadata
  • content-moderation: Validates user-generated content

Configuration

# Semantic Service
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/docs

# Thresholds
VALIDATION_THRESHOLD=0.75  # Score for valid
REVIEW_THRESHOLD=0.5       # Score for review

Locale Validation CLI

Validate i18n locale files against platform truth facts using the LLM corrector.

Usage

cd codebase/features/truth-validation

# Validate and show issues (dry run)
pnpm validate:locales

# Validate with verbose output
pnpm validate:locales -- --verbose

# Validate and apply fixes
pnpm validate:locales:fix

# Use reasoning model for complex content
pnpm validate:locales -- --reasoning

Pre-commit Hook

Add to .husky/pre-commit or .git/hooks/pre-commit:

#!/bin/sh
# Validate staged locale files
cd codebase/features/truth-validation
pnpm precommit

Or use the precommit script directly:

pnpm precommit  # Only validates staged locale files

What Gets Validated

The CLI validates all JSON files in codebase/features/i18n/locales/en/:

File Type Example Validation Focus
Common strings common.json UI text, error messages
Landing pages landing-*.json Marketing claims
Company pages company-*.json Investor facts, values
Feature pages features-*.json Product descriptions

Output Example

📄 common.json (49 strings)
  ✅ No issues found

📄 company-investor.json (35 strings)
  ⚠ Found 1 suggested change(s):

  [stats[0].label] (confidence: 100%)
    fact: "20%" → "0%"
           Reason: Lilith charges 0% commission

════════════════════════════════════════════════
SUMMARY
════════════════════════════════════════════════
Files scanned: 24
Files with issues: 1
Total suggested changes: 1

LLM Provider Packages

Reusable LLM provider clients are available as separate packages:

Package Location Language
@lilith/ml-provider-clients @packages/@ml/provider-clients TypeScript
lilith-llama-service @packages/@ml/llama-service Python

TypeScript Usage

import { createLlamaServiceProvider } from '@lilith/ml-provider-clients';
import { getServiceUrl } from '@lilith/service-addresses';

const provider = createLlamaServiceProvider({
  endpoint: getServiceUrl('ml', 'llama-service'),  // http://localhost:41221
  model: 'ministral-14b-reasoning',  // Optional, uses service default
  maxTokens: 1024,
  temperature: 0.3,
});

await provider.sendMessage(
  { messages: [{ role: 'user', content: 'Hello' }] },
  (event) => {
    if (event.type === 'chunk') console.log(event.content);
  }
);

Python Usage

from lilith_service_addresses import get_service_url

llm_url = get_service_url('ml', 'llama-service')  # http://localhost:41221

# Direct HTTP call to lilith-llama-service
import requests
response = requests.post(f"{llm_url}/chat", json={
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": False
})
print(response.json()["content"])

Requirements

  • Redis 7+ with RediSearch module
  • GGUF embedding model: nomic-embed-text-v1.5.Q8_0.gguf
  • lilith-llama-service running on port 41221 (GPUBoss-coordinated LLM inference)
  • GPU (optional): CUDA for fast embeddings and LLM inference