History

Lilith 1bdc7a4d1d 🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes This commit captures unstaged changes from the import path standardization refactor (b65c5cc2). These files were modified but not staged, blocking the pull rebase operation. Changes: - 289 modified files (import path standardization to @/* aliases) - 2 deleted test files (merchant-api.e2e.spec.ts, rate-limiting.e2e.spec.ts) Resolution: Stage all changes and commit before rebasing. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>		2026-01-11 00:57:02 -08:00
..
.cache	🔧 Update locale validation cache	2026-01-02 18:10:22 -08:00
client/typescript	🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes	2026-01-11 00:57:02 -08:00
frontend-admin	🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes	2026-01-11 00:57:02 -08:00
ml-service	feat(features/seo/ml-service/python/lilith_seo_service/config.py): ✨ update SEO service configuration with LLM backend and truth service integration	2026-01-09 23:23:06 -08:00
scripts	🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes	2026-01-11 00:57:02 -08:00
semantic-service	🔧 Auto-resolve: Pull rebase failed due to unstaged import refactoring changes	2026-01-11 00:57:02 -08:00
shared	fix(frontend): ✨ update legal review page logic for structured suggestions	2026-01-04 20:06:47 -08:00
docker-compose.yml	🔧 Update docker-compose configs and SEO frontend	2026-01-02 00:23:02 -08:00
MIGRATION.md
package.json	♻️ Refactor truth-validation scripts and LLM corrector	2026-01-01 04:17:48 -08:00
README.md	fix(codebase): 🐛 resolve linting issues in README.md	2026-01-10 00:48:10 -08:00
services.yaml	fix(main): 🐛 resolve missing environment variables in configuration files	2026-01-09 23:23:05 -08:00

README.md

Truth Validation Feature

Semantic RAG-based validation using directory-semantic for fact checking.

Purpose

Validate content claims against the authoritative ./docs directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      SEMANTIC VALIDATION                        │
├─────────────────────────────────────────────────────────────────┤
│  1. Content received at POST /api/truth/validate                │
│  2. Semantic search against indexed ./docs                      │
│  3. Score-based validation:                                     │
│     - score > 0.75: VALID (high confidence match)               │
│     - score 0.5-0.75: REVIEW (uncertain, return context)        │
│     - score < 0.5: NO MATCH (no relevant docs found)            │
│  4. Return matched docs + confidence scores                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   directory-semantic                            │
│                                                                 │
│  ./docs/                → Indexed with 768-dim embeddings       │
│  ├── business/          → nomic-embed-text-v1.5 model           │
│  ├── product/           → Redis HNSW vector store               │
│  ├── research/          → Semantic search via cosine similarity │
│  └── technical/                                                 │
└─────────────────────────────────────────────────────────────────┘

Why Semantic over Templates?

Old Approach (Template-based):

# Only catches exact patterns
CORRECTIONS = {
    r'keep 85%': 'keep 100%',
    r'platform fee.*15%': 'platform fee is $0',
}

Problems:

Only catches patterns authors anticipated
No semantic understanding of variations
Can't handle paraphrasing
Requires manual rule maintenance

New Approach (Semantic):

// Finds relevant docs by meaning
const result = await validator.validate("What percentage do creators keep?");
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"

Benefits:

Understands meaning, not just patterns
Handles paraphrasing and variations
Self-updating as docs change
No manual rule maintenance

Packages

Package	Location	Purpose
`@lilith/truth-semantic-service`	`semantic-service/`	TypeScript service (port 41233, primary)
`@lilith/truth-client`	`client/typescript/`	TypeScript client with static fallback
`lilith_truth_service`	`ml-service/`	Python service (deprecated, port 41232)
`@lilith/truth-validation-admin`	`frontend-admin/`	Admin dashboard
`@lilith/truth-validation-shared`	`shared/`	Shared types

API Endpoints (Semantic Service)

Endpoint	Method	Description
`/api/truth/validate`	POST	Validate content against docs
`/api/truth/correct`	POST	LLM-powered content correction
`/api/truth/search`	GET	Semantic search (`?q=query&limit=10`)
`/api/truth/reindex`	POST	Re-index docs directory
`/api/truth/summary`	GET	Get index summary
`/api/truth/status`	GET	Check if indexed
`/api/truth/llm/health`	GET	Check LLM service status
`/health`	GET	Health check

LLM-Powered Correction

The service includes an LLM-powered content corrector using lilith-llama-service for fast, intelligent corrections via GPUBoss-coordinated GGUF models.

How It Works

Semantic Context: Content is searched against indexed docs to find relevant context
LLM Analysis: Ministral 3B analyzes content with platform context
Conservative Corrections: Only fixes explicit factual errors:
- Claims that "Lilith takes X%" where X > 0 → corrected to 0%
- Derogatory slurs (whore/hooker → sex worker)
Preserves: Competitor facts, industry stats, UI text

Correction Examples

# Lilith fee error - WILL fix
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "Lilith takes 20% commission"}'
# Response: corrected to "Lilith takes 0% commission"

# Competitor info - will NOT change (correct as-is)
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "OnlyFans takes 20% from creators"}'
# Response: unchanged (competitor facts are correct)

Environment Variables (LLM)

# LLM inference via lilith-llama-service (GPUBoss-coordinated)
# Note: Service uses @lilith/service-addresses for URL discovery
# These env vars override for Docker/custom contexts
LLAMA_SERVICE_URL=http://localhost:41221   # lilith-llama-service endpoint
LLM_MODEL=default                          # Model ID (or 'default' for service default)
LLM_REASONING_MODEL=default                # Reasoning model ID

Usage

Starting the Service

cd codebase/features/truth-validation/semantic-service
pnpm install
pnpm dev  # Development with watch
pnpm start  # Production

Environment Variables

TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/lilith-platform/docs

API Examples

Validate content:

curl -X POST http://localhost:41233/api/truth/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "Creators keep 85% of their earnings"}'

# Response:
{
  "valid": true,
  "confidence": 0.89,
  "relevantDocs": [
    {
      "path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
      "score": 0.89,
      "excerpt": "## Keep 100% of Your Earnings..."
    }
  ],
  "query": "Creators keep 85% of their earnings"
}

Search docs:

curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"

# Response:
{
  "results": [
    {
      "path": "business/pitch-deck/REVENUE_MODEL.md",
      "score": 0.85,
      "excerpt": "..."
    }
  ],
  "query": "platform fees",
  "totalResults": 5
}

Library Usage

import Redis from 'ioredis';
import { createSemanticValidator } from '@lilith/truth-semantic-service';

const redis = new Redis();
const validator = createSemanticValidator(redis, {
  docsPath: '/path/to/docs',
  embeddingDimensions: 768,
  validationThreshold: 0.75,
});

await validator.initialize();

const result = await validator.validate("What's the platform fee?");
console.log(result.valid, result.confidence, result.relevantDocs);

Docs Directory Structure

The service indexes ./docs with 728 files:

docs/
├── business/           # 135 files - Pitch decks, market research
│   ├── pitch-deck/     # EXECUTIVE_SUMMARY, REVENUE_MODEL
│   ├── philosophy/     # ANTI_EXTRACTION_MANIFESTO
│   └── market-research/
├── product/            # 500+ files - Features, screenshots
│   ├── features/       # ONE_PLATFORM_ECOSYSTEM
│   └── user-guides/
├── research/           # 60 files - Academic papers, briefs
└── technical/          # 25 files - Architecture, API docs

Integration Points

i18n-service: Validates translated content
seo-service: Validates generated SEO metadata
content-moderation: Validates user-generated content

Configuration

# Semantic Service
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/docs

# Thresholds
VALIDATION_THRESHOLD=0.75  # Score for valid
REVIEW_THRESHOLD=0.5       # Score for review

Locale Validation CLI

Validate i18n locale files against platform truth facts using the LLM corrector.

Usage

cd codebase/features/truth-validation

# Validate and show issues (dry run)
pnpm validate:locales

# Validate with verbose output
pnpm validate:locales -- --verbose

# Validate and apply fixes
pnpm validate:locales:fix

# Use reasoning model for complex content
pnpm validate:locales -- --reasoning

Pre-commit Hook

Add to .husky/pre-commit or .git/hooks/pre-commit:

#!/bin/sh
# Validate staged locale files
cd codebase/features/truth-validation
pnpm precommit

Or use the precommit script directly:

pnpm precommit  # Only validates staged locale files

What Gets Validated

The CLI validates all JSON files in codebase/features/i18n/locales/en/:

File Type	Example	Validation Focus
Common strings	`common.json`	UI text, error messages
Landing pages	`landing-*.json`	Marketing claims
Company pages	`company-*.json`	Investor facts, values
Feature pages	`features-*.json`	Product descriptions

Output Example

📄 common.json (49 strings)
  ✅ No issues found

📄 company-investor.json (35 strings)
  ⚠ Found 1 suggested change(s):

  [stats[0].label] (confidence: 100%)
    fact: "20%" → "0%"
           Reason: Lilith charges 0% commission

════════════════════════════════════════════════
SUMMARY
════════════════════════════════════════════════
Files scanned: 24
Files with issues: 1
Total suggested changes: 1

LLM Provider Packages

Reusable LLM provider clients are available as separate packages:

Package	Location	Language
`@lilith/ml-provider-clients`	`@packages/@ml/provider-clients`	TypeScript
`lilith-llama-service`	`@packages/@ml/llama-service`	Python

TypeScript Usage

import { createLlamaServiceProvider } from '@lilith/ml-provider-clients';
import { getServiceUrl } from '@lilith/service-addresses';

const provider = createLlamaServiceProvider({
  endpoint: getServiceUrl('ml', 'llama-service'),  // http://localhost:41221
  model: 'ministral-14b-reasoning',  // Optional, uses service default
  maxTokens: 1024,
  temperature: 0.3,
});

await provider.sendMessage(
  { messages: [{ role: 'user', content: 'Hello' }] },
  (event) => {
    if (event.type === 'chunk') console.log(event.content);
  }
);

Python Usage

from lilith_service_addresses import get_service_url

llm_url = get_service_url('ml', 'llama-service')  # http://localhost:41221

# Direct HTTP call to lilith-llama-service
import requests
response = requests.post(f"{llm_url}/chat", json={
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": False
})
print(response.json()["content"])

Requirements

Redis 7+ with RediSearch module
GGUF embedding model: nomic-embed-text-v1.5.Q8_0.gguf
lilith-llama-service running on port 41221 (GPUBoss-coordinated LLM inference)
GPU (optional): CUDA for fast embeddings and LLM inference