platform-codebase/features/ML_INTEGRATION.md

# ML Features Integration Plan

## Overview

Three ML-powered features that work together to provide intelligent content management with **semantic RAG-based validation**:

```
┌─────────────────────────────────────────────────────────────────┐
│                   SEMANTIC RAG ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ./docs (728 files)                                             │
│  ├── business/          ─┐                                      │
│  ├── product/            │── Indexed with 768-dim embeddings    │
│  ├── research/           │   nomic-embed-text-v1.5 model        │
│  └── technical/         ─┘   Redis HNSW vector store            │
│                                                                 │
│  Content → Semantic Search → Score-Based Validation             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                  directory-semantic (ML)                        │
│                ~/Code/@packages/@ml/directory-semantic          │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐   ┌─────────────────┐   ┌───────────────┐
│  i18n-service │   │ truth-semantic  │   │  seo-service  │
│   Port 3300   │   │   Port 41233    │   │   Port 3014   │
│               │   │                 │   │               │
│  6 providers  │   │  Semantic RAG   │   │  Geographic   │
│  Auto-fallback│   │  Score-based    │   │  hierarchy    │
└───────┬───────┘   └────────┬────────┘   └───────┬───────┘
        │                    │                    │
        │          ┌─────────┴─────────┐          │
        └──────────►  validates both   ◄──────────┘
                   └───────────────────┘
```

## Service Dependencies

| Service | Port | Depends On | Used By |
|---------|------|------------|---------|
| directory-semantic | - | Redis + RediSearch, GPU | truth-semantic |
| truth-semantic | 41233 | directory-semantic | i18n, seo |
| i18n-service | 3300 | llama-service, truth-semantic | React apps |
| seo-service | 3014 | llama-service, truth-semantic | All frontends |

## Integration Flows

### Flow 1: Translation with Semantic Validation

```
User requests translation
        │
        ▼
┌───────────────┐
│ i18n-service  │──── 1. Get translation from LLM
│               │◄─── llama-service returns translation
│               │
│               │──── 2. Semantic validate translation
│               │◄─── truth-semantic returns confidence
│               │
│               │──── 3. Return (flag if low confidence)
└───────────────┘
        │
        ▼
   React app displays
```

### Flow 2: SEO Generation with Semantic Validation

```
User configures SEO
        │
        ▼
┌───────────────┐
│  seo-service  │──── 1. Generate metadata from LLM
│               │◄─── llama-service returns SEO
│               │
│               │──── 2. Semantic validate against docs
│               │◄─── truth-semantic returns relevant docs
│               │
│               │──── 3. Cache and return
└───────────────┘
        │
        ▼
   HTML <head> tags
```

### Flow 3: Content Publishing

```
Creator writes content
        │
        ▼
┌─────────────────┐
│ truth-semantic  │◄─── Semantic search for relevant facts
│                 │
│ Score-based:    │
│   > 0.75: Valid │
│   0.5-0.75: Review
│   < 0.5: No match
└─────────────────┘
        │
        ▼
┌─────────────────┐
│  i18n-service   │◄─── Translate to other locales
└─────────────────┘
        │
        ▼
   Published in all locales
```

## Semantic Validation Details

### How It Works

1. **Index docs directory** on startup
   - 728 files (135 markdown, 447 images, 54 code files)
   - Chunked and embedded with nomic-embed-text-v1.5 (768 dimensions)
   - Stored in Redis with HNSW indexing

2. **Validate content** by semantic search
   - Content → embedding → KNN search → top matches
   - Returns relevant docs with similarity scores

3. **Score-based decisions**
   - `score > 0.75`: Content matches docs = VALID
   - `score 0.5-0.75`: Uncertain, return context for review
   - `score < 0.5`: No matching documentation

### Example Validation

```typescript
// Input: Marketing claim to validate
const result = await validator.validate("Creators keep 100% of earnings");

// Output: Matched against docs
{
  valid: true,
  confidence: 0.92,
  relevantDocs: [
    {
      path: "product/features/ONE_PLATFORM_ECOSYSTEM.md",
      score: 0.92,
      excerpt: "## Keep 100% of Your Earnings..."
    },
    {
      path: "business/pitch-deck/EXECUTIVE_SUMMARY.md",
      score: 0.87,
      excerpt: "...creators retain all earnings..."
    }
  ]
}
```

## Deployment Order

1. **Redis with RediSearch** - Vector store
2. **truth-semantic-service** - Indexes docs, provides validation API
3. **i18n-service** - Uses truth-semantic for translation validation
4. **seo-service** - Uses truth-semantic for SEO validation

## Health Check Chain

```
GET /health on each service should verify:

truth-semantic-service:
  - Redis reachable
  - Embedding model loaded
  - Docs directory indexed

i18n-service:
  - llama-service reachable
  - truth-semantic reachable
  - Glossary loaded

seo-service:
  - llama-service reachable
  - truth-semantic reachable
  - Cache initialized
```

## API Gateway Routing

```nginx
# ML Services
location /api/i18n/ {
    proxy_pass http://i18n-service:41231/api/i18n/;
}

location /api/truth/ {
    proxy_pass http://truth-semantic-service:41233/api/truth/;
}

location /api/seo/ {
    proxy_pass http://seo-service:3014/api/seo/;
}
```

## Monitoring

Each service exposes Prometheus metrics:
- Request count/latency
- Semantic search latency
- Cache hit rates
- Validation confidence distribution

Dashboard in platform-admin shows:
- Service health status
- Docs index statistics
- Validation activity
- Confidence score distribution