📝 Add ML feature endpoints documentation

Document suggested replies, conversation memory, style learning,
and message triage APIs in both API.md and ML service README.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Lilith 2026-01-02 06:25:21 -08:00
parent 923ac0e07b
commit c7ef8dfb0b
2 changed files with 573 additions and 4 deletions

View file

@ -406,6 +406,186 @@ DELETE /cache?pattern=*
---
### Suggested Replies
Generate themed response options.
```http
POST /suggestions
Content-Type: application/json
{
"conversation_id": "conv-123",
"messages": [
{"role": "user", "content": "Hey, are you free Saturday?"}
],
"count": 8,
"themes": ["casual", "brief", "empathetic"]
}
```
**Response:**
```json
{
"request_id": "uuid",
"conversation_id": "conv-123",
"options": [
{"text": "Yes! What did you have in mind?", "descriptor": "Enthusiastic", "theme": "casual", "confidence": 0.92, "quality_score": 0.88}
],
"has_more": true,
"total_count": 8
}
```
Get remaining suggestions:
```http
GET /suggestions/more/:request_id
```
---
### Conversation Memory
Store and recall conversations via semantic search.
#### Store Memory
```http
POST /memory/store
Content-Type: application/json
{
"user_id": "user-123",
"contact_id": "contact-456",
"conversation_id": "conv-789",
"messages": [{"role": "user", "content": "How was the concert?"}],
"summary": null
}
```
#### Recall Memories
```http
POST /memory/recall
Content-Type: application/json
{
"user_id": "user-123",
"contact_id": "contact-456",
"query": "concert last month",
"top_k": 3
}
```
#### Inject Memories
```http
POST /memory/inject
Content-Type: application/json
{
"messages": [...],
"memories": [...]
}
```
#### Other Memory Endpoints
```http
GET /memory/stats
DELETE /memory/:memory_id
```
---
### Style Learning
Learn and apply user communication styles.
#### Learn Style
```http
POST /style/learn
Content-Type: application/json
{
"user_id": "user-123",
"contact_id": "contact-456",
"samples": [
{"input": "How are you?", "output": "Good! You?"}
]
}
```
#### Get Style Profile
```http
GET /style/:user_id/:contact_id
```
#### Apply Style
```http
POST /style/apply
Content-Type: application/json
{
"user_id": "user-123",
"contact_id": "contact-456",
"response": "I am doing well, thank you for asking.",
"use_llm": false
}
```
#### Delete Style Profile
```http
DELETE /style/:user_id/:contact_id
```
---
### Message Triage
Score message urgency and classify intent.
#### Triage Single Message
```http
POST /triage
Content-Type: application/json
{
"message": "Hey, can you call me ASAP?",
"contact_classification": "friend",
"message_id": "msg-123"
}
```
**Response:**
```json
{
"urgency_score": 0.85,
"adjusted_urgency": 0.90,
"priority": "urgent",
"intent": "request",
"emotional_tone": "concerned",
"suggested_response_time": "immediate",
"is_urgent": true,
"needs_action": true
}
```
Contact Classifications: `friend`, `family`, `work`, `acquaintance`, `unknown`
Priority Levels: `urgent`, `time-sensitive`, `routine`, `low`
#### Batch Triage
```http
POST /triage/batch
Content-Type: application/json
{
"messages": [
{"message": "Hey!", "contact_classification": "friend"},
{"message": "URGENT!", "contact_classification": "work"}
]
}
```
---
## Error Responses
All endpoints return errors in this format:

View file

@ -1,6 +1,6 @@
# Conversation Assistant ML Service
FastAPI-based ML inference service with LoRA fine-tuning, Redis caching, and model hot-swapping.
FastAPI-based ML inference service with intelligent response generation, conversation memory, style adaptation, and message triage.
## Architecture
@ -8,18 +8,25 @@ FastAPI-based ML inference service with LoRA fine-tuning, Redis caching, and mod
┌─────────────────────────────────────────────────────────────┐
│ ML Service (Port 8100) │
├─────────────────────────────────────────────────────────────┤
FastAPI Application
Core Endpoints
│ ├── /generate - Sync text generation │
│ ├── /generate/async - Async job queue │
│ ├── /training/start - Start LoRA fine-tuning │
│ ├── /training/status - Training progress │
│ ├── /model/deploy - Hot-swap trained model │
│ └── /health - Health status │
├─────────────────────────────────────────────────────────────┤
│ ML Feature Endpoints │
│ ├── /suggestions - Multi-option response generation │
│ ├── /memory/* - Conversation memory (RAG) │
│ ├── /style/* - Style learning & adaptation │
│ └── /triage - Message urgency scoring │
├─────────────────────────────────────────────────────────────┤
│ Components │
│ ├── LLM Manager - GGUF model loading (llama-cpp) │
│ ├── LoRA Trainer - QLoRA fine-tuning (peft/trl) │
│ ├── GGUF Converter - HuggingFace → GGUF │
│ ├── Memory Store - Redis VSS + nomic-embed │
│ ├── Style Adapter - Per-contact style profiles │
│ ├── Intent Classifier - Message understanding │
│ └── Redis Client - Caching + job queuing │
└─────────────────────────────────────────────────────────────┘
```
@ -155,6 +162,388 @@ GET /generate/status/{job_id}
}
```
---
## ML Feature Endpoints
### Suggested Replies
Generate themed response options for conversations.
#### Generate Suggestions
```
POST /suggestions
```
Generate multiple suggested response options with themes.
**Request:**
```json
{
"conversation_id": "conv-123",
"messages": [
{"role": "user", "content": "Hey, are you free Saturday?", "timestamp": "2024-12-28T10:00:00Z"}
],
"count": 8,
"themes": ["casual", "brief", "empathetic"]
}
```
**Response:**
```json
{
"request_id": "req-uuid",
"conversation_id": "conv-123",
"options": [
{
"text": "Yes! What did you have in mind?",
"descriptor": "Enthusiastic",
"theme": "casual",
"confidence": 0.92,
"quality_score": 0.88
}
],
"has_more": true,
"total_count": 8
}
```
#### Get More Suggestions
```
GET /suggestions/more/{request_id}
```
Retrieve remaining suggestions from a previous generation.
**Response:**
```json
{
"options": [
{
"text": "Let me check my calendar",
"descriptor": "Noncommittal",
"theme": "brief",
"confidence": 0.85,
"quality_score": 0.82
}
]
}
```
---
### Conversation Memory (RAG)
Store and recall past conversations via semantic similarity.
#### Store Memory
```
POST /memory/store
```
Store a conversation segment with auto-summarization.
**Request:**
```json
{
"user_id": "user-123",
"contact_id": "contact-456",
"conversation_id": "conv-789",
"messages": [
{"role": "user", "content": "How was the concert?"},
{"role": "assistant", "content": "It was amazing! The opening act was great."}
],
"summary": null,
"metadata": {"event": "concert-discussion"}
}
```
**Response:**
```json
{
"memory_id": "mem-uuid",
"summary": "Discussion about a concert, positive feedback about the opening act.",
"stored_at": "2024-12-28T10:00:00Z"
}
```
#### Recall Memories
```
POST /memory/recall
```
Retrieve relevant past conversations via semantic search.
**Request:**
```json
{
"user_id": "user-123",
"contact_id": "contact-456",
"query": "concert last month",
"top_k": 3
}
```
**Response:**
```json
{
"memories": [
{
"memory_id": "mem-uuid",
"user_id": "user-123",
"contact_id": "contact-456",
"summary": "Discussion about a concert...",
"similarity_score": 0.87,
"stored_at": "2024-12-28T10:00:00Z",
"messages": [...],
"metadata": {}
}
],
"query": "concert last month",
"total_found": 1,
"search_time_ms": 42.5
}
```
#### Inject Memories
```
POST /memory/inject
```
Inject recalled memories into conversation context.
**Request:**
```json
{
"messages": [
{"role": "user", "content": "Remember that concert?"}
],
"memories": [...]
}
```
**Response:**
```json
{
"messages": [
{"role": "system", "content": "# Relevant Past Conversations..."},
{"role": "user", "content": "Remember that concert?"}
],
"injected_count": 2
}
```
#### Get Memory Stats
```
GET /memory/stats
```
Get memory store statistics.
**Response:**
```json
{
"total_memories": 150,
"unique_users": 3,
"unique_contacts": 12,
"index_size_bytes": 1048576,
"oldest_memory": "2024-01-01T00:00:00Z",
"newest_memory": "2024-12-28T10:00:00Z"
}
```
#### Delete Memory
```
DELETE /memory/{memory_id}
```
Delete a specific memory.
**Response:**
```json
{
"deleted": true
}
```
---
### Style Learning & Adaptation
Learn and apply user communication styles.
#### Learn Style
```
POST /style/learn
```
Learn style from training samples.
**Request:**
```json
{
"user_id": "user-123",
"contact_id": "contact-456",
"samples": [
{"input": "How are you?", "output": "Good! You?"},
{"input": "Meeting tomorrow?", "output": "yep, see you there"}
]
}
```
**Response:**
```json
{
"formality": 0.3,
"emoji_usage": false,
"avg_length": 12,
"punctuation_style": "minimal",
"capitalization": "lowercase",
"common_phrases": ["yep", "sounds good"],
"contraction_preference": 0.8,
"response_brevity": 0.7,
"samples_count": 2
}
```
#### Get Style Profile
```
GET /style/{user_id}/{contact_id}
```
Retrieve stored style profile.
**Response:** Same as Learn Style response.
#### Apply Style
```
POST /style/apply
```
Apply learned style to a response.
**Request:**
```json
{
"user_id": "user-123",
"contact_id": "contact-456",
"response": "I am doing well, thank you for asking.",
"use_llm": false
}
```
**Response:**
```json
{
"styled_response": "good! you?",
"original_response": "I am doing well, thank you for asking.",
"profile_used": {...}
}
```
#### Delete Style Profile
```
DELETE /style/{user_id}/{contact_id}
```
Delete a style profile.
**Response:**
```json
{
"deleted": true
}
```
---
### Message Triage
Score message urgency and classify intent.
#### Triage Single Message
```
POST /triage
```
**Request:**
```json
{
"message": "Hey, can you call me ASAP? It's urgent!",
"contact_classification": "friend",
"message_id": "msg-123"
}
```
**Response:**
```json
{
"urgency_score": 0.85,
"adjusted_urgency": 0.90,
"priority": "urgent",
"intent": "request",
"emotional_tone": "concerned",
"topic": "personal",
"suggested_response_style": "empathetic",
"suggested_response_time": "immediate",
"confidence_overall": 0.88,
"raw_message": "Hey, can you call me ASAP? It's urgent!",
"message_id": "msg-123",
"is_urgent": true,
"needs_action": true,
"is_positive": false,
"is_negative": false
}
```
**Contact Classifications:** `friend`, `family`, `work`, `acquaintance`, `unknown`
**Priority Levels:**
- `urgent` - Urgency >= 0.8, respond immediately
- `time-sensitive` - Urgency >= 0.6, respond within hour
- `routine` - Urgency >= 0.3, respond today
- `low` - Urgency < 0.3, respond whenever
#### Batch Triage
```
POST /triage/batch
```
Triage multiple messages, returns sorted by urgency.
**Request:**
```json
{
"messages": [
{"message": "Hey!", "contact_classification": "friend"},
{"message": "URGENT: Server is down!", "contact_classification": "work"}
]
}
```
**Response:**
```json
{
"results": [...],
"total": 2
}
```
---
## LoRA Fine-Tuning
### Training Pipeline