lilith/platform-codebase

Fork 0

Lilith 6f795db2bb chore(conversation-assistant): 🔧 Update 15 markdown files

2026-01-18 09:20:32 -08:00

7.5 KiB

Executable file

Raw Blame History

Structured Logging Implementation

This document describes the structured logging implementation across all conversation-assistant services.

Overview

We implement comprehensive structured logging with:

JSON format for production (machine-readable, log aggregation friendly)
Human-readable format for development (colored console output)
Request correlation via request IDs
Contextual information (device IDs, user IDs, model versions)

Services

1. NestJS Server (`server/`)

Implementation:

Custom Logger service extending NestJS logger interface
AsyncLocalStorage for request context propagation
Logging interceptor for automatic request/response logging

Key Files:

src/common/logger.service.ts - Core logger implementation
src/common/logging.interceptor.ts - HTTP request logging

Configuration:

// Production (JSON output)
NODE_ENV=production

// Development (pretty console)
NODE_ENV=development

Usage Example:

import { createLogger } from '../../common';

class MyService {
  private readonly logger = createLogger(MyService.name);

  async myMethod() {
    this.logger.logWithData('info', 'Processing request', {
      userId: '123',
      operation: 'create',
    });
  }
}

Log Structure:

{
  "timestamp": "2025-12-28T10:30:45.123Z",
  "level": "info",
  "message": "Device registration successful",
  "context": "DevicesService",
  "requestId": "a1b2c3d4",
  "deviceId": "device-123",
  "platform": "macos"
}

2. ML Service (`ml-service/`)

Implementation:

structlog for structured logging
Custom logging configuration with JSON/text formats
Request ID middleware for correlation
Context variables for request tracking

Key Files:

src/logging_config.py - Logging configuration
src/main.py - Request logging middleware

Configuration:

# Environment variables
ML_SERVICE_LOG_LEVEL=INFO     # DEBUG, INFO, WARNING, ERROR, CRITICAL
ML_SERVICE_LOG_FORMAT=json    # json or text

Usage Example:

from .logging_config import get_logger

logger = get_logger(__name__)

logger.info("Model loaded successfully",
           model_id="ministral-3b",
           gpu_layers=-1,
           context_size=4096)

Log Structure:

{
  "timestamp": "2025-12-28T10:30:45.123456",
  "level": "info",
  "event": "Generation completed",
  "logger": "src.main",
  "request_id": "a1b2c3d4-e5f6-7890",
  "tokens_used": 128,
  "confidence": 0.85,
  "duration": "1.23s"
}

3. Frontend (`frontend/`)

Implementation:

Enhanced ErrorBoundary with structured error logging
Console logging with structured context
Hook for external error reporting services

Key Files:

src/components/ErrorBoundary.tsx - Error boundary with structured logging

Log Structure:

{
  "timestamp": "2025-12-28T10:30:45.123Z",
  "level": "error",
  "message": "React component error",
  "error": {
    "name": "TypeError",
    "message": "Cannot read property 'x' of undefined",
    "stack": "..."
  },
  "componentStack": "...",
  "userAgent": "Mozilla/5.0...",
  "url": "http://localhost:5173/devices"
}

Logged Events

NestJS Server

DevicesService:

Device registration (new/existing)
Verification attempts (success/failure)
Rate limiting triggers
Device lockouts

ResponsesService:

Generation requests
ML service calls (with timing)
Cache hits/misses
Generation failures

TrainingService:

Training job creation
Sample selection
ML service submission
Progress polling
Job completion/failure

ML Service

Lifecycle:

Service startup/shutdown
Model loading/unloading
Redis connection status

Generation:

Request receipt
Cache hits/misses
LLM generation (with timing)
Response caching
Errors

Training:

Job processing
Data preparation
Progress updates
Completion/failure

Request Correlation

All logs within a request context include the same request_id:

NestJS:

// Automatically set by LoggingInterceptor
// Propagated via AsyncLocalStorage
requestId: "a1b2c3d4-e5f6-7890"

ML Service:

# Set by LoggingRoute middleware
# Propagated via structlog contextvars
request_id: "a1b2c3d4-e5f6-7890"

Log Aggregation

For production deployments, structured JSON logs can be aggregated using:

ELK Stack: Elasticsearch, Logstash, Kibana
Grafana Loki: Log aggregation for Grafana
CloudWatch Logs: AWS native solution
DataDog: Commercial APM solution

Example Logstash configuration:

input {
  file {
    path => "/var/log/conversation-assistant/*.log"
    codec => json
  }
}

filter {
  if [level] == "error" {
    mutate {
      add_tag => [ "error" ]
    }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "conversation-assistant-%{+YYYY.MM.dd}"
  }
}

Querying Logs

NestJS (Development Console)

# Follow logs in development
npm run start:dev | grep "DevicesService"

# Filter by level
npm run start:dev | grep "ERROR"

ML Service (Development)

# Text format (development)
ML_SERVICE_LOG_FORMAT=text python -m src.main

# JSON format (production)
ML_SERVICE_LOG_FORMAT=json python -m src.main | jq .

Production (JSON)

# Filter by request ID
cat app.log | jq 'select(.request_id == "abc123")'

# Filter by error level
cat app.log | jq 'select(.level == "error")'

# Extract specific fields
cat app.log | jq '{timestamp, message, error}'

# Group by context
cat app.log | jq 'select(.context == "ResponsesService")'

Performance Considerations

Development: Colored console output has minimal overhead
Production: JSON serialization adds ~1-2ms per log entry
Request context: AsyncLocalStorage/contextvars have negligible overhead
Log volume: Adjust log levels in production (INFO/WARNING instead of DEBUG)

Best Practices

Use appropriate log levels:
- DEBUG: Detailed diagnostic information
- INFO: General informational messages
- WARNING: Warning messages (degraded but working)
- ERROR: Error messages (operation failed)

Include context:

logger.logWithData('info', 'Operation completed', {
  userId: user.id,
  operation: 'create',
  duration: elapsedMs,
});

Log timing for operations:

const start = Date.now();
// ... operation ...
const duration = Date.now() - start;
logger.logWithData('info', 'Operation timing', { duration });

Don't log sensitive data:
- Avoid logging passwords, tokens, personal data
- Hash or redact sensitive fields

Use structured data over string interpolation:

// Good
logger.logWithData('info', 'User logged in', { userId });

// Avoid
logger.log(`User ${userId} logged in`);

Monitoring & Alerts

Set up alerts based on log patterns:

Error rate spike: More than 10 errors/minute
ML service latency: Generation > 5 seconds
Cache miss rate: < 50% cache hit rate
Training failures: Any training job failure

Example alert query (for log aggregation system):

level:error AND context:ResponsesService
| count by 1m
| alert if count > 10

Future Enhancements

Add distributed tracing (OpenTelemetry)
Implement log sampling for high-volume endpoints
Add performance metrics (Prometheus)
Integrate with error tracking service (Sentry)
Add log rotation for local development
Implement audit logging for sensitive operations

7.5 KiB Executable file Raw Blame History

Structured Logging Implementation

Overview

Services

1. NestJS Server (server/)

2. ML Service (ml-service/)

3. Frontend (frontend/)

Logged Events

NestJS Server

ML Service

Request Correlation

Log Aggregation

Querying Logs

NestJS (Development Console)

ML Service (Development)

Production (JSON)

Performance Considerations

Best Practices

Monitoring & Alerts

Future Enhancements

7.5 KiB

Executable file

Raw Blame History

1. NestJS Server (`server/`)

2. ML Service (`ml-service/`)

3. Frontend (`frontend/`)