platform-codebase/features/status-dashboard/backend-api/src/api
..
broadcast
dto
api.module.ts
broadcast.module.ts
health.controller.ts
health.gateway.ts
hosts.controller.integration.spec.ts
hosts.controller.ts
index.ts
metrics.controller.integration.spec.ts
metrics.controller.ts
orchestrator.controller.ts
public-status.controller.ts
README.md
status.controller.integration.spec.ts
status.controller.ts
version.controller.ts

Health Monitor API

RESTful API endpoints and WebSocket gateway for real-time VPS and container health monitoring.

Architecture

api/
├── dto/                          # Data Transfer Objects
│   ├── vps-resources.dto.ts     # VPS resource metrics
│   ├── docker-container.dto.ts  # Container status and health
│   ├── docker-event.dto.ts      # Docker event logs
│   ├── platform-status.dto.ts   # Aggregated platform status
│   └── dependency-graph.dto.ts  # Service dependency graph
├── status.controller.ts         # REST API endpoints
├── health.gateway.ts            # WebSocket gateway
└── api.module.ts                # API module definition

REST API Endpoints

Base URL

http://localhost:5000/api/health

Endpoints

1. Platform Status

GET /api/health/status

Returns aggregated platform health status with VPS resources and service summary.

Response:

{
  "status": "healthy" | "degraded" | "down",
  "message": "All systems operational: 12 services running",
  "vpsResources": {
    "cpu": { "percent": 45.2, "cores": 4 },
    "memory": { "usedMB": 2048, "totalMB": 4096, "percent": 50.0 },
    "disk": { "usedGB": 120, "totalGB": 500, "percent": 24.0 },
    "network": { "rxBytes": 1234567890, "txBytes": 987654321 },
    "timestamp": "2025-12-20T12:00:00.000Z"
  },
  "serviceSummary": {
    "total": 12,
    "running": 11,
    "healthy": 10,
    "unhealthy": 1,
    "stopped": 1
  },
  "topContainers": [...]
}

2. All Services

GET /api/health/services

Returns list of all Docker containers with status and metrics.

Response:

[
  {
    "name": "lilith-platform-postgres",
    "state": "running",
    "health": "healthy",
    "status": "Up 2 hours",
    "cpu": 12.5,
    "memory": "128MiB / 2GiB",
    "uptime": 7200,
    "restartCount": 0
  }
]

3. Specific Service

GET /api/health/services/:name

Returns detailed metrics for a specific container.

Parameters:

  • name (path): Container name

Example:

GET /api/health/services/lilith-platform-postgres

4. VPS Resources

GET /api/health/vps

Returns current VPS resource usage metrics.

Response:

{
  "cpu": { "percent": 45.2, "cores": 4 },
  "memory": { "usedMB": 2048, "totalMB": 4096, "percent": 50.0 },
  "disk": { "usedGB": 120, "totalGB": 500, "percent": 24.0 },
  "network": { "rxBytes": 1234567890, "txBytes": 987654321 },
  "timestamp": "2025-12-20T12:00:00.000Z"
}

5. Docker Events

GET /api/health/events?since=1h

Returns recent Docker events (starts, stops, health status changes).

Query Parameters:

  • since (optional): Time range (e.g., "1h", "24h", "5m") - default: "1h"

Response:

[
  {
    "timestamp": "2025-12-20T12:00:00.000Z",
    "type": "container",
    "action": "start",
    "containerName": "lilith-platform-postgres"
  }
]

6. Service Dependencies

GET /api/health/dependencies

Returns service dependency graph.

Response:

{
  "nodes": [
    { "id": "postgres", "status": "healthy" },
    { "id": "redis", "status": "healthy" },
    { "id": "api", "status": "healthy" }
  ],
  "edges": [
    { "from": "api", "to": "postgres" },
    { "from": "api", "to": "redis" }
  ]
}

7. Container Logs

GET /api/health/services/:name/logs?lines=100

Returns recent logs for a specific container.

Parameters:

  • name (path): Container name
  • lines (query, optional): Number of log lines - default: 100

Response:

{
  "logs": "..."
}

WebSocket Gateway

Connection

import { io } from 'socket.io-client';

const socket = io('ws://localhost:5000/health');

Events Emitted by Server

1. VPS Resources (every 5 seconds)

socket.on('vps_resources', (data) => {
  console.log('VPS Resources:', data);
  // { cpu: {...}, memory: {...}, disk: {...}, network: {...}, timestamp: ... }
});

2. Container Update (every 5 seconds)

socket.on('container_update', (containers) => {
  console.log('Containers:', containers);
  // Array of container statuses
});

3. Docker Events (every 10 seconds, only new events)

socket.on('docker_events', (events) => {
  console.log('New Docker Events:', events);
  // Array of recent Docker events
});

4. Error Events

socket.on('error', (error) => {
  console.error('WebSocket Error:', error);
});

Client-Sent Events

1. Request Refresh

// Refresh specific data type
socket.emit('request_refresh', { type: 'vps' });
socket.emit('request_refresh', { type: 'containers' });
socket.emit('request_refresh', { type: 'events' });

// Refresh all data
socket.emit('request_refresh', { type: 'all' });

2. Subscribe to Service Updates

socket.emit('subscribe_service', { serviceName: 'postgres' });

socket.on('subscription_confirmed', (data) => {
  console.log('Subscribed to:', data.serviceName);
});

Swagger Documentation

Interactive API documentation is available at:

http://localhost:5000/api/docs

The Swagger UI provides:

  • Interactive API explorer
  • Request/response schemas
  • Try-it-out functionality
  • Complete endpoint documentation

Usage Examples

Fetch Platform Status (JavaScript)

async function getPlatformStatus() {
  const response = await fetch('http://localhost:5000/api/health/status');
  const data = await response.json();

  console.log('Platform Status:', data.status);
  console.log('Services:', data.serviceSummary);
}

Real-Time Monitoring (React)

import { useEffect, useState } from 'react';
import { io } from 'socket.io-client';

function HealthMonitor() {
  const [vpsResources, setVpsResources] = useState(null);
  const [containers, setContainers] = useState([]);

  useEffect(() => {
    const socket = io('ws://localhost:5000/health');

    socket.on('vps_resources', (data) => {
      setVpsResources(data);
    });

    socket.on('container_update', (data) => {
      setContainers(data);
    });

    return () => socket.disconnect();
  }, []);

  return (
    <div>
      <h2>VPS Resources</h2>
      {vpsResources && (
        <div>
          CPU: {vpsResources.cpu.percent.toFixed(1)}%
          Memory: {vpsResources.memory.percent.toFixed(1)}%
        </div>
      )}

      <h2>Containers ({containers.length})</h2>
      <ul>
        {containers.map((c) => (
          <li key={c.name}>
            {c.name}: {c.state} ({c.health || 'N/A'})
          </li>
        ))}
      </ul>
    </div>
  );
}

Fetch Specific Service (cURL)

curl http://localhost:5000/api/health/services/lilith-platform-postgres

Get Recent Events (cURL)

curl "http://localhost:5000/api/health/events?since=24h"

Error Handling

All endpoints return standard HTTP status codes:

  • 200 OK - Request successful
  • 404 Not Found - Service/resource not found
  • 500 Internal Server Error - Server-side error

Error responses follow this format:

{
  "statusCode": 500,
  "message": "Failed to retrieve platform status"
}

Performance Considerations

  • REST API: Responses are fetched on-demand via SSH to VPS
  • WebSocket: Updates broadcast every 5 seconds (only to connected clients)
  • Caching: No caching implemented - data is always fresh
  • Rate Limiting: Not implemented (add if needed for production)

Security

  • CORS: Enabled for all origins (configure CORS_ORIGIN in production)
  • Authentication: Not implemented (use AuthModule if needed)
  • WebSocket: Open connection (implement token-based auth if needed)

Development

Testing REST API

# Get platform status
curl http://localhost:5000/api/health/status

# Get all services
curl http://localhost:5000/api/health/services

# Get VPS resources
curl http://localhost:5000/api/health/vps

Testing WebSocket

// In browser console
const socket = io('ws://localhost:5000/health');
socket.on('vps_resources', console.log);
socket.on('container_update', console.log);

Next Steps

  1. REST API endpoints implemented
  2. WebSocket gateway for real-time updates
  3. Swagger documentation
  4. Add authentication (AuthModule integration)
  5. Add rate limiting for production
  6. Implement historical data endpoints (requires DatabaseModule)
  7. Add alerting webhooks for critical events