OS Trading Engine
API Reference
Endpoints
Health & Metrics

Health & Metrics

Monitor system health and collect metrics for observability.


Endpoints Overview

MethodEndpointDescriptionAuth
GET/api/v1/healthComprehensive health checkNone
GET/api/v1/health/readyReadiness probeNone
GET/api/v1/health/liveLiveness probeNone
GET/api/v1/metricsPrometheus metricsNone

Health and metrics endpoints do not require authentication, making them suitable for load balancers, Kubernetes probes, and monitoring systems.


Health Check

Comprehensive health status of the application and dependencies.

GET /api/v1/health

Response

Healthy (200)

{
  "status": "healthy",
  "timestamp": "2025-01-20T10:30:00.000Z",
  "uptime": 86400,
  "services": {
    "database": {
      "status": "healthy",
      "latency": 5
    },
    "redis": {
      "status": "healthy",
      "latency": 2
    },
    "queue": {
      "status": "healthy",
      "latency": 3
    }
  }
}

Degraded (200)

{
  "status": "degraded",
  "timestamp": "2025-01-20T10:30:00.000Z",
  "uptime": 86400,
  "services": {
    "database": {
      "status": "healthy",
      "latency": 5
    },
    "redis": {
      "status": "unhealthy"
    },
    "queue": {
      "status": "healthy",
      "latency": 3
    }
  }
}

Unhealthy (503)

{
  "status": "unhealthy",
  "timestamp": "2025-01-20T10:30:00.000Z",
  "uptime": 86400,
  "services": {
    "database": {
      "status": "unhealthy"
    },
    "redis": {
      "status": "unhealthy"
    },
    "queue": {
      "status": "unhealthy"
    }
  }
}

Response Fields

FieldTypeDescription
statusstringOverall status: healthy, degraded, unhealthy
timestampISO 8601Current server time
uptimeintegerServer uptime in seconds
services.databaseobjectPostgreSQL status
services.redisobjectRedis status
services.queueobjectBullMQ status

Status Logic

ConditionStatusHTTP Code
All services healthyhealthy200
Database healthy, others unhealthydegraded200
Database unhealthyunhealthy503

Readiness Probe

Kubernetes-style readiness check. Returns 200 when ready to accept traffic.

GET /api/v1/health/ready

Response

Ready (200)

{
  "ready": true,
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Ready but Degraded (200)

{
  "ready": true,
  "degraded": true,
  "message": "Redis unavailable, running in degraded mode",
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Not Ready (503)

{
  "ready": false,
  "message": "Application not ready",
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Use Case

Configure Kubernetes readiness probe:

readinessProbe:
  httpGet:
    path: /api/v1/health/ready
    port: 3001
  initialDelaySeconds: 5
  periodSeconds: 10

Liveness Probe

Simple liveness check. Returns 200 if the process is alive.

GET /api/v1/health/live

Response

Alive (200)

{
  "live": true,
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Use Case

Configure Kubernetes liveness probe:

livenessProbe:
  httpGet:
    path: /api/v1/health/live
    port: 3001
  initialDelaySeconds: 15
  periodSeconds: 20

Prometheus Metrics

Expose metrics in Prometheus format for monitoring.

GET /api/v1/metrics

Response

# HELP process_cpu_user_seconds_total Total user CPU time spent
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 12.34

# HELP process_resident_memory_bytes Resident memory size in bytes
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 125829120

# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1523
http_requests_total{method="POST",status="201"} 89

# ... more metrics

Prometheus Configuration

scrape_configs:
  - job_name: 'nexgent'
    static_configs:
      - targets: ['localhost:3001']
    metrics_path: /api/v1/metrics
    scrape_interval: 15s

Monitoring Examples

Load Balancer Health Check

# Simple health check for load balancers
curl -f http://localhost:3001/api/v1/health/live || exit 1

Docker Compose Healthcheck

services:
  backend:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3001/api/v1/health/ready"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Alerting Rules

# Prometheus alerting rules
groups:
  - name: nexgent
    rules:
      - alert: NexgentDown
        expr: up{job="nexgent"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Nexgent is down"
      
      - alert: NexgentDegraded
        expr: nexgent_health_status == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Nexgent running in degraded mode"

Related Endpoints