Health & Metrics

Monitor system health and collect metrics for observability.

Endpoints Overview

Method	Endpoint	Description	Auth
`GET`	`/api/v1/health`	Comprehensive health check	None
`GET`	`/api/v1/health/ready`	Readiness probe	None
`GET`	`/api/v1/health/live`	Liveness probe	None
`GET`	`/api/v1/metrics`	Prometheus metrics	None

Health and metrics endpoints do not require authentication, making them suitable for load balancers, Kubernetes probes, and monitoring systems.

Health Check

Comprehensive health status of the application and dependencies.

GET /api/v1/health

Response

Healthy (200)

{
  "status": "healthy",
  "timestamp": "2025-01-20T10:30:00.000Z",
  "uptime": 86400,
  "services": {
    "database": {
      "status": "healthy",
      "latency": 5
    },
    "redis": {
      "status": "healthy",
      "latency": 2
    },
    "queue": {
      "status": "healthy",
      "latency": 3
    }
  }
}

Degraded (200)

{
  "status": "degraded",
  "timestamp": "2025-01-20T10:30:00.000Z",
  "uptime": 86400,
  "services": {
    "database": {
      "status": "healthy",
      "latency": 5
    },
    "redis": {
      "status": "unhealthy"
    },
    "queue": {
      "status": "healthy",
      "latency": 3
    }
  }
}

Unhealthy (503)

{
  "status": "unhealthy",
  "timestamp": "2025-01-20T10:30:00.000Z",
  "uptime": 86400,
  "services": {
    "database": {
      "status": "unhealthy"
    },
    "redis": {
      "status": "unhealthy"
    },
    "queue": {
      "status": "unhealthy"
    }
  }
}

Response Fields

Field	Type	Description
`status`	string	Overall status: `healthy`, `degraded`, `unhealthy`
`timestamp`	ISO 8601	Current server time
`uptime`	integer	Server uptime in seconds
`services.database`	object	PostgreSQL status
`services.redis`	object	Redis status
`services.queue`	object	BullMQ status

Status Logic

Condition	Status	HTTP Code
All services healthy	`healthy`	200
Database healthy, others unhealthy	`degraded`	200
Database unhealthy	`unhealthy`	503

Readiness Probe

Kubernetes-style readiness check. Returns 200 when ready to accept traffic.

GET /api/v1/health/ready

Response

Ready (200)

{
  "ready": true,
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Ready but Degraded (200)

{
  "ready": true,
  "degraded": true,
  "message": "Redis unavailable, running in degraded mode",
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Not Ready (503)

{
  "ready": false,
  "message": "Application not ready",
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Use Case

Configure Kubernetes readiness probe:

readinessProbe:
  httpGet:
    path: /api/v1/health/ready
    port: 3001
  initialDelaySeconds: 5
  periodSeconds: 10

Liveness Probe

Simple liveness check. Returns 200 if the process is alive.

GET /api/v1/health/live

Response

Alive (200)

{
  "live": true,
  "timestamp": "2025-01-20T10:30:00.000Z"
}

Use Case

Configure Kubernetes liveness probe:

livenessProbe:
  httpGet:
    path: /api/v1/health/live
    port: 3001
  initialDelaySeconds: 15
  periodSeconds: 20

Prometheus Metrics

Expose metrics in Prometheus format for monitoring.

GET /api/v1/metrics

Response

# HELP process_cpu_user_seconds_total Total user CPU time spent
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 12.34

# HELP process_resident_memory_bytes Resident memory size in bytes
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 125829120

# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1523
http_requests_total{method="POST",status="201"} 89

# ... more metrics

Prometheus Configuration

scrape_configs:
  - job_name: 'nexgent'
    static_configs:
      - targets: ['localhost:3001']
    metrics_path: /api/v1/metrics
    scrape_interval: 15s

Monitoring Examples

Load Balancer Health Check

# Simple health check for load balancers
curl -f http://localhost:3001/api/v1/health/live || exit 1

Docker Compose Healthcheck

services:
  backend:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3001/api/v1/health/ready"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Alerting Rules

# Prometheus alerting rules
groups:
  - name: nexgent
    rules:
      - alert: NexgentDown
        expr: up{job="nexgent"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Nexgent is down"
      
      - alert: NexgentDegraded
        expr: nexgent_health_status == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Nexgent running in degraded mode"

Related Endpoints

Data Sources - Check data source status
Price Feeds - Check price feed health

Price Feeds Connection