feat: queue checker (Message Queue Health Monitor) #222

Open
opened 2026-03-28 04:22:28 +00:00 by mfreeman451 · 0 comments
Owner

Imported from GitHub.

Original GitHub issue: #609
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/609
Original created: 2025-04-14T14:12:28Z


Monitors message queue health (e.g., RabbitMQ, Redis Streams, Kafka) by checking queue depth, consumer lag, and publish rate. Reports metrics like messages queued, lag (ms), and rate (messages/sec) for dashboard visualization, stored in SQLite.

Visuals:
Gauge: Show queue depth with warning thresholds (e.g., green <1000 messages, red >5000).

Line Chart: Plot consumer lag and publish rate over time, highlighting backlog trends.

Stacked Bar: Compare queue depths across services, spotting bottlenecks.

Engagement: Queues are critical for async systems (e.g., microservices, ETL pipelines). Gauges and charts showing backlog or lag feel urgent and actionable, rivaling SolarWinds’ application monitoring.

Data Appeal: Bright gauges and trending lines make the dashboard lively, emphasizing system health.

Value Proposition:
Unique Niche: Targets message queues, untouched by sysmon (system), rperf (network), snmp (devices), or dusk (blockchain). Essential for event-driven apps on Proxmox.

Lightweight: ~200 bytes per check every 60s (e.g., queue_depth: 500, lag_ms: 100). ~0.3 MB/day/host, fits SQLite’s 24 GB/day.

Proxmox Fit: Monitors queues in containerized apps (e.g., RabbitMQ in LXC), complementing sysmon’s resource metrics.

Security: mTLS for gRPC, TLS/auth for queue APIs (tls-security.md).

SolarWinds/Nagios Fit: Matches SolarWinds’ app monitoring (e.g., RabbitMQ plugins) but simpler, more visual than Nagios’ check_rabbitmq.

Dashboard Impact: Gauges and charts for queue health are intuitive, showing system bottlenecks instantly.

Implementation:
Logic (pkg/checker/queue/queue.go):
Connect via client libraries (e.g., github.com/rabbitmq/amqp091-go, github.com/redis/go-redis/v9).

Fetch queue depth (messages ready), consumer lag (time since last consume), publish rate.

Implement checker.HealthChecker:
Check: True if depth < threshold (e.g., 10000).

GetStatusData: JSON with {queue_depth, lag_ms, publish_rate}.

Data: Store in timeseries_metrics:

INSERT INTO timeseries_metrics (poller_id, name, value, type, timestamp, metadata)
VALUES ('host1', 'queue_rabbit', 500.0, 'queue', '2025-04-14T12:00:00Z', '{"lag_ms": 100, "publish_rate": 10.0}');

Config (/etc/serviceradar/checkers/queue.json):


{
  "listen_addr": ":50091",
  "security": {
    "mode": "mtls",
    "cert_dir": "/etc/serviceradar/certs",
    "role": "checker",
    "tls": {
      "cert_file": "queue-checker.pem",
      "key_file": "queue-checker-key.pem",
      "ca_file": "root.pem"
    }
  },
  "targets": [
    {
      "name": "rabbit-queue",
      "type": "rabbitmq",
      "endpoint": "amqp://user:pass@localhost:5672",
      "queue": "tasks",
      "poll_interval": "60s",
      "timeout": "5s",
      "depth_threshold": 10000
    }
  ]
}

Poller:


{
  "agents": {
    "local-agent": {
      "checks": [
        {
          "service_type": "queue",
          "service_name": "rabbit-queue",
          "details": "{\"type\": \"rabbitmq\", \"endpoint\": \"amqp://user:pass@localhost:5672\", \"queue\": \"tasks\"}"
        }
      ]
    }
  }
}

Storage: Add processQueueMetrics to core/server.go, storing depth/lag/rate.

Dashboard Integration:
API Endpoint: Add /api/metrics/queue?poller_id=host1&name=rabbit-queue to pkg/core/api/server.go.

Imported from GitHub. Original GitHub issue: #609 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/609 Original created: 2025-04-14T14:12:28Z --- Monitors message queue health (e.g., RabbitMQ, Redis Streams, Kafka) by checking queue depth, consumer lag, and publish rate. Reports metrics like messages queued, lag (ms), and rate (messages/sec) for dashboard visualization, stored in SQLite. Visuals: Gauge: Show queue depth with warning thresholds (e.g., green <1000 messages, red >5000). Line Chart: Plot consumer lag and publish rate over time, highlighting backlog trends. Stacked Bar: Compare queue depths across services, spotting bottlenecks. Engagement: Queues are critical for async systems (e.g., microservices, ETL pipelines). Gauges and charts showing backlog or lag feel urgent and actionable, rivaling SolarWinds’ application monitoring. Data Appeal: Bright gauges and trending lines make the dashboard lively, emphasizing system health. Value Proposition: Unique Niche: Targets message queues, untouched by sysmon (system), rperf (network), snmp (devices), or dusk (blockchain). Essential for event-driven apps on Proxmox. Lightweight: ~200 bytes per check every 60s (e.g., queue_depth: 500, lag_ms: 100). ~0.3 MB/day/host, fits SQLite’s 24 GB/day. Proxmox Fit: Monitors queues in containerized apps (e.g., RabbitMQ in LXC), complementing sysmon’s resource metrics. Security: mTLS for gRPC, TLS/auth for queue APIs (tls-security.md). SolarWinds/Nagios Fit: Matches SolarWinds’ app monitoring (e.g., RabbitMQ plugins) but simpler, more visual than Nagios’ check_rabbitmq. Dashboard Impact: Gauges and charts for queue health are intuitive, showing system bottlenecks instantly. Implementation: Logic (pkg/checker/queue/queue.go): Connect via client libraries (e.g., github.com/rabbitmq/amqp091-go, github.com/redis/go-redis/v9). Fetch queue depth (messages ready), consumer lag (time since last consume), publish rate. Implement checker.HealthChecker: Check: True if depth < threshold (e.g., 10000). GetStatusData: JSON with {queue_depth, lag_ms, publish_rate}. Data: Store in timeseries_metrics: ```sql INSERT INTO timeseries_metrics (poller_id, name, value, type, timestamp, metadata) VALUES ('host1', 'queue_rabbit', 500.0, 'queue', '2025-04-14T12:00:00Z', '{"lag_ms": 100, "publish_rate": 10.0}'); ``` Config (/etc/serviceradar/checkers/queue.json): ```json { "listen_addr": ":50091", "security": { "mode": "mtls", "cert_dir": "/etc/serviceradar/certs", "role": "checker", "tls": { "cert_file": "queue-checker.pem", "key_file": "queue-checker-key.pem", "ca_file": "root.pem" } }, "targets": [ { "name": "rabbit-queue", "type": "rabbitmq", "endpoint": "amqp://user:pass@localhost:5672", "queue": "tasks", "poll_interval": "60s", "timeout": "5s", "depth_threshold": 10000 } ] } ``` Poller: ```json { "agents": { "local-agent": { "checks": [ { "service_type": "queue", "service_name": "rabbit-queue", "details": "{\"type\": \"rabbitmq\", \"endpoint\": \"amqp://user:pass@localhost:5672\", \"queue\": \"tasks\"}" } ] } } } ``` Storage: Add processQueueMetrics to core/server.go, storing depth/lag/rate. Dashboard Integration: API Endpoint: Add /api/metrics/queue?poller_id=host1&name=rabbit-queue to pkg/core/api/server.go.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#222
No description provided.