feat: uptime checker #221

New issue

Open

opened 2026-03-28 04:22:27 +00:00 by mfreeman451 · 0 comments

mfreeman451 commented

2026-03-28 04:22:27 +00:00

Owner

Imported from GitHub.

Original GitHub issue: #608
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/608
Original created: 2025-04-14T13:55:15Z

Tracks uptime and availability of services (e.g., HTTP APIs, TCP ports, gRPC endpoints) by periodically checking connectivity and calculating uptime percentage over sliding windows (e.g., 1h, 24h). Reports uptime (%), downtime incidents, and availability streaks for dashboard display.

Why Exciting for Dashboard?

Visuals:

Uptime Gauge: Show real-time uptime % (e.g., 99.95%) with green/yellow/red zones.

Bar Chart: Display uptime % across services over 24h, highlighting reliable vs. flaky services.

Timeline: Visualize downtime incidents (e.g., red bars for outages) over time.

Engagement: Uptime is a KPI everyone cares about—admins love seeing 99.99% in green. Downtime timelines add drama, showing when services faltered.

Data Appeal: Gauges and bars are eye-catching, making the dashboard feel alive with mission-critical metrics.

Value Proposition:

Unique Niche: Focuses on availability tracking, complementing sysmon (resource usage), rperf (performance), and snmp (devices). Unlike dusk, it’s generic for any service.

Lightweight: ~150 bytes per check every 60s (e.g., uptime_percent: 99.95, downtime_count: 0). ~0.2 MB/day/host, ideal for SQLite’s 24 GB/day.

Proxmox Fit: Tracks uptime of containerized apps or VMs, ensuring critical services (e.g., APIs, DBs) are reliable.

Security: mTLS for gRPC (tls-security.md), optional TLS/auth for service checks.

Dashboard Impact: Uptime gauges and downtime timelines are instantly understandable, boosting user confidence in service reliability.

Implementation:

Logic (pkg/checker/uptime/uptime.go):
Check service via HTTP HEAD, TCP connect, or gRPC health (google.golang.org/grpc/health).

Track successes/failures in-memory, calculate uptime % over sliding windows (1h, 24h).

Count downtime incidents (consecutive failures).

Implement checker.HealthChecker:
Check: True if service responds.

GetStatusData: JSON with {uptime_percent_1h, uptime_percent_24h, downtime_count, last_downtime}.

Data: Store in timeseries_metrics:

INSERT INTO timeseries_metrics (poller_id, name, value, type, timestamp, metadata)
VALUES ('host1', 'uptime_api', 99.95, 'uptime', '2025-04-14T12:00:00Z', '{"uptime_24h": 99.90, "downtime_count": 0}');

Config (/etc/serviceradar/checkers/uptime.json):

{
  "listen_addr": ":50088",
  "security": {
    "mode": "mtls",
    "cert_dir": "/etc/serviceradar/certs",
    "role": "checker",
    "tls": {
      "cert_file": "uptime-checker.pem",
      "key_file": "uptime-checker-key.pem",
      "ca_file": "root.pem"
    }
  },
  "targets": [
    {
      "name": "api-uptime",
      "endpoint": "http://api.example.com/health",
      "type": "http",
      "poll_interval": "60s",
      "timeout": "5s",
      "window_1h": "1h",
      "window_24h": "24h"
    }
  ]
}

Storage: Add processUptimeMetrics to core/server.go, storing uptime and incidents.

Dashboard Integration:
API Endpoint: Add /api/metrics/uptime?poller_id=host1&name=api-uptime to pkg/core/api/server.go.

Next.js UI:
Add UptimeDashboard component:
jsx

import { RadialBarChart, RadialBar } from 'recharts';
function UptimeDashboard({ metrics }) {
return (
<RadialBarChart width={300} height={300} data={[{name: 'Uptime', value: metrics[0].uptime_percent_1h}]}>

);
}

Bar chart for 24h uptime: <BarChart data={metrics.map(m => ({name: m.name, uptime: m.uptime_percent_24h}))} />.

Timeline for downtime: <Timeline data={metrics.filter(m => m.downtime_count > 0)} />.

Visual:
Radial gauge for 1h uptime % (e.g., 99.95% in green).

Bar chart comparing services’ 24h uptime.

Red timeline bars for downtime events.

Pros:
Visual Appeal: Uptime gauges and timelines are engaging and critical.

Unique: Focuses on availability, distinct from performance (rperf) or system (sysmon) metrics.

Ultra-Lightweight: ~0.2 MB/day/host, perfect for SQLite.

Proxmox: Ensures app reliability in containers/VMs.

Simple: No dependencies, just net/http or grpc/health.

Imported from GitHub. Original GitHub issue: #608 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/608 Original created: 2025-04-14T13:55:15Z --- Tracks uptime and availability of services (e.g., HTTP APIs, TCP ports, gRPC endpoints) by periodically checking connectivity and calculating uptime percentage over sliding windows (e.g., 1h, 24h). Reports uptime (%), downtime incidents, and availability streaks for dashboard display. Why Exciting for Dashboard? Visuals: Uptime Gauge: Show real-time uptime % (e.g., 99.95%) with green/yellow/red zones. Bar Chart: Display uptime % across services over 24h, highlighting reliable vs. flaky services. Timeline: Visualize downtime incidents (e.g., red bars for outages) over time. Engagement: Uptime is a KPI everyone cares about—admins love seeing 99.99% in green. Downtime timelines add drama, showing when services faltered. Data Appeal: Gauges and bars are eye-catching, making the dashboard feel alive with mission-critical metrics. # Value Proposition: Unique Niche: Focuses on availability tracking, complementing sysmon (resource usage), rperf (performance), and snmp (devices). Unlike dusk, it’s generic for any service. Lightweight: ~150 bytes per check every 60s (e.g., uptime_percent: 99.95, downtime_count: 0). ~0.2 MB/day/host, ideal for SQLite’s 24 GB/day. Proxmox Fit: Tracks uptime of containerized apps or VMs, ensuring critical services (e.g., APIs, DBs) are reliable. Security: mTLS for gRPC (tls-security.md), optional TLS/auth for service checks. Dashboard Impact: Uptime gauges and downtime timelines are instantly understandable, boosting user confidence in service reliability. # Implementation: Logic (pkg/checker/uptime/uptime.go): Check service via HTTP HEAD, TCP connect, or gRPC health (google.golang.org/grpc/health). Track successes/failures in-memory, calculate uptime % over sliding windows (1h, 24h). Count downtime incidents (consecutive failures). Implement checker.HealthChecker: Check: True if service responds. GetStatusData: JSON with {uptime_percent_1h, uptime_percent_24h, downtime_count, last_downtime}. Data: Store in timeseries_metrics: ```sql INSERT INTO timeseries_metrics (poller_id, name, value, type, timestamp, metadata) VALUES ('host1', 'uptime_api', 99.95, 'uptime', '2025-04-14T12:00:00Z', '{"uptime_24h": 99.90, "downtime_count": 0}'); ``` Config (/etc/serviceradar/checkers/uptime.json): ```json { "listen_addr": ":50088", "security": { "mode": "mtls", "cert_dir": "/etc/serviceradar/certs", "role": "checker", "tls": { "cert_file": "uptime-checker.pem", "key_file": "uptime-checker-key.pem", "ca_file": "root.pem" } }, "targets": [ { "name": "api-uptime", "endpoint": "http://api.example.com/health", "type": "http", "poll_interval": "60s", "timeout": "5s", "window_1h": "1h", "window_24h": "24h" } ] } ``` Storage: Add processUptimeMetrics to core/server.go, storing uptime and incidents. Dashboard Integration: API Endpoint: Add /api/metrics/uptime?poller_id=host1&name=api-uptime to pkg/core/api/server.go. Next.js UI: Add UptimeDashboard component: jsx import { RadialBarChart, RadialBar } from 'recharts'; function UptimeDashboard({ metrics }) { return ( <RadialBarChart width={300} height={300} data={[{name: 'Uptime', value: metrics[0].uptime_percent_1h}]}> <RadialBar minAngle={15} background clockWise dataKey="value" fill="#4caf50" /> </RadialBarChart> ); } Bar chart for 24h uptime: <BarChart data={metrics.map(m => ({name: m.name, uptime: m.uptime_percent_24h}))} />. Timeline for downtime: <Timeline data={metrics.filter(m => m.downtime_count > 0)} />. Visual: Radial gauge for 1h uptime % (e.g., 99.95% in green). Bar chart comparing services’ 24h uptime. Red timeline bars for downtime events. Pros: Visual Appeal: Uptime gauges and timelines are engaging and critical. Unique: Focuses on availability, distinct from performance (rperf) or system (sysmon) metrics. Ultra-Lightweight: ~0.2 MB/day/host, perfect for SQLite. Proxmox: Ensures app reliability in containers/VMs. Simple: No dependencies, just net/http or grpc/health.

mfreeman451 added the

good first issue

plug-in

wasm

labels

2026-03-28 04:22:27 +00:00