carverauto/serviceradar

Fork 0

feat: resource checker (Container/VM Resource Efficiency Tracker) #223

New issue

Open

opened 2026-03-28 04:22:28 +00:00 by mfreeman451 · 0 comments

mfreeman451 commented

2026-03-28 04:22:28 +00:00

Owner

Imported from GitHub.

Original GitHub issue: #610
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/610
Original created: 2025-04-14T14:30:29Z

Monitors resource efficiency for containers or VMs on Proxmox (e.g., LXC, QEMU) by querying runtime metrics (e.g., CPU %, memory %, I/O wait) via Proxmox APIs or lxc commands. Reports efficiency scores (e.g., CPU usage vs. allocated) and bottlenecks for dashboard visualization, stored in SQLite.

Visuals:
Radar Chart: Show efficiency across CPU, memory, I/O for each container/VM, comparing usage vs. limits.

Donut Chart: Display resource utilization % (e.g., 70% CPU, 50% memory), highlighting over/under-provisioning.

Gantt Chart: Track bottleneck events (e.g., CPU throttling, I/O wait spikes) over time.

Engagement: Resource efficiency is a hot topic for virtualization admins, especially on Proxmox. Radar and donut charts feel analytical and modern, beating Nagios’ text outputs and rivaling SolarWinds’ VM monitoring.

Data Appeal: Multi-dimensional charts and colorful donuts make the dashboard a go-to for optimizing Proxmox clusters.

Value Proposition:

Unique Niche: Enhances sysmon (host-level CPU/disk/ZFS) by focusing on container/VM granularity, not covered by rperf, snmp, or dusk. Critical for Proxmox optimization.

Lightweight: ~300 bytes per check every 5m (e.g., cpu_efficiency: 0.7, mem_efficiency: 0.5). ~0.05 MB/day/container, fits SQLite’s 24 GB/day even with 100,000 containers.

Proxmox Fit: Native fit for LXC/QEMU, leveraging libzetta or Proxmox APIs, complementing sysmon’s ZFS metrics.

Security: mTLS for gRPC, API token auth for Proxmox (tls-security.md).

SolarWinds/Nagios Fit: Matches SolarWinds’ VM Manager visuals but simpler, more automated than Nagios’ check_lxc or manual scripts.

Dashboard Impact: Radar and donut charts offer clear insights into resource usage, encouraging efficient provisioning.

Implementation:

Logic (pkg/checker/resource/resource.go):
Query Proxmox API (github.com/Telmate/proxmox-api-go) or lxc commands for CPU %, memory %, I/O wait.

Calculate efficiency: usage / allocated (e.g., 700m CPU used / 1000m allocated = 0.7).

Detect bottlenecks (e.g., I/O wait > 10%).

Implement checker.HealthChecker:
Check: True if efficiency > 0.2 (not idle) and no bottlenecks.

GetStatusData: JSON with {cpu_efficiency, mem_efficiency, io_wait_percent, bottlenecks}.

Data: New resource_metrics table (pkg/db/db.go):


CREATE TABLE resource_metrics (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    poller_id TEXT NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    container_id TEXT NOT NULL,
    metric_name TEXT NOT NULL,
    value REAL NOT NULL,
    FOREIGN KEY (poller_id) REFERENCES pollers(poller_id) ON DELETE CASCADE
);

Example: INSERT INTO resource_metrics (poller_id, timestamp, container_id, metric_name, value) VALUES ('host1', '2025-04-14T12:00:00Z', 'ct101', 'cpu_efficiency', 0.7);

Config (/etc/serviceradar/checkers/resource.json):


{
  "listen_addr": ":50092",
  "security": {
    "mode": "mtls",
    "cert_dir": "/etc/serviceradar/certs",
    "role": "checker",
    "tls": {
      "cert_file": "resource-checker.pem",
      "key_file": "resource-checker-key.pem",
      "ca_file": "root.pem"
    }
  },
  "targets": [
    {
      "name": "lxc-ct101",
      "type": "lxc",
      "container_id": "ct101",
      "poll_interval": "5m",
      "timeout": "10s",
      "proxmox_api": {
        "endpoint": "https://proxmox.example.com:8006",
        "token": "user@pve!token=uuid"
      }
    }
  ]
}

Poller:


{
  "agents": {
    "local-agent": {
      "checks": [
        {
          "service_type": "resource",
          "service_name": "lxc-ct101",
          "details": "{\"type\": \"lxc\", \"container_id\": \"ct101\"}"
        }
      ]
    }
  }
}

Storage: Add processResourceMetrics to core/server.go, storing efficiency metrics.

Dashboard Integration:
API Endpoint: Add /api/metrics/resource?poller_id=host1&container_id=ct101 to pkg/core/api/server.go.

Next.js UI:
Add ResourceDashboard component:


import { RadarChart, PolarGrid, PolarAngleAxis, Radar } from 'recharts';
function ResourceDashboard({ metrics }) {
  const radarData = [
    { metric: 'CPU', value: metrics[0].cpu_efficiency * 100 },
    { metric: 'Memory', value: metrics[0].mem_efficiency * 100 },
    { metric: 'I/O', value: (1 - metrics[0].io_wait_percent / 100) * 100 }
  ];
  return (
    <RadarChart width={400} height={400} data={radarData}>
      <PolarGrid />
      <PolarAngleAxis dataKey="metric" />
      <Radar name="Efficiency" dataKey="value" stroke="#4caf50" fill="#4caf50" fillOpacity={0.6} />
    </RadarChart>
  );
}

Donut chart: <PieChart data={[{name: 'CPU', value: metrics[0].cpu_efficiency}, {name: 'Memory', value: metrics[0].mem_efficiency}]} />.

Gantt chart: <GanttChart data={metrics.filter(m => m.bottlenecks.length > 0)} />.

Visual:
Radar chart comparing CPU/memory/I/O efficiency.

Donut chart showing resource usage balance.

Gantt bars for bottleneck events.

SolarWinds/Nagios Fit:
SolarWinds: Matches Virtualization Manager’s VM visuals but leaner, no heavy agents.

Nagios: Easier than check_lxc or check_vm, with richer charts.

ServiceRadar: Optimizes Proxmox resources, visual like SolarWinds, automated like Nagios.

Pros:
Visual Appeal: Radar and donut charts are analytical and engaging.

Unique: Container/VM focus enhances sysmon without overlap.

Ultra-Lightweight: ~0.05 MB/day/container, SQLite-friendly.

Proxmox: Core strength, optimizing LXC/QEMU.

Targeted: Actionable for virtualization admins.

Implementation Plan for resource-checker
Checker Logic (pkg/checker/resource/resource.go):
Use github.com/Telmate/proxmox-api-go for LXC/QEMU metrics (CPU, memory, I/O).

Calculate efficiency: usage / allocated (e.g., 0.7 for 700m/1000m CPU).

Detect bottlenecks (e.g., I/O wait > 10%, CPU throttle events).

Implement checker.HealthChecker:
Check: True if efficiency > 0.2 and no bottlenecks.

GetStatusData: {cpu_efficiency, mem_efficiency, io_wait_percent, bottlenecks}.

gRPC Service (cmd/checkers/resource/main.go):
gRPC server on :50092.

Use lifecycle.RunServer for systemd.

Storage:
Add resource_metrics table to pkg/db/db.go.

Add processResourceMetrics to core/server.go, storing efficiency metrics.

Build Script (scripts/setup-deb-resource.sh):
Model after rperf-client.sh:
Build Go binary with go build.

Create .deb with serviceradar-resource-checker.service.

Include /etc/serviceradar/checkers/resource.json.example.

Update Makefile: add build-resource-checker, deb-resource-checker.

Dashboard UI:
Extend /pkg/core/api/server.go with /api/metrics/resource.

Add ResourceDashboard to /pkg/core/api/web/src/pages:


import { RadarChart, PolarGrid, PolarAngleAxis, Radar } from 'recharts';
function ResourceDashboard({ metrics }) {
  const radarData = [
    { metric: 'CPU', value: metrics[0].cpu_efficiency * 100 },
    { metric: 'Memory', value: metrics[0].mem_efficiency * 100 },
    { metric: 'I/O', value: (1 - metrics[0].io_wait_percent / 100) * 100 }
  ];
  return (
    <RadarChart width={400} height={400} data={radarData}>
      <PolarGrid />
      <PolarAngleAxis dataKey="metric" />
      <Radar name="Efficiency" dataKey="value" stroke="#4caf50" fill="#4caf50" fillOpacity={0.6} />
    </RadarChart>
  );
}

Sample Dashboard Visuals
Radar Chart:


<RadarChart data={[{metric: 'CPU', value: 70}, {metric: 'Memory', value: 50}, {metric: 'I/O', value: 90}]}>
  <Radar dataKey="value" fill="#4caf50" />
</RadarChart>

Triangular plot showing efficiency balance.

Donut Chart:


<PieChart>
  <Pie data={[{name: 'CPU', value: 70}, {name: 'Memory', value: 50}]} innerRadius={60} outerRadius={80} fill="#82ca9d" />
</PieChart>

Circular view of resource usage.

Gantt Chart:


<GanttChart data={metrics.filter(m => m.bottlenecks.length > 0).map(m => ({timestamp: m.timestamp, issue: m.bottlenecks[0]}))} />

Timeline of CPU/I/O issues.

Imported from GitHub. Original GitHub issue: #610 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/610 Original created: 2025-04-14T14:30:29Z --- Monitors resource efficiency for containers or VMs on Proxmox (e.g., LXC, QEMU) by querying runtime metrics (e.g., CPU %, memory %, I/O wait) via Proxmox APIs or lxc commands. Reports efficiency scores (e.g., CPU usage vs. allocated) and bottlenecks for dashboard visualization, stored in SQLite. Visuals: Radar Chart: Show efficiency across CPU, memory, I/O for each container/VM, comparing usage vs. limits. Donut Chart: Display resource utilization % (e.g., 70% CPU, 50% memory), highlighting over/under-provisioning. Gantt Chart: Track bottleneck events (e.g., CPU throttling, I/O wait spikes) over time. Engagement: Resource efficiency is a hot topic for virtualization admins, especially on Proxmox. Radar and donut charts feel analytical and modern, beating Nagios’ text outputs and rivaling SolarWinds’ VM monitoring. Data Appeal: Multi-dimensional charts and colorful donuts make the dashboard a go-to for optimizing Proxmox clusters. # Value Proposition: Unique Niche: Enhances sysmon (host-level CPU/disk/ZFS) by focusing on container/VM granularity, not covered by rperf, snmp, or dusk. Critical for Proxmox optimization. Lightweight: ~300 bytes per check every 5m (e.g., cpu_efficiency: 0.7, mem_efficiency: 0.5). ~0.05 MB/day/container, fits SQLite’s 24 GB/day even with 100,000 containers. Proxmox Fit: Native fit for LXC/QEMU, leveraging libzetta or Proxmox APIs, complementing sysmon’s ZFS metrics. Security: mTLS for gRPC, API token auth for Proxmox (tls-security.md). SolarWinds/Nagios Fit: Matches SolarWinds’ VM Manager visuals but simpler, more automated than Nagios’ check_lxc or manual scripts. Dashboard Impact: Radar and donut charts offer clear insights into resource usage, encouraging efficient provisioning. # Implementation: Logic (pkg/checker/resource/resource.go): Query Proxmox API (github.com/Telmate/proxmox-api-go) or lxc commands for CPU %, memory %, I/O wait. Calculate efficiency: usage / allocated (e.g., 700m CPU used / 1000m allocated = 0.7). Detect bottlenecks (e.g., I/O wait > 10%). Implement checker.HealthChecker: Check: True if efficiency > 0.2 (not idle) and no bottlenecks. GetStatusData: JSON with {cpu_efficiency, mem_efficiency, io_wait_percent, bottlenecks}. Data: New resource_metrics table (pkg/db/db.go): ```sql CREATE TABLE resource_metrics ( id INTEGER PRIMARY KEY AUTOINCREMENT, poller_id TEXT NOT NULL, timestamp TIMESTAMP NOT NULL, container_id TEXT NOT NULL, metric_name TEXT NOT NULL, value REAL NOT NULL, FOREIGN KEY (poller_id) REFERENCES pollers(poller_id) ON DELETE CASCADE ); ``` Example: INSERT INTO resource_metrics (poller_id, timestamp, container_id, metric_name, value) VALUES ('host1', '2025-04-14T12:00:00Z', 'ct101', 'cpu_efficiency', 0.7); Config (/etc/serviceradar/checkers/resource.json): ```json { "listen_addr": ":50092", "security": { "mode": "mtls", "cert_dir": "/etc/serviceradar/certs", "role": "checker", "tls": { "cert_file": "resource-checker.pem", "key_file": "resource-checker-key.pem", "ca_file": "root.pem" } }, "targets": [ { "name": "lxc-ct101", "type": "lxc", "container_id": "ct101", "poll_interval": "5m", "timeout": "10s", "proxmox_api": { "endpoint": "https://proxmox.example.com:8006", "token": "user@pve!token=uuid" } } ] } ``` Poller: ```json { "agents": { "local-agent": { "checks": [ { "service_type": "resource", "service_name": "lxc-ct101", "details": "{\"type\": \"lxc\", \"container_id\": \"ct101\"}" } ] } } } ``` Storage: Add processResourceMetrics to core/server.go, storing efficiency metrics. Dashboard Integration: API Endpoint: Add /api/metrics/resource?poller_id=host1&container_id=ct101 to pkg/core/api/server.go. Next.js UI: Add ResourceDashboard component: ```jsx import { RadarChart, PolarGrid, PolarAngleAxis, Radar } from 'recharts'; function ResourceDashboard({ metrics }) { const radarData = [ { metric: 'CPU', value: metrics[0].cpu_efficiency * 100 }, { metric: 'Memory', value: metrics[0].mem_efficiency * 100 }, { metric: 'I/O', value: (1 - metrics[0].io_wait_percent / 100) * 100 } ]; return ( <RadarChart width={400} height={400} data={radarData}> <PolarGrid /> <PolarAngleAxis dataKey="metric" /> <Radar name="Efficiency" dataKey="value" stroke="#4caf50" fill="#4caf50" fillOpacity={0.6} /> </RadarChart> ); } ``` Donut chart: <PieChart data={[{name: 'CPU', value: metrics[0].cpu_efficiency}, {name: 'Memory', value: metrics[0].mem_efficiency}]} />. Gantt chart: <GanttChart data={metrics.filter(m => m.bottlenecks.length > 0)} />. Visual: Radar chart comparing CPU/memory/I/O efficiency. Donut chart showing resource usage balance. Gantt bars for bottleneck events. SolarWinds/Nagios Fit: SolarWinds: Matches Virtualization Manager’s VM visuals but leaner, no heavy agents. Nagios: Easier than check_lxc or check_vm, with richer charts. ServiceRadar: Optimizes Proxmox resources, visual like SolarWinds, automated like Nagios. Pros: Visual Appeal: Radar and donut charts are analytical and engaging. Unique: Container/VM focus enhances sysmon without overlap. Ultra-Lightweight: ~0.05 MB/day/container, SQLite-friendly. Proxmox: Core strength, optimizing LXC/QEMU. Targeted: Actionable for virtualization admins. Implementation Plan for resource-checker Checker Logic (pkg/checker/resource/resource.go): Use github.com/Telmate/proxmox-api-go for LXC/QEMU metrics (CPU, memory, I/O). Calculate efficiency: usage / allocated (e.g., 0.7 for 700m/1000m CPU). Detect bottlenecks (e.g., I/O wait > 10%, CPU throttle events). Implement checker.HealthChecker: Check: True if efficiency > 0.2 and no bottlenecks. GetStatusData: {cpu_efficiency, mem_efficiency, io_wait_percent, bottlenecks}. Register with checker.Registry. gRPC Service (cmd/checkers/resource/main.go): gRPC server on :50092. Register proto.AgentServiceServer with health/status. Use lifecycle.RunServer for systemd. Storage: Add resource_metrics table to pkg/db/db.go. Add processResourceMetrics to core/server.go, storing efficiency metrics. Build Script (scripts/setup-deb-resource.sh): Model after rperf-client.sh: Build Go binary with go build. Create .deb with serviceradar-resource-checker.service. Include /etc/serviceradar/checkers/resource.json.example. Update Makefile: add build-resource-checker, deb-resource-checker. Dashboard UI: Extend /pkg/core/api/server.go with /api/metrics/resource. Add ResourceDashboard to /pkg/core/api/web/src/pages: ```jsx import { RadarChart, PolarGrid, PolarAngleAxis, Radar } from 'recharts'; function ResourceDashboard({ metrics }) { const radarData = [ { metric: 'CPU', value: metrics[0].cpu_efficiency * 100 }, { metric: 'Memory', value: metrics[0].mem_efficiency * 100 }, { metric: 'I/O', value: (1 - metrics[0].io_wait_percent / 100) * 100 } ]; return ( <RadarChart width={400} height={400} data={radarData}> <PolarGrid /> <PolarAngleAxis dataKey="metric" /> <Radar name="Efficiency" dataKey="value" stroke="#4caf50" fill="#4caf50" fillOpacity={0.6} /> </RadarChart> ); } ``` Sample Dashboard Visuals Radar Chart: ```jsx <RadarChart data={[{metric: 'CPU', value: 70}, {metric: 'Memory', value: 50}, {metric: 'I/O', value: 90}]}> <Radar dataKey="value" fill="#4caf50" /> </RadarChart> ``` Triangular plot showing efficiency balance. Donut Chart: ```jsx <PieChart> <Pie data={[{name: 'CPU', value: 70}, {name: 'Memory', value: 50}]} innerRadius={60} outerRadius={80} fill="#82ca9d" /> </PieChart> ``` Circular view of resource usage. Gantt Chart: ```jsx <GanttChart data={metrics.filter(m => m.bottlenecks.length > 0).map(m => ({timestamp: m.timestamp, issue: m.bottlenecks[0]}))} /> ``` Timeline of CPU/I/O issues.