feat: resource checker (Container/VM Resource Efficiency Tracker) #223
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar#223
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub.
Original GitHub issue: #610
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/610
Original created: 2025-04-14T14:30:29Z
Monitors resource efficiency for containers or VMs on Proxmox (e.g., LXC, QEMU) by querying runtime metrics (e.g., CPU %, memory %, I/O wait) via Proxmox APIs or lxc commands. Reports efficiency scores (e.g., CPU usage vs. allocated) and bottlenecks for dashboard visualization, stored in SQLite.
Visuals:
Radar Chart: Show efficiency across CPU, memory, I/O for each container/VM, comparing usage vs. limits.
Donut Chart: Display resource utilization % (e.g., 70% CPU, 50% memory), highlighting over/under-provisioning.
Gantt Chart: Track bottleneck events (e.g., CPU throttling, I/O wait spikes) over time.
Engagement: Resource efficiency is a hot topic for virtualization admins, especially on Proxmox. Radar and donut charts feel analytical and modern, beating Nagios’ text outputs and rivaling SolarWinds’ VM monitoring.
Data Appeal: Multi-dimensional charts and colorful donuts make the dashboard a go-to for optimizing Proxmox clusters.
Value Proposition:
Unique Niche: Enhances sysmon (host-level CPU/disk/ZFS) by focusing on container/VM granularity, not covered by rperf, snmp, or dusk. Critical for Proxmox optimization.
Lightweight: ~300 bytes per check every 5m (e.g., cpu_efficiency: 0.7, mem_efficiency: 0.5). ~0.05 MB/day/container, fits SQLite’s 24 GB/day even with 100,000 containers.
Proxmox Fit: Native fit for LXC/QEMU, leveraging libzetta or Proxmox APIs, complementing sysmon’s ZFS metrics.
Security: mTLS for gRPC, API token auth for Proxmox (tls-security.md).
SolarWinds/Nagios Fit: Matches SolarWinds’ VM Manager visuals but simpler, more automated than Nagios’ check_lxc or manual scripts.
Dashboard Impact: Radar and donut charts offer clear insights into resource usage, encouraging efficient provisioning.
Implementation:
Logic (pkg/checker/resource/resource.go):
Query Proxmox API (github.com/Telmate/proxmox-api-go) or lxc commands for CPU %, memory %, I/O wait.
Calculate efficiency: usage / allocated (e.g., 700m CPU used / 1000m allocated = 0.7).
Detect bottlenecks (e.g., I/O wait > 10%).
Implement checker.HealthChecker:
Check: True if efficiency > 0.2 (not idle) and no bottlenecks.
GetStatusData: JSON with {cpu_efficiency, mem_efficiency, io_wait_percent, bottlenecks}.
Data: New resource_metrics table (pkg/db/db.go):
Example: INSERT INTO resource_metrics (poller_id, timestamp, container_id, metric_name, value) VALUES ('host1', '2025-04-14T12:00:00Z', 'ct101', 'cpu_efficiency', 0.7);
Config (/etc/serviceradar/checkers/resource.json):
Poller:
Storage: Add processResourceMetrics to core/server.go, storing efficiency metrics.
Dashboard Integration:
API Endpoint: Add /api/metrics/resource?poller_id=host1&container_id=ct101 to pkg/core/api/server.go.
Next.js UI:
Add ResourceDashboard component:
Donut chart: <PieChart data={[{name: 'CPU', value: metrics[0].cpu_efficiency}, {name: 'Memory', value: metrics[0].mem_efficiency}]} />.
Gantt chart: <GanttChart data={metrics.filter(m => m.bottlenecks.length > 0)} />.
Visual:
Radar chart comparing CPU/memory/I/O efficiency.
Donut chart showing resource usage balance.
Gantt bars for bottleneck events.
SolarWinds/Nagios Fit:
SolarWinds: Matches Virtualization Manager’s VM visuals but leaner, no heavy agents.
Nagios: Easier than check_lxc or check_vm, with richer charts.
ServiceRadar: Optimizes Proxmox resources, visual like SolarWinds, automated like Nagios.
Pros:
Visual Appeal: Radar and donut charts are analytical and engaging.
Unique: Container/VM focus enhances sysmon without overlap.
Ultra-Lightweight: ~0.05 MB/day/container, SQLite-friendly.
Proxmox: Core strength, optimizing LXC/QEMU.
Targeted: Actionable for virtualization admins.
Implementation Plan for resource-checker
Checker Logic (pkg/checker/resource/resource.go):
Use github.com/Telmate/proxmox-api-go for LXC/QEMU metrics (CPU, memory, I/O).
Calculate efficiency: usage / allocated (e.g., 0.7 for 700m/1000m CPU).
Detect bottlenecks (e.g., I/O wait > 10%, CPU throttle events).
Implement checker.HealthChecker:
Check: True if efficiency > 0.2 and no bottlenecks.
GetStatusData: {cpu_efficiency, mem_efficiency, io_wait_percent, bottlenecks}.
Register with checker.Registry.
gRPC Service (cmd/checkers/resource/main.go):
gRPC server on :50092.
Register proto.AgentServiceServer with health/status.
Use lifecycle.RunServer for systemd.
Storage:
Add resource_metrics table to pkg/db/db.go.
Add processResourceMetrics to core/server.go, storing efficiency metrics.
Build Script (scripts/setup-deb-resource.sh):
Model after rperf-client.sh:
Build Go binary with go build.
Create .deb with serviceradar-resource-checker.service.
Include /etc/serviceradar/checkers/resource.json.example.
Update Makefile: add build-resource-checker, deb-resource-checker.
Dashboard UI:
Extend /pkg/core/api/server.go with /api/metrics/resource.
Add ResourceDashboard to /pkg/core/api/web/src/pages:
Sample Dashboard Visuals
Radar Chart:
Triangular plot showing efficiency balance.
Donut Chart:
Circular view of resource usage.
Gantt Chart:
Timeline of CPU/I/O issues.