feat: Agent Fleet Management & Secure Self-Update System #805
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar#805
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub.
Original GitHub issue: #2406
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2406
Original created: 2026-01-20T07:21:24Z
This needs to really get framed into a much larger change here, where we can take full advantage of having a bi-directional/persistent grpc/websocket connection between agents and agent-gateway. Instead of polling every 5 minutes for config changes we can just signal to agents when we have one, reducing traffic.
We might also be able to use AshOban job scheduler to trigger collections on an agent but that might get a little tricky/verbose, maybe we're better off leaving that part alone, where we generate the config for the agent on-demand or on a change, signal down to the agent, and the config has the polling interval in it like we are currently doing.
https://github.com/werf/trdl
https://github.com/jpillora/overseer
PRD: Agent Fleet Management & Secure Self-Update System
1. Executive Summary
To support large-scale deployments (2,400+ nodes), our platform requires a centralized way to manage, configure, and update agents without relying on external orchestration tools like Ansible. This project introduces a Bi-directional gRPC Control Plane and an Automated Self-Update Mechanism that allows the Core Engine to orchestrate fleet-wide upgrades securely and reliably.
2. Problem Statement
Current agent updates require manual intervention or 3rd-party automation. At a scale of 2,400+ machines, manual updates are impossible. Furthermore, the current "polling" architecture (Agent -> Gateway) creates latency in management actions. We need a "Push" capability to trigger immediate actions (like emergency patches) across the fleet.
3. Goals & Objectives
4. Technical Architecture
4.1. Communication: Persistent gRPC Stream
The Agent-Gateway communication will be upgraded to use gRPC Bi-directional Streaming.
UpdateInstructionmessage down the stream.4.2. The "Sidecar Updater" Pattern
To avoid file-locking issues on Linux, the update process will involve two components:
4.3. Directory & Package Strategy
To remain compatible with RPM/DEB:
/opt/our-platform/./opt/our-platform/bin/current, which is a symlink to/opt/our-platform/bin/v1.0.0/agent.v1.1.0, verifies it, and updates thecurrentsymlink.5. Functional Requirements
5.1. Agent Capabilities
updaterprocess and exit gracefully.5.2. Core Engine / Gateway Capabilities
env:productionoros:ubuntu)5.3. Web UI Capabilities
6. Security Specification
6.1. Cryptographic Signing
All update binaries must be signed during the CI/CD build process.
6.2. Rollback Mechanism
updater(or the new agent itself) must revert the/currentsymlink to the previous version and restart.7. User Stories
8. Success Metrics
9. Implementation Phases
Updatersidecar and symlink logic for Linux.