feat(agent): native add-on pushed-artifact delivery + resilience (#3425) #3451
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!3451
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/native-addon-delivery-models"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements the agent-side pushed-artifact delivery model and its resilience for native add-ons — the OpenSpec change
add-native-addon-delivery-models(issue #3425), the first follow-up to the framework PR #3447.Until now the agent only ran
agent_sidecaradd-ons whose binary already existed on the host. This branch lets the control plane deliver a signed add-on artifact that the agent fetches, verifies, stages, and runs.What's in this PR
go/pkg/agent/addon_activation.go): fetch the artifact from object storage → verifysha256(and ed25519 signature when present) → stage under<runtime-root>/addons/<id>/versions/<version>with an atomiccurrentsymlink → hand the resolved binary to the go-plugin supervisor.agent_config_generator.ex):AddonAssignmentConfiggainsartifact_object_key/artifact_sha256/artifact_signature/target_os/target_arch; the generator selects the per-arch artifact from the package'sartifactsmap by the agent'smetadata.os/archand emits it; the reference joins the config version hash.currentbinary instead of tearing down a running add-on (and survives reboots while the store is down).addons.local.json): operator break-glass / dev pinning that takes precedence over pushed assignments byaddon_id; a malformed file is ignored so it can't break pushed delivery.Reuse over reinvention
Per review guidance, this leans on existing infrastructure rather than reimplementing it:
ObjectStore.DownloadObject(the bumblebee catalog-staging interface),hashutilfor digests, the agent release ed25519 trust root for signature verification, andresolveReleaseRuntimeRootfor the staging root.Validation
go build/go vet/go test+golangci-lint(0 issues) +gofmt— green.mix compile --warnings-as-errors— green.openspec validate add-native-addon-delivery-models --strict— passes.Deferred (need root / systemd / build-packaging / signing keys)
Tracked in the change's
tasks.md: file-capability application viaagent-updater(setcap, needs root), version rollback (health-feedback loop), systemd-service/timer/ephemeral-helper supervision,config-toggle/os-packagedispatch (best landed with the remote-access migration), and the base-agent packaging carve. These are better implemented and verified in an environment with that infrastructure.DCO: I can add
Signed-off-bytrailers if the DCO check requires them.🤖 Generated with Claude Code
Addresses the confirmed findings from a multi-agent review of this branch: P1 path traversal: addon_id and version arrive from the control plane and become path segments under the staging root. They are now validated as safe single segments (no separators, no . or .. ) via safeAddonSegment/ErrAddonUnsafePath BEFORE any fetch or filesystem write, in both stageAddonArtifact and lastKnownGoodAddonBinary, so a crafted addon_id=../../etc or version=.. can no longer escape the staging root. P1 last-known-good correctness: on a delivery/verification failure the agent previously paired the OLD staged binary with the NEW assignment config/args. It now caches the last fully-verified spec per add-on (rememberAddonSpec/lastGoodAddonSpec) and reuses it verbatim on fallback, so a transient failure keeps the add-on running exactly as before instead of reconfiguring an old binary with incompatible config. P2 incomplete artifact reference: the generator now emits an artifact reference only when both object_key and sha256 are present, so the agent is never told to fetch something it cannot verify. Tests: path-traversal rejection (5 cases, asserts nothing created under root), fail-closed signature when a signature is supplied but no verification key is configured, local override enabled=false disables a pushed add-on, last-known-good spec cache, and the generator incomplete-artifact case. Go build/vet/test + golangci-lint (0 issues) + gofmt green; generator suite 37 tests / 0 failures on srql-fixtures. Reviewed-and-dismissed as false positives: the resolve_agent_platform nil-metadata guard (is_map/2 already routes nil to {nil,nil}), proto field numbering (per-message, no collision), reuse of the release ed25519 trust root (intentional and fails closed), and the symlink TOCTOU (atomic rename). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>Second review pass (xhigh recall) on the pushed-artifact delivery branch surfaced real issues beyond the first review; fixed: P1 binary_path basename traversal: addonBinaryName accepted a basename of '..', which filepath.Join(versionDir, '..') would resolve outside the version dir. The derived name is now validated with safeAddonSegment and falls back to the synthesized name when unsafe. P1 cache-miss fallback paired old binary with new config: the lastKnownGoodAddonBinary path reused the on-disk binary but built the spec from the NEW assignment (new version/args/config, and a re-derived filename that could miss). Removed that brittle path entirely — on a delivery failure the agent reuses the cached fully-verified spec verbatim, or skips when there is no cached spec (a running add-on always has one). Single, consistent fallback semantic. P2 local override blanked omitted fields: the override now patches only the fields it specifies onto the matching pushed assignment (config/capabilities/etc. inherited) instead of replacing the whole assignment with zeros. P2 cache eviction + empty-configDir guard: the last-good cache is pruned to currently-assigned ids so a removed/re-added add-on cannot reuse a stale spec; the local override is skipped when no config dir is known (avoids a relative-path read). Dismissed as false positives (with reasoning): the order-of-validation claim (validation IS before download), the resolve_agent_platform nil guard (is_map/2 routes nil to {nil,nil}), and the reuse of the release ed25519 trust root (intentional, fails closed). Tests: binary_path traversal fallback, addonBinaryName unsafe-base rejection, override merge preserves pushed config/capabilities, override enabled=false disables-but-preserves, cache prune. go build/vet/test + golangci-lint (0 issues) + gofmt green; openspec validate --strict green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>