Secure edge poller onboarding flow #628
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar#628
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub.
Original GitHub issue: #1903
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1903
Original created: 2025-10-28T20:18:39Z
PRD: Secure Edge Poller Onboarding
Background
docker/compose/edge-poller-restart.sh, which callskubectl execagainst the demo SPIRE server to mint a join token and downstream entry.Problem Statement
Provide an operator-approved workflow—accessible via Core UI and API—that issues short-lived credentials to edge poller deployments, tracks their lifecycle, and prevents unauthorized enrollment. The solution must remove direct kube dependencies from the edge host while preserving SPIRE trust guarantees.
Goals
Non-Goals
Personas
User Stories
Functional Requirements
edge-poller.env(or overrides), readme.config:writescope../edge-poller-install.sh --package edge-site-x.tar.gz).docker/compose/spire/, updates.env, runs restart helper.Non-Functional Requirements
UX / Flow
edge-poller-install.sh --package pkg.tar.gz(script does extraction + restart helper).API/Backend Tasks
edge_onboarding_packageswith fields (id, name, spiffe_id, join_token_id, entry_id, ttl, status, created_by, created_at, activated_at, revoked_at, metadata JSON).POST /api/edge-packages,GET /api/edge-packages,POST /api/edge-packages/{id}/revoke,GET /api/edge-packages/{id}/download.Frontend Tasks
Installer Tooling
docker/compose/edge-poller-install.sh(or update restart helper) to accept package archive, extract to appropriate directories, optionally prompt for CORE/KV override if different from package defaults.Security Considerations
Metrics / Success Criteria
Open Questions
Milestones
Dependencies
Risks / Mitigations
Acceptance Criteria
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1903#issuecomment-3458668536
Original created: 2025-10-28T21:54:34Z
Am wondering if we can automate the bundle download.. this becomes a bit tricky because the edge devices aren't necessarily supposed to be able to talk directly to the core, if we put it on the KV, before onboarding it wouldn't have the necessary credentials to get it from there either unless we did just a normal TLS interface on a different port for GRPC API calls, we would have to keep the request / token generation information in the KV so we could verify it some how.. this might be more complicated.. still would like to have a good way to do easier deployments, and hands-off deployments. The way it is today it can still easily be automated.
We could just try and hit the public/core API from the edge and try and make that a requirement (network access from edge agents to core API)
Another thought, is to let the poller orchestrate a lot of this, since the poller has to be onboarded first (we need a way to onboard pollers the same as edge agents btw), the poller can talk to the core GRPC API directly and get information about clients that are waiting to be onboarded. Part of the UI onboarding workflow should be selecting the poller as well that we expect the new edge agent/checker should be associated with. The poller would gather the onboarding information from the core and make it available to agents/checkers. Packages could be stored in NATS JetStream object store, reaper routine would watch the TTL/expiry on the token and then clean them up.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1903#issuecomment-3458840163
Original created: 2025-10-28T22:38:10Z
We also need to make sure that we are generating internal events that get published to NATS and show up in the UI under the events dashboard/console, for all of these onboarding events.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1903#issuecomment-3459492711
Original created: 2025-10-29T03:38:27Z
Edge onboarding backend is now lifecycle-aware:
Remaining work: trim (backfill path) into lifecycle options, finish UI/CLI flows, and script proton token automation.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1903#issuecomment-3459493022
Original created: 2025-10-29T03:38:40Z
Edge onboarding backend is now lifecycle-aware:
edgeOnboardingServiceimplements Start/Stop, runs a bounded refresh loop (5s timeout, 5m interval), and exposes a callback so the API’s dynamic poller list stays in sync.pkg/lifecycle, broadcasts updates viaSetDynamicPollers, and no longer blocks startup on a streaming query.SELECT ... FROM table(edge_onboarding_packages) FINALto satisfy changelog_kv semantics.Remaining work: trim
cmd/core/main.go(backfill path) into lifecycle options, finish UI/CLI flows, and script proton token automation.