E2E edge onboarding with docker poller/agent #633
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar#633
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub.
Original GitHub issue: #1911
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911
Original created: 2025-10-30T23:36:36Z
Summary
Exercise the edge onboarding flow end-to-end by issuing packages from the demo namespace and bootstrapping docker-compose poller/agent workloads locally.
Details
demonamespace secrets to authenticate against/api/admin/edge-packages.serviceradar-coresurfaces both docker poller/agent entries in device inventory; investigate registry merge/clobber behavior when multiple services share the same IP.Resolves bd: serviceradar-56
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3470980300
Original created: 2025-10-31T01:44:22Z
Update: refreshed the demo-issued poller/agent packages and rebuilt the docker stack with the new edge-poller setup tweaks (POLLERS_POLLER_ID + kv disabled). The poller now registers with core as \ and gRPC health checks against the local agent succeed, but both packages are still stuck in \delivered. Core never flips to \activated\ because the agent is serving the poller SPIFFE ID so the TLS authorizer accepts it, while the package metadata still claims \spiffe://carverauto.dev/services/agent. Next step is to reconcile the SPIFFE IDs (either regenerate the agent package with the poller SVID or relax the client expectation) so activation events fire and device inventory can be revalidated.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3470980549
Original created: 2025-10-31T01:44:33Z
Update: refreshed the demo-issued poller/agent packages and rebuilt the docker stack with the new edge-poller setup tweaks (POLLERS_POLLER_ID + KV disabled). The poller now registers with core as
docker-poller-e2e-01and gRPC health checks against the local agent succeed, but both packages are still stuck indelivered. Core never flips toactivatedbecause the agent is serving the poller SPIFFE ID so the TLS authorizer accepts it, while the package metadata still claimsspiffe://carverauto.dev/services/agent. Next step is to reconcile the SPIFFE IDs (either regenerate the agent package with the poller SVID or relax the client expectation) so activation events fire and device inventory can be revalidated.Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471074531
Original created: 2025-10-31T02:31:31Z
Dug into the "delivered" state on docker poller/agent packages (fc348035-9092-4903-aaa6-c3e9105ce719 and bce4363a-da28-43bd-afce-9aa68f365b04). Even though the poller is reporting to core, we never mark the packages as activated – the core service only persisted the issued/delivered states. Added an activation hook in core that runs during device registration so the first poller/agent heartbeat promotes the package to activated, captures the source IP, and emits an audit event. Also added unit coverage in pkg/core to exercise the new flow. Once we cut a fresh serviceradar-core build and roll it into demo we should see both packages flip to activated in the admin UI.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471285022
Original created: 2025-10-31T04:19:28Z
0f61dd1e-bfc7-4448-8461-bb65e9f71509, agent5bd01c6d-91a2-4821-8154-bdce639ae894.edge-poller-restart.shusing the new artifacts. Poller successfully enrolls once upstream join token is refreshed.invalid server SPIFFE ID). Temporarily clearedKV_ADDRESS/KV_SPIFFE_IDto get the container to come up; agent now serves gRPC (serviceradar-agentbroadcasts ready state) but runs without KV linkage.transport: authentication handshake failed: unexpected ID "spiffe://carverauto.dev/services/poller"during/grpc.health.v1.Health/Check. Package metadata currently stamps the agent server SPIFFE ID asspiffe://carverauto.dev/ns/edge/docker-agent-e2e-01; need to reconcile what the agent actually expects versus what the poller presents.status:"delivered"; core never records activation. Device inventory still missing the docker agent entry.spiffe://carverauto.dev/services/pollercaller is accepted. Also need a follow-up pass to re-enable KV once the SPIFFE mismatch is sorted.Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471316742
Original created: 2025-10-31T04:39:48Z
Update: Edge package tooling now threads the nested SPIRE IDs all the way through. Fresh archives include NESTED_SPIRE_PARENT_ID/NESTED_SPIRE_DOWNSTREAM_SPIFFE_ID/NESTED_SPIRE_AGENT_SPIFFE_ID, the compose setup script rewrites poller-spire/env to match, and docker compose passes NESTED_SPIRE_AGENT_SPIFFE_ID into the poller entrypoint so it provisions the right workload entries. Once the demo packages are reissued we can rerun setup-edge-poller.sh, restore KV in edge-poller.env, bring the stack back up, and validate that both poller and agent packages reach and appear in device inventory.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471316961
Original created: 2025-10-31T04:39:57Z
Update: Edge package tooling now threads the nested SPIRE IDs end-to-end. Fresh archives include NESTED_SPIRE_PARENT_ID/NESTED_SPIRE_DOWNSTREAM_SPIFFE_ID/NESTED_SPIRE_AGENT_SPIFFE_ID, the compose setup script rewrites poller-spire/env to match, and docker compose passes NESTED_SPIRE_AGENT_SPIFFE_ID into the poller entrypoint so it provisions the right workload entries. Next step: reissue the demo packages, rerun setup-edge-poller.sh with the new env, restore KV in edge-poller.env, bring the stack back up, and verify both poller and agent packages transition to activated and appear in device inventory.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471461156
Original created: 2025-10-31T06:06:03Z
Edge onboarding compose stack is now running cleanly with the new packages. I revoked the previous artifacts and reissued poller ce492405-a4ed-404d-bece-7044a0bb7798 and agent 91bcd977-11da-4fc4-bf01-213d4c9a549b, adding dedicated unix:path selectors so the nested SPIRE server hands different SVIDs to the poller and agent processes. After downloading the archives and replaying docker/compose/edge-poller-restart.sh, the poller gRPC checks started passing (docker logs serviceradar-poller shows the agent service check completing successfully) and both packages report status:"activated" via /api/admin/edge-packages/*. Device inventory now surfaces default:172.19.0.2 with poller_id docker-poller-e2e-02, so the docker agent is visible in the UI. I left KV disabled in edge-poller.env for the moment: re-enabling it still crashes the agent with "invalid server SPIFFE ID" when it builds the KV client; we should follow up on that mismatch separately.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471868791
Original created: 2025-10-31T08:33:01Z
Status update (edge onboarding validation)
pkg/grpc/security.go; mixed-case server IDs (SPIFFE://…) now parse correctly.go test ./pkg/grpc/...passes.ce492405-a4ed-404d-bece-7044a0bb7798and agent91bcd977-11da-4fc4-bf01-213d4c9a549bpackages. Once the compose stack is stable both showstatus:"activated", and/api/deviceslistsdocker-poller-e2e-02plusdocker-agent-e2e-02.docker cp bin/serviceradar-agent …. That diverged from the GHCR image we deploy in demo and left the container restarting. We will not hand-copy binaries again.bazel build --config=remote //docker/images:agent_image_amd64thenbazel run --config=remote //docker/images:agent_image_amd64_push, followed bydocker/compose/edge-poller-restart.sh --env-file edge-poller.envso compose pulls the pushed digest.serviceradar-56as well):kubectl port-forward svc/serviceradar-core -n demo 18090:8090API_KEY=$(kubectl -n demo get secret serviceradar-secrets -o jsonpath='{.data.edge_api_key}' | base64 -d)ADMIN_PASS=$(kubectl -n demo get secret serviceradar-secrets -o jsonpath='{.data.admin_password}' | base64 -d)TOKEN=$(curl -s -X POST http://localhost:18090/auth/login -d '{"username":"admin","password":"'"$ADMIN_PASS"'"}' | jq -r .token)curl -s -X POST http://localhost:18090/api/admin/edge-packages -H "Authorization: Bearer $TOKEN" -H "X-Edge-Api-Key: $API_KEY" -H 'Content-Type: application/json' -d '{"label":"Docker E2E Poller 03","poller_id":"docker-poller-e2e-02"}'curl -s -X POST http://localhost:18090/api/admin/edge-packages -H ... -d '{"poller_id":"docker-poller-e2e-02","component_type":"agent","metadata":{"agent_spiffe_id":"spiffe://carverauto.dev/ns/edge/docker-agent-e2e-02"}}'curl -s -X POST http://localhost:18090/api/admin/edge-packages/$PACKAGE_ID/download -H ... -d '{"download_token":"..."}' --output edge-package.tar.gzdocker/compose/edge-poller-restart.sh --env-file edge-poller.envto bootstrap./api/admin/edge-packages/$AGENT_IDstays activated while KV sweeps run.Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3474787185
Original created: 2025-10-31T20:28:15Z
Additional Edge Onboarding Improvements
Following up on the initial E2E validation, we've made significant improvements to the edge onboarding process:
🎯 Completed Work
1. Organized Edge E2E Structure
/docker/compose/edge-e2e/directory structure2. Idempotent Setup Script
Created
docker/compose/edge-e2e/setup-edge-e2e.sh:./setup-edge-e2e.sh --package-id <uuid>3. Package Management Utility
Created
docker/compose/edge-e2e/manage-packages.sh:/api/admin/edge/packagesto/api/admin/edge-packages4. Critical Bug Fixes
SQL DELETE Syntax Error (pkg/db/pollers.go:195):
DELETE FROM table(pollers)toALTER TABLE pollers DELETELinting Errors:
RecordActivation()function to reduce cyclomatic complexity (31 → <15)findPackageForActivation(),updatePackageActivation(),recordActivationEvent()5. Comprehensive Documentation
Created three documentation files:
README.md- Quick reference and troubleshootingSETUP_GUIDE.md- Complete setup process with all configuration detailsFRICTION_POINTS.md- Detailed analysis of issues and proposed solutions🔍 Friction Points Identified
Critical (Fixed):
Critical (Needs Implementation):
known_pollersConfigMapMedium (Automated):
Low Priority:
📁 Files Created/Modified
docker/compose/edge-e2e/
├── setup-edge-e2e.sh ✅ NEW - Idempotent setup automation
├── manage-packages.sh ✅ NEW - Package management CLI
├── README.md ✅ NEW - Quick reference
├── SETUP_GUIDE.md ✅ NEW - Complete documentation
├── FRICTION_POINTS.md ✅ NEW - Detailed friction analysis
└── docker-compose.edge-e2e.yml ✅ MOVED - From project root
pkg/db/pollers.go ✅ FIXED - SQL DELETE syntax
pkg/core/edge_onboarding.go ✅ FIXED - Linting errors
🎯 Recommended Next Steps
allowed_poller_idto edge_packages table to eliminate manual ConfigMap updatesserviceradar-cli edgecommands for package management✅ What Works Now
The edge onboarding process is now well-organized, documented, and largely automated!
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3474876153
Original created: 2025-10-31T20:55:35Z
✅ E2E validation completed successfully.
Follow-up work to eliminate edge onboarding friction points is tracked in:
Closing this issue as the validation work is complete.