E2E edge onboarding with docker poller/agent #633

Closed
opened 2026-03-28 04:26:41 +00:00 by mfreeman451 · 10 comments
Owner

Imported from GitHub.

Original GitHub issue: #1911
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911
Original created: 2025-10-30T23:36:36Z


Summary

Exercise the edge onboarding flow end-to-end by issuing packages from the demo namespace and bootstrapping docker-compose poller/agent workloads locally.

Details

  • Use the API key from the demo namespace secrets to authenticate against /api/admin/edge-packages.
  • Issue a poller package, consume it from a local docker poller, and confirm successful activation.
  • Issue an agent package tied to the docker poller and onboard it as well.
  • Verify serviceradar-core surfaces both docker poller/agent entries in device inventory; investigate registry merge/clobber behavior when multiple services share the same IP.
  • Expect follow-up changes in the device registry if inventory records are overwritten.

Resolves bd: serviceradar-56

Imported from GitHub. Original GitHub issue: #1911 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911 Original created: 2025-10-30T23:36:36Z --- ## Summary Exercise the edge onboarding flow end-to-end by issuing packages from the demo namespace and bootstrapping docker-compose poller/agent workloads locally. ## Details - Use the API key from the `demo` namespace secrets to authenticate against `/api/admin/edge-packages`. - Issue a poller package, consume it from a local docker poller, and confirm successful activation. - Issue an agent package tied to the docker poller and onboard it as well. - Verify `serviceradar-core` surfaces both docker poller/agent entries in device inventory; investigate registry merge/clobber behavior when multiple services share the same IP. - Expect follow-up changes in the device registry if inventory records are overwritten. Resolves bd: serviceradar-56
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3470980300
Original created: 2025-10-31T01:44:22Z


Update: refreshed the demo-issued poller/agent packages and rebuilt the docker stack with the new edge-poller setup tweaks (POLLERS_POLLER_ID + kv disabled). The poller now registers with core as \ and gRPC health checks against the local agent succeed, but both packages are still stuck in \delivered. Core never flips to \activated\ because the agent is serving the poller SPIFFE ID so the TLS authorizer accepts it, while the package metadata still claims \spiffe://carverauto.dev/services/agent. Next step is to reconcile the SPIFFE IDs (either regenerate the agent package with the poller SVID or relax the client expectation) so activation events fire and device inventory can be revalidated.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3470980300 Original created: 2025-10-31T01:44:22Z --- Update: refreshed the demo-issued poller/agent packages and rebuilt the docker stack with the new edge-poller setup tweaks (POLLERS_POLLER_ID + kv disabled). The poller now registers with core as \ and gRPC health checks against the local agent succeed, but both packages are still stuck in \delivered\. Core never flips to \activated\ because the agent is serving the poller SPIFFE ID so the TLS authorizer accepts it, while the package metadata still claims \spiffe://carverauto.dev/services/agent\. Next step is to reconcile the SPIFFE IDs (either regenerate the agent package with the poller SVID or relax the client expectation) so activation events fire and device inventory can be revalidated.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3470980549
Original created: 2025-10-31T01:44:33Z


Update: refreshed the demo-issued poller/agent packages and rebuilt the docker stack with the new edge-poller setup tweaks (POLLERS_POLLER_ID + KV disabled). The poller now registers with core as docker-poller-e2e-01 and gRPC health checks against the local agent succeed, but both packages are still stuck in delivered. Core never flips to activated because the agent is serving the poller SPIFFE ID so the TLS authorizer accepts it, while the package metadata still claims spiffe://carverauto.dev/services/agent. Next step is to reconcile the SPIFFE IDs (either regenerate the agent package with the poller SVID or relax the client expectation) so activation events fire and device inventory can be revalidated.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3470980549 Original created: 2025-10-31T01:44:33Z --- Update: refreshed the demo-issued poller/agent packages and rebuilt the docker stack with the new edge-poller setup tweaks (POLLERS_POLLER_ID + KV disabled). The poller now registers with core as `docker-poller-e2e-01` and gRPC health checks against the local agent succeed, but both packages are still stuck in `delivered`. Core never flips to `activated` because the agent is serving the poller SPIFFE ID so the TLS authorizer accepts it, while the package metadata still claims `spiffe://carverauto.dev/services/agent`. Next step is to reconcile the SPIFFE IDs (either regenerate the agent package with the poller SVID or relax the client expectation) so activation events fire and device inventory can be revalidated.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471074531
Original created: 2025-10-31T02:31:31Z


Dug into the "delivered" state on docker poller/agent packages (fc348035-9092-4903-aaa6-c3e9105ce719 and bce4363a-da28-43bd-afce-9aa68f365b04). Even though the poller is reporting to core, we never mark the packages as activated – the core service only persisted the issued/delivered states. Added an activation hook in core that runs during device registration so the first poller/agent heartbeat promotes the package to activated, captures the source IP, and emits an audit event. Also added unit coverage in pkg/core to exercise the new flow. Once we cut a fresh serviceradar-core build and roll it into demo we should see both packages flip to activated in the admin UI.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471074531 Original created: 2025-10-31T02:31:31Z --- Dug into the "delivered" state on docker poller/agent packages (fc348035-9092-4903-aaa6-c3e9105ce719 and bce4363a-da28-43bd-afce-9aa68f365b04). Even though the poller is reporting to core, we never mark the packages as activated – the core service only persisted the issued/delivered states. Added an activation hook in core that runs during device registration so the first poller/agent heartbeat promotes the package to activated, captures the source IP, and emits an audit event. Also added unit coverage in pkg/core to exercise the new flow. Once we cut a fresh serviceradar-core build and roll it into demo we should see both packages flip to activated in the admin UI.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471285022
Original created: 2025-10-31T04:19:28Z


  • Re-issued demo edge packages via Core admin API after revoking legacy poller/agent IDs; new package ids: poller 0f61dd1e-bfc7-4448-8461-bb65e9f71509, agent 5bd01c6d-91a2-4821-8154-bdce639ae894.
  • Reset compose stack (wiped volumes, refreshed nested SPIRE creds) and replayed edge-poller-restart.sh using the new artifacts. Poller successfully enrolls once upstream join token is refreshed.
  • Agent bootstrap was blocked by KV SPIFFE validation (invalid server SPIFFE ID). Temporarily cleared KV_ADDRESS/KV_SPIFFE_ID to get the container to come up; agent now serves gRPC (serviceradar-agent broadcasts ready state) but runs without KV linkage.
  • Poller↔agent handshake still fails: downstream TLS reports transport: authentication handshake failed: unexpected ID "spiffe://carverauto.dev/services/poller" during /grpc.health.v1.Health/Check. Package metadata currently stamps the agent server SPIFFE ID as spiffe://carverauto.dev/ns/edge/docker-agent-e2e-01; need to reconcile what the agent actually expects versus what the poller presents.
  • Because of the handshake failure both packages remain status:"delivered"; core never records activation. Device inventory still missing the docker agent entry.
  • Next step: decide whether to adjust package metadata (agent server/parent SPIFFE IDs) or relax the agent-side client allowlist so the poller’s spiffe://carverauto.dev/services/poller caller is accepted. Also need a follow-up pass to re-enable KV once the SPIFFE mismatch is sorted.
Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471285022 Original created: 2025-10-31T04:19:28Z --- - Re-issued demo edge packages via Core admin API after revoking legacy poller/agent IDs; new package ids: poller `0f61dd1e-bfc7-4448-8461-bb65e9f71509`, agent `5bd01c6d-91a2-4821-8154-bdce639ae894`. - Reset compose stack (wiped volumes, refreshed nested SPIRE creds) and replayed `edge-poller-restart.sh` using the new artifacts. Poller successfully enrolls once upstream join token is refreshed. - Agent bootstrap was blocked by KV SPIFFE validation (`invalid server SPIFFE ID`). Temporarily cleared `KV_ADDRESS`/`KV_SPIFFE_ID` to get the container to come up; agent now serves gRPC (`serviceradar-agent` broadcasts ready state) but runs without KV linkage. - Poller↔agent handshake still fails: downstream TLS reports `transport: authentication handshake failed: unexpected ID "spiffe://carverauto.dev/services/poller"` during `/grpc.health.v1.Health/Check`. Package metadata currently stamps the agent server SPIFFE ID as `spiffe://carverauto.dev/ns/edge/docker-agent-e2e-01`; need to reconcile what the agent actually expects versus what the poller presents. - Because of the handshake failure both packages remain `status:"delivered"`; core never records activation. Device inventory still missing the docker agent entry. - Next step: decide whether to adjust package metadata (agent server/parent SPIFFE IDs) or relax the agent-side client allowlist so the poller’s `spiffe://carverauto.dev/services/poller` caller is accepted. Also need a follow-up pass to re-enable KV once the SPIFFE mismatch is sorted.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471316742
Original created: 2025-10-31T04:39:48Z


Update: Edge package tooling now threads the nested SPIRE IDs all the way through. Fresh archives include NESTED_SPIRE_PARENT_ID/NESTED_SPIRE_DOWNSTREAM_SPIFFE_ID/NESTED_SPIRE_AGENT_SPIFFE_ID, the compose setup script rewrites poller-spire/env to match, and docker compose passes NESTED_SPIRE_AGENT_SPIFFE_ID into the poller entrypoint so it provisions the right workload entries. Once the demo packages are reissued we can rerun setup-edge-poller.sh, restore KV in edge-poller.env, bring the stack back up, and validate that both poller and agent packages reach and appear in device inventory.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471316742 Original created: 2025-10-31T04:39:48Z --- **Update:** Edge package tooling now threads the nested SPIRE IDs all the way through. Fresh archives include NESTED_SPIRE_PARENT_ID/NESTED_SPIRE_DOWNSTREAM_SPIFFE_ID/NESTED_SPIRE_AGENT_SPIFFE_ID, the compose setup script rewrites poller-spire/env to match, and docker compose passes NESTED_SPIRE_AGENT_SPIFFE_ID into the poller entrypoint so it provisions the right workload entries. Once the demo packages are reissued we can rerun setup-edge-poller.sh, restore KV in edge-poller.env, bring the stack back up, and validate that both poller and agent packages reach and appear in device inventory.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471316961
Original created: 2025-10-31T04:39:57Z


Update: Edge package tooling now threads the nested SPIRE IDs end-to-end. Fresh archives include NESTED_SPIRE_PARENT_ID/NESTED_SPIRE_DOWNSTREAM_SPIFFE_ID/NESTED_SPIRE_AGENT_SPIFFE_ID, the compose setup script rewrites poller-spire/env to match, and docker compose passes NESTED_SPIRE_AGENT_SPIFFE_ID into the poller entrypoint so it provisions the right workload entries. Next step: reissue the demo packages, rerun setup-edge-poller.sh with the new env, restore KV in edge-poller.env, bring the stack back up, and verify both poller and agent packages transition to activated and appear in device inventory.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471316961 Original created: 2025-10-31T04:39:57Z --- **Update:** Edge package tooling now threads the nested SPIRE IDs end-to-end. Fresh archives include NESTED_SPIRE_PARENT_ID/NESTED_SPIRE_DOWNSTREAM_SPIFFE_ID/NESTED_SPIRE_AGENT_SPIFFE_ID, the compose setup script rewrites poller-spire/env to match, and docker compose passes NESTED_SPIRE_AGENT_SPIFFE_ID into the poller entrypoint so it provisions the right workload entries. Next step: reissue the demo packages, rerun setup-edge-poller.sh with the new env, restore KV in edge-poller.env, bring the stack back up, and verify both poller and agent packages transition to activated and appear in device inventory.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471461156
Original created: 2025-10-31T06:06:03Z


Edge onboarding compose stack is now running cleanly with the new packages. I revoked the previous artifacts and reissued poller ce492405-a4ed-404d-bece-7044a0bb7798 and agent 91bcd977-11da-4fc4-bf01-213d4c9a549b, adding dedicated unix:path selectors so the nested SPIRE server hands different SVIDs to the poller and agent processes. After downloading the archives and replaying docker/compose/edge-poller-restart.sh, the poller gRPC checks started passing (docker logs serviceradar-poller shows the agent service check completing successfully) and both packages report status:"activated" via /api/admin/edge-packages/*. Device inventory now surfaces default:172.19.0.2 with poller_id docker-poller-e2e-02, so the docker agent is visible in the UI. I left KV disabled in edge-poller.env for the moment: re-enabling it still crashes the agent with "invalid server SPIFFE ID" when it builds the KV client; we should follow up on that mismatch separately.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471461156 Original created: 2025-10-31T06:06:03Z --- Edge onboarding compose stack is now running cleanly with the new packages. I revoked the previous artifacts and reissued poller ce492405-a4ed-404d-bece-7044a0bb7798 and agent 91bcd977-11da-4fc4-bf01-213d4c9a549b, adding dedicated unix:path selectors so the nested SPIRE server hands different SVIDs to the poller and agent processes. After downloading the archives and replaying docker/compose/edge-poller-restart.sh, the poller gRPC checks started passing (docker logs serviceradar-poller shows the agent service check completing successfully) and both packages report status:"activated" via /api/admin/edge-packages/*. Device inventory now surfaces default:172.19.0.2 with poller_id docker-poller-e2e-02, so the docker agent is visible in the UI. I left KV disabled in edge-poller.env for the moment: re-enabling it still crashes the agent with "invalid server SPIFFE ID" when it builds the KV client; we should follow up on that mismatch separately.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471868791
Original created: 2025-10-31T08:33:01Z


Status update (edge onboarding validation)

  • Fixed the SPIFFE normalization bug in pkg/grpc/security.go; mixed-case server IDs (SPIFFE://…) now parse correctly. go test ./pkg/grpc/... passes.
  • Reissued poller ce492405-a4ed-404d-bece-7044a0bb7798 and agent 91bcd977-11da-4fc4-bf01-213d4c9a549b packages. Once the compose stack is stable both show status:"activated", and /api/devices lists docker-poller-e2e-02 plus docker-agent-e2e-02.
  • Tried to hot-swap the agent binary with docker cp bin/serviceradar-agent …. That diverged from the GHCR image we deploy in demo and left the container restarting. We will not hand-copy binaries again.
  • Going forward we rebuild/push images only via Bazel: bazel build --config=remote //docker/images:agent_image_amd64 then bazel run --config=remote //docker/images:agent_image_amd64_push, followed by docker/compose/edge-poller-restart.sh --env-file edge-poller.env so compose pulls the pushed digest.
  • Manual package issuance playbook (documented in bd serviceradar-56 as well):
    1. kubectl port-forward svc/serviceradar-core -n demo 18090:8090
    2. API_KEY=$(kubectl -n demo get secret serviceradar-secrets -o jsonpath='{.data.edge_api_key}' | base64 -d)
    3. ADMIN_PASS=$(kubectl -n demo get secret serviceradar-secrets -o jsonpath='{.data.admin_password}' | base64 -d)
    4. TOKEN=$(curl -s -X POST http://localhost:18090/auth/login -d '{"username":"admin","password":"'"$ADMIN_PASS"'"}' | jq -r .token)
    5. Issue poller: curl -s -X POST http://localhost:18090/api/admin/edge-packages -H "Authorization: Bearer $TOKEN" -H "X-Edge-Api-Key: $API_KEY" -H 'Content-Type: application/json' -d '{"label":"Docker E2E Poller 03","poller_id":"docker-poller-e2e-02"}'
    6. Issue agent tied to the poller: curl -s -X POST http://localhost:18090/api/admin/edge-packages -H ... -d '{"poller_id":"docker-poller-e2e-02","component_type":"agent","metadata":{"agent_spiffe_id":"spiffe://carverauto.dev/ns/edge/docker-agent-e2e-02"}}'
    7. Download archives: curl -s -X POST http://localhost:18090/api/admin/edge-packages/$PACKAGE_ID/download -H ... -d '{"download_token":"..."}' --output edge-package.tar.gz
    8. Extract and run docker/compose/edge-poller-restart.sh --env-file edge-poller.env to bootstrap.
  • Next: rebuild + push agent image via Bazel, restart compose to pick up the digest, and re-enable KV once the stock container is green so we can confirm /api/admin/edge-packages/$AGENT_ID stays activated while KV sweeps run.
Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3471868791 Original created: 2025-10-31T08:33:01Z --- **Status update (edge onboarding validation)** - Fixed the SPIFFE normalization bug in `pkg/grpc/security.go`; mixed-case server IDs (`SPIFFE://…`) now parse correctly. `go test ./pkg/grpc/...` passes. - Reissued poller `ce492405-a4ed-404d-bece-7044a0bb7798` and agent `91bcd977-11da-4fc4-bf01-213d4c9a549b` packages. Once the compose stack is stable both show `status:"activated"`, and `/api/devices` lists `docker-poller-e2e-02` plus `docker-agent-e2e-02`. - Tried to hot-swap the agent binary with `docker cp bin/serviceradar-agent …`. That diverged from the GHCR image we deploy in demo and left the container restarting. We will not hand-copy binaries again. - Going forward we rebuild/push images only via Bazel: `bazel build --config=remote //docker/images:agent_image_amd64` then `bazel run --config=remote //docker/images:agent_image_amd64_push`, followed by `docker/compose/edge-poller-restart.sh --env-file edge-poller.env` so compose pulls the pushed digest. - Manual package issuance playbook (documented in bd `serviceradar-56` as well): 1. `kubectl port-forward svc/serviceradar-core -n demo 18090:8090` 2. `API_KEY=$(kubectl -n demo get secret serviceradar-secrets -o jsonpath='{.data.edge_api_key}' | base64 -d)` 3. `ADMIN_PASS=$(kubectl -n demo get secret serviceradar-secrets -o jsonpath='{.data.admin_password}' | base64 -d)` 4. `TOKEN=$(curl -s -X POST http://localhost:18090/auth/login -d '{"username":"admin","password":"'"$ADMIN_PASS"'"}' | jq -r .token)` 5. Issue poller: `curl -s -X POST http://localhost:18090/api/admin/edge-packages -H "Authorization: Bearer $TOKEN" -H "X-Edge-Api-Key: $API_KEY" -H 'Content-Type: application/json' -d '{"label":"Docker E2E Poller 03","poller_id":"docker-poller-e2e-02"}'` 6. Issue agent tied to the poller: `curl -s -X POST http://localhost:18090/api/admin/edge-packages -H ... -d '{"poller_id":"docker-poller-e2e-02","component_type":"agent","metadata":{"agent_spiffe_id":"spiffe://carverauto.dev/ns/edge/docker-agent-e2e-02"}}'` 7. Download archives: `curl -s -X POST http://localhost:18090/api/admin/edge-packages/$PACKAGE_ID/download -H ... -d '{"download_token":"..."}' --output edge-package.tar.gz` 8. Extract and run `docker/compose/edge-poller-restart.sh --env-file edge-poller.env` to bootstrap. - Next: rebuild + push agent image via Bazel, restart compose to pick up the digest, and re-enable KV once the stock container is green so we can confirm `/api/admin/edge-packages/$AGENT_ID` stays activated while KV sweeps run.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3474787185
Original created: 2025-10-31T20:28:15Z


Additional Edge Onboarding Improvements

Following up on the initial E2E validation, we've made significant improvements to the edge onboarding process:

🎯 Completed Work

1. Organized Edge E2E Structure

  • Created /docker/compose/edge-e2e/ directory structure
  • Moved all edge onboarding tools to organized location
  • Created comprehensive documentation suite

2. Idempotent Setup Script

Created docker/compose/edge-e2e/setup-edge-e2e.sh:

  • Fully automated edge deployment from package ID
  • Handles downloads, extraction, configuration, and deployment
  • Automatic DNS to LoadBalancer IP conversion
  • Automatic readable poller ID extraction from SPIFFE IDs
  • Supports clean restarts and credential refresh
  • Usage: ./setup-edge-e2e.sh --package-id <uuid>

3. Package Management Utility

Created docker/compose/edge-e2e/manage-packages.sh:

  • CLI tool for package operations: list, create, revoke, delete, activate
  • Fixed API paths from /api/admin/edge/packages to /api/admin/edge-packages
  • Automatic authentication handling

4. Critical Bug Fixes

SQL DELETE Syntax Error (pkg/db/pollers.go:195):

  • Fixed ClickHouse DELETE syntax causing Core crashes
  • Changed from DELETE FROM table(pollers) to ALTER TABLE pollers DELETE
  • Built and ready for deployment

Linting Errors:

  • Fixed exhaustive switch cases for component types and package statuses
  • Refactored RecordActivation() function to reduce cyclomatic complexity (31 → <15)
  • Extracted helper methods: findPackageForActivation(), updatePackageActivation(), recordActivationEvent()
  • All linting checks now pass: 0 issues

5. Comprehensive Documentation

Created three documentation files:

  • README.md - Quick reference and troubleshooting
  • SETUP_GUIDE.md - Complete setup process with all configuration details
  • FRICTION_POINTS.md - Detailed analysis of issues and proposed solutions

🔍 Friction Points Identified

Critical (Fixed):

  • SQL DELETE syntax error - Fixed, awaiting deployment
  • Linting errors - All resolved

Critical (Needs Implementation):

  • Manual poller registration - Pollers must be manually added to Core's known_pollers ConfigMap
  • API authentication complexity - Hard to create packages programmatically

Medium (Automated):

  • DNS resolution from Docker - Automated in setup script
  • UUID poller IDs - Automated extraction of readable IDs

Low Priority:

  • SPIRE SQLite usage - Actually correct for edge deployments (no change needed)
  • Join token expiration - Documented, easy to regenerate

📁 Files Created/Modified

docker/compose/edge-e2e/
├── setup-edge-e2e.sh NEW - Idempotent setup automation
├── manage-packages.sh NEW - Package management CLI
├── README.md NEW - Quick reference
├── SETUP_GUIDE.md NEW - Complete documentation
├── FRICTION_POINTS.md NEW - Detailed friction analysis
└── docker-compose.edge-e2e.yml MOVED - From project root

pkg/db/pollers.go FIXED - SQL DELETE syntax
pkg/core/edge_onboarding.go FIXED - Linting errors

  1. Deploy Core with fixes - Update k8s deployment with SQL DELETE fix
  2. Implement auto-registration - Add allowed_poller_id to edge_packages table to eliminate manual ConfigMap updates
  3. Create CLI commands - Add serviceradar-cli edge commands for package management
  4. Test full flow - Validate complete onboarding process with new automation

What Works Now

  • Package download and extraction
  • SPIRE credential bootstrap
  • Configuration generation (automated)
  • Network namespace sharing
  • Agent-poller communication
  • SPIFFE/mTLS authentication
  • Status reporting to Core (after manual registration)
  • Idempotent deployments via setup script

The edge onboarding process is now well-organized, documented, and largely automated!

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3474787185 Original created: 2025-10-31T20:28:15Z --- ## Additional Edge Onboarding Improvements Following up on the initial E2E validation, we've made significant improvements to the edge onboarding process: ### 🎯 Completed Work #### 1. Organized Edge E2E Structure - Created `/docker/compose/edge-e2e/` directory structure - Moved all edge onboarding tools to organized location - Created comprehensive documentation suite #### 2. Idempotent Setup Script Created `docker/compose/edge-e2e/setup-edge-e2e.sh`: - Fully automated edge deployment from package ID - Handles downloads, extraction, configuration, and deployment - Automatic DNS to LoadBalancer IP conversion - Automatic readable poller ID extraction from SPIFFE IDs - Supports clean restarts and credential refresh - Usage: `./setup-edge-e2e.sh --package-id <uuid>` #### 3. Package Management Utility Created `docker/compose/edge-e2e/manage-packages.sh`: - CLI tool for package operations: list, create, revoke, delete, activate - Fixed API paths from `/api/admin/edge/packages` to `/api/admin/edge-packages` - Automatic authentication handling #### 4. Critical Bug Fixes **SQL DELETE Syntax Error** (pkg/db/pollers.go:195): - Fixed ClickHouse DELETE syntax causing Core crashes - Changed from `DELETE FROM table(pollers)` to `ALTER TABLE pollers DELETE` - Built and ready for deployment **Linting Errors**: - Fixed exhaustive switch cases for component types and package statuses - Refactored `RecordActivation()` function to reduce cyclomatic complexity (31 → <15) - Extracted helper methods: `findPackageForActivation()`, `updatePackageActivation()`, `recordActivationEvent()` - All linting checks now pass: 0 issues #### 5. Comprehensive Documentation Created three documentation files: - `README.md` - Quick reference and troubleshooting - `SETUP_GUIDE.md` - Complete setup process with all configuration details - `FRICTION_POINTS.md` - Detailed analysis of issues and proposed solutions ### 🔍 Friction Points Identified **Critical (Fixed)**: - ✅ SQL DELETE syntax error - Fixed, awaiting deployment - ✅ Linting errors - All resolved **Critical (Needs Implementation)**: - ❌ Manual poller registration - Pollers must be manually added to Core's `known_pollers` ConfigMap - ❌ API authentication complexity - Hard to create packages programmatically **Medium (Automated)**: - ✅ DNS resolution from Docker - Automated in setup script - ✅ UUID poller IDs - Automated extraction of readable IDs **Low Priority**: - SPIRE SQLite usage - Actually correct for edge deployments (no change needed) - Join token expiration - Documented, easy to regenerate ### 📁 Files Created/Modified docker/compose/edge-e2e/ ├── setup-edge-e2e.sh ✅ NEW - Idempotent setup automation ├── manage-packages.sh ✅ NEW - Package management CLI ├── README.md ✅ NEW - Quick reference ├── SETUP_GUIDE.md ✅ NEW - Complete documentation ├── FRICTION_POINTS.md ✅ NEW - Detailed friction analysis └── docker-compose.edge-e2e.yml ✅ MOVED - From project root pkg/db/pollers.go ✅ FIXED - SQL DELETE syntax pkg/core/edge_onboarding.go ✅ FIXED - Linting errors ### 🎯 Recommended Next Steps 1. Deploy Core with fixes - Update k8s deployment with SQL DELETE fix 2. Implement auto-registration - Add `allowed_poller_id` to edge_packages table to eliminate manual ConfigMap updates 3. Create CLI commands - Add `serviceradar-cli edge` commands for package management 4. Test full flow - Validate complete onboarding process with new automation ### ✅ What Works Now - Package download and extraction - SPIRE credential bootstrap - Configuration generation (automated) - Network namespace sharing - Agent-poller communication - SPIFFE/mTLS authentication - Status reporting to Core (after manual registration) - Idempotent deployments via setup script The edge onboarding process is now well-organized, documented, and largely automated!
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3474876153
Original created: 2025-10-31T20:55:35Z


E2E validation completed successfully.

Follow-up work to eliminate edge onboarding friction points is tracked in:

  • bd issue: serviceradar-57
  • GitHub issue: #1915

Closing this issue as the validation work is complete.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1911#issuecomment-3474876153 Original created: 2025-10-31T20:55:35Z --- ✅ E2E validation completed successfully. Follow-up work to eliminate edge onboarding friction points is tracked in: - bd issue: serviceradar-57 - GitHub issue: #1915 Closing this issue as the validation work is complete.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#633
No description provided.