Edge onboarding: support agents and checkers #632

Closed
opened 2026-03-28 04:26:40 +00:00 by mfreeman451 · 3 comments
Owner

Imported from GitHub.

Original GitHub issue: #1909
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1909
Original created: 2025-10-29T06:00:06Z


Summary

Expand the edge onboarding experience so operators can issue packages for pollers, agents, and checkers from a single flow. The UI must capture the relationships between these components and automatically update KV configuration so Core and the pollers/agents start exchanging work without manual edits.

Problem Statement

Today the edge onboarding service only creates poller installers. Operators who need to onboard an agent or checker must clone the poller package, hand-edit environment files, and manually patch KV entries so pollers know about their downstream agents and checkers. This manual path is error-prone, undocumented, and blocks repeatable onboarding for sites with multiple agents and device checkers.

Goals & Success Criteria

  • UI and API support issuing onboarding packages for three asset types: poller, agent, checker.
  • Agents must be associated with an existing poller; checkers must be associated with an existing agent.
  • KV configuration is updated automatically when a package is created so pollers and agents begin polling new children once they activate.
  • Operators can review package lineage (parent poller/agent linkage) and status in both the UI and API responses.
  • Documentation and CLI tooling reflect the new flows.

Non-Goals

  • Changing how SPIRE join tokens and bundles are minted (reuse existing onboarding package machinery).
  • Replacing the poller restart scripts—only ensure they consume the enriched metadata.
  • Supporting bulk onboarding or site templates (captured under future work).

User Stories

  • As an admin, I can create a poller package and optionally pre-register agents/checkers that should be deployed immediately after the poller comes online.
  • As an admin onboarding a new agent, I can select the poller it reports to, receive a package with SPIRE credentials plus the poller association, and expect Core/KV to route tasks automatically on activation.
  • As an admin onboarding a checker, I can specify the agent it depends on, capture device-specific metadata, and have the agent’s KV configuration updated so the checker starts polling as soon as it joins.

Functional Requirements

  1. UI updates

    • Introduce a required Component type selector (Poller, Agent, Checker) in the Edge Onboarding create modal.
    • When Agent is selected, surface an Associated poller dropdown populated from active pollers and pending poller packages. Require selection before submission.
    • When Checker is selected, surface Associated agent dropdown (active agents + pending agent packages) and a Checker kind selector (SNMP, sysmon-vm, custom, etc.).
    • Dynamically adjust metadata form hints: poller metadata unchanged; agent metadata highlights the poller channel; checker metadata prompts for device credentials/targets.
    • Reflect parent-child relationships in the listing table (e.g., "Agent ➝ Poller sea-edge-01").
    • Allow revocation/activation views to show linked resources.
  2. API and backend

    • Extend POST /api/admin/edge-packages to accept component_type (poller|agent|checker), parent_id (poller id or agent id depending on type), and optional checker_kind plus checker_config JSON blob.
    • Validate that the referenced parent resource exists or is pending activation; reject if the parent is revoked or unknown.
    • Persist parent linkage fields in edge_onboarding_packages and audit events.
    • For agents: on package creation, update the parent poller’s KV config under config/pollers/<poller-id>/agents/<agent-id> with metadata required for activation and mark status pending.
    • For checkers: similar KV update under config/agents/<agent-id>/checkers/<checker-id> (structure already used manually today) with status pending.
    • When agents/checkers activate via status reports, flip their KV entries to active and emit audit events.
    • Ensure download archives contain type-specific env files (edge-agent.env, edge-checker.env) and any additional metadata (checker credentials, device targets).
  3. CLI enhancements

    • Update serviceradar-cli edge package create to accept --component-type, --parent-id, and type-specific payload flags.
    • Provide serviceradar-cli edge package list output columns for component type and parent linkage.
  4. Observability & security

    • Emit metrics for counts by component type and activation status.
    • Ensure logs continue to suppress cleartext tokens; include parent identifiers for traceability.
    • Apply the same admin role enforcement to all new fields.

Acceptance Criteria

  • Demo scenario: Operator issues a poller package, then issues an agent package linked to that poller, then issues a checker linked to the agent; after running the installers in order, Core reflects the hierarchy and the poller begins scheduling work without manual KV edits.
  • Documentation (docs/docs/edge-onboarding.md and new agent/poller onboarding doc) explains the new options and automation.
  • All new behavior covered by API/unit tests for KV updates and parent validation.

Open Questions

  • Do we need to support pre-seeding checkers during poller creation (single request)?
  • Should we block agent onboarding if the parent poller package has not yet been activated, or is pending acceptable?
  • How are secrets for checker credentials managed—do we store encrypted fields in the package metadata or expect operators to provide them on installation?

References

  • serviceradar-54 (closed) – Secure edge poller onboarding flow
  • Documentation draft: docs/docs/edge-onboarding.md
Imported from GitHub. Original GitHub issue: #1909 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1909 Original created: 2025-10-29T06:00:06Z --- ## Summary Expand the edge onboarding experience so operators can issue packages for pollers, agents, and checkers from a single flow. The UI must capture the relationships between these components and automatically update KV configuration so Core and the pollers/agents start exchanging work without manual edits. ## Problem Statement Today the edge onboarding service only creates poller installers. Operators who need to onboard an agent or checker must clone the poller package, hand-edit environment files, and manually patch KV entries so pollers know about their downstream agents and checkers. This manual path is error-prone, undocumented, and blocks repeatable onboarding for sites with multiple agents and device checkers. ## Goals & Success Criteria - UI and API support issuing onboarding packages for three asset types: poller, agent, checker. - Agents must be associated with an existing poller; checkers must be associated with an existing agent. - KV configuration is updated automatically when a package is created so pollers and agents begin polling new children once they activate. - Operators can review package lineage (parent poller/agent linkage) and status in both the UI and API responses. - Documentation and CLI tooling reflect the new flows. ## Non-Goals - Changing how SPIRE join tokens and bundles are minted (reuse existing onboarding package machinery). - Replacing the poller restart scripts—only ensure they consume the enriched metadata. - Supporting bulk onboarding or site templates (captured under future work). ## User Stories - As an admin, I can create a poller package and optionally pre-register agents/checkers that should be deployed immediately after the poller comes online. - As an admin onboarding a new agent, I can select the poller it reports to, receive a package with SPIRE credentials plus the poller association, and expect Core/KV to route tasks automatically on activation. - As an admin onboarding a checker, I can specify the agent it depends on, capture device-specific metadata, and have the agent’s KV configuration updated so the checker starts polling as soon as it joins. ## Functional Requirements 1. **UI updates** - Introduce a required `Component type` selector (Poller, Agent, Checker) in the Edge Onboarding create modal. - When `Agent` is selected, surface an `Associated poller` dropdown populated from active pollers and pending poller packages. Require selection before submission. - When `Checker` is selected, surface `Associated agent` dropdown (active agents + pending agent packages) and a `Checker kind` selector (SNMP, sysmon-vm, custom, etc.). - Dynamically adjust metadata form hints: poller metadata unchanged; agent metadata highlights the poller channel; checker metadata prompts for device credentials/targets. - Reflect parent-child relationships in the listing table (e.g., "Agent ➝ Poller sea-edge-01"). - Allow revocation/activation views to show linked resources. 2. **API and backend** - Extend `POST /api/admin/edge-packages` to accept `component_type` (`poller|agent|checker`), `parent_id` (poller id or agent id depending on type), and optional `checker_kind` plus `checker_config` JSON blob. - Validate that the referenced parent resource exists or is pending activation; reject if the parent is revoked or unknown. - Persist parent linkage fields in `edge_onboarding_packages` and audit events. - For agents: on package creation, update the parent poller’s KV config under `config/pollers/<poller-id>/agents/<agent-id>` with metadata required for activation and mark status `pending`. - For checkers: similar KV update under `config/agents/<agent-id>/checkers/<checker-id>` (structure already used manually today) with status `pending`. - When agents/checkers activate via status reports, flip their KV entries to `active` and emit audit events. - Ensure download archives contain type-specific env files (`edge-agent.env`, `edge-checker.env`) and any additional metadata (checker credentials, device targets). 3. **CLI enhancements** - Update `serviceradar-cli edge package create` to accept `--component-type`, `--parent-id`, and type-specific payload flags. - Provide `serviceradar-cli edge package list` output columns for component type and parent linkage. 4. **Observability & security** - Emit metrics for counts by component type and activation status. - Ensure logs continue to suppress cleartext tokens; include parent identifiers for traceability. - Apply the same admin role enforcement to all new fields. ## Acceptance Criteria - Demo scenario: Operator issues a poller package, then issues an agent package linked to that poller, then issues a checker linked to the agent; after running the installers in order, Core reflects the hierarchy and the poller begins scheduling work without manual KV edits. - Documentation (`docs/docs/edge-onboarding.md` and new agent/poller onboarding doc) explains the new options and automation. - All new behavior covered by API/unit tests for KV updates and parent validation. ## Open Questions - Do we need to support pre-seeding checkers during poller creation (single request)? - Should we block agent onboarding if the parent poller package has not yet been activated, or is pending acceptable? - How are secrets for checker credentials managed—do we store encrypted fields in the package metadata or expect operators to provide them on installation? ## References - serviceradar-54 (closed) – Secure edge poller onboarding flow - Documentation draft: `docs/docs/edge-onboarding.md`
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1909#issuecomment-3476840087
Original created: 2025-11-01T20:59:22Z


Checker Configuration Automation - Implemented

Part of the solution for automated checker onboarding has been implemented:

Checker Template Registration System

Problem Solved: Eliminates manual KV configuration for checkers during edge onboarding.

How it Works:

  1. Checkers self-register templates on startup to templates/checkers/{kind}.json
  2. Core fetches templates during edge package creation (if checker_config_json not provided)
  3. Variable substitution replaces placeholders with instance values (SPIFFE IDs, addresses, etc.)
  4. Instance configs written to agents/{agent_id}/checkers/{kind}.json (only if not exists)
  5. User modifications protected - once created, instance configs are never overwritten

Components Deployed

Core Service - Template fetching, substitution, overwrite protection (deployed to k8s demo)
Design Document - docs/checker-template-registration.md (complete architecture)
🔄 Sysmon Checker - Template registration (Rust implementation building)
📋 Go Checkers - sweep, rperf (pending)

What This Means for Edge Onboarding

Before:

# Create edge package
curl -X POST .../edge-packages -d '{
  "checker_kind": "sysmon",
  "checker_config_json": "{...entire config...}",  # Manual!
  ...
}'

Now:

# Create edge package - no config needed!
curl -X POST .../edge-packages -d '{
  "checker_kind": "sysmon",
  "parent_id": "agent-id",
  # Config auto-populated from template with substitution
  ...
}'

Architecture Details

KV Key Structure:

  • templates/checkers/sysmon.json - Factory default (auto-registered by checker)
  • agents/{agent-id}/checkers/sysmon.json - Instance config (one-time write)

Supported Variables:

  • {{DOWNSTREAM_SPIFFE_ID}} - Checker's SPIFFE ID
  • {{TRUST_DOMAIN}} - SPIFFE trust domain
  • {{AGENT_ADDRESS}}, {{CORE_ADDRESS}}, {{KV_ADDRESS}} - Service endpoints
  • {{LOG_LEVEL}}, {{AGENT_ID}}, {{CHECKER_KIND}} - Metadata

Remaining Work

  • Complete sysmon checker build/test
  • Implement template registration for Go checkers
  • Integration with checker package creation API
  • E2E testing

Files:

  • Core: pkg/core/edge_onboarding.go:1453-1743
  • Docs: docs/checker-template-registration.md
  • Sysmon: cmd/checkers/sysmon/src/template.rs
Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1909#issuecomment-3476840087 Original created: 2025-11-01T20:59:22Z --- ## Checker Configuration Automation - Implemented Part of the solution for automated checker onboarding has been implemented: ### Checker Template Registration System **Problem Solved**: Eliminates manual KV configuration for checkers during edge onboarding. **How it Works:** 1. **Checkers self-register templates** on startup to `templates/checkers/{kind}.json` 2. **Core fetches templates** during edge package creation (if `checker_config_json` not provided) 3. **Variable substitution** replaces placeholders with instance values (SPIFFE IDs, addresses, etc.) 4. **Instance configs written** to `agents/{agent_id}/checkers/{kind}.json` (only if not exists) 5. **User modifications protected** - once created, instance configs are never overwritten ### Components Deployed ✅ **Core Service** - Template fetching, substitution, overwrite protection (deployed to k8s demo) ✅ **Design Document** - `docs/checker-template-registration.md` (complete architecture) 🔄 **Sysmon Checker** - Template registration (Rust implementation building) 📋 **Go Checkers** - sweep, rperf (pending) ### What This Means for Edge Onboarding **Before:** ```bash # Create edge package curl -X POST .../edge-packages -d '{ "checker_kind": "sysmon", "checker_config_json": "{...entire config...}", # Manual! ... }' ``` **Now:** ```bash # Create edge package - no config needed! curl -X POST .../edge-packages -d '{ "checker_kind": "sysmon", "parent_id": "agent-id", # Config auto-populated from template with substitution ... }' ``` ### Architecture Details **KV Key Structure:** - `templates/checkers/sysmon.json` - Factory default (auto-registered by checker) - `agents/{agent-id}/checkers/sysmon.json` - Instance config (one-time write) **Supported Variables:** - `{{DOWNSTREAM_SPIFFE_ID}}` - Checker's SPIFFE ID - `{{TRUST_DOMAIN}}` - SPIFFE trust domain - `{{AGENT_ADDRESS}}`, `{{CORE_ADDRESS}}`, `{{KV_ADDRESS}}` - Service endpoints - `{{LOG_LEVEL}}`, `{{AGENT_ID}}`, `{{CHECKER_KIND}}` - Metadata ### Remaining Work - Complete sysmon checker build/test - Implement template registration for Go checkers - Integration with checker package creation API - E2E testing **Files:** - Core: `pkg/core/edge_onboarding.go:1453-1743` - Docs: `docs/checker-template-registration.md` - Sysmon: `cmd/checkers/sysmon/src/template.rs`
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1909#issuecomment-3477241255
Original created: 2025-11-02T02:39:43Z


Service Device Registration Implementation Complete

Successfully implemented device registration for pollers, agents, and checkers to ensure they show up in the inventory as distinct devices.

What Was Implemented

1. Service-Aware Device IDs

  • Each service type now gets a unique device ID format: serviceradar:service_type:service_id
    • Pollers: serviceradar:poller:poller-id
    • Agents: serviceradar:agent:agent-id
    • Checkers: serviceradar:checker:checker-id
  • Network devices continue to use: partition:ip
  • This prevents services on the same IP from merging into a single device

2. Self-Registration

  • Pollers register themselves when they send status reports (pkg/core/pollers.go:393, 427, 571)
  • Agents register when they report status (pkg/core/services.go:532-540)
  • Checkers register when they report status (pkg/core/services.go:547-556)
  • All registrations are best-effort (log warnings on failure, don't block operations)

3. Parent-Child Relationships

  • Metadata includes relationship information:
    • Agents: include poller_id in metadata
    • Checkers: include both agent_id and poller_id in metadata
  • Stable checker IDs use format: serviceName@agentID

4. Cleanup/Revocation Support

  • When a package is revoked via API, the system emits a tombstone device update
  • Service devices are marked as unavailable (IsAvailable=false)
  • Includes revocation timestamp and metadata
  • Implemented in pkg/core/edge_onboarding.go:1847-1895

5. High Cardinality Support

  • Successfully tested with 100 checkers on a single agent/IP
  • Each gets a unique device ID - no collisions

Files Modified

  • pkg/models/service_device.go (new): Device ID generation and service types
  • pkg/models/service_registration.go (new): Helper functions for creating DeviceUpdates
  • pkg/models/unified_device.go: Added ServiceType and ServiceID fields
  • pkg/registry/registry.go: Updated normalization to generate service-aware IDs
  • pkg/db/db.go: Database layer updates for service devices
  • pkg/core/pollers.go: Poller self-registration
  • pkg/core/services.go: Agent and checker self-registration
  • pkg/core/edge_onboarding.go: Revocation cleanup
  • pkg/core/server.go: Wired up device registry callback

Test Coverage

Created comprehensive test suite:

  • pkg/models/service_device_test.go: 15 test cases covering ID generation, helper functions, high cardinality, multiple services on same IP
  • pkg/registry/service_device_test.go: 8 test cases covering registry integration
  • All models tests passing

Verification

Deployed to k8s demo cluster and confirmed:

  • Pollers registering with device IDs: serviceradar:poller:k8s-poller, serviceradar:poller:k8s-demo-datasvc
  • No errors or warnings in logs
  • Device updates successfully published to NATS stream
  • Empty IP validation allows service components (they're identified by service-aware IDs, not IPs)

Next Steps

This addresses the device registration and cleanup aspects of #1909. Still needed for full completion:

  • UI updates to show parent-child relationships in device listing
  • Query enhancements to retrieve agents by poller, checkers by agent
  • Dashboard views showing service hierarchy

Closes the device registration portion of #1909.

🤖 Generated with Claude Code

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1909#issuecomment-3477241255 Original created: 2025-11-02T02:39:43Z --- ## ✅ Service Device Registration Implementation Complete Successfully implemented device registration for pollers, agents, and checkers to ensure they show up in the inventory as distinct devices. ### What Was Implemented **1. Service-Aware Device IDs** - Each service type now gets a unique device ID format: `serviceradar:service_type:service_id` - Pollers: `serviceradar:poller:poller-id` - Agents: `serviceradar:agent:agent-id` - Checkers: `serviceradar:checker:checker-id` - Network devices continue to use: `partition:ip` - This prevents services on the same IP from merging into a single device **2. Self-Registration** - Pollers register themselves when they send status reports (pkg/core/pollers.go:393, 427, 571) - Agents register when they report status (pkg/core/services.go:532-540) - Checkers register when they report status (pkg/core/services.go:547-556) - All registrations are best-effort (log warnings on failure, don't block operations) **3. Parent-Child Relationships** - Metadata includes relationship information: - Agents: include `poller_id` in metadata - Checkers: include both `agent_id` and `poller_id` in metadata - Stable checker IDs use format: `serviceName@agentID` **4. Cleanup/Revocation Support** - When a package is revoked via API, the system emits a tombstone device update - Service devices are marked as unavailable (`IsAvailable=false`) - Includes revocation timestamp and metadata - Implemented in pkg/core/edge_onboarding.go:1847-1895 **5. High Cardinality Support** - Successfully tested with 100 checkers on a single agent/IP - Each gets a unique device ID - no collisions ### Files Modified - **pkg/models/service_device.go** (new): Device ID generation and service types - **pkg/models/service_registration.go** (new): Helper functions for creating DeviceUpdates - **pkg/models/unified_device.go**: Added ServiceType and ServiceID fields - **pkg/registry/registry.go**: Updated normalization to generate service-aware IDs - **pkg/db/db.go**: Database layer updates for service devices - **pkg/core/pollers.go**: Poller self-registration - **pkg/core/services.go**: Agent and checker self-registration - **pkg/core/edge_onboarding.go**: Revocation cleanup - **pkg/core/server.go**: Wired up device registry callback ### Test Coverage Created comprehensive test suite: - **pkg/models/service_device_test.go**: 15 test cases covering ID generation, helper functions, high cardinality, multiple services on same IP - **pkg/registry/service_device_test.go**: 8 test cases covering registry integration - All models tests passing ✅ ### Verification Deployed to k8s demo cluster and confirmed: - Pollers registering with device IDs: `serviceradar:poller:k8s-poller`, `serviceradar:poller:k8s-demo-datasvc` - No errors or warnings in logs - Device updates successfully published to NATS stream - Empty IP validation allows service components (they're identified by service-aware IDs, not IPs) ### Next Steps This addresses the device registration and cleanup aspects of #1909. Still needed for full completion: - [ ] UI updates to show parent-child relationships in device listing - [ ] Query enhancements to retrieve agents by poller, checkers by agent - [ ] Dashboard views showing service hierarchy Closes the device registration portion of #1909. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1909#issuecomment-3813913295
Original created: 2026-01-28T21:01:30Z


closing, stale

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1909#issuecomment-3813913295 Original created: 2026-01-28T21:01:30Z --- closing, stale
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#632
No description provided.