1915 create common onboarding library to eliminate edge deployment friction #2397
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2397
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2397/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #1916
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1916
Original created: 2025-11-03T00:59:10Z
Original updated: 2025-11-03T04:53:25Z
Original head: carverauto/serviceradar:1915-create-common-onboarding-library-to-eliminate-edge-deployment-friction
Original base: main
Original merged: 2025-11-03T04:52:34Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Tests
Description
Implements comprehensive edge onboarding library to eliminate deployment friction for edge-deployed services (pollers, agents, checkers)
Creates service registry infrastructure with registration, lifecycle management, and heartbeat tracking for pollers, agents, and checkers
Adds device registry integration for service components with auto-registration and tombstone filtering for deleted devices
Implements SPIRE credential configuration and deployment environment detection (Kubernetes, Docker, bare-metal) for edge services
Integrates edge onboarding into all service components (poller, agent, checkers, datasvc, consumers) with automatic configuration generation
Adds DataSvc Core registration mechanism with periodic heartbeat updates via gRPC
Extends API server with new endpoints for device registry queries, agent discovery, datasvc instance listing, and device deletion
Implements device lifecycle event publishing for audit trail and lifecycle tracking
Adds Kong configuration support for web service routing to device endpoints
Updates database layer with agent discovery methods and poller registration fields
Regenerates mocks with standardized parameters and new edge onboarding support
Diagram Walkthrough
File Walkthrough
5 files
mock_db.go
Regenerate mocks with standardized parameters and edge onboardingsupportpkg/db/mock_db.go
pkg/db/prefixisgomockstruct fields from mock types (MockService,MockSysmonMetricsProvider,MockRows,MockQueryExecutor)(e.g.,
ctx,pollerID) to generic names (e.g.,arg0,arg1)DeleteEdgeOnboardingPackage,GetEdgeOnboardingPackage,InsertEdgeOnboardingEvent,ListEdgeOnboardingEvents,ListEdgeOnboardingPackages,ListEdgeOnboardingPollerIDsListAgentsByPoller,ListAgentsWithPollersservice_device_test.go
Device registration test coverage for all service typespkg/registry/service_device_test.go
validation
operations
devices require IPs
service_device_test.go
Unit tests for service device model functionspkg/models/service_device_test.go
IsServiceDevicedetection logic for poller/agent/checkerprefixes
edge_onboarding_test.go
Update edge onboarding service test signaturespkg/core/edge_onboarding_test.go
newEdgeOnboardingService()to include newnilparametersignature
registry_test.go
Tests for device tombstone filtering logicpkg/registry/registry_test.go
TestProcessBatchDeviceUpdates_DropsSelfReportedAfterDelete()verifying self-reported updates are dropped for deleted devices
TestProcessBatchDeviceUpdates_AllowsFreshNonSelfReportedAfterDelete()verifying fresh non-self-reported updates bypass deletion filter
timestamp metadata
32 files
edge_onboarding.go
Add service registry integration and checker template supportpkg/core/edge_onboarding.go
ServiceManagerinterface and registration type structs(
PollerRegistration,AgentRegistration,CheckerRegistration) to avoidimport cycles
ErrUnsupportedComponentTypeerror constant for unsupportedcomponent types
edgeOnboardingServicewithdeviceRegistryCallbackandserviceRegistryfieldsSetDeviceRegistryCallbackmethod to register device registrycallbacks
registerServiceComponentmethod to register components inthe service registry based on type
CreatePackageto injectdatasvc_endpointinto metadata andregister services
markServiceDeviceUnavailablemethod to emit tombstone updateswhen packages are revoked
applyComponentKVUpdatesputKVDocument,substituteTemplateVariables, andsubstituteInMaphelper methods for template variable replacement
kvKeyForPackageto use correct KV path for checker configs(
agents/{agent_id}/checkers/{checker_kind}.json)edge_onboarding.go
Add DataSvc endpoint configuration to edge package creationpkg/core/api/edge_onboarding.go
DataSvcEndpointfield toedgePackageCreateRequeststruct forDataSvc gRPC endpoint configuration
handleCreateEdgePackageto extract and trimDataSvcEndpointfrom request and pass to
CreatePackageservice_registry.go
Core service registry implementation with registration and lifecyclemanagementpkg/registry/service_registry.go
ServiceRegistrystruct managing lifecycle of pollers,agents, and checkers
RegisterPoller,RegisterAgent,RegisterChecker) with validation and event emissionservices
services
service_registry_queries.go
Query and retrieval operations for service registrypkg/registry/service_registry_queries.go
(
GetPoller,GetAgent,GetChecker)source
IsKnownPollerwith TTL-based caching for performancepollers.go
Service registry integration into poller status handlingpkg/core/pollers.go
ensurePollerRegistered,ensureAgentRegistered,ensureCheckerRegisteredmethodsisKnownPollerto check service registry as primary path withfallback to legacy methods
device_registry.go
HTTP API endpoints for device registry queries and deletionpkg/core/api/device_registry.go
retrieval
getDeviceRegistryInfoendpoint to query service registrationdetails
deleteDeviceendpoint for tombstoning devices with audittrail
config.go
Edge onboarding configuration generation for service componentspkg/edgeonboarding/config.go
(poller, agent, checker)
SPIRE integration
deployment types
fields
service_registration.go
Service device update creation helper functionspkg/models/service_registration.go
components
CreatePollerDeviceUpdate,CreateAgentDeviceUpdate,CreateCheckerDeviceUpdatefunctionsbootstrap.go
Core edge onboarding bootstrapper implementationpkg/edgeonboarding/bootstrap.go
Bootstrapperstruct andConfigfor edge serviceonboarding workflow
Bootstrap()method orchestrating package download, SPIREconfiguration, and service config generation
configurations
spire.go
SPIRE credential configuration for edge componentspkg/edgeonboarding/spire.go
agents, and checkers
registry.go
Support service components in device registrypkg/registry/registry.go
checkers) alongside network devices
filterObsoleteUpdates()to prevent stale updates fordeleted devices
device IDs
devices
core_registration.go
DataSvc Core registration and heartbeat servicepkg/datasvc/core_registration.go
with Core service
mTLS
ReportStatusRPCintegration.go
Edge onboarding integration layer for servicespkg/edgeonboarding/integration.go
TryOnboard()entry point for services to attempt edgeonboarding via environment variables
IntegrationResultstruct for returning onboarding artifactsto services
path retrieval
server.go
API server enhancements for service and event managementpkg/core/api/server.go
WithEventPublisher()andWithServiceRegistry()option functionsfor API server configuration
and agent listing
handleDeviceByID()router for GET/DELETE methods on deviceendpoints
kong.go
Kong configuration support for web service routingpkg/cli/kong.go
web-serviceflag for optional Web service URL routing to/api/devicesendpointsJWT authentication
exist
renderKongDBLess()to support separate web service routingdeployment.go
Deployment environment detection for edge servicespkg/edgeonboarding/deployment.go
detectDeploymentType()to identify Kubernetes, Docker, orbare-metal environments
isKubernetes(),isDocker())checking env vars and filesystem
service discovery
pollers.go
Poller registration and agent discovery database methodspkg/db/pollers.go
insertPollerStatus()to include new poller registration fields(component_id, registration_source, status, etc.)
AgentInfostruct for agent-poller relationship queriesListAgentsWithPollers()andListAgentsByPoller()methodsfor agent discovery
types
service_models.go
Service registry domain models and typespkg/registry/service_models.go
ServiceStatus,RegistrationSource,and registration request types
RegisteredPoller,RegisteredAgent,RegisteredCheckerstructs for service state
ServiceHeartbeat,ServiceFilter, andRegistrationEventforservice management
datasvc_registry.go
DataSvc instance registry API endpointpkg/core/api/datasvc_registry.go
handleListDataSvcInstances()endpoint for listingregistered datasvc instances
availability information
implementation
events.go
Event publishing refactor and device lifecycle eventspkg/natsutil/events.go
PublishPollerHealthEvent()to use newpublishEvent()helpermethod
PublishDeviceLifecycleEvent()for device lifecycle events(delete, restore, etc.)
publishEvent()method handling stream creation and eventmarshaling
interfaces.go
Service manager interface definitionpkg/registry/interfaces.go
ServiceManagerinterface defining service registration andlifecycle operations
operations
ServiceManagerinterfaceserver.go
Core server service registry initializationpkg/core/server.go
ServiceRegistryinNewServer()with type assertion to*db.DBrevocation
EventPublisher()getter method for accessing eventpublisher
types.go
API server type enhancements for service and event supportpkg/core/api/types.go
SetDeviceRegistryCallback()method toEdgeOnboardingServiceinterface
serviceRegistryfield toAPIServerstruct for service managementeventPublisherfield toAPIServerstruct for lifecycle eventsnatsutilpackage for event publishing supportservices.go
Auto-registration of agents and checkers as devicespkg/core/services.go
AgentIdis present inservice status
(snmp, sysmon, rperf, mapper)
isCheckerService()to identify checker service typesgenerateCheckerID()to create stable checker identifiersserver.go
DataSvc logging and Core registration initializationpkg/datasvc/server.go
loggerfield to datasvcServerstructdatasvcLoggerwrapper adapting zerolog tologger.Loggerinterface
NewServer()with service name and timestampcontext
StartCoreRegistration()inStart()method for Core registrationagent_registry.go
Agent registry API endpointspkg/core/api/agent_registry.go
handleListAgents()endpoint for retrieving all agents withpoller associations
handleListAgentsByPoller()endpoint for filtering agents bypoller ID
agentInfoViewstruct for API responses with agent, poller, andservice type information
main.go
Edge onboarding integration for db-event-writer consumercmd/consumers/db-event-writer/main.go
edgeonboarding.TryOnboard()to attempt edge onboarding beforeloading config
processor.go
Device lifecycle event processingpkg/consumers/db-event-writer/processor.go
tryDeviceLifecycleEvent()handler for device lifecycleCloudEvents
unified_device.go
Device model support for service componentspkg/models/unified_device.go
DiscoverySourceServiceRadarconstant for ServiceRadarinfrastructure components
ServiceTypeandServiceIDfields toDeviceUpdatestruct forservice component identification
GetSourceConfidence()to return high confidence forServiceRadar source
main.go
Edge onboarding integration for sysmon-vm checkercmd/checkers/sysmon-vm/main.go
edgeonboarding.TryOnboard()withEdgeOnboardingComponentTypeCheckermain.go
Edge onboarding integration for faker agentcmd/faker/main.go
edgeonboarding.TryOnboard()withEdgeOnboardingComponentTypeAgentmain.go
Edge onboarding integration for dusk checkercmd/checkers/dusk/main.go
edgeonboarding.TryOnboard()withEdgeOnboardingComponentTypeChecker1 files
app.go
Wire service registry and event publisher to API servercmd/core/app/app.go
api.WithServiceRegistry(server.ServiceRegistry)option to passservice registry to API server
api.WithEventPublisher(server.EventPublisher())option to passevent publisher to API server
63 files
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1916#issuecomment-3478615809
Original created: 2025-11-03T01:00:48Z
PR Compliance Guide 🔍
(Compliance updated until commit
github.com/carverauto/serviceradar@5760e8406d)Below is a summary of compliance checks for this PR:
Authorization/Access control
Description: The DELETE device endpoint allows tombstoning any device by ID without explicit RBAC
checks shown in the handler, potentially enabling unauthorized deletion if upstream auth
middleware is misconfigured.
device_registry.go [156-287]
Referred Code
Template injection risk
Description: Template variable substitution for checker configs performs simple string replacement
which could allow injecting unintended values into JSON configuration if templates include
sensitive fields.
edge_onboarding.go [1542-1611]
Referred Code
Destructive delete behavior
Description: Hard DELETE operations directly remove rows from versioned_kv tables which may bypass
audit/history expectations; if exposed, could aid tampering or data loss.
service_registry.go [659-706]
Referred Code
🎫 #1915
pollers/agents/checkers.
isKnownPoller() to check KV/database.
device deletion.
dynamic config in KV.
statuses Issued/Delivered/Activated for allowlist.
document differences.
consider service account tokens.
pollers, agents, and checkers using a token-based flow.
rotation.
env var; do not modify k8s or main docker-compose logic.
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Missing audit context: Registration/heartbeat/deletion events are emitted and logged but lack explicit inclusion
of user identifiers and full action context in all paths (e.g., auto-registration uses
actor "system"), which may be compliant but needs verification of upstream
logging to ensure audit trail completeness.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Input validation gaps: Template variable substitution for checker configs validates only a whitelist of metadata
keys and simple patterns, but additional external inputs (e.g., kv template contents) are
trusted and substituted recursively without schema validation which may be acceptable but
warrants further review.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Detailed errors: Errors returned include specific file paths and configuration details (e.g., bundle/join
token paths) which are useful for debugging but may expose internal details to callers if
surfaced to end users.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Sensitive log data: Logging includes SPIRE configuration details such as trust bundle sizes and potentially
socket/paths which might disclose infrastructure details if logs are accessible beyond
internal systems.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Template trust risk: Checker template JSON fetched from KV is substituted and written without explicit schema
enforcement or strict sanitization of all fields, which could allow unsafe configuration
injection depending on template contents.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Previous compliance checks
Compliance check up to commit 1cb76cd
SQL injection
Description: Raw SQL string is built using fmt.Sprintf and manual quoting (quoteLiteral) for
identifiers like poller_id, risking SQL injection if inputs are not sanitized;
parameterized queries should be used instead.
service_registry_queries.go [16-25]
Referred Code
Config injection
Description: Template variable substitution writes checker configuration from KV into another KV key
without strict schema or whitelisting, allowing potential injection of unexpected settings
if template or metadata are attacker-controlled.
edge_onboarding.go [1542-1611]
Referred Code
🎫 #1915
edge services (poller, agent, checker).
auto-registers with Core via KV/database, generates service config, handles rotation, and
starts service.
isKnownPoller() to use KV/database instead of ConfigMaps.
unchanged.
endpoints.
guides.
changes; works for Docker and bare metal; automatic registration and rotation.
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status: Passed
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Unsafe query build: Direct string interpolation in SQL queries (using fmt.Sprintf) with untrusted IDs risks
SQL injection; parameterized queries should be used consistently.
Referred Code
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Missing audit context: Registration and deletion events are emitted but lack guaranteed inclusion of user
identity and full action context in all paths (e.g., auto-registration, purge), making
completeness of audit trails uncertain.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
SQL injection risk: Queries for GetPoller/GetAgent/GetChecker use string formatting to interpolate IDs, which
can be unsafe and bypass parameter binding protections.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Potential sensitive logs: Logs include dynamic identifiers and IP addresses (e.g., source IP and device IDs) which
may be sensitive depending on policy; ensure redaction or policy approval for such fields.
Referred Code
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1916#issuecomment-3478617100
Original created: 2025-11-03T01:02:06Z
PR Code Suggestions ✨
Explore these optional code suggestions:
✅
Prevent SQL injection with parametersSuggestion Impact:
The commit removed quoteLiteral and replaced fmt.Sprintf-built queries with parameterized queries using placeholders (?) and passing the IDs as parameters in GetPoller, GetAgent, and GetChecker.code diff:
Replace
fmt.Sprintfwith parameterized queries inGetPoller,GetAgent, andGetCheckerto prevent potential SQL injection vulnerabilities.pkg/registry/service_registry_queries.go [11-26]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 10
__
Why: The suggestion correctly identifies a critical SQL injection vulnerability in
GetPoller,GetAgent, andGetCheckerand proposes the standard, secure fix of using parameterized queries.✅
Handle metadata parsing error properlySuggestion Impact:
The commit changed the code to return an error on metadata JSON parsing failure, replacing the previous warning log. It also added sanitization logic, but the key part of the suggestion was implemented.code diff:
In
substituteTemplateVariables, return an error if parsingpkg.MetadataJSONfails, instead of just logging a warning, to prevent silent failures and invalid
configurations.
pkg/core/edge_onboarding.go [1754-1791]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies that ignoring a JSON parsing error can lead to silent failures and invalid configurations, and proposes the correct fix of returning the error.
✅
Allow re-onboarding of deleted devicesSuggestion Impact:
The commit updated the tombstone filter to only block self-reported updates when the update timestamp is not after the deletion time, and allowed newer updates with logging—enabling re-onboarding. It also added extra logging fields/messages.code diff:
Modify the tombstone filter to allow self-reported updates for a deleted device
if the update's timestamp is after the deletion time, enabling device
re-onboarding.
pkg/registry/registry.go [863-871]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: This suggestion fixes a significant logic flaw where a deleted device could never be re-onboarded, correctly proposing to allow fresh updates that occur after the deletion timestamp.
✅
Improve SPIFFE ID parsing logicSuggestion Impact:
The commit updated extractTrustDomain to use strings.TrimPrefix and strings.Index, returning the substring before the first slash and handling empty cases—implementing the suggested robust parsing approach.code diff:
Refactor the
extractTrustDomainfunction to usestrings.HasPrefixandstrings.Indexfor more robustly parsing the trust domain from a SPIFFE ID.pkg/edgeonboarding/spire.go [231-247]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 5
__
Why: The suggestion correctly identifies a fragile implementation and proposes using idiomatic Go functions (
strings.HasPrefix,strings.Index) to make the SPIFFE ID parsing more robust and readable.Improve performance with batch database operations
Refactor
RecordBatchHeartbeatsto use true database batching forSELECTandINSERToperations to fix the N+1 query problem and improve performance.pkg/registry/service_registry.go [294-305]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a significant performance issue (N+1 problem) where a batch function iterates and makes individual DB calls, but the improved code only adds a comment explaining the problem instead of implementing the fix.
✅
Fix discarded partition value assignmentSuggestion Impact:
The commit updated the code to assign partitionFromDeviceID(existing.DeviceID) to partition and also set update.Partition, preventing the partition value from being discarded.code diff:
In
deleteDevice, assign the result ofpartitionFromDeviceIDto thepartitionvariable and update
DeviceUpdateto ensure the partition is not lost.pkg/core/api/device_registry.go [197-206]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that a calculated partition value is discarded, and fixing this ensures the device's partition is correctly preserved when it is tombstoned.
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1916#issuecomment-3478847447
Original created: 2025-11-03T04:13:48Z
CI Feedback 🧐
A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
Action: cpufreq-clang-tidy
Failed stage: Run clang-tidy via Bazel [❌]
Failure summary:
Bazel failed during analysis because the OPAM extension could not install the OCaml package
dreamnon-interactively:
- In
external/tools_opam+/extensions/opam/opam_ops.bzl:142:13,fail(...)wastriggered with rc=10 from the command
opam install dream --switch 5.2.0 --root/Users/runner/.local/share/obazl/opam/2.4.1/root --yes.- OPAM detected missing external system
dependencies and prompted for action, but the CI run is non-interactive. It suggested rerunning with
--assume-depextsor settingopam option depext=false, and noted that running the system packagemanager non-interactively requires
--confirm-level=unsafe-yes.- As a result, not all targets were
analyzed and Bazel reported "Build did NOT complete successfully."
Relevant error logs:
Imported GitHub PR review comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1916#discussion_r2485300587
Original created: 2025-11-03T04:47:02Z
Original path: pkg/edgeonboarding/bootstrap.go
Original line: 244
should probably be storing this in the NATS JetStream object store instead