Cleanup/poller removal #2633
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2633
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2633/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2226
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2226
Original created: 2026-01-06T17:32:45Z
Original updated: 2026-01-06T21:52:57Z
Original head: carverauto/serviceradar:cleanup/poller_removal
Original base: testing
Original merged: 2026-01-06T21:52:25Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Bug fix, Tests
Description
Major architectural refactor: Transition from poller-based pull architecture to gateway-based push architecture with multi-tenant support
Protobuf schema updates: Renamed
PollerIdtoGatewayId, replacedPollerServicewithAgentGatewayService, added new agent-gateway communication types (AgentHelloRequest,AgentHelloResponse,AgentConfigRequest,AgentConfigResponse)Sync service refactoring: Removed KV client and gRPC pull-based dependencies; implemented gateway enrollment, config polling, and heartbeat loops for push-first communication with chunked streaming support
NATS multi-tenancy: Added NATS account management service with JWT/NKeys cryptographic operations, account signer implementation, and credentials file support
CLI enhancements: New NATS bootstrap and admin CLI functionality for operator/account credential generation and management
Agent push loop: New
PushLoopimplementation for managing periodic agent status pushes to gateway with config polling and service status collectionTest updates: Removed deprecated poller/KV-related tests, updated integration tests (Armis, NetBox) to remove KV dependencies, added new chunking logic tests
Bug fix: Added
Synccomponent type to SPIRE configuration validationCleanup: Removed sync service command, related Docker/Kubernetes configurations, and deprecated pull-based gRPC methods
Diagram Walkthrough
File Walkthrough
7 files
monitoring.pb.go
Refactor protobuf definitions from poller to gateway architectureproto/monitoring.pb.go
PollerIdfield toGatewayIdacross multiple message types(
StatusRequest,ResultsRequest,StatusResponse,ResultsResponse)PollerStatusRequest,PollerStatusResponse, andServiceStatusmessage typesPollerStatusChunkwith newGatewayStatusRequest,GatewayStatusResponse, andGatewayStatusChunktypesAgentHelloRequest,AgentHelloResponse,AgentConfigRequest,AgentConfigResponse, andAgentCheckConfigPollerServicetoAgentGatewayServicewith new RPC methods
TenantId,TenantSlug) togateway-related messages
nats_bootstrap.go
Add NATS bootstrap and admin CLI functionalitypkg/cli/nats_bootstrap.go
CLI
nats-bootstrapandadmin natssubcommands withflag parsing
verification modes
credentials and configuration files
status, and listing tenant accounts
service.go
Gateway push architecture with multi-tenant supportpkg/sync/service.go
gateway-based push architecture
integrations, and results tracking
for push-first communication
with configurable chunk sizes
GetResultsandStreamResultsgRPC methods infavor of gateway push
account_manager.go
NATS account signer implementationpkg/nats/accounts/account_manager.go
AccountSignerfor creating tenant accounts and signing JWTsSNMP, netflow, etc.)
push_loop.go
New push loop implementation for agent status managementpkg/agent/push_loop.go
PushLoopstruct that manages periodic pushing ofagent status to gateway
services and checkers
version, enrollment status)
to checker configs
nats_account_service.go
New NATS account service for JWT-based multi-tenancypkg/datasvc/nats_account_service.go
NATSAccountServerfor NATS JWT/NKeyscryptographic operations
creation, and user credential generation
signing
file writing
events.go
Add NATS credentials file support to event publisherpkg/core/events.go
publisher initialization
nats.UserCredentialsoption whenconfig.NATS.CredsFileis set1 files
spire.go
Add Sync component type to SPIRE configuration validationpkg/edgeonboarding/spire.go
models.EdgeOnboardingComponentTypeSyncto the unsupportedcomponent type check alongside
EdgeOnboardingComponentTypeNone1 files
nats_account.pb.go
NATS account management protobuf definitionsproto/nats_account.pb.go
generation, and JWT signing
management
3 files
netbox_test.go
Remove KV-related test codepkg/sync/integrations/netbox/netbox_test.go
processDevicescall to expect only device events return valueservice_test.go
Refactor sync service tests to remove poller-related codepkg/sync/service_test.go
polling (
TestSimpleSyncService_GetResults,TestSimpleSyncService_writeToKV*)NewSimpleSyncServiceconstructor calls by removing mock KVand gRPC client parameters
TestBuildResultsChunks,TestBuildGatewayStatusChunks,TestBuildResultsChunksSizeBudget)TestSimpleSyncService_StreamResultsto verify unimplementedstatus and added
TestGroupSourcesByTenantScopearmis_test.go
Remove poller/KV writer dependencies from Armis integration testspkg/sync/integrations/armis/armis_test.go
KVWritermock and related sweep config expectations from testsetup
Fetchmethod signature to return only events instead of KVdata and events
processDevicestest to verify only device events without KVpayloads or device targets
1 files
prod.exs
Add Elixir production configurationelixir/serviceradar_agent_gateway/config/prod.exs
101 files
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2226#issuecomment-3715636152
Original created: 2026-01-06T17:34:01Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
No security concerns identified
No security vulnerabilities detected by AI analysis. Human verification advised for critical code.🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2226#issuecomment-3715640843
Original created: 2026-01-06T17:35:33Z
PR Code Suggestions ✨
Latest suggestions up to
9464350Preserve protobuf field numbers
Avoid reusing the protobuf field number from the removed
poller_idforgateway_idto prevent breaking backward compatibility and causing potential datacorruption. Instead, reserve the old field number or use a new, unused one for
gateway_id.proto/monitoring.pb.go [139-148]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 10
__
Why: The suggestion correctly identifies a critical backward-compatibility issue in the protobuf definition, where reusing a field number for a different field can lead to data corruption with older clients.
Prevent overwriting stored host IP
Preserve the
HostIPfield when updating gateway status to prevent it from beingoverwritten with an empty value.
pkg/core/gateways.go [581-634]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that the
HostIPfield could be unintentionally cleared during a status update, which could lead to loss of important data.Fail fast on missing input
In
processICMPMetrics, return an error instead ofnilwhen thesvcparameter isnil. This "fail-fast" approach improves error handling and helps identifyupstream bugs.
pkg/core/metrics.go [579-585]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out that returning
nilon anilinputsvccan hide upstream issues. Failing fast by returning an error provides better error handling and makes debugging easier by preventing silent failures.Normalize identifiers written to KV
Use the normalized checker kind when creating a
newCheckto prevent duplicateentries in the gateway configuration.
pkg/core/edge_onboarding.go [2818-2823]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 6
__
Why: The suggestion correctly points out an inconsistency where duplicate check detection uses a normalized name, but the new check is created with the raw name, potentially leading to duplicate entries.
Prevent nil map buffer panics
Prevent a potential panic in
bufferRperfMetricsby ensurings.metricBuffersisinitialized before use. This can be done by adding a nil check under the lock.
pkg/core/metrics.go [467-471]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies a potential nil pointer dereference that would cause a panic if
s.metricBuffersis not initialized. This improves the robustness of the code, especially in test environments whereServermight be partially constructed.Avoid caching with empty keys
Add a check to ensure
pkg.GatewayIDis not empty before using it as a key foractivationCacheStore. This prevents potential cache collisions for packages withan empty gateway ID.
pkg/core/edge_onboarding.go [340-343]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: This is a good defensive check to prevent caching packages with an empty
GatewayID, which could lead to cache collisions and incorrect data being served.Guard deletion on empty IDs
Add a check to ensure
pkg.GatewayIDis not empty before deleting it from thes.allowedmap. This prevents unintended deletion of an entry with an emptystring key.
pkg/core/edge_onboarding.go [1363-1368]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: This is a valid defensive programming suggestion that prevents accidentally deleting an entry with an empty key from the
s.allowedmap, which could have unintended side effects.Prevent null array serialization
Add the
omitemptytag to theChecksfield in thegatewayAgentConfigstruct. Thisprevents a
nilslice from being marshaled asnullin the JSON output.pkg/core/edge_onboarding.go [2738-2743]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a potential JSON marshaling issue where a
nilslice becomesnull, which might be undesirable for consumers. Addingomitemptyis a standard and effective way to handle this.Add descriptor alias for compatibility
Add a backward-compatibility alias for the renamed
File_proto_monitoring_protovariable to
File_monitoring_prototo avoid breaking changes for consumers ofthis generated code.
proto/monitoring.pb.go [1740]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out a breaking change for an exported variable and proposes a valid solution to maintain backward compatibility, which is important for generated code that might be used externally.
Previous suggestions
✅ Suggestions up to commit
870ef5c✅
Prevent deadlock from blocking channel writeSuggestion Impact:
UpdateConfig was changed to use a non-blocking select when sending to s.reloadChan, preventing a blocking send while the config mutex is held. The commit also adjusts the log message when the channel is full (Debug instead of Warn) and includes mutex locking around UpdateConfig.code diff:
In
UpdateConfig, use a non-blockingselectstatement to send tos.reloadChantoprevent a potential deadlock if the channel is full while a mutex is held.
pkg/sync/service.go [367-391]
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a potential deadlock caused by a blocking send on
s.reloadChanwhile holdings.configMu. The proposed fix using a non-blockingselectis the standard and correct way to prevent this type of deadlock.Remove extra newline literal
Remove the extra newline literal
"\n"from thefile_monitoring_proto_rawDescconstant to prevent potential protobuf descriptor parsing errors.
proto/monitoring.pb.go [1751-1753]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a stray newline literal
"\n"in thefile_monitoring_proto_rawDescconstant, which could cause protobuf descriptor parsing to fail at runtime. This is a critical fix for the generated code's integrity.✅
Fix race condition on configSuggestion Impact:
The commit adds s.configMu.RLock()/RUnlock() around the read of s.config.Sources[sourceName] in applySourceBlacklist, addressing the identified race condition. It also adds a write lock in the config update path, further reducing concurrent access issues.code diff:
Add a read lock in
applySourceBlacklistwhen accessings.config.Sourcestoprevent a potential race condition with concurrent configuration updates.
pkg/sync/service.go [1490-1516]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a race condition where
s.config.Sourcesis read without a lock, while it can be modified concurrently byUpdateConfig. The proposed fix of usings.configMu.RLock()is accurate and prevents this potential data race.✅
Prevent race condition during configuration updateSuggestion Impact:
The commit updated applyConfigPayload to take a read lock on s.config, clone it, then release the lock before mutating the cloned config—matching the suggested approach to avoid data races during concurrent config access.code diff:
In
applyConfigPayload, create a deep copy ofs.configunder a lock beforemodifying it to prevent race conditions with other goroutines.
pkg/sync/service.go [828-858]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a race condition where
s.configis read and modified without a lock. The proposed solution to create a copy of the config under a read lock before modification is a correct and safe way to handle concurrent access.Safely handle API response types
Add checks for type assertions when parsing the API response for tenants to
handle unexpected data types gracefully by displaying 'N/A' for invalid fields.
pkg/cli/nats_bootstrap.go [951-957]
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies that unchecked type assertions on API response data can lead to confusing output if the data types are unexpected, and the proposed change makes the output more robust and user-friendly.
Use the official NATS
nsctoolInstead of building a custom service and CLI for NATS account management, use
the official NATS
nsctool. This would simplify the architecture and reducemaintenance.
Examples:
proto/nats_account.pb.go [1-1266]
pkg/cli/nats_bootstrap.go [1-961]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 7
__
Why: The suggestion raises a valid, high-impact architectural point about reinventing NATS account management instead of using the official
nsctool, which could simplify the codebase and reduce maintenance.✅
Fix context cancellation handlingSuggestion Impact:
The commit removed the `case <-s.ctx.Done(): return s.ctx.Err()` branch from `configPollLoop`, so the loop now only exits on the passed-in `ctx.Done()`. The same redundant `s.ctx.Done()` cancellation branch was also removed from another similar poll loop (heartbeat), extending the same idea beyond the original suggestion.code diff:
In
configPollLoop, remove the check fors.ctx.Done()and only listen forcancellation on the
ctxargument passed to the function.pkg/sync/service.go [951-972]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out that listening to both the local
ctxand the service-wides.ctxis redundant and potentially confusing. The function should only respect the context passed to it, making the cancellation behavior clearer and more conventional.Validate output format flag
Add validation to ensure the
--outputflag is either 'text' or 'json', returningan error for invalid values.
pkg/cli/nats_bootstrap.go [106]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out that the
--outputflag value is not validated, which could lead to silent failures. Adding explicit validation improves the command's robustness and provides clear feedback to the user for invalid input.Prevent conflicting mode flags
Add a check to prevent the mutually exclusive flags
--verifyand--localfrombeing used together.
pkg/cli/nats_bootstrap.go [79-81]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that
--verifyand--localrepresent mutually exclusive modes of operation. Adding a check to prevent their simultaneous use improves the CLI's robustness and provides clearer error messaging to the user.✅
Remove stale checks on updateSuggestion Impact:
The previously commented-out stale-check removal loop in applyChecks was enabled and updated with new comments asserting the gateway config as source of truth, so stale checks are now deleted when not seen in the gateway config.code diff:
Enable the removal of stale checker configurations by uncommenting the relevant
code block in
applyChecks.pkg/agent/push_loop.go [765-772]
Suggestion importance[1-10]: 5
__
Why: The suggestion proposes enabling a commented-out feature to remove stale checks. While this is a reasonable feature enhancement for configuration management, the original code intentionally left it disabled to preserve local checks, so this change alters intended behavior.
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2226#issuecomment-3716494125
Original created: 2026-01-06T21:50:29Z
CI Feedback 🧐
A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
Action: test-rust (rust/srql)
Failed stage: Run Tests [❌]
Failed test name: srql_api_queries
Failure summary:
srql_api_queriesfailed (exit code 101from
cargo test --test api).rust/srql/tests/support/harness.rs:74:10while trying to acquire a remotefixture lock, due to a database connection error.
FATAL: pg_hba.conf rejects connection for host"10.42.221.65", user "srql_hydra", database "postgres", no encryption, indicating the CI runnerhost/IP or TLS/SSL mode is not allowed by the server’s
pg_hba.confrules (connection attempted withno encryption).Relevant error logs: