Cleanup/nats internal events #2642
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2642
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2642/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2238
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2238
Original created: 2026-01-10T01:52:08Z
Original updated: 2026-01-10T02:36:17Z
Original head: carverauto/serviceradar:cleanup/nats-internal-events
Original base: testing
Original merged: 2026-01-10T02:36:15Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Documentation
Description
Major architectural refactoring: Migrated from poller-based to gateway-based architecture across the entire codebase
Protobuf definitions updated: Renamed
PollerIdtoGatewayId, replacedPollerStatusRequest/ResponsewithGatewayStatusRequest/Response, and added new agent-gateway communication messagesPush-first sync service: Refactored
pkg/sync/service.goto implement push-based gateway architecture with multi-tenant support, replacing pull-based KV and gRPC clientsGateway monitoring implementation: Added comprehensive gateway health checks, offline/recovery detection, and NATS event publishing in
pkg/core/gateways.goNATS bootstrap CLI: Implemented new CLI command for NATS operator, system account, and platform account generation with multi-tenant routing support
Multi-tenant support: Added
TenantIdandTenantSlugfields throughout gateway-related messages and servicesExtensive cleanup: Removed deprecated poller infrastructure including
pkg/poller/package, poller-specific Docker/Helm configurations, and related build scriptsConfiguration updates: Added production and development Elixir configurations for agent gateway
API documentation: Updated terminology from "service pollers" to "service gateways"
Diagram Walkthrough
File Walkthrough
5 files
monitoring.pb.go
Refactor protobuf definitions from poller to gateway architectureproto/monitoring.pb.go
PollerIdfield toGatewayIdacross multiple message types(
StatusRequest,ResultsRequest,StatusResponse,ResultsResponse)PollerStatusRequest,PollerStatusResponse, andServiceStatusmessage typesPollerStatusChunkwith newGatewayStatusRequest,GatewayStatusResponse,GatewayStatusChunk, andGatewayServiceStatustypes
AgentHelloRequest,AgentHelloResponse,AgentConfigRequest,AgentConfigResponse,AgentCheckConfigPollerServicetoAgentGatewayServicewith new RPC methods
TenantId,TenantSlug) togateway-related messages
nats_bootstrap.go
Add NATS bootstrap CLI command implementationpkg/cli/nats_bootstrap.go
nats-bootstrapandadmin natssubcommandsgeneration
management
service.go
Migrate sync service to push-first gateway architecture withmulti-tenant supportpkg/sync/service.go
client for push-first architecture
integrations, and results tracking
for dynamic configuration
interval-based filtering
sizes
GetResultsandStreamResultsmethods in favor ofpush-based gateway integration
gateways.go
Gateway monitoring and status management implementationpkg/core/gateways.go
implementation with 1549 lines of core functionality
event publishing to NATS
message processing
auto-registration capabilities
edge_onboarding.go
Poller to gateway terminology refactoring and sync supportpkg/core/edge_onboarding.go
pollerreferences togatewaythroughout the file
reflect gateway terminology
EdgeOnboardingComponentTypeSynccomponent type inpackage creation and validation
config/pollers/toconfig/gateways/andrelated configuration handling
1 files
main.go
Update API documentation terminologycmd/core/main.go
2 files
prod.exs
Add production configuration for Elixir agent gatewayelixir/serviceradar_agent_gateway/config/prod.exs
infofor production environmentdev.exs
Add Elixir development configurationelixir/serviceradar_agent_gateway/config/dev.exs
2 files
nats_account.pb.go
Add NATS account management protobuf definitionsproto/nats_account.pb.go
credentials
account creation, and user credential generation
nats_account_grpc.pb.go
NATS account service gRPC definitionsproto/nats_account_grpc.pb.go
NATSAccountServicewith 6RPC methods
account creation, and credential generation
and method descriptors
1 files
ubnt_poller_test.go
Test parameter terminology updatepkg/mapper/ubnt_poller_test.go
PollerID: "poller1"toGatewayID:"gateway1"for consistency101 files
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2238#issuecomment-3731448946
Original created: 2026-01-10T01:53:39Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
No security concerns identified
No security vulnerabilities detected by AI analysis. Human verification advised for critical code.🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Audit logs unverified: The PR description indicates new gateway health/offline detection and NATS event
publishing, but those code paths are in files not included in the provided diff so it
cannot be verified that critical actions are logged with actor identity, timestamp,
action, and outcome.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Error handling unverified: The PR description claims a major refactor to a push-first gateway sync service, but the
relevant implementation files are not included in the provided diff so potential failure
points and edge-case handling cannot be verified.
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
CLI errors unverified: The PR introduces NATS bootstrap/account generation functionality, but the new CLI code is
not present in the provided diff so it cannot be verified that user-facing errors avoid
leaking sensitive internal details or secrets.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Sensitive logs unverified: The PR adds multi-tenant NATS account and key management, but the implementation is not
included in the provided diff so it cannot be verified that logs are structured and do not
emit secrets (keys, tokens, credentials) or tenant-identifying data inappropriately.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Input validation unverified: New external-facing fields such as
tenant_id,tenant_slug,gateway_id, andconfig_jsonwere added to protobuf messages, but the server-side handlers/validators consuming these
inputs are not included in the provided diff so validation, authorization, and secure
handling cannot be confirmed.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2238#issuecomment-3731457594
Original created: 2026-01-10T01:55:25Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Leverage NATS features for service monitoring
The custom gateway monitoring system in
pkg/core/gateways.goshould besimplified. Instead of bespoke health checks and status management, leverage
built-in NATS features like the Key-Value Store and service heartbeats for a
more robust and maintainable solution.
Examples:
pkg/core/gateways.go [1-1549]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies that the custom-built monitoring in
pkg/core/gateways.gocould be greatly simplified by using built-in NATS features like KV Store and service heartbeats, reducing significant code complexity.Prevent race condition in goroutine
To prevent a potential race condition, pass the
reqobject as an argument to thegoroutine instead of capturing it in the closure.
pkg/core/gateways.go [624-629]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies and fixes a potential race condition by passing the
reqpointer as an argument to the goroutine, preventing the closure from capturing a variable that could be modified concurrently.Set actual gateway ID in status chunk
In
buildGatewayStatusChunks, populate theGatewayIdfield ofproto.GatewayStatusChunkwith thegatewayIDvariable instead of an empty string.pkg/sync/service.go [1257-1269]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that
GatewayIdis hardcoded to an empty string, while the correctgatewayIDvariable is available and should be used for proper message correlation.Populate GatewayId in heartbeat request
In
pushHeartbeat, retrieves.config.GatewayIDand use it to populate theGatewayIdfield in theGatewayStatusRequestinstead of using an empty string.pkg/sync/service.go [1342-1370]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out that
GatewayIdin the heartbeat request is hardcoded to an empty string, and it should be populated with the value from the service configuration.Add missing tenant fields to response
Add the
TenantIdandTenantSlugfields to theAgentConfigResponsestruct toensure it provides comprehensive tenant context.
proto/monitoring.pb.go [1518-1532]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that
AgentConfigResponseis missing tenant fields, which is inconsistent with other new messages, and proposes adding them to make the configuration response more complete.