Fix/analytics page #2639
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2639
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2639/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2233
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2233
Original created: 2026-01-09T05:10:19Z
Original updated: 2026-01-09T06:54:49Z
Original head: carverauto/serviceradar:fix/analytics-page
Original base: testing
Original merged: 2026-01-09T06:53:51Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Refactoring
Description
Major architectural refactoring from poller-based to gateway-based system: Comprehensive terminology and implementation changes throughout the codebase replacing
pollerwithgatewayMulti-tenant gateway push architecture: Refactored sync service to support push-first communication model with gateway clients replacing pull-based gRPC methods
Gateway monitoring and status management: New comprehensive implementation for gateway health checks, offline/recovery detection, alert handling, and streaming status report support
NATS bootstrap and account management: Added CLI implementation for NATS bootstrap functionality with operator, system account, and platform account generation for multi-tenant routing
Protobuf definitions update: Refactored protobuf messages from poller to gateway architecture with new agent-gateway communication types and service definitions
Edge onboarding enhancements: Updated edge onboarding to support gateway terminology and added
EdgeOnboardingComponentTypeSynccomponent type supportRemoved legacy poller infrastructure: Deleted poller-specific packages, CLI commands, and related implementations in favor of gateway-based approach
Diagram Walkthrough
File Walkthrough
4 files
monitoring.pb.go
Refactor protobuf definitions from poller to gateway architectureproto/monitoring.pb.go
PollerIdfield toGatewayIdacross multiple message types(
StatusRequest,ResultsRequest,StatusResponse,ResultsResponse)PollerStatusRequest,PollerStatusResponse, andServiceStatusmessage typesPollerStatusChunkwith newGatewayStatusRequest,GatewayStatusResponse, andGatewayStatusChunktypesGatewayServiceStatusmessage type with tenant-related fields(
TenantId,TenantSlug)AgentHelloRequest,AgentHelloResponse,AgentConfigRequest,AgentConfigResponse,AgentCheckConfigPollerServicetoAgentGatewayServicewith new RPC methods
nats_bootstrap.go
Add comprehensive NATS bootstrap CLI implementationpkg/cli/nats_bootstrap.go
CLI
nats-bootstrapandadmin natssubcommands withsupport for token-based and local bootstrap modes
generation and configuration
configurations
service.go
Multi-tenant gateway push architecture with dynamic configpkg/sync/service.go
gateway push-first communication model
for push-based results and status reporting
per-tenant configuration management
dynamic configuration updates
GetResultsandStreamResultsgRPC methods infavor of push-based gateway communication
gateways.go
Gateway monitoring and status management implementationpkg/core/gateways.go
implementation with 1541 lines of core functionality
alert handling with event publishing to NATS
reassembly and service message processing
auto-registration of gateways/agents/checkers
1 files
main.go
Update API documentation terminologycmd/core/main.go
to reflect architectural terminology change
3 files
prod.exs
Add Elixir production configurationelixir/serviceradar_agent_gateway/config/prod.exs
infofor production environmentprod.exs
Elixir production configuration setupelixir/serviceradar_core/config/prod.exs
infofor production environment.gitkeep
Docker Compose credentials directory placeholderdocker/compose/creds/.gitkeep
Compose setup
2 files
nats_account.pb.go
NATS account management protobuf definitionsproto/nats_account.pb.go
message types and 1 enum
operator bootstrap operations
for tenant isolation
nats_account_grpc.pb.go
NATS account service gRPC definitionsproto/nats_account_grpc.pb.go
NATSAccountServicewithsix RPC methods
bootstrap, tenant account creation, and user credentials
implementations and service registration
1 files
mock_armis.go
Remove KVWriter from Armis mock interfacespkg/sync/integrations/armis/mock_armis.go
KVWriterinterface frommocked interfaces
2 files
edge_onboarding.go
Poller to gateway terminology refactoring and sync supportpkg/core/edge_onboarding.go
pollerterminology togatewaythroughout the file (100+ occurrences)
reflect gateway-centric architecture
EdgeOnboardingComponentTypeSynccomponent type inpackage creation and validation
config/pollers/toconfig/gateways/and related metadata keysmain.go
SNMP checker gateway terminology updatecmd/checkers/snmp/main.go
NewSNMPPollerServiceto
NewSNMPGatewayServicesnmp.Pollertosnmp.Gatewayfor consistencywith gateway terminology
101 files
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2233#issuecomment-3727220550
Original created: 2026-01-09T05:11:49Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
Tenant spoofing
Description: The newly added gateway push messages include client-populated identity/routing fields
(e.g.,
tenant_id,tenant_slug, andgateway_idinGatewayStatusRequest/GatewayStatusChunk/GatewayServiceStatus), which could enablecross-tenant data injection or routing spoofing if the receiving gateway/control-plane
code trusts these fields instead of deriving tenant and gateway identity from
authenticated context (e.g., mTLS cert claims) and enforcing authorization.
monitoring.pb.go [826-1127]
Referred Code
🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status: Passed
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Audit logging unclear: The PR introduces new gateway/agent status and config RPC flows but the diff does not show
whether these critical actions are audit-logged with actor identity, timestamp, action,
and outcome.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Error handling unknown: New request/response message types and gateway RPC endpoints are added, but the diff does
not show server/client-side handling for invalid payloads, missing required fields,
partial stream chunks, or other edge cases.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Input validation unseen: The PR adds externally-supplied identifiers and routing fields (e.g.,
tenant_id,tenant_slug,source_ip,gateway_id) but the diff does not show validation/sanitization orauthorization checks for these inputs.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2233#issuecomment-3727223101
Original created: 2026-01-09T05:13:16Z
PR Code Suggestions ✨
Latest suggestions up to
2e44e59Snapshot maps to avoid races
To prevent a race condition, create a deep copy of
s.tenantSourcesunder theread lock before passing it to
detectAndResetChangedIntervals.pkg/sync/service.go [1433-1444]
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a critical race condition that could cause the application to panic and provides a correct solution by creating a snapshot of the map while under lock.
Normalize tenant before schema lookup
Normalize the
tenantto an ID usingtenant_id_from/1before callingTenantSchemas.schema_for_tenant/1to prevent potential crashes from incorrecttypes.
web-ng/lib/serviceradar_web_ng/srql/ash_adapter.ex [646-652]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a potential crash by passing an incorrect type to
TenantSchemas.schema_for_tenant/1and proposes a robust fix by normalizing the tenant representation first.Safely convert decimals to integers
Safely convert
Decimalvalues to integers by first rounding them down to avoidcrashes when the decimal has a fractional part.
web-ng/lib/serviceradar_web_ng_web/live/analytics_live/index.ex [409-432]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a potential
ArithmeticErrorcrash when converting aDecimalwith a fractional part to an integer and provides a safe conversion method.Update modification timestamp on ingest
Explicitly update the
modified_timeattribute on the device changeset, inaddition to
last_seen_time, to ensure it is updated on every ingest.elixir/serviceradar_core/lib/serviceradar/inventory/sync_ingestor.ex [36-40]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies that the PR stopped updating
modified_timeon device ingest, which could break logic relying on this timestamp, and provides a correct fix.Previous suggestions
✅ Suggestions up to commit
98352f0✅
Fix race condition on sourcesSuggestion Impact:
The commit introduced snapshotLegacySources() to copy s.sources under a read lock, and updated runDiscovery (and related metrics) to use the snapshot instead of s.sources directly. It also applied the same snapshotting approach to the Armis update path.code diff:
To prevent a race condition in
runDiscovery, create a snapshot ofs.sourcesunder a read lock before using it.
pkg/sync/service.go [407-419]
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a critical race condition on the
s.sourcesmap, which could cause the application to panic, and proposes a valid solution.✅
Fix race condition on tenant sourcesSuggestion Impact:
The commit changed concurrency handling around tenant sources by taking a tenantMu read-lock in setTenantSources to snapshot the previous s.tenantSources ("oldSources") and passing that snapshot into detectAndResetChangedIntervals, removing the internal lock from detectAndResetChangedIntervals. This relates to the suggested race-condition concern, but it does not implement the suggested fix of holding a tenantMu write lock for the full duration of setTenantSources modifications.code diff:
Fix a race condition in
setTenantSourcesby acquiring a write lock ons.tenantMuat the start of the function and holding it until all modifications are
complete.
pkg/sync/service.go [1426-1441]
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a critical race condition where
s.tenantSourcesis read and then written without a continuous lock, which could lead to data corruption or panics.✅
Schedule daily cleanup tickSuggestion Impact:
Implemented the missing cleanupTicker case in the select loop, invoking cleanupUnknownGateways on each tick and logging failures.code diff:
Add a case for
cleanupTickerin theselectloop to ensure the daily cleanup ofunknown gateways is executed as intended.
pkg/core/gateways.go [44-73]
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a significant bug where the
cleanupTickeris initialized but never used, causing a critical daily cleanup task to be skipped. This fix is essential for the intended functionality.Prevent potential goroutine leak on cancellation
Replace
context.WithoutCancel(ctx)with the originalctxin the backgroundgoroutine to prevent potential goroutine leaks if the request is cancelled.
pkg/core/gateways.go [619-620]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a potential goroutine leak due to using a detached context. While the original code's intent was to ensure registration completes, respecting context cancellation is a more robust and safer pattern to prevent resource exhaustion.
Persist gateway recovery status to DB
Persist the gateway's recovered status to the database by queueing a status
update, ensuring the state is not lost on restart.
pkg/core/gateways.go [148-149]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that the gateway recovery status is only updated in-memory and not persisted, which is a bug in the backup recovery mechanism. Applying this fix ensures state consistency with the database.
Refactor to prevent potential deadlocks
Refactor
runDiscoveryForIntegrationsto avoid potential deadlocks by acquiringall necessary locks at once, retrieving data, and then releasing them.
pkg/sync/service.go [483-519]
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies a potential deadlock scenario due to inconsistent lock ordering, and the proposed refactoring improves code safety and maintainability.
Distinguish not-found from query errors
Refine error handling to specifically check for a "not found" error
(
db.ErrNotFound) instead of a generic query failure when determining if agateway is new.
pkg/core/gateways.go [445-454]
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly points out that the error handling is too broad, treating all query failures as a "not found" case. Using a specific
db.ErrNotFoundcheck improves robustness by distinguishing between a missing record and an actual database error.