Chore/arc fixes #2641
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2641
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2641/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2236
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2236
Original created: 2026-01-10T00:02:30Z
Original updated: 2026-01-10T00:05:45Z
Original head: carverauto/serviceradar:chore/arc-fixes
Original base: testing
Original merged: 2026-01-10T00:03:02Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Bug fix
Description
Refactored agent architecture from KV-based configuration to push-mode with file-based config loading, including graceful shutdown handling
Renamed "poller" to "gateway" terminology across all services (Go agents, SNMP checker, sysmon, trapd, zen consumer, rperf checker) for consistency
Added NATS credentials file support across multiple components (trapd, zen consumer, OTEL, flowgger, sysmon) with proper configuration and validation
Implemented comprehensive Elixir backend with multi-tenant architecture including:
Added NATS account service initialization in data-services with operator configuration and JWT resolver support
Enhanced CLI with new
nats-bootstrapandadminsubcommands for NATS operationsFixed missing
gateway_idfield in rperf checker responsesUpdated API documentation from "service pollers" to "service gateways"
Removed legacy components including old poller and sync services with associated Docker/Helm configurations
Diagram Walkthrough
File Walkthrough
29 files
main.go
Refactor agent to push mode with file-based configcmd/agent/main.go
onboarding to direct file-based config loading
loadConfig()function that supports embedded defaultsrunPushMode()function implementing push-based architecturewith gateway client and push loop
timeout (10 seconds)
onboarding dependencies
main.go
Initialize NATS account service with resolver supportcmd/data-services/main.go
and client identity validation
with fallback to config file
operations
NATSAccountServiceServerin gRPC service registrationmain.go
Add NATS bootstrap and admin commands, rename poller to gatewaycmd/cli/main.go
update-pollersubcommand toupdate-gatewayto reflectterminology change
nats-bootstrapsubcommand for NATS operationsadminsubcommand dispatcher withnatsadmin resource routingdispatchAdminCommand()helper function for admin subcommandrouting
main.go
Rename SNMP poller service to gateway servicecmd/checkers/snmp/main.go
SNMPPollerServicetoSNMPGatewayServicefor consistency withgateway terminology
Pollerstruct reference toGatewaystructapp.go
Rename gRPC poller service to agent gateway servicecmd/core/app/app.go
RegisterPollerServiceServertoRegisterAgentGatewayServiceServerconfig.rs
Add NATS credentials file configuration supportcmd/consumers/zen/src/config.rs
nats_creds_fileconfiguration field with validationnats_creds_path()method to resolve credentials file pathwith support for absolute/relative paths and cert directory resolution
nats_creds_fileis not empty when providedserver.rs
Rename poller_id to gateway_id in sysmon checkercmd/checkers/sysmon/src/server.rs
poller_idfield togateway_idin status and results responselogging and response building
config.rs
Add NATS credentials file configuration to OTELcmd/otel/src/config.rs
creds_filefield toNATSConfigTOMLstruct forcredentials file configuration
creds_filewith trimming and empty stringhandling
NATSConfigstruct to includecreds_filefieldcreds_fileusagemain.rs
Add NATS creds support and rename poller_id to gateway_idcmd/trapd/src/main.rs
secure and non-secure modes
credentials_file()whencreds_pathisavailable
poller_idtogateway_idin status and results responsebuilding
nats_output.rs
Add NATS credentials file support to flowggercmd/flowgger/src/flowgger/output/nats_output.rs
creds_filefield toNATSConfigstruct for credentials filesupport
creds_fileconfiguration with empty stringhandling
config.rs
Add NATS credentials file configuration to trapdcmd/trapd/src/config.rs
nats_creds_fileconfiguration field with validationnats_creds_path()method to resolve credentials file pathusing security config path resolution
nats_creds_fileis not empty when providednats_output.rs
Add NATS credentials file support to OTEL outputcmd/otel/src/nats_output.rs
creds_filefield toNATSConfigstruct with defaultNonevaluedebug logging
grpc_server.rs
Rename poller_id to gateway_id in zen consumercmd/consumers/zen/src/grpc_server.rs
poller_idfield togateway_idin status and results responsebuilding
nats.rs
Add NATS credentials file support to zen consumercmd/consumers/zen/src/nats.rs
nats_creds_path()methodsetup.rs
Add NATS creds file debug loggingcmd/otel/src/setup.rs
onboarding_packages.ex
Implement edge onboarding packages context moduleelixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex
package management
revocation, and soft-delete
SPIFFE URI support
create_with_tenant_cert()for multi-tenant deployments withautomatic certificate bundling
attributes
cluster_status.ex
Add unified cluster status API for distributed querieselixir/serviceradar_core/lib/serviceradar/cluster/cluster_status.ex
cluster
requiring coordinator role
queried from non-coordinator nodes
architecture and cross-node communication patterns
coordinator
agent_gateway_server.ex
gRPC Agent Gateway Server with Multi-Tenant mTLS Securityelixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex
multi-tenant security via mTLS certificates
hello(agent enrollment),get_config(configuration retrieval),push_status(status updates),and
stream_status(chunked streaming)component identity validation to prevent cross-tenant impersonation
services per request), and error handling with proper gRPC status
codes
agent.ex
OCSF-Compliant Agent Resource with State Machineelixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex
Agentresource using Ash framework with OCSF v1.4.0 compliancefor Go agent monitoring
connected, degraded, disconnected, unavailable)
SNMP) with UI metadata and OCSF type mappings
management, and heartbeat operations with multi-tenant isolation
tenant_registry.ex
Multi-Tenant Process Registry with Horde Integrationelixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex
multi-tenant process isolation and discovery
while using hash-based registry names for security
tenant-scoped processes (gateways, agents, checkers)
metadata and heartbeat tracking
alias_events.ex
Device Alias Lifecycle Event Tracking Systemelixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex
collectors) with change detection and audit logging
AliasRecordstruct to parse and compare alias metadata fromdevice records
persistence to
DeviceAliasStateresourcewith configurable thresholds
generator.ex
X.509 Certificate Generation for Tenant CAselixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex
using Erlang's
:public_keymoduleedge component certificate generation (1-year validity)
Usage, Extended Key Usage, SAN, Subject Key ID)
certificate validation and tenant identification
account_client.ex
NATS Account gRPC Client for Tenant Isolationelixir/serviceradar_core/lib/serviceradar/nats/account_client.ex
isolation
and operator bootstrap
DataService.Client unavailable
permissions, subject mappings)
spiffe.ex
SPIFFE/SPIRE Integration for Cluster Node Authorizationelixir/serviceradar_core/lib/serviceradar/spiffe.ex
authorization
certificate loading
ERTS distribution
capabilities
stats_aggregator.ex
Device Statistics Aggregator with Periodic Snapshotselixir/serviceradar_core/lib/serviceradar/core/stats_aggregator.ex
capability breakdowns
statistics
detection
tenant.ex
Multi-Tenant Organization Resource with NATS Integrationelixir/serviceradar_core/lib/serviceradar/identity/tenant.ex
role-based access control
AshCloak
generation for edge isolation
user access levels
device.ex
Device Actor for Distributed State Managementelixir/serviceradar_core/lib/serviceradar/actors/device.ex
tracking
hibernation after inactivity
broadcasting for state changes
client.ex
DataService KV Store gRPC Client with Reconnectionelixir/serviceradar_core/lib/serviceradar/data_service/client.ex
backoff
options
timeout handling
reserved_tenant_slug.ex
Tenant Slug Reservation Validationelixir/serviceradar_core/lib/serviceradar/identity/validations/reserved_tenant_slug.ex
cannot use it
case-insensitive string comparison
2 files
main.go
Update API documentation terminologycmd/core/main.go
in Swagger documentation
main.go
Update config-sync role documentationcmd/tools/config-sync/main.go
text
1 files
server.rs
Add gateway_id field to rperf checker responsescmd/checkers/rperf-client/src/server.rs
gateway_idfield to status response in error casegateway_idfield to results response1 files
message_processor.rs
Add nats_creds_file to zen test configcmd/consumers/zen/src/message_processor.rs
nats_creds_file: Nonefield to test configuration struct3 files
20260107043446_initial_schema.exs
Add comprehensive initial database schema migrationelixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs
multi-tenant ServiceRadar system
discovery, monitoring, alerts, and edge onboarding
proper foreign key relationships
performance
20260107043447_add_oban_jobs.exs
Oban Job Queue Tenant Migrationelixir/serviceradar_core/priv/repo/tenant_migrations/20260107043447_add_oban_jobs.exs
processing
.gitkeep
Docker Compose Credentials Directorydocker/compose/creds/.gitkeep
101 files
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2236#issuecomment-3731073168
Original created: 2026-01-10T00:03:58Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
Sensitive log exposure
Description: Potential sensitive information exposure via logging full request fields (e.g.,
req.details, plus IDs) atinfo!level; similar request-logging of potentiallyuser-controlled/sensitive fields appears in other updated services (e.g.,
cmd/trapd/src/*and
cmd/consumers/zen/src/grpc_server.rs), which could leak secrets or PII into logs ifdetailscontains credentials, tokens, or device/user data.server.rs [214-219]
Referred Code
🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Possible runtime crash: Converting role strings via
String.to_existing_atom/1can raise for unexpected rolevalues, creating an unhandled crash path instead of graceful authorization failure.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Sensitive data logged: The new
info!log lines includereq.details(and identifiers), which may contain sensitivepayload data and is emitted at info level.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Unsafe config parsing: New config parsing uses
v.as_str().unwrap()which can panic on malformed/unexpectedexternal configuration values instead of validating and returning a controlled error.
Referred Code
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Missing audit context: The newly added NATS account service initialization logs startup states but the diff does
not show audit logging for privileged gRPC actions (e.g., operator bootstrap/signing) with
actor identity and outcome.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Error exposure risk: The new config-loading errors include file paths and detailed parsing causes, and the diff
does not indicate whether these errors are strictly internal logs versus user-facing
outputs.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2236#issuecomment-3731083921
Original created: 2026-01-10T00:05:45Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Fix tenant-hopping security vulnerability
Prioritize the tenant ID from resource attributes (
attrs) over caller-providedoptions (
opts) inresolve_tenant/2to prevent a tenant-hopping securityvulnerability.
elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [448-461]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 10
__
Why: The suggestion correctly identifies a critical tenant-hopping security vulnerability and provides the correct fix, which is to prioritize the tenant ID from the resource attributes over caller-provided options.
Prevent tenant isolation security vulnerability
Use the full SHA256 hash in
base_name/1instead of a truncated version toprevent potential hash collisions that could break tenant isolation.
elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex [199-207]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a potential hash collision vulnerability from truncating the SHA256 hash, which could break tenant isolation, and proposes using the full hash to mitigate this security risk.
Strengthen multi-tenant authorization security policies
Strengthen multi-tenant authorization policies for
createandupdateactions byusing
forbid_ifto explicitly deny access if thetenant_iddoes not match theactor's tenant or is being improperly changed.
elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [408-427]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a security weakness in the multi-tenant authorization policies and proposes a more robust implementation using
forbid_ifandchangeset_attributeto prevent potential bypasses.Fix duplicate shutdown logic bug
Refactor the
pushLooperror handling to remove duplicate shutdown logic andprevent
server.Stop()from being called multiple times.cmd/agent/main.go [231-245]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a bug in the shutdown logic where
server.Stop()could be called twice, leading to unpredictable behavior. This is a significant correctness fix for error and signal handling.Handle potential ArgumentError in fallback
Add a
catchforArgumentErrorin theget_channel/0function to prevent crasheswhen
ServiceRadar.DataService.Clientis not available.elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [458-489]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies an unhandled
ArgumentErrorthat could crash the calling process, improving the robustness of the error handling and fallback logic.Handle lookup errors explicitly
In
build_lifecycle_events/2, handle potential errors fromlookup_existing_alias_records/2by propagating an error tuple instead ofsilently continuing.
elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [247-278]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly points out that a database lookup failure is silently ignored, which could lead to incorrect data and spurious events. Explicitly handling the error improves the function's robustness.
Fix incomplete alias change detection
Modify
alias_change_detected?/2to also detect when service or IP aliases areremoved, not just when they are added.
elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [307-315]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a bug where the removal of an alias is not detected, which would lead to an incomplete audit trail.
Use default limits struct
In
build_limits/1, return a default empty%Proto.AccountLimits{}struct insteadof
nilto prevent potential gRPC encoding errors.elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [524]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that returning
nilfor a protobuf message field can cause encoding errors, and providing a default empty struct is a robust fix.Check creds file exists
Before setting the resolver client, verify that the
systemCredsFileexists andis a valid file, logging a warning if it is not.
cmd/data-services/main.go [132-141]
Suggestion importance[1-10]: 7
__
Why: This suggestion improves robustness by adding a check to ensure the credentials file exists and is not a directory before attempting to use it, providing an early and more specific warning for misconfigurations.
Guard writing operator config path
Add a check to ensure
operatorConfigPathis not empty before callingnatsAccountServer.WriteOperatorConfig()to prevent writing to an invalid path.cmd/data-services/main.go [117-130]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a potential issue where
WriteOperatorConfigcould be called with an empty path. Adding a guard prevents this, making the startup logic more robust against partial configurations.Replace update! with safe update
Replace
Ash.update!withAsh.updateincreate_with_tenant_cert/2and handle thepotential error tuple to prevent the process from crashing on failure.
elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [552-557]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly advises replacing the bang version
Ash.update!with the saferAsh.updateand handling the error case, which improves the function's robustness and prevents potential crashes.Log fallback to default config
Add a log message to indicate when the agent falls back to using the embedded
default configuration because the specified config file was not found.
cmd/agent/main.go [112-126]
Suggestion importance[1-10]: 6
__
Why: This is a valuable observability improvement. Logging the fallback to the default configuration makes the agent's behavior more transparent and aids in debugging configuration issues, which could otherwise be silent.
Log when truncating status messages
Add a
Logger.warninginnormalize_message/2to log when a status message isbeing truncated, improving observability of silent data loss.
elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [591-609]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 6
__
Why: The suggestion improves observability by adding a log message when a status message is truncated, which is valuable for debugging and monitoring data integrity without being a critical issue.
Use consistent gateway_id for metrics
In
record_push_metrics/2, use the localgateway_id()helper instead ofConfig.gateway_id()for consistent metric tagging.elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [612-622]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 5
__
Why: The suggestion correctly points out an inconsistency in how
gateway_idis retrieved, which could lead to mismatched metric tags. Using the localgateway_id()helper ensures data consistency.