Updates/sync service ng #2632

Merged
mfreeman451 merged 26 commits from refs/pull/2632/head into testing 2026-01-06 09:05:54 +00:00
mfreeman451 commented 2026-01-05 21:17:13 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2225
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2225
Original created: 2026-01-05T21:17:13Z
Original updated: 2026-01-06T09:05:56Z
Original head: carverauto/serviceradar:updates/sync-service-ng
Original base: testing
Original merged: 2026-01-06T09:05:54Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Enhancement


Description

Major architectural refactoring from gRPC server-based sync service to push-mode agent gateway communication model with comprehensive multi-tenant support:

Core Architecture Changes:

  • Replaced centralized sync service with distributed push-mode agents communicating directly to agent gateway

  • Refactored Go agent (cmd/agent/main.go) from gRPC server to push-mode client with signal handling and graceful shutdown

  • Implemented agent gateway gRPC server in Elixir with mTLS certificate validation and multi-tenant isolation

NATS Integration Enhancements:

  • Added NATS credentials file support across all services (trapd, zen consumer, otel, flowgger, sysmon checker, rperf-client)

  • Implemented NATS account service initialization with operator key management and bootstrap support

  • Added NATS Account Client for tenant-isolated account management and JWT signing

Multi-Tenant Infrastructure:

  • Implemented distributed process registry using Horde for per-tenant process isolation and discovery

  • Created comprehensive database schema with tenant-aware tables for devices, agents, pollers, and monitoring

  • Added tenant CA certificate generation for secure component onboarding with X.509 support

  • Integrated SPIFFE/SPIRE for distributed cluster authentication

Agent and Status Tracking:

  • Implemented agent tracking with ETS-based in-memory state management and TTL-based stale detection

  • Added gateway_id field to response messages across checkers and consumers for proper status attribution

  • Created agent resource with state machine (connecting, connected, degraded, disconnected, unavailable) and capability management

Monitoring and Operations:

  • Implemented device actor for runtime state management with event buffering and health tracking

  • Added device statistics aggregator for periodic snapshots and capability breakdowns

  • Created gateway resource with state machine for job orchestrator management

  • Implemented DataService KV client with automatic reconnection and mTLS support

CLI and Configuration:

  • Added nats-bootstrap and admin commands for NATS operations

  • Implemented file-based agent configuration loading with environment variable fallback

  • Added comprehensive configuration support for NATS credentials across all components

Testing:

  • Added agent registry tests with multi-tenant isolation verification

  • Implemented event publisher tests for infrastructure event functionality

  • Updated test configurations for NATS credentials support


Diagram Walkthrough

flowchart LR
  A["Go Agents<br/>Push-Mode"] -->|"mTLS<br/>Status Push"| B["Agent Gateway<br/>gRPC Server"]
  B -->|"Config<br/>Delivery"| A
  B -->|"Tenant<br/>Isolation"| C["Tenant<br/>Registry"]
  C -->|"Process<br/>Management"| D["Horde<br/>Infrastructure"]
  E["NATS<br/>Services"] -->|"Credentials<br/>File"| F["All<br/>Components"]
  G["Agent<br/>Tracker"] -->|"ETS<br/>State"| H["Device<br/>Actor"]
  H -->|"Events"| I["Event<br/>Publisher"]
  J["Certificate<br/>Generator"] -->|"X.509<br/>Certs"| K["Tenant<br/>CA"]

File Walkthrough

Relevant files
Enhancement
27 files
main.go
Agent refactored to push-mode gateway communication model

cmd/agent/main.go

  • Refactored agent startup from gRPC server model to push-mode
    architecture with gateway communication
  • Removed edge onboarding, KV watch, and config bootstrap dependencies;
    simplified to file-based config loading
  • Added loadConfig() function for JSON config parsing with embedded
    default fallback via environment variable
  • Implemented runPushMode() with signal handling, push loop management,
    and graceful shutdown with timeout
  • Added version injection via ldflags and proper error definitions for
    configuration validation
+171/-74
main.go
NATS account service initialization and configuration       

cmd/data-services/main.go

  • Added NATS account service initialization with operator key management
    and bootstrap support
  • Configured resolver paths and system account credentials from
    environment variables with fallback to config
  • Registered NATSAccountServiceServer in gRPC service registration
  • Added validation and logging for allowed client identities and
    resolver configuration
+68/-0   
main.go
CLI commands for NATS bootstrap and admin operations         

cmd/cli/main.go

  • Added nats-bootstrap command dispatch for NATS bootstrap operations
  • Added admin command dispatch with nats subcommand routing via
    dispatchAdminCommand()
  • Implemented error handling for unknown admin resources
+14/-0   
main.rs
NATS credentials support and gateway ID tracking in trapd

cmd/trapd/src/main.rs

  • Added support for NATS credentials file authentication via
    nats_creds_path() method
  • Integrated credentials file loading into all NATS connection modes
    (None, Spiffe, Mtls)
  • Added gateway_id field to trap and result response messages for proper
    attribution
+23/-1   
config.rs
NATS credentials file configuration for zen consumer         

cmd/consumers/zen/src/config.rs

  • Added nats_creds_file optional configuration field with validation
  • Implemented nats_creds_path() method to resolve credentials file paths
    with cert_dir support
  • Added validation to ensure non-empty credentials file paths
+26/-0   
config.rs
NATS credentials file configuration for trapd                       

cmd/trapd/src/config.rs

  • Added nats_creds_file optional configuration field with validation
  • Implemented nats_creds_path() method to resolve credentials file paths
    using security config
  • Added validation to reject empty credentials file values
+21/-0   
config.rs
NATS credentials file support in otel configuration           

cmd/otel/src/config.rs

  • Added creds_file optional field to NATSConfigTOML struct
  • Implemented credentials file path resolution in config parsing with
    empty string handling
  • Updated default config example to include credentials file path
+13/-0   
nats_output.rs
NATS credentials file support in flowgger output                 

cmd/flowgger/src/flowgger/output/nats_output.rs

  • Added creds_file optional field to NATSConfig struct
  • Implemented credentials file loading from configuration with empty
    string trimming
  • Integrated credentials file into NATS connection options
+14/-0   
nats_output.rs
NATS credentials file support in otel NATS output               

cmd/otel/src/nats_output.rs

  • Added creds_file optional field to NATSConfig struct with default None
    value
  • Integrated credentials file loading into NATS connection options with
    debug logging
+7/-0     
server.rs
Gateway ID tracking in sysmon checker responses                   

cmd/checkers/sysmon/src/server.rs

  • Added gateway_id field to both PingResponse and ResultResponse
    messages for proper status attribution
+2/-0     
grpc_server.rs
Gateway ID tracking in zen consumer responses                       

cmd/consumers/zen/src/grpc_server.rs

  • Added gateway_id field to both PingResponse and ResultResponse
    messages for status attribution
+2/-0     
server.rs
Poller and gateway ID tracking in rperf responses               

cmd/checkers/rperf-client/src/server.rs

  • Added poller_id and gateway_id fields to both error and success
    response messages
+4/-0     
nats.rs
NATS credentials file integration in zen NATS connection 

cmd/consumers/zen/src/nats.rs

  • Added credentials file loading into NATS connection options via
    nats_creds_path()
+4/-0     
setup.rs
NATS credentials file debug logging                                           

cmd/otel/src/setup.rs

  • Added debug logging for NATS credentials file configuration
+1/-0     
agent_gateway_server.ex
Agent gateway gRPC server with mTLS and multi-tenant support

elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex

  • Implemented complete gRPC server for agent gateway receiving status
    pushes from Go agents
  • Implemented hello() for agent enrollment with mTLS certificate
    validation and tenant extraction
  • Implemented get_config() for agent configuration delivery with
    versioning support
  • Implemented push_status() and stream_status() for status reception
    with validation and resource limits
  • Integrated tenant security via mTLS certificate parsing and
    multi-tenant isolation
+1013/-0
tenant_registry.ex
Multi-tenant process registry with Horde infrastructure   

elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex

  • Implemented multi-tenant process registry using Horde for distributed
    process discovery and supervision
  • Created per-tenant Horde Registry and DynamicSupervisor for process
    isolation
  • Implemented slug-to-UUID mapping via ETS table for admin/debug lookups
  • Added convenience functions for registering and managing pollers,
    agents, and gateways
  • Implemented tenant infrastructure lifecycle management with lazy
    initialization
+682/-0 
agent_tracker.ex
Agent tracking with ETS and stale detection                           

elixir/serviceradar_core/lib/serviceradar/agent_tracker.ex

  • Implemented in-memory agent tracking using ETS with TTL-based stale
    detection
  • Added agent status tracking with metadata (service count, partition,
    source IP)
  • Implemented periodic cleanup of stale agents (24-hour threshold)
  • Added PubSub broadcast for UI updates on agent status changes
+177/-0 
agent.ex
Agent resource with state machine and capability management

elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex

  • New Ash resource for managing Go agents with OCSF v1.4.0 compliance
  • Implements agent lifecycle state machine with transitions (connecting,
    connected, degraded, disconnected, unavailable)
  • Defines agent capabilities (ICMP, TCP, HTTP, gRPC, DNS, Process, SNMP)
    with metadata and icons
  • Provides JSON API routes for agent registration, connection
    management, and status updates
  • Includes calculated fields for display name, online status, status
    color, and gRPC endpoint
+649/-0 
onboarding_packages.ex
Onboarding packages context with token and certificate management

elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex

  • Context module for edge onboarding package operations with CRUD
    functionality
  • Implements token generation, encryption, and verification for join and
    download tokens
  • Provides package delivery workflow with state transitions (issued,
    delivered, activated, revoked)
  • Integrates tenant CA certificate generation for secure component
    onboarding
  • Includes soft-delete and revocation capabilities with audit event
    recording
+606/-0 
alias_events.ex
Device alias event tracking and lifecycle management         

elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex

  • Device alias lifecycle event tracking module ported from Go core
  • Detects and processes alias changes (service IDs, IPs, collector IPs)
    from device metadata
  • Builds lifecycle events for audit and alerting with change detection
    logic
  • Manages DeviceAliasState resource with sighting counts and state
    transitions
  • Provides metadata parsing and comparison utilities for alias records
+650/-0 
generator.ex
X.509 certificate generation for tenant CAs and components

elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex

  • X.509 certificate generation for per-tenant CAs and edge components
  • Generates tenant intermediate CAs (10-year validity) signed by
    platform root CA
  • Creates edge component certificates (1-year validity) with SPIFFE IDs
    and SANs
  • Provides certificate parsing, CN extraction, and SPKI SHA-256
    computation
  • Uses Erlang :public_key module for all cryptographic operations
+541/-0 
spiffe.ex
SPIFFE/SPIRE integration for cluster authentication           

elixir/serviceradar_core/lib/serviceradar/spiffe.ex

  • SPIFFE/SPIRE integration module for distributed cluster authentication
  • Provides SSL/TLS options for ERTS distribution with SPIFFE certificate
    verification
  • Parses and validates SPIFFE IDs with trust domain verification
  • Monitors certificate files for rotation and provides expiry
    information
  • Supports both filesystem and SPIRE Workload API modes for certificate
    loading
+564/-0 
account_client.ex
NATS Account Client for Tenant Isolation                                 

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex

  • New gRPC client module for NATS account management with tenant
    isolation support
  • Implements account creation, user credential generation, JWT signing,
    and operator bootstrap
  • Provides channel management with fallback to fresh connections and
    SSL/TLS support
  • Includes helper functions for building protobuf request structures
    (limits, permissions, mappings)
+563/-0 
stats_aggregator.ex
Device Statistics Aggregator with Periodic Snapshots         

elixir/serviceradar_core/lib/serviceradar/core/stats_aggregator.ex

  • New GenServer module for periodic device statistics aggregation and
    caching
  • Implements canonical record selection, device filtering, and
    per-partition statistics
  • Tracks device availability, activity windows, and capability
    breakdowns (ICMP, SNMP, Sysmon)
  • Includes telemetry recording, alert handler integration, and snapshot
    management
+626/-0 
device.ex
Device Actor for Runtime State Management                               

elixir/serviceradar_core/lib/serviceradar/actors/device.ex

  • New GenServer-based device actor for in-memory state management and
    caching
  • Implements identity resolution, event buffering, health status
    tracking with state machine
  • Provides lazy initialization, Horde registry integration, and
    automatic hibernation
  • Includes event flushing, health check recording, and PubSub
    broadcasting
+608/-0 
gateway.ex
Gateway Resource with State Machine and Job Orchestration

elixir/serviceradar_core/lib/serviceradar/infrastructure/gateway.ex

  • New Ash resource for gateway (job orchestrator) management with state
    machine
  • Implements status transitions (healthy, degraded, offline, recovering,
    maintenance, draining)
  • Provides registration, heartbeat, and health tracking with tenant
    isolation
  • Includes JSON API routes and policy-based authorization
+544/-0 
client.ex
DataService KV Client with Reconnection and mTLS                 

elixir/serviceradar_core/lib/serviceradar/data_service/client.ex

  • New GenServer-based gRPC client for datasvc KV store operations
  • Implements put, get, delete, list_keys, and put_many operations with
    automatic reconnection
  • Provides exponential backoff reconnection strategy and mTLS
    certificate support
  • Handles gun connection lifecycle events and ensures connection before
    operations
+507/-0 
Tests
3 files
message_processor.rs
Test configuration update for NATS credentials                     

cmd/consumers/zen/src/message_processor.rs

  • Added nats_creds_file: None field to test configuration struct
+1/-0     
agent_registry_test.exs
Agent registry tests with multi-tenant isolation                 

elixir/serviceradar_core/test/serviceradar/registry/agent_registry_test.exs

  • Comprehensive test suite for AgentRegistry with 362 lines of test
    cases
  • Tests agent registration with gRPC connection details and metadata
    storage
  • Verifies multi-tenant isolation for registry operations and lookups
  • Tests capability-based and partition-based agent discovery
  • Includes heartbeat, unregister, and count operations
+362/-0 
event_publisher_test.exs
Event Publisher Tests for Infrastructure Events                   

elixir/serviceradar_core/test/serviceradar/infrastructure/event_publisher_test.exs

  • New test module for infrastructure event publishing functionality
  • Tests event structure building for state changes, registration,
    deregistration, and health changes
  • Verifies mandatory field requirements and supported entity/event types
  • Mocks NATS connection to avoid requiring running NATS server in unit
    tests
+143/-0 
Configuration
1 files
20251224184523_initial_schema.exs
Initial database schema migration for ServiceRadar core   

elixir/serviceradar_core/priv/repo/migrations/20251224184523_initial_schema.exs

  • Created comprehensive database schema with tables for tenants, users,
    devices, agents, service checks, and monitoring
  • Defined relationships between core entities (devices, agents, pollers,
    partitions)
  • Added support for edge onboarding packages, monitoring events, and
    alerts
  • Implemented multi-tenant isolation with tenant_id foreign keys across
    all tables
+733/-0 
Dependencies
1 files
monitoring.pb.ex
Protobuf Definitions for Monitoring Services                         

elixir/serviceradar_core/lib/serviceradar/proto/monitoring.pb.ex

  • Auto-generated protobuf definitions for monitoring service messages
  • Defines request/response structures for agent, poller, and gateway
    status reporting
  • Includes sweep completion tracking, chunked streaming, and agent
    configuration
  • Supports agent hello handshake with capabilities and heartbeat
    intervals
+398/-0 
Additional files
101 files
.bazelignore +4/-0     
.bazelrc +2/-0     
.env.example +38/-0   
AGENTS.md +171/-2 
MODULE.bazel +5/-0     
Makefile +4/-0     
README-Docker.md +16/-1   
BUILD.bazel +11/-0   
BUILD.bazel +12/-0   
mix_release.bzl +120/-49
BUILD.bazel +1/-0     
config.json +5/-6     
monitoring.proto +4/-1     
zen-consumer-with-otel.json +1/-0     
zen-consumer.json +1/-0     
BUILD.bazel +1/-0     
README.md +9/-12   
flowgger.toml +1/-0     
otel.toml +1/-0     
BUILD.bazel +0/-25   
config.json +0/-77   
main.go +0/-123 
docker-compose.elx.yml +108/-0 
docker-compose.spiffe.yml +8/-55   
docker-compose.yml +298/-280
Dockerfile.agent-gateway +94/-0   
Dockerfile.core-elx +108/-0 
Dockerfile.poller-elx +95/-0   
Dockerfile.sync +0/-95   
Dockerfile.web-ng +6/-0     
agent-minimal.docker.json +6/-6     
agent.docker.json +5/-20   
agent.mtls.json +7/-10   
.gitkeep +1/-0     
datasvc.docker.json +3/-2     
datasvc.mtls.json +14/-1   
db-event-writer.mtls.json +1/-0     
entrypoint-certs.sh +12/-8   
entrypoint-sync.sh +0/-96   
flowgger.docker.toml +2/-1     
generate-certs.sh +217/-12
nats.docker.conf +16/-160
netflow-consumer.mtls.json +1/-0     
otel.docker.toml +2/-0     
pg_hba.conf +9/-0     
pg_ident.conf +17/-0   
bootstrap-compose-spire.sh +0/-1     
ssl_dist.core.conf +17/-0   
ssl_dist.gateway.conf +17/-0   
ssl_dist.poller.conf +17/-0   
ssl_dist.web.conf +17/-0   
sync.docker.json +0/-71   
sync.mtls.json +0/-75   
trapd.docker.json +2/-1     
zen.docker.json +2/-1     
BUILD.bazel +80/-42 
push_targets.bzl +2/-1     
SBOM.md +0/-1     
agents.md +8/-9     
architecture.md +100/-26
armis.md +7/-21   
ash-api.md +305/-0 
ash-authentication.md +244/-0 
ash-authorization.md +283/-0 
ash-domains.md +223/-0 
ash-migration-guide.md +339/-0 
auth-configuration.md +1/-1     
cluster.md +2/-2     
discovery.md +19/-20 
docker-setup.md +57/-14 
edge-agents.md +253/-0 
edge-onboarding.md +316/-273
identity_drift_monitoring.md +1/-1     
installation.md +13/-14 
intro.md +5/-5     
kv-configuration.md +3/-3     
netbox.md +4/-23   
netflow.md +1/-1     
onboarding-review-2025.md +0/-1     
rperf-monitoring.md +1/-1     
compose-mtls-sysmonosx.md +1/-1     
docker-compose-login-500.md +3/-3     
security-architecture.md +238/-0 
self-signed.md +32/-107
service-port-map.md +3/-5     
snmp.md +2/-2     
spiffe-identity.md +52/-0   
srql-service.md +2/-2     
sync.md +75/-482
syslog.md +1/-1     
troubleshooting-guide.md +53/-2   
web-ui.md +31/-106
datasvc.ex +32/-2   
BUILD.bazel +30/-0   
config.exs +13/-0   
dev.exs +4/-0     
prod.exs +4/-0     
runtime.exs +209/-0 
test.exs +20/-0   
agent_client.ex +472/-0 
Additional files not shown

Imported from GitHub pull request. Original GitHub pull request: #2225 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2225 Original created: 2026-01-05T21:17:13Z Original updated: 2026-01-06T09:05:56Z Original head: carverauto/serviceradar:updates/sync-service-ng Original base: testing Original merged: 2026-01-06T09:05:54Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Enhancement ___ ### **Description** Major architectural refactoring from gRPC server-based sync service to push-mode agent gateway communication model with comprehensive multi-tenant support: **Core Architecture Changes:** - Replaced centralized sync service with distributed push-mode agents communicating directly to agent gateway - Refactored Go agent (`cmd/agent/main.go`) from gRPC server to push-mode client with signal handling and graceful shutdown - Implemented agent gateway gRPC server in Elixir with mTLS certificate validation and multi-tenant isolation **NATS Integration Enhancements:** - Added NATS credentials file support across all services (trapd, zen consumer, otel, flowgger, sysmon checker, rperf-client) - Implemented NATS account service initialization with operator key management and bootstrap support - Added NATS Account Client for tenant-isolated account management and JWT signing **Multi-Tenant Infrastructure:** - Implemented distributed process registry using Horde for per-tenant process isolation and discovery - Created comprehensive database schema with tenant-aware tables for devices, agents, pollers, and monitoring - Added tenant CA certificate generation for secure component onboarding with X.509 support - Integrated SPIFFE/SPIRE for distributed cluster authentication **Agent and Status Tracking:** - Implemented agent tracking with ETS-based in-memory state management and TTL-based stale detection - Added `gateway_id` field to response messages across checkers and consumers for proper status attribution - Created agent resource with state machine (connecting, connected, degraded, disconnected, unavailable) and capability management **Monitoring and Operations:** - Implemented device actor for runtime state management with event buffering and health tracking - Added device statistics aggregator for periodic snapshots and capability breakdowns - Created gateway resource with state machine for job orchestrator management - Implemented DataService KV client with automatic reconnection and mTLS support **CLI and Configuration:** - Added `nats-bootstrap` and admin commands for NATS operations - Implemented file-based agent configuration loading with environment variable fallback - Added comprehensive configuration support for NATS credentials across all components **Testing:** - Added agent registry tests with multi-tenant isolation verification - Implemented event publisher tests for infrastructure event functionality - Updated test configurations for NATS credentials support ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Go Agents<br/>Push-Mode"] -->|"mTLS<br/>Status Push"| B["Agent Gateway<br/>gRPC Server"] B -->|"Config<br/>Delivery"| A B -->|"Tenant<br/>Isolation"| C["Tenant<br/>Registry"] C -->|"Process<br/>Management"| D["Horde<br/>Infrastructure"] E["NATS<br/>Services"] -->|"Credentials<br/>File"| F["All<br/>Components"] G["Agent<br/>Tracker"] -->|"ETS<br/>State"| H["Device<br/>Actor"] H -->|"Events"| I["Event<br/>Publisher"] J["Certificate<br/>Generator"] -->|"X.509<br/>Certs"| K["Tenant<br/>CA"] ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>27 files</summary><table> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Agent refactored to push-mode gateway communication model</code></dd></summary> <hr> cmd/agent/main.go <ul><li>Refactored agent startup from gRPC server model to push-mode <br>architecture with gateway communication<br> <li> Removed edge onboarding, KV watch, and config bootstrap dependencies; <br>simplified to file-based config loading<br> <li> Added <code>loadConfig()</code> function for JSON config parsing with embedded <br>default fallback via environment variable<br> <li> Implemented <code>runPushMode()</code> with signal handling, push loop management, <br>and graceful shutdown with timeout<br> <li> Added version injection via ldflags and proper error definitions for <br>configuration validation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3">+171/-74</a></td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>NATS account service initialization and configuration</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/data-services/main.go <ul><li>Added NATS account service initialization with operator key management <br>and bootstrap support<br> <li> Configured resolver paths and system account credentials from <br>environment variables with fallback to config<br> <li> Registered <code>NATSAccountServiceServer</code> in gRPC service registration<br> <li> Added validation and logging for allowed client identities and <br>resolver configuration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816">+68/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>CLI commands for NATS bootstrap and admin operations</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/cli/main.go <ul><li>Added <code>nats-bootstrap</code> command dispatch for NATS bootstrap operations<br> <li> Added <code>admin</code> command dispatch with <code>nats</code> subcommand routing via <br><code>dispatchAdminCommand()</code><br> <li> Implemented error handling for unknown admin resources</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-ed4d81d29a7267f93fd77e17993fd3491b9ef6ded18490b4514d10ed1d803bc2">+14/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.rs</strong><dd><code>NATS credentials support and gateway ID tracking in trapd</code></dd></summary> <hr> cmd/trapd/src/main.rs <ul><li>Added support for NATS credentials file authentication via <br><code>nats_creds_path()</code> method<br> <li> Integrated credentials file loading into all NATS connection modes <br>(None, Spiffe, Mtls)<br> <li> Added <code>gateway_id</code> field to trap and result response messages for proper <br>attribution</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-33b655d8730ae3e9c844ee280787d11f1b0d5343119188273f89558805f814ba">+23/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>NATS credentials file configuration for zen consumer</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/config.rs <ul><li>Added <code>nats_creds_file</code> optional configuration field with validation<br> <li> Implemented <code>nats_creds_path()</code> method to resolve credentials file paths <br>with cert_dir support<br> <li> Added validation to ensure non-empty credentials file paths</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-05038f3867985e757de9027609950e682bad6d1992dac6acd7c28962a3c65dc4">+26/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>NATS credentials file configuration for trapd</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/trapd/src/config.rs <ul><li>Added <code>nats_creds_file</code> optional configuration field with validation<br> <li> Implemented <code>nats_creds_path()</code> method to resolve credentials file paths <br>using security config<br> <li> Added validation to reject empty credentials file values</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-c89b88ba4d2bf0a054d0ba69a672a92c30140b8d19503d67b980a218ffe3106d">+21/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>NATS credentials file support in otel configuration</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/config.rs <ul><li>Added <code>creds_file</code> optional field to <code>NATSConfigTOML</code> struct<br> <li> Implemented credentials file path resolution in config parsing with <br>empty string handling<br> <li> Updated default config example to include credentials file path</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-abbaec651da3d6af96b482e0f77bb909b65dbe0cabd78b5803769cc9dab0a1b0">+13/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats_output.rs</strong><dd><code>NATS credentials file support in flowgger output</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/flowgger/src/flowgger/output/nats_output.rs <ul><li>Added <code>creds_file</code> optional field to <code>NATSConfig</code> struct<br> <li> Implemented credentials file loading from configuration with empty <br>string trimming<br> <li> Integrated credentials file into NATS connection options</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a82e2e4d413539bf0b414b5629665b19648447523994cba639c4d1238aa5a0c1">+14/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats_output.rs</strong><dd><code>NATS credentials file support in otel NATS output</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/nats_output.rs <ul><li>Added <code>creds_file</code> optional field to <code>NATSConfig</code> struct with default None <br>value<br> <li> Integrated credentials file loading into NATS connection options with <br>debug logging</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-6b585ea3564a481174e04da1270e2e13edd4e2b980d02a2652d6d21e6d82a498">+7/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>server.rs</strong><dd><code>Gateway ID tracking in sysmon checker responses</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/sysmon/src/server.rs <ul><li>Added <code>gateway_id</code> field to both <code>PingResponse</code> and <code>ResultResponse</code> <br>messages for proper status attribution</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33e">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>grpc_server.rs</strong><dd><code>Gateway ID tracking in zen consumer responses</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/grpc_server.rs <ul><li>Added <code>gateway_id</code> field to both <code>PingResponse</code> and <code>ResultResponse</code> <br>messages for status attribution</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e4564a93f6cf84ff91cd3d8141fc9272ec9b4ec19defd107afa42be01fcfed5b">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>server.rs</strong><dd><code>Poller and gateway ID tracking in rperf responses</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/rperf-client/src/server.rs <ul><li>Added <code>poller_id</code> and <code>gateway_id</code> fields to both error and success <br>response messages</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-bce0f4ca6548712f224b73816825d28e831acbbff7dbed3c98671ed50f65d028">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats.rs</strong><dd><code>NATS credentials file integration in zen NATS connection</code>&nbsp; </dd></summary> <hr> cmd/consumers/zen/src/nats.rs <ul><li>Added credentials file loading into NATS connection options via <br><code>nats_creds_path()</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-97f7335def0ad5d644b594a1076ae2d7080b11259cbb8de22c7946cc8e4b39f8">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>setup.rs</strong><dd><code>NATS credentials file debug logging</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/setup.rs - Added debug logging for NATS credentials file configuration </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-3891f667deb20fd26e296d3e2742c57378d3764fe1743118e612465ae360391f">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>agent_gateway_server.ex</strong><dd><code>Agent gateway gRPC server with mTLS and multi-tenant support</code></dd></summary> <hr> elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex <ul><li>Implemented complete gRPC server for agent gateway receiving status <br>pushes from Go agents<br> <li> Implemented <code>hello()</code> for agent enrollment with mTLS certificate <br>validation and tenant extraction<br> <li> Implemented <code>get_config()</code> for agent configuration delivery with <br>versioning support<br> <li> Implemented <code>push_status()</code> and <code>stream_status()</code> for status reception <br>with validation and resource limits<br> <li> Integrated tenant security via mTLS certificate parsing and <br>multi-tenant isolation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9f">+1013/-0</a></td> </tr> <tr> <td> <details> <summary><strong>tenant_registry.ex</strong><dd><code>Multi-tenant process registry with Horde infrastructure</code>&nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex <ul><li>Implemented multi-tenant process registry using Horde for distributed <br>process discovery and supervision<br> <li> Created per-tenant Horde Registry and DynamicSupervisor for process <br>isolation<br> <li> Implemented slug-to-UUID mapping via ETS table for admin/debug lookups<br> <li> Added convenience functions for registering and managing pollers, <br>agents, and gateways<br> <li> Implemented tenant infrastructure lifecycle management with lazy <br>initialization</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-91248b3b128a2e3d9bea6ffdb5e0f295e4a1745e82f87687c640ad01416fb85d">+682/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>agent_tracker.ex</strong><dd><code>Agent tracking with ETS and stale detection</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/agent_tracker.ex <ul><li>Implemented in-memory agent tracking using ETS with TTL-based stale <br>detection<br> <li> Added agent status tracking with metadata (service count, partition, <br>source IP)<br> <li> Implemented periodic cleanup of stale agents (24-hour threshold)<br> <li> Added PubSub broadcast for UI updates on agent status changes</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-792a06febe706658f0bd382113e31d8070d15c477202d51f1699e77c8317001a">+177/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>agent.ex</strong><dd><code>Agent resource with state machine and capability management</code></dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex <ul><li>New Ash resource for managing Go agents with OCSF v1.4.0 compliance<br> <li> Implements agent lifecycle state machine with transitions (connecting, <br>connected, degraded, disconnected, unavailable)<br> <li> Defines agent capabilities (ICMP, TCP, HTTP, gRPC, DNS, Process, SNMP) <br>with metadata and icons<br> <li> Provides JSON API routes for agent registration, connection <br>management, and status updates<br> <li> Includes calculated fields for display name, online status, status <br>color, and gRPC endpoint</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843a">+649/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>onboarding_packages.ex</strong><dd><code>Onboarding packages context with token and certificate management</code></dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex <ul><li>Context module for edge onboarding package operations with CRUD <br>functionality<br> <li> Implements token generation, encryption, and verification for join and <br>download tokens<br> <li> Provides package delivery workflow with state transitions (issued, <br>delivered, activated, revoked)<br> <li> Integrates tenant CA certificate generation for secure component <br>onboarding<br> <li> Includes soft-delete and revocation capabilities with audit event <br>recording</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e4fe8e19bc324416302bb4c962f57133b3f62eb82053766844d881c522a473e5">+606/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>alias_events.ex</strong><dd><code>Device alias event tracking and lifecycle management</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex <ul><li>Device alias lifecycle event tracking module ported from Go core<br> <li> Detects and processes alias changes (service IDs, IPs, collector IPs) <br>from device metadata<br> <li> Builds lifecycle events for audit and alerting with change detection <br>logic<br> <li> Manages <code>DeviceAliasState</code> resource with sighting counts and state <br>transitions<br> <li> Provides metadata parsing and comparison utilities for alias records</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28">+650/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>generator.ex</strong><dd><code>X.509 certificate generation for tenant CAs and components</code></dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex <ul><li>X.509 certificate generation for per-tenant CAs and edge components<br> <li> Generates tenant intermediate CAs (10-year validity) signed by <br>platform root CA<br> <li> Creates edge component certificates (1-year validity) with SPIFFE IDs <br>and SANs<br> <li> Provides certificate parsing, CN extraction, and SPKI SHA-256 <br>computation<br> <li> Uses Erlang <code>:public_key</code> module for all cryptographic operations</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-b48e4a9e1189da61e2a60e16f56fce81298d76b7cdab745107140fed3f6e48b4">+541/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>spiffe.ex</strong><dd><code>SPIFFE/SPIRE integration for cluster authentication</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/spiffe.ex <ul><li>SPIFFE/SPIRE integration module for distributed cluster authentication<br> <li> Provides SSL/TLS options for ERTS distribution with SPIFFE certificate <br>verification<br> <li> Parses and validates SPIFFE IDs with trust domain verification<br> <li> Monitors certificate files for rotation and provides expiry <br>information<br> <li> Supports both filesystem and SPIRE Workload API modes for certificate <br>loading</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-0cb8d921c19f671b66f91c0978e351e71d927c5f4694924984c9f1ed34d7ee78">+564/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>account_client.ex</strong><dd><code>NATS Account Client for Tenant Isolation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex <ul><li>New gRPC client module for NATS account management with tenant <br>isolation support<br> <li> Implements account creation, user credential generation, JWT signing, <br>and operator bootstrap<br> <li> Provides channel management with fallback to fresh connections and <br>SSL/TLS support<br> <li> Includes helper functions for building protobuf request structures <br>(limits, permissions, mappings)</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554">+563/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>stats_aggregator.ex</strong><dd><code>Device Statistics Aggregator with Periodic Snapshots</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/core/stats_aggregator.ex <ul><li>New GenServer module for periodic device statistics aggregation and <br>caching<br> <li> Implements canonical record selection, device filtering, and <br>per-partition statistics<br> <li> Tracks device availability, activity windows, and capability <br>breakdowns (ICMP, SNMP, Sysmon)<br> <li> Includes telemetry recording, alert handler integration, and snapshot <br>management</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-1f4ac8290be7d27cac0ed660e51a9b3b23a219a6bb43b3735f3c5a9768321031">+626/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>device.ex</strong><dd><code>Device Actor for Runtime State Management</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/actors/device.ex <ul><li>New GenServer-based device actor for in-memory state management and <br>caching<br> <li> Implements identity resolution, event buffering, health status <br>tracking with state machine<br> <li> Provides lazy initialization, Horde registry integration, and <br>automatic hibernation<br> <li> Includes event flushing, health check recording, and PubSub <br>broadcasting</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-eba1d95a852e4a736813c7b486da651704f20718e24f931c966ff3f37c421eea">+608/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>gateway.ex</strong><dd><code>Gateway Resource with State Machine and Job Orchestration</code></dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/infrastructure/gateway.ex <ul><li>New Ash resource for gateway (job orchestrator) management with state <br>machine<br> <li> Implements status transitions (healthy, degraded, offline, recovering, <br>maintenance, draining)<br> <li> Provides registration, heartbeat, and health tracking with tenant <br>isolation<br> <li> Includes JSON API routes and policy-based authorization</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-1655f7cc80c8949b3d2c9e50d20110cb85883413fb6181825bfbcf0bda578c6b">+544/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>client.ex</strong><dd><code>DataService KV Client with Reconnection and mTLS</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/data_service/client.ex <ul><li>New GenServer-based gRPC client for datasvc KV store operations<br> <li> Implements put, get, delete, list_keys, and put_many operations with <br>automatic reconnection<br> <li> Provides exponential backoff reconnection strategy and mTLS <br>certificate support<br> <li> Handles gun connection lifecycle events and ensures connection before <br>operations</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-503e195ad79e05e12d7ad03a675f6e35ffdfc201b8571b0d30a220fe036e03a1">+507/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>3 files</summary><table> <tr> <td> <details> <summary><strong>message_processor.rs</strong><dd><code>Test configuration update for NATS credentials</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/message_processor.rs - Added `nats_creds_file: None` field to test configuration struct </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-9fcbc5358a9009e60a8cd22d21e5a9ea652787c727732d0b869e0865495114c3">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>agent_registry_test.exs</strong><dd><code>Agent registry tests with multi-tenant isolation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/test/serviceradar/registry/agent_registry_test.exs <ul><li>Comprehensive test suite for <code>AgentRegistry</code> with 362 lines of test <br>cases<br> <li> Tests agent registration with gRPC connection details and metadata <br>storage<br> <li> Verifies multi-tenant isolation for registry operations and lookups<br> <li> Tests capability-based and partition-based agent discovery<br> <li> Includes heartbeat, unregister, and count operations</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e46695d8be4c0156bea5a577432628ea11a106208eaae749e30ca71d6ec42b47">+362/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>event_publisher_test.exs</strong><dd><code>Event Publisher Tests for Infrastructure Events</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/test/serviceradar/infrastructure/event_publisher_test.exs <ul><li>New test module for infrastructure event publishing functionality<br> <li> Tests event structure building for state changes, registration, <br>deregistration, and health changes<br> <li> Verifies mandatory field requirements and supported entity/event types<br> <li> Mocks NATS connection to avoid requiring running NATS server in unit <br>tests</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-aedb73cf51479f3bd752152338a3a98c595d4ef48d36139eee372149c134975b">+143/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration</strong></td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary><strong>20251224184523_initial_schema.exs</strong><dd><code>Initial database schema migration for ServiceRadar core</code>&nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/priv/repo/migrations/20251224184523_initial_schema.exs <ul><li>Created comprehensive database schema with tables for tenants, users, <br>devices, agents, service checks, and monitoring<br> <li> Defined relationships between core entities (devices, agents, pollers, <br>partitions)<br> <li> Added support for edge onboarding packages, monitoring events, and <br>alerts<br> <li> Implemented multi-tenant isolation with tenant_id foreign keys across <br>all tables</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a95104ae5125c32fa35bb828aa9a716f5129b8052d83d73901d3bf8a5dc3502a">+733/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Dependencies</strong></td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary><strong>monitoring.pb.ex</strong><dd><code>Protobuf Definitions for Monitoring Services</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/proto/monitoring.pb.ex <ul><li>Auto-generated protobuf definitions for monitoring service messages<br> <li> Defines request/response structures for agent, poller, and gateway <br>status reporting<br> <li> Includes sweep completion tracking, chunked streaming, and agent <br>configuration<br> <li> Supports agent hello handshake with capabilities and heartbeat <br>intervals</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-c4072f56f73e61992b2401a13598f9edcf11b255ba343c79cc629782b0b0dca0">+398/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Additional files</strong></td><td><details><summary>101 files</summary><table> <tr> <td><strong>.bazelignore</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a5641cd37d6ad98b32cdfce1980836cc68312277bc6a7052f55da02ada5bc6cf">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>.bazelrc</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-544556920c45b42cbfe40159b082ce8af6bd929e492d076769226265f215832f">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>.env.example</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a3046da0d15a27e89f2afe639b25748a7ad4d9290af3e7b1b6c1a5533c8f0a8c">+38/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>AGENTS.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a54ff182c7e8acf56acfd6e4b9c3ff41e2c41a31c9b211b2deb9df75d9a478f9">+171/-2</a>&nbsp; </td> </tr> <tr> <td><strong>MODULE.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-6136fc12446089c3db7360e923203dd114b6a1466252e71667c6791c20fe6bdc">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>Makefile</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README-Docker.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-9fd61d24482efe68c22d8d41e2a1dcc440f39195aa56e7a050f2abe598179efd">+16/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-884fa9353a5226345e44fbabea3300efc7a87dfbcde0b6a42521ca51823f1b68">+11/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-0e80ea46aeb61a873324685edb96eae864c7a2004fbb7ee404b4ec951190ba10">+12/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>mix_release.bzl</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-86ec281f99363b6b6eb1f49e21d83b7eeca93a35b552b9f305fffc6855e38ccd">+120/-49</a></td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-143f8d1549d52f28906f19ce28e5568a5be474470ff103c2c1e63c3e6b08d670">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-5b1bc8fe77422534739bdd3a38dc20d2634a86c171265c34e1b5d0c5a61b6bab">+5/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>monitoring.proto</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-b56f709f4a0a3db694f2124353908318631f23e20b7846bc4b8ee869e2e0632a">+4/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>zen-consumer-with-otel.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-68375f1f7847e1fbdf75664f6be65b1ad94ae6ce86ed73fc5964d65054668acb">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>zen-consumer.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-4d308af9802a93a0f656e8c02a3b5fcd8991407bb18360f087470db74e1f9524">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-c62c0139ebdb337369f4067567cd2c52b8e7decb3ddfabc77f9f67b2f6e5789c">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-f425b4378f84e0ba0c6f532facff17ff5d55b4dc6033d8bf35130a159cd2ba32">+9/-12</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>flowgger.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-af9f49f931e282dca53d1f0521b036d222fe671f77e61a876a84cf4c6d7cca4d">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-c64b9ace832b8ea57a2be62f84166e03bb1904882635d444ec76a880cdf14cc0">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-4f5d2ea4260d490a0d6f28adde0b35eca8af77d22f3ee366a783946c53687619">+0/-25</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-bcac20d6b3cb81f0059e766839ba1ee59a885009249501b0ba1182ebb1daea25">+0/-77</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>main.go</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-78dc6bc53f1c760c66f43ff5f486bfe78a65bee8b2e0d4862293ec0892da2b29">+0/-123</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.elx.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-9562070d7ad4a3e9b2d06567008cf35de1d96448d914b3b45bf6c36d97cdd914">+108/-0</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.spiffe.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-603fd9e7d40841d174f26b95d0cb0c9537430bf3f7a5da3ccbba4ea3d8ac66c9">+8/-55</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>docker-compose.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3">+298/-280</a></td> </tr> <tr> <td><strong>Dockerfile.agent-gateway</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-332bc81a932ae08efa711a71b60fe0954d99bf17ebdab00a3baaa177a44de8b0">+94/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.core-elx</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-5ec7a971285669999af442a0c7f141c34f7fd9180257307f5c4ed12f789a2182">+108/-0</a>&nbsp; </td> </tr> <tr> <td><strong>Dockerfile.poller-elx</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-966cdcf55b7defa4f9fd83a15c55a0e1106430e6e20926c1181187e95ed8d87e">+95/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.sync</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-0227933b9961fd553af1d229e89d71a0271fdc475081bbcef49b587941af1eda">+0/-95</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.web-ng</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-92d43af1965575d56c3380ecc8a81024aac2ff36f039ec2d3839e9fc7852bc10">+6/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent-minimal.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-1f09fad94636c90373af8e270f6ba0332ae4f4d1df50a4909729280a3a9691e6">+6/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-5d33fe703515d03076d31261ecf946e9c6fc668cf5bf65099d49b670739e455e">+5/-20</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-008f2216f159a9bd5db9cc90baaf6f1e64487df7af05b56ab3b9d6c4946aa95f">+7/-10</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>.gitkeep</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-d72c41aab2d6f2c230a4340dfefe7917cdd12bed942c825aa0d4c9875a637bac">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>datasvc.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-3f2719d3dbfe042e8383739e3c78e74e5f851a44e5e46bea8e79c4b79fdcc34f">+3/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>datasvc.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-3a45619e57f1e6e9a31486ec7fffb33ef246e271f82bac272ee0a946b88da70a">+14/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>db-event-writer.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-7a33f95f7545499abf0ed9fc91b58499ab209639e4885019579c959583fc7496">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>entrypoint-certs.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-83d6800b184a5233c66c69766286b0a60fece1bc64addb112d9f8dc019437f05">+12/-8</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>entrypoint-sync.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-9d5620b8e6833309dbafb8ee6b6b75c3b942d163c3fe7f1a9827958b2d640265">+0/-96</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>flowgger.docker.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-824f8797b418d4b9f5ea41e4a3741a0ed64b881f343072464489a76b7ea01008">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate-certs.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-8298241543b4744a6ac7780c760ac5b5a0a87ba62de19c8612ebe1aba0996ebd">+217/-12</a></td> </tr> <tr> <td><strong>nats.docker.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-06f2012494f428fe1bfb304972061c2094e0d99da88ba9af6914f7776872e6eb">+16/-160</a></td> </tr> <tr> <td><strong>netflow-consumer.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-f15920e8498a24f71ce3eec4f48fe8fefbb1765a90362998af779a660fcef9e1">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.docker.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-d4af38790e3657b7589cd37a7539d5308b032f11caba7aa740ddc86bf99f4415">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>pg_hba.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-7bd5f7292054916c7e5997f4c84ac9ec07d4c945621a48936c2aed0575fb96eb">+9/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>pg_ident.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e7b8ce062e32c61fdc3bcc9e525c1f1df1c8008fbc02b11409e58c67baa17cc5">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>bootstrap-compose-spire.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-ca219a124d4c95ee7995764d7e0c322b4bfe59e357b7bcb42bc5d7c8b9b0af0d">+0/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.core.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-08d49d8621b581d1a9aa5c456f61e8c5774e021083c982cbb514019f915a1701">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.gateway.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-4a43a8290d45ac68592000e7ef51afe78b4213090155bd42aafb46e66130f7ae">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.poller.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-3373e5d02bd9a19c75819948cf829086dc9c02f41f30c78759beb7038c428d3e">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.web.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-cef5be462ddb059fdfdeb9fd7c5cd70e656c4cd8b6ae1fe3fe312557b3da80ac">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-4237fcee4f33a230abf28e12e8d4823499d163759cd1ff124fec1c62faa8b8b4">+0/-71</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-c652c07f7127be5b2932d92e6ef4c7448c544d1f3095cb96a03294fa58fd3c4c">+0/-75</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>trapd.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-1ab1a0e03e63bc02e0ef31992a7187a377927272ed2060150b40d44cc0ea3357">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>zen.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e060a3164cdc2746e0d9ad000fcf43c4bcdb05f4a41c586d7220e2ff2a7df01d">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-0e4db31c224a8f72ae8e870a849e38a59d74a2c7f7b04347b0b3eb07e20c5a80">+80/-42</a>&nbsp; </td> </tr> <tr> <td><strong>push_targets.bzl</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-4af33fe62caba04b6d479589c16cfb85babc39bae5c92595d4d4e31660738513">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>SBOM.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-1747f3c15088e6b4a3ff7fb1fdec9c2ef15ac865ee991b26ace892deba984eae">+0/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>agents.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-af8d04277f2353629065b0cc5fad3e44bd3e7c20339bd125e0812104bdbeff28">+8/-9</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>architecture.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37">+100/-26</a></td> </tr> <tr> <td><strong>armis.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-3e5c728550851ae1cfb8754eb8a1566d8084ebc58646c73f7c29ac6f7d60adbf">+7/-21</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ash-api.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-1cb48a12148688dc640f42675d0b3c2458ecc40175f4ae7eef4b07bdc2ede3a3">+305/-0</a>&nbsp; </td> </tr> <tr> <td><strong>ash-authentication.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e3d22dc8d6917661b980dd9c7e08bdc07fad4d225b10c60885b091a9cdd20425">+244/-0</a>&nbsp; </td> </tr> <tr> <td><strong>ash-authorization.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-72abeea2439e2281b6f998db2d11a73fe3ab393898125c466f0b16e7a49d3088">+283/-0</a>&nbsp; </td> </tr> <tr> <td><strong>ash-domains.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-bc78325c5ea4c6839212536a43336a231175450738af14f4173be663acf2fe49">+223/-0</a>&nbsp; </td> </tr> <tr> <td><strong>ash-migration-guide.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-e730373f07f6656d00327de02b4aa64e13809aadb3bf5478a4c60041ee431ad8">+339/-0</a>&nbsp; </td> </tr> <tr> <td><strong>auth-configuration.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-92b481e02c4edb9a178f51f2a8cb94108c7bcb1514a4629d8c67d8f20a70e852">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>cluster.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-0d2e32a530d31d08b311213cd036c14e611d2f9a31e9cbcaa97f02b53edeeb44">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>discovery.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a7f02a003a0e9aaf09d6aa072689ca8ad8b7b298ee014c6ffb878d2a5ecbab28">+19/-20</a>&nbsp; </td> </tr> <tr> <td><strong>docker-setup.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-8604269dffb3ce4133e48cab374ca8e97745d0efbdef67cad792aeb5945fe5ec">+57/-14</a>&nbsp; </td> </tr> <tr> <td><strong>edge-agents.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-df8f4b6b6cd8fb926f9204b2bab85ec4f393286d1b25d8a62d2debd357dd651d">+253/-0</a>&nbsp; </td> </tr> <tr> <td><strong>edge-onboarding.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-f32fbefcc4dd23b0f2146015630a411099012e8bf56ea7b9d52ae8b2a83ac3e3">+316/-273</a></td> </tr> <tr> <td><strong>identity_drift_monitoring.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-963ddfdc868c8777f9004744f5d2c5ec3ce585b81d1c0de2269b9d7a73942a9f">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>installation.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-19fd876fb56da88a1becb80817b52db1a40e8491641d547913b93bb7ecda3d22">+13/-14</a>&nbsp; </td> </tr> <tr> <td><strong>intro.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-5ddf6bcf52fe8ef8dfc865f5f2295616d5c073d899766372e5929cc0fd479ffd">+5/-5</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>kv-configuration.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-24c06e4806fa0ec5508ae5aeab9fbe82be7122af9933e6b7970032576cf5f6f8">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>netbox.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-ee0f0a76b85416a7c6e086eca04b7991c176075816b51f442ddb94ea6506dc68">+4/-23</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>netflow.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a64260e87bce20ed3bd93c7b64dac6ff175cdc7c67071276601aa9f6d370d634">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>onboarding-review-2025.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-1728c73ae8e684d13b7b166b90cbfc5363c0794dcd63f7fa3733e1c624940e98">+0/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>rperf-monitoring.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-70d068dec062c5ed771e7294401c52d55e85e279e9eee4f1084ca06c23b2fde8">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>compose-mtls-sysmonosx.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-6faf517c376b3509d0b14235f1b674a825f0399cd8a20791d8caf3b6ef16525b">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>docker-compose-login-500.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-d3ec1ba25f119cff4efcb5767a1afe923b92fd5727453c6276030bc8e9923f05">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>security-architecture.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-49e33c2fc609030af8a64af34016a22e9e86c0f3781acec5a9e6d0f7ccf5dc09">+238/-0</a>&nbsp; </td> </tr> <tr> <td><strong>self-signed.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-92747f4c31b991032f6920205d2f9751d89568323d4c65f5d1bd2406beb7a28a">+32/-107</a></td> </tr> <tr> <td><strong>service-port-map.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-81eabc52b9d300cbb6414b5f1f55cecb85adfe891bdb58cf138ce4dc115c48ee">+3/-5</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>snmp.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-5d2a72888517a4c731287280253a29cedbcbe595f1ce57c2fa14aef37c62595a">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>spiffe-identity.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-6a3f0bdce99eb15d1075805cb5101594398f81ad92aa4eea9b27bb7545098db1">+52/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>srql-service.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-9b2d1be11daf488059f9debfbdd2b5ebe51286a7e2c8b0d134112a6c51132ba6">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-a498de557ae883d480609bf07629a7eee78ec86b5cf5ee966508e75713973a9e">+75/-482</a></td> </tr> <tr> <td><strong>syslog.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-8efe898141f64a74b80133318b56ed95d03703ac95b47cebc95978b7ef761a4a">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>troubleshooting-guide.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-343b87dd0b430b3f99b16d32200c353bb6e3d7bbb185da3c1b3effc3a03e7f2a">+53/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>web-ui.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-7e4dc17ccc5b2fd5773b7e6c9583dbe0f597ab9ae21849a2ec0a31554a6337a3">+31/-106</a></td> </tr> <tr> <td><strong>datasvc.ex</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-00c928e77e8bf2f741a35e42e304acd2f2d4b8546532a5125a717924f2bde5dc">+32/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-aa8f6a4e89041a3423f2299efab4bb1b5d13358d445b3c6e35cc46f527dbaa7b">+30/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.exs</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-abaac0167f05b505cf4bc5c5d375ae769001d8cc20120919941748c79681bbae">+13/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>dev.exs</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-378e53cb66373073730a5d7ceba4973777eb84ae419c8a38472e3f69c3056fe9">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>prod.exs</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-5fdc859dfda96a02326392f61218325913f22f9f0c75a1276ee940d5db9fc62b">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>runtime.exs</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-842568fafa717a8064203543d674517fd28ad7dd2a4d3f0f157d274cfda4f18b">+209/-0</a>&nbsp; </td> </tr> <tr> <td><strong>test.exs</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-00c8b58b49024f5d6f37ba40ac310a571e5a1b47fc16819423dc42f3035bc175">+20/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent_client.ex</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-82bf77f47ac3b0001b036f0f22944d07bf894f1ee07b912e0ecab959eb8d5c53">+472/-0</a>&nbsp; </td> </tr> <tr> <td><strong>Additional files not shown</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2225/files#diff-2f328e4cd8dbe3ad193e49d92bcf045f47a6b72b1e9487d366f6b8288589b4ca"></a></td> </tr> </table></details></td></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2026-01-05 21:18:37 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2225#issuecomment-3712121318
Original created: 2026-01-05T21:18:37Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Sensitive info exposure

Description: The new logging emits potentially sensitive operational details (e.g.,
AllowedClientIdentities, resolver/system creds file paths, and operator name) which could
aid attackers or leak internal identity/credential placement if logs are exposed; similar
creds-path logging was also added in cmd/otel/src/nats_output.rs and
cmd/otel/src/setup.rs.
main.go [82-141]

Referred Code
// Initialize NATS account service
// This service is stateless - it only holds the operator key for signing operations.
// Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption.
// The service can start without an operator and bootstrap later via gRPC.
var natsAccountServer *datasvc.NATSAccountServer
if cfg.NATSOperator != nil {
	operator, opErr := accounts.NewOperator(cfg.NATSOperator)
	if opErr != nil {
		// Operator not available yet - that's okay, bootstrap will be called later
		log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr)
		natsAccountServer = datasvc.NewNATSAccountServer(nil)
	} else {
		natsAccountServer = datasvc.NewNATSAccountServer(operator)
		log.Printf("NATS account service initialized with operator %s", operator.Name())
	}

	natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities)
	if len(cfg.NATSOperator.AllowedClientIdentities) == 0 {
		log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected")
	} else {
		log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities)


 ... (clipped 39 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Potential panic unwrap: New config parsing uses as_str().unwrap() which can panic on unexpected config types
rather than returning a handled error with context.

Referred Code
let creds_file = cfg.lookup("output.nats_creds_file").and_then(|v| {
    let value = v.as_str().unwrap().trim();
    if value.is_empty() {
        None
    } else {
        Some(PathBuf::from(value))
    }
});

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit context: The newly added NATS account service initialization implies privileged
signing/bootstrapping operations but the diff does not show corresponding audit logging
that captures who performed such actions and their outcomes.

Referred Code
// Initialize NATS account service
// This service is stateless - it only holds the operator key for signing operations.
// Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption.
// The service can start without an operator and bootstrap later via gRPC.
var natsAccountServer *datasvc.NATSAccountServer
if cfg.NATSOperator != nil {
	operator, opErr := accounts.NewOperator(cfg.NATSOperator)
	if opErr != nil {
		// Operator not available yet - that's okay, bootstrap will be called later
		log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr)
		natsAccountServer = datasvc.NewNATSAccountServer(nil)
	} else {
		natsAccountServer = datasvc.NewNATSAccountServer(operator)
		log.Printf("NATS account service initialized with operator %s", operator.Name())
	}

	natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities)
	if len(cfg.NATSOperator.AllowedClientIdentities) == 0 {
		log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected")
	} else {
		log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities)


 ... (clipped 57 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Sensitive config in logs: The new logging statements print potentially sensitive security configuration details
(e.g., allowed client identities and credential/config file paths) which may be considered
sensitive operational information in some environments.

Referred Code
// Initialize NATS account service
// This service is stateless - it only holds the operator key for signing operations.
// Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption.
// The service can start without an operator and bootstrap later via gRPC.
var natsAccountServer *datasvc.NATSAccountServer
if cfg.NATSOperator != nil {
	operator, opErr := accounts.NewOperator(cfg.NATSOperator)
	if opErr != nil {
		// Operator not available yet - that's okay, bootstrap will be called later
		log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr)
		natsAccountServer = datasvc.NewNATSAccountServer(nil)
	} else {
		natsAccountServer = datasvc.NewNATSAccountServer(operator)
		log.Printf("NATS account service initialized with operator %s", operator.Name())
	}

	natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities)
	if len(cfg.NATSOperator.AllowedClientIdentities) == 0 {
		log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected")
	} else {
		log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities)


 ... (clipped 39 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2225#issuecomment-3712121318 Original created: 2026-01-05T21:18:37Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/799e7d50f201d477aaefbade23c64d07f056086e --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=1>⚪</td> <td><details><summary><strong>Sensitive info exposure </strong></summary><br> <b>Description:</b> The new logging emits potentially sensitive operational details (e.g., <br><code>AllowedClientIdentities</code>, resolver/system creds file paths, and operator name) which could <br>aid attackers or leak internal identity/credential placement if logs are exposed; similar <br>creds-path logging was also added in <code>cmd/otel/src/nats_output.rs</code> and <br><code>cmd/otel/src/setup.rs</code>.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2225/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R82-R141'>main.go [82-141]</a></strong><br> <details open><summary>Referred Code</summary> ```go // Initialize NATS account service // This service is stateless - it only holds the operator key for signing operations. // Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption. // The service can start without an operator and bootstrap later via gRPC. var natsAccountServer *datasvc.NATSAccountServer if cfg.NATSOperator != nil { operator, opErr := accounts.NewOperator(cfg.NATSOperator) if opErr != nil { // Operator not available yet - that's okay, bootstrap will be called later log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr) natsAccountServer = datasvc.NewNATSAccountServer(nil) } else { natsAccountServer = datasvc.NewNATSAccountServer(operator) log.Printf("NATS account service initialized with operator %s", operator.Name()) } natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities) if len(cfg.NATSOperator.AllowedClientIdentities) == 0 { log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected") } else { log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities) ... (clipped 39 lines) ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=3>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=1>🔴</td> <td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2225/files#diff-a82e2e4d413539bf0b414b5629665b19648447523994cba639c4d1238aa5a0c1R86-R93'><strong>Potential panic unwrap</strong></a>: New config parsing uses <code>as_str().unwrap()</code> which can panic on unexpected config types <br>rather than returning a handled error with context.<br> <details open><summary>Referred Code</summary> ```rust let creds_file = cfg.lookup("output.nats_creds_file").and_then(|v| { let value = v.as_str().unwrap().trim(); if value.is_empty() { None } else { Some(PathBuf::from(value)) } }); ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=2>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2225/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R82-R159'><strong>Missing audit context</strong></a>: The newly added NATS account service initialization implies privileged <br>signing/bootstrapping operations but the diff does not show corresponding audit logging <br>that captures who performed such actions and their outcomes.<br> <details open><summary>Referred Code</summary> ```go // Initialize NATS account service // This service is stateless - it only holds the operator key for signing operations. // Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption. // The service can start without an operator and bootstrap later via gRPC. var natsAccountServer *datasvc.NATSAccountServer if cfg.NATSOperator != nil { operator, opErr := accounts.NewOperator(cfg.NATSOperator) if opErr != nil { // Operator not available yet - that's okay, bootstrap will be called later log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr) natsAccountServer = datasvc.NewNATSAccountServer(nil) } else { natsAccountServer = datasvc.NewNATSAccountServer(operator) log.Printf("NATS account service initialized with operator %s", operator.Name()) } natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities) if len(cfg.NATSOperator.AllowedClientIdentities) == 0 { log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected") } else { log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities) ... (clipped 57 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2225/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R82-R141'><strong>Sensitive config in logs</strong></a>: The new logging statements print potentially sensitive security configuration details <br>(e.g., allowed client identities and credential/config file paths) which may be considered <br>sensitive operational information in some environments.<br> <details open><summary>Referred Code</summary> ```go // Initialize NATS account service // This service is stateless - it only holds the operator key for signing operations. // Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption. // The service can start without an operator and bootstrap later via gRPC. var natsAccountServer *datasvc.NATSAccountServer if cfg.NATSOperator != nil { operator, opErr := accounts.NewOperator(cfg.NATSOperator) if opErr != nil { // Operator not available yet - that's okay, bootstrap will be called later log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr) natsAccountServer = datasvc.NewNATSAccountServer(nil) } else { natsAccountServer = datasvc.NewNATSAccountServer(operator) log.Printf("NATS account service initialized with operator %s", operator.Name()) } natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities) if len(cfg.NATSOperator.AllowedClientIdentities) == 0 { log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected") } else { log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities) ... (clipped 39 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2026-01-05 21:20:36 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2225#issuecomment-3712126644
Original created: 2026-01-05T21:20:36Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use correct attribute for timestamp

Correct the attribute name from :last_seen_time to :last_seen_at in the
persist_identity function to prevent device update failures.

elixir/serviceradar_core/lib/serviceradar/actors/device.ex [503-523]

 defp persist_identity(tenant_id, device_id, identity) do
   case DeviceResource.get_by_uid(device_id, tenant: tenant_id, authorize?: false) do
     {:ok, device} ->
       updates =
         identity
         |> Map.take([:name, :hostname, :ip, :mac, :vendor_name, :model, :os])
-        |> Map.put(:last_seen_time, DateTime.utc_now())
+        |> Map.put(:last_seen_at, DateTime.utc_now())
 
       case Ash.update(device, updates, authorize?: false) do
         {:ok, _} -> :ok
         {:error, error} -> {:error, error}
       end
 
     {:error, %Ash.Error.Query.NotFound{}} ->
       # Device doesn't exist yet, nothing to persist
       :ok
 
     {:error, error} ->
       {:error, error}
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: This suggestion identifies a critical bug where an incorrect attribute name (:last_seen_time instead of :last_seen_at) would cause all device identity updates to fail. Fixing this is essential for the core functionality of the device actor.

High
Fix race condition in shutdown logic
Suggestion Impact:The commit introduces a shared shutdownCtx with a timeout and uses it in the shutdown goroutine and the final select (replacing the separate timer), matching the suggested approach to avoid premature timeout races. It also extends the idea by passing shutdownCtx into pushLoop.Stop and guarding the errChan wait with shutdownCtx.

code diff:

 		// Bound shutdown so the process can't hang forever (includes pushLoop.Stop()).
 		const shutdownTimeout = 10 * time.Second
+		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), shutdownTimeout)
+		defer shutdownCancel()
 		shutdownDone := make(chan struct{})
 
 		go func() {
 			defer close(shutdownDone)
 
 			cancel()
-			pushLoop.Stop()
-			<-errChan
-
-			stopCtx, stopCancel := context.WithTimeout(context.Background(), shutdownTimeout)
-			defer stopCancel()
-			if err := server.Stop(stopCtx); err != nil {
+			if err := pushLoop.Stop(shutdownCtx); err != nil {
+				log.Warn().Err(err).Msg("Push loop stop did not complete before timeout")
+			}
+			select {
+			case <-errChan:
+			case <-shutdownCtx.Done():
+				return
+			}
+
+			if err := server.Stop(shutdownCtx); err != nil {
 				log.Error().Err(err).Msg("Error stopping agent services")
 			}
 		}()
-
-		timer := time.NewTimer(shutdownTimeout)
-		defer timer.Stop()
 
 		select {
 		case <-shutdownDone:
-		case <-timer.C:
+		case <-shutdownCtx.Done():
 			return fmt.Errorf("%w after %s", errShutdownTimeout, shutdownTimeout)

Refactor the shutdown logic to use a single parent context with a timeout for
all shutdown operations, preventing a race condition where the main goroutine
could time out prematurely.

cmd/agent/main.go [201-226]

 		// Bound shutdown so the process can't hang forever (includes pushLoop.Stop()).
 		const shutdownTimeout = 10 * time.Second
+		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), shutdownTimeout)
+		defer shutdownCancel()
 		shutdownDone := make(chan struct{})
 
 		go func() {
 			defer close(shutdownDone)
 
 			cancel()
 			pushLoop.Stop()
 			<-errChan
 
-			stopCtx, stopCancel := context.WithTimeout(context.Background(), shutdownTimeout)
-			defer stopCancel()
-			if err := server.Stop(stopCtx); err != nil {
+			// Use the shutdownCtx for stopping the server to respect the overall timeout.
+			if err := server.Stop(shutdownCtx); err != nil {
 				log.Error().Err(err).Msg("Error stopping agent services")
 			}
 		}()
 
-		timer := time.NewTimer(shutdownTimeout)
-		defer timer.Stop()
-
 		select {
 		case <-shutdownDone:
-		case <-timer.C:
+			// Shutdown completed gracefully.
+		case <-shutdownCtx.Done():
 			return fmt.Errorf("%w after %s", errShutdownTimeout, shutdownTimeout)
 		}

[Suggestion processed]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies and fixes a race condition in the shutdown logic, ensuring all shutdown operations are bound by a single, shared timeout, which improves the agent's robustness.

Medium
Ensure state transitions are atomic
Suggestion Impact:The commit removed `require_atomic? false` from multiple state-transition update actions including `establish_connection` (and others like connection_failed, degrade, restore_health, etc.), aligning those transitions with atomic execution.

code diff:

@@ -282,7 +307,6 @@
     update :establish_connection do
       description "Mark agent as connected (from connecting state)"
       accept [:poller_id]
-      require_atomic? false
 
       change transition_state(:connected)
       change set_attribute(:is_healthy, true)
@@ -293,7 +317,6 @@
 
     update :connection_failed do
       description "Mark connection attempt as failed"
-      require_atomic? false
 
       change transition_state(:disconnected)
       change set_attribute(:modified_time, &DateTime.utc_now/0)
@@ -302,7 +325,6 @@
 
     update :degrade do
       description "Mark agent as degraded (connected but unhealthy)"
-      require_atomic? false
 
       change transition_state(:degraded)
       change set_attribute(:is_healthy, false)
@@ -312,7 +334,6 @@
 
     update :restore_health do
       description "Restore agent health (from degraded to connected)"
-      require_atomic? false
 
       change transition_state(:connected)
       change set_attribute(:is_healthy, true)
@@ -322,7 +343,6 @@
 
     update :lose_connection do
       description "Mark agent as disconnected (connection lost)"
-      require_atomic? false
 
       change transition_state(:disconnected)
       change set_attribute(:poller_id, nil)
@@ -332,7 +352,6 @@
 
     update :reconnect do
       description "Start reconnection process (from disconnected to connecting)"
-      require_atomic? false
 
       change transition_state(:connecting)
       change set_attribute(:modified_time, &DateTime.utc_now/0)
@@ -342,7 +361,6 @@
     update :mark_unavailable do
       description "Mark agent as unavailable (admin action)"
       argument :reason, :string
-      require_atomic? false
 
       change transition_state(:unavailable)
       change set_attribute(:is_healthy, false)
@@ -352,7 +370,6 @@
 
     update :recover do
       description "Start recovery process (from unavailable to connecting)"
-      require_atomic? false
 
       change transition_state(:connecting)
       change set_attribute(:modified_time, &DateTime.utc_now/0)
@@ -363,7 +380,6 @@
     update :connect do
       description "Mark agent as connected to a poller (legacy - use establish_connection)"
       accept [:poller_id]
-      require_atomic? false
 
       change transition_state(:connected)
       change set_attribute(:is_healthy, true)
@@ -374,7 +390,6 @@
 
     update :disconnect do
       description "Mark agent as disconnected (legacy - use lose_connection)"
-      require_atomic? false
 
       change transition_state(:disconnected)
       change set_attribute(:poller_id, nil)
@@ -384,7 +399,6 @@
 
     update :mark_unhealthy do
       description "Mark agent as unhealthy (legacy - use degrade)"
-      require_atomic? false
 
       change transition_state(:degraded)
       change set_attribute(:is_healthy, false)

Remove require_atomic? false from all state-transition actions like
establish_connection to ensure atomicity and prevent data inconsistency. State
transitions are critical and should be transactional.

elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [270-292]

 update :heartbeat do
   description "Update last_seen_time and health status (for connected agents)"
   accept [:is_healthy, :capabilities]
   require_atomic? false
 
   change set_attribute(:last_seen_time, &DateTime.utc_now/0)
   change set_attribute(:modified_time, &DateTime.utc_now/0)
 end
 
 # State machine transition actions
 # Each action includes PublishStateChange to emit NATS events
 
 update :establish_connection do
   description "Mark agent as connected (from connecting state)"
   accept [:poller_id]
-  require_atomic? false
 
   change transition_state(:connected)
   change set_attribute(:is_healthy, true)
   change set_attribute(:last_seen_time, &DateTime.utc_now/0)
   change set_attribute(:modified_time, &DateTime.utc_now/0)
   change {ServiceRadar.Infrastructure.Changes.PublishStateChange, entity_type: :agent, new_state: :connected}
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that using require_atomic? false for state transition actions can lead to data inconsistency, which is a significant correctness issue. Ensuring atomicity for these critical operations is crucial for maintaining a valid state.

Medium
Combine create and update into one
Suggestion Impact:The commit removed the separate update call that set token fields after creation, and instead applies the token attributes directly on the create changeset (via force_change_attribute) before a single Ash.create, making the operation effectively atomic and avoiding inconsistent intermediate state.

code diff:

+    changeset =
+      OnboardingPackage
+      |> Ash.Changeset.for_create(:create, create_attrs,
+        actor: actor,
+        authorize?: authorize?,
+        tenant: tenant
+      )
+      |> Ash.Changeset.force_change_attribute(:join_token_ciphertext, join_token_ciphertext)
+      |> Ash.Changeset.force_change_attribute(:join_token_expires_at, join_expires)
+      |> Ash.Changeset.force_change_attribute(:download_token_hash, download_token_hash)
+      |> Ash.Changeset.force_change_attribute(:download_token_expires_at, download_expires)
+
+    case Ash.create(changeset) do
       {:ok, package} ->
-        # Update with token fields
-        token_attrs = %{
-          join_token_ciphertext: join_token_ciphertext,
-          join_token_expires_at: join_expires,
-          download_token_hash: download_token_hash,
-          download_token_expires_at: download_expires
-        }
-
-        case package
-             |> Ash.Changeset.for_update(:update_tokens, token_attrs,
-               actor: actor,
-               authorize?: authorize?,
-               tenant: tenant
-             )
-             |> Ash.update() do
-          {:ok, updated_package} ->
-            # Record creation event
-            OnboardingEvents.record(package.id, :created,
-              actor: get_actor_name(actor),
-              source_ip: source_ip
-            )
-
-            {:ok,
-             %{
-               package: updated_package,
-               join_token: join_token,
-               download_token: download_token
-             }}
-
-          {:error, error} ->
-            {:error, error}
-        end
+        # Record creation event
+        OnboardingEvents.record(package.id, :created,
+          actor: get_actor_name(actor),
+          source_ip: source_ip
+        )
+
+        {:ok,
+         %{
+           package: package,
+           join_token: join_token,
+           download_token: download_token
+         }}

Refactor the create function to be atomic by combining the "create-then-update"
logic into a single operation. Pass all attributes, including tokens, to the
:create action to prevent data inconsistency.

elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [156-199]

-case OnboardingPackage
-     |> Ash.Changeset.for_create(:create, create_attrs,
-       actor: actor,
-       authorize?: authorize?,
-       tenant: tenant
-     )
-     |> Ash.create() do
+# Encrypt join token and hash download token
+join_token_ciphertext = Crypto.encrypt(join_token)
+download_token_hash = Crypto.hash_token(download_token)
+
+# Prepare attributes for Ash create
+create_attrs =
+  attrs
+  |> Map.put(:created_by, get_actor_name(actor))
+  |> Map.put(:join_token_ciphertext, join_token_ciphertext)
+  |> Map.put(:join_token_expires_at, join_expires)
+  |> Map.put(:download_token_hash, download_token_hash)
+  |> Map.put(:download_token_expires_at, download_expires)
+
+case OnboardingPackage.create(create_attrs, actor: actor, authorize?: authorize?, tenant: tenant) do
   {:ok, package} ->
-    # Update with token fields
-    token_attrs = %{
-      join_token_ciphertext: join_token_ciphertext,
-      join_token_expires_at: join_expires,
-      download_token_hash: download_token_hash,
-      download_token_expires_at: download_expires
-    }
+    # Record creation event
+    OnboardingEvents.record(package.id, :created,
+      actor: get_actor_name(actor),
+      source_ip: source_ip
+    )
 
-    case package
-         |> Ash.Changeset.for_update(:update_tokens, token_attrs,
-           actor: actor,
-           authorize?: authorize?,
-           tenant: tenant
-         )
-         |> Ash.update() do
-      {:ok, updated_package} ->
-        # Record creation event
-        OnboardingEvents.record(package.id, :created,
-          actor: get_actor_name(actor),
-          source_ip: source_ip
-        )
-
-        {:ok,
-         %{
-           package: updated_package,
-           join_token: join_token,
-           download_token: download_token
-         }}
-
-      {:error, error} ->
-        {:error, error}
-    end
+    {:ok,
+     %{
+       package: package,
+       join_token: join_token,
+       download_token: download_token
+     }}
 
   {:error, error} ->
     {:error, error}
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out a non-atomic "create-then-update" pattern that could lead to inconsistent data. Consolidating this into a single atomic create operation is a critical improvement for data integrity.

Medium
Prevent race condition during updates
Suggestion Impact:The commit removed the separate conditional DeviceAliasState.confirm/2 call and instead passed confirm_threshold into DeviceAliasState.record_sighting/3, implying the confirmation decision is now handled within the sighting recording operation (moving toward the suggested atomic consolidation), though it did not introduce the specifically suggested atomic_sighting_and_confirm function or the suggested return-shape handling.

code diff:

     # Record the sighting
-    case DeviceAliasState.record_sighting(existing, %{}, actor: actor) do
+    case DeviceAliasState.record_sighting(
+           existing,
+           %{confirm_threshold: confirm_threshold},
+           actor: actor
+         ) do
       {:ok, updated} ->
-        # Check if we should confirm
-        if updated.state == :detected and updated.sighting_count >= confirm_threshold do
-          DeviceAliasState.confirm(updated, actor: actor)
-        end

Prevent a race condition in handle_existing_alias/4 by consolidating the logic
for recording a sighting and confirming the state into a single, atomic action
on the DeviceAliasState resource.

elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [438-474]

 defp handle_existing_alias(existing, _update, confirm_threshold, actor) do
   alias ServiceRadar.Identity.DeviceAliasState
 
-  # Record the sighting
-  case DeviceAliasState.record_sighting(existing, %{}, actor: actor) do
-    {:ok, updated} ->
-      # Check if we should confirm
-      if updated.state == :detected and updated.sighting_count >= confirm_threshold do
-        DeviceAliasState.confirm(updated, actor: actor)
-      end
+  # Atomically record sighting and potentially confirm
+  case DeviceAliasState.atomic_sighting_and_confirm(existing, %{confirm_threshold: confirm_threshold}, actor: actor) do
+    {:ok, %{data: %{state_changed: true, new_state: new_state, sighting_count: count}}} ->
+      # Generate event for state change
+      %{
+        device_id: existing.device_id,
+        partition: existing.partition || "",
+        action: "alias_state_changed",
+        reason: "state_transition",
+        timestamp: DateTime.utc_now(),
+        severity: "Info",
+        level: 6,
+        metadata: %{
+          "alias_type" => to_string(existing.alias_type),
+          "alias_value" => existing.alias_value,
+          "previous_state" => to_string(existing.state),
+          "new_state" => to_string(new_state),
+          "sighting_count" => to_string(count)
+        }
+      }
 
-      # Generate event if state changed
-      if updated.state != existing.state do
-        %{
-          device_id: existing.device_id,
-          partition: existing.partition || "",
-          action: "alias_state_changed",
-          reason: "state_transition",
-          timestamp: DateTime.utc_now(),
-          severity: "Info",
-          level: 6,
-          metadata: %{
-            "alias_type" => to_string(existing.alias_type),
-            "alias_value" => existing.alias_value,
-            "previous_state" => to_string(existing.state),
-            "new_state" => to_string(updated.state),
-            "sighting_count" => to_string(updated.sighting_count)
-          }
-        }
-      else
-        nil
-      end
+    {:ok, _} ->
+      # Sighting recorded, but no state change
+      nil
 
     {:error, _reason} ->
       nil
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a race condition in handle_existing_alias/4 due to non-atomic read-modify-write operations. Consolidating the sighting and confirmation logic into a single atomic action is crucial to prevent incorrect state transitions.

Medium
Conditionally enable SSL for gRPC

Conditionally enable SSL for the gRPC connection based on an environment
variable, allowing for both secure and insecure deployments.

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [487-518]

 defp create_fresh_channel do
   host = System.get_env("DATASVC_HOST", "datasvc")
   port = String.to_integer(System.get_env("DATASVC_PORT", "50057"))
+  ssl_enabled = System.get_env("DATASVC_SSL", "false") in ["true", "1", "yes"]
   cert_dir = System.get_env("DATASVC_CERT_DIR", "/etc/serviceradar/certs")
   cert_name = System.get_env("DATASVC_CERT_NAME", "core")
   server_name = System.get_env("DATASVC_SERVER_NAME", "datasvc.serviceradar")
 
   endpoint = "#{host}:#{port}"
 
-  ssl_opts = [
-    cacertfile: String.to_charlist(Path.join(cert_dir, "root.pem")),
-    certfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}.pem")),
-    keyfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}-key.pem")),
-    verify: :verify_peer,
-    server_name_indication: String.to_charlist(server_name)
-  ]
+  cred_opts =
+    if ssl_enabled do
+      ssl_opts = [
+        cacertfile: String.to_charlist(Path.join(cert_dir, "root.pem")),
+        certfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}.pem")),
+        keyfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}-key.pem")),
+        verify: :verify_peer,
+        server_name_indication: String.to_charlist(server_name)
+      ]
 
-  connect_opts = [
-    cred: GRPC.Credential.new(ssl: ssl_opts),
-    adapter_opts: [connect_timeout: 5_000]
-  ]
+      [cred: GRPC.Credential.new(ssl: ssl_opts)]
+    else
+      []
+    end
+
+  connect_opts =
+    cred_opts
+    |> Keyword.put(:adapter_opts, [connect_timeout: 5_000])
 
   case GRPC.Stub.connect(endpoint, connect_opts) do
     {:ok, channel} ->
       Logger.debug("Created fresh NATS account gRPC channel to #{endpoint}")
       {:ok, channel}
 
     {:error, reason} ->
       Logger.error("Failed to create NATS account gRPC channel: #{inspect(reason)}")
       {:error, {:connection_failed, reason}}
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the function hardcodes an mTLS connection, which contradicts the module's documentation stating it uses the same configuration as DataService.Client. Applying this change improves flexibility and aligns the component's behavior with other parts of the system, making it usable in non-SSL environments.

Low
Prevent misleading operator config log

Add a check to ensure operatorConfigPath is not empty before attempting to write
the NATS operator configuration to prevent misleading log messages.

cmd/data-services/main.go [117-130]

 		if operatorConfigPath != "" || resolverPath != "" {
 			natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath)
 			log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath)
 
 			// If operator is already initialized, write the config now
 			// This ensures config files exist even when datasvc restarts with an existing operator
-			if operator != nil {
+			if operator != nil && operatorConfigPath != "" {
 				if err := natsAccountServer.WriteOperatorConfig(); err != nil {
 					log.Printf("Warning: failed to write initial operator config: %v", err)
 				} else {
 					log.Printf("Wrote initial operator config to %s", operatorConfigPath)
 				}
 			}
 		}
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies a condition that leads to a misleading log message and proposes a simple fix, improving the accuracy of the application's logging.

Low
Ensure deterministic record selection order
Suggestion Impact:The commit adds `sorted_fallback = Enum.sort(fallback)` and updates the subsequent Enum.reduce to iterate over `sorted_fallback` instead of the unsorted map, making processing deterministic.

code diff:

     # Merge fallback records that don't have canonical equivalents
+    sorted_fallback = Enum.sort(fallback)
+
     {canonical, meta} =
-      Enum.reduce(fallback, {canonical, meta}, fn {key, device}, {can, m} ->
+      Enum.reduce(sorted_fallback, {canonical, meta}, fn {key, device}, {can, m} ->

Sort the fallback map by key before reducing it in select_canonical_records to
ensure deterministic device selection and prevent metric instability.

elixir/serviceradar_core/lib/serviceradar/core/stats_aggregator.ex [263-286]

 defp select_canonical_records(devices, meta) do
   # Group by canonical ID to deduplicate
   canonical = %{}
   fallback = %{}
 
   {canonical, fallback, meta} =
     Enum.reduce(devices, {canonical, fallback, meta}, fn device, {can, fall, m} ->
       process_device_record(device, can, fall, m)
     end)
 
   # Merge fallback records that don't have canonical equivalents
+  # Sort the fallback map to ensure deterministic processing
+  sorted_fallback = Enum.sort(fallback)
+
   {canonical, meta} =
-    Enum.reduce(fallback, {canonical, meta}, fn {key, device}, {can, m} ->
+    Enum.reduce(sorted_fallback, {canonical, meta}, fn {key, device}, {can, m} ->
       if Map.has_key?(can, key) do
         {can, %{m | skipped_non_canonical: m.skipped_non_canonical + 1}}
       else
         {Map.put(can, key, device),
          %{m | inferred_canonical_fallback: m.inferred_canonical_fallback + 1}}
       end
     end)
 
   selected = Map.values(canonical)
   {selected, meta}
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that iterating over a map is not order-guaranteed, which could lead to non-deterministic behavior and metric instability. Sorting the fallback records before processing is a valid improvement for ensuring consistent results.

Low
General
Safely parse integer env vars
Suggestion Impact:Updated get_env_int to use Integer.parse/1 and return nil for non-numeric or partially numeric environment variable values instead of raising.

code diff:

@@ -302,7 +302,11 @@
     case System.get_env(key) do
       nil -> nil
       "" -> nil
-      value -> String.to_integer(value)
+      value ->
+        case Integer.parse(value) do
+          {int, ""} -> int
+          _ -> nil
+        end

Replace String.to_integer/1 with the safer Integer.parse/1 in get_env_int to
prevent crashes from non-numeric environment variable values.

elixir/serviceradar_core/lib/serviceradar/data_service/client.ex [301-307]

 defp get_env_int(key) do
   case System.get_env(key) do
     nil -> nil
     "" -> nil
-    value -> String.to_integer(value)
+    value ->
+      case Integer.parse(value) do
+        {int, ""} -> int
+        _ -> nil
+      end
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that String.to_integer/1 will crash if the environment variable contains a non-numeric string. Using Integer.parse/1 makes the configuration loading more robust against invalid user input.

Medium
Use default limits struct

In build_limits, return a default Proto.AccountLimits{} struct instead of nil
when no limits are provided to ensure a valid protobuf message.

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [520]

-defp build_limits(nil), do: nil
+defp build_limits(nil), do: %Proto.AccountLimits{}

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that sending nil for a protobuf message field can cause issues. Returning a default, empty AccountLimits struct is safer and ensures the gRPC request is always well-formed, improving the client's robustness.

Medium
Handle invalid timestamps safely
Suggestion Impact:The commit replaced the bang version with DateTime.from_unix/1 and added pattern matching to return nil on error, preventing potential crashes from invalid UNIX timestamps.

code diff:

           expires_at =
             case response.expires_at_unix do
               0 -> nil
-              unix -> DateTime.from_unix!(unix)
+              unix ->
+                case DateTime.from_unix(unix) do
+                  {:ok, dt} -> dt
+                  {:error, _} -> nil
+                end

Replace DateTime.from_unix!/1 with the safer DateTime.from_unix/1 and handle
potential errors to prevent crashes from invalid timestamps.

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [174-178]

 expires_at =
   case response.expires_at_unix do
     0 -> nil
-    unix -> DateTime.from_unix!(unix)
+    unix ->
+      case DateTime.from_unix(unix) do
+        {:ok, dt} -> dt
+        {:error, _} -> nil
+      end
   end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: This suggestion correctly identifies a potential crash when parsing a UNIX timestamp from a gRPC response. Using the non-bang version DateTime.from_unix/1 and handling the error case makes the code more resilient to invalid data from the remote service.

Medium
Improve detection of trailing config data

Replace the flawed check for trailing JSON data with the more robust and
idiomatic dec.More() method to correctly handle whitespace and prevent invalid
configurations from being accepted.

cmd/agent/main.go [133-137]

-	if err := dec.Decode(&struct{}{}); err == nil {
+	if dec.More() {
 		return nil, fmt.Errorf("failed to parse config: %w", errConfigTrailingData)
-	} else if !errors.Is(err, io.EOF) {
-		return nil, fmt.Errorf("failed to parse config: %w", err)
 	}
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a subtle flaw in the JSON parsing logic and proposes using the idiomatic dec.More() function, which improves code correctness and readability.

Low
Include TLS1.2 cipher suites

Update the ciphers list to include suites for both TLS 1.3 and TLS 1.2. This
ensures that TLS 1.2 connections are supported, as currently only TLS 1.3 cipher
suites are configured.

elixir/serviceradar_core/lib/serviceradar/spiffe.ex [323]

-ciphers: :ssl.cipher_suites(:default, :"tlsv1.3")
+ciphers: :ssl.cipher_suites(:default, :"tlsv1.3") ++
+         :ssl.cipher_suites(:default, :"tlsv1.2")

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that while versions allows both TLS 1.3 and 1.2, the ciphers list only includes suites for TLS 1.3. Adding TLS 1.2 cipher suites is necessary to ensure TLS 1.2 connections can be established, improving compatibility.

Low
Use SQL fragment for endpoint
Suggestion Impact:The :endpoint expr was changed from `host <> ":" <> fragment("?::text", port)` to a single SQL fragment `fragment("? || ':' || ?::text", host, port)`, ensuring concatenation happens correctly in the database.

code diff:

@@ -636,7 +650,7 @@
               :string,
               expr(
                 if not is_nil(host) and not is_nil(port) do
-                  host <> ":" <> fragment("?::text", port)
+                  fragment("? || ':' || ?::text", host, port)
                 else
                   nil
                 end

In the :endpoint calculation, use a single SQL fragment for string concatenation
instead of mixing Elixir's <> operator with a fragment. This ensures the
concatenation is performed correctly in the database.

elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [635-643]

 calculate :endpoint,
-              :string,
-              expr(
-                if not is_nil(host) and not is_nil(port) do
-                  host <> ":" <> fragment("?::text", port)
-                else
-                  nil
-                end
-              )
+          :string,
+          expr(
+            case do
+              not is_nil(host) and not is_nil(port) ->
+                fragment("? || ':' || ?::text", host, port)
+              true ->
+                nil
+            end
+          )

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the current implementation mixes Elixir string concatenation with an SQL fragment, which is not valid in an Ash expr. The proposed change to use a single fragment for concatenation is a valid and necessary correction for the calculation to work at the database level.

Low
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2225#issuecomment-3712126644 Original created: 2026-01-05T21:20:36Z --- ## PR Code Suggestions ✨ <!-- 799e7d5 --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=8>Possible issue</td> <td> <details><summary>Use correct attribute for timestamp</summary> ___ **Correct the attribute name from <code>:last_seen_time</code> to <code>:last_seen_at</code> in the <br><code>persist_identity</code> function to prevent device update failures.** [elixir/serviceradar_core/lib/serviceradar/actors/device.ex [503-523]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-eba1d95a852e4a736813c7b486da651704f20718e24f931c966ff3f37c421eeaR503-R523) ```diff defp persist_identity(tenant_id, device_id, identity) do case DeviceResource.get_by_uid(device_id, tenant: tenant_id, authorize?: false) do {:ok, device} -> updates = identity |> Map.take([:name, :hostname, :ip, :mac, :vendor_name, :model, :os]) - |> Map.put(:last_seen_time, DateTime.utc_now()) + |> Map.put(:last_seen_at, DateTime.utc_now()) case Ash.update(device, updates, authorize?: false) do {:ok, _} -> :ok {:error, error} -> {:error, error} end {:error, %Ash.Error.Query.NotFound{}} -> # Device doesn't exist yet, nothing to persist :ok {:error, error} -> {:error, error} end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: This suggestion identifies a critical bug where an incorrect attribute name (`:last_seen_time` instead of `:last_seen_at`) would cause all device identity updates to fail. Fixing this is essential for the core functionality of the device actor. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>✅ <s>Fix race condition in shutdown logic</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit introduces a shared shutdownCtx with a timeout and uses it in the shutdown goroutine and the final select (replacing the separate timer), matching the suggested approach to avoid premature timeout races. It also extends the idea by passing shutdownCtx into pushLoop.Stop and guarding the errChan wait with shutdownCtx. code diff: ```diff // Bound shutdown so the process can't hang forever (includes pushLoop.Stop()). const shutdownTimeout = 10 * time.Second + shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), shutdownTimeout) + defer shutdownCancel() shutdownDone := make(chan struct{}) go func() { defer close(shutdownDone) cancel() - pushLoop.Stop() - <-errChan - - stopCtx, stopCancel := context.WithTimeout(context.Background(), shutdownTimeout) - defer stopCancel() - if err := server.Stop(stopCtx); err != nil { + if err := pushLoop.Stop(shutdownCtx); err != nil { + log.Warn().Err(err).Msg("Push loop stop did not complete before timeout") + } + select { + case <-errChan: + case <-shutdownCtx.Done(): + return + } + + if err := server.Stop(shutdownCtx); err != nil { log.Error().Err(err).Msg("Error stopping agent services") } }() - - timer := time.NewTimer(shutdownTimeout) - defer timer.Stop() select { case <-shutdownDone: - case <-timer.C: + case <-shutdownCtx.Done(): return fmt.Errorf("%w after %s", errShutdownTimeout, shutdownTimeout) ``` </details> ___ **Refactor the shutdown logic to use a single parent context with a timeout for <br>all shutdown operations, preventing a race condition where the main goroutine <br>could time out prematurely.** [cmd/agent/main.go [201-226]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3R201-R226) ```diff // Bound shutdown so the process can't hang forever (includes pushLoop.Stop()). const shutdownTimeout = 10 * time.Second + shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), shutdownTimeout) + defer shutdownCancel() shutdownDone := make(chan struct{}) go func() { defer close(shutdownDone) cancel() pushLoop.Stop() <-errChan - stopCtx, stopCancel := context.WithTimeout(context.Background(), shutdownTimeout) - defer stopCancel() - if err := server.Stop(stopCtx); err != nil { + // Use the shutdownCtx for stopping the server to respect the overall timeout. + if err := server.Stop(shutdownCtx); err != nil { log.Error().Err(err).Msg("Error stopping agent services") } }() - timer := time.NewTimer(shutdownTimeout) - defer timer.Stop() - select { case <-shutdownDone: - case <-timer.C: + // Shutdown completed gracefully. + case <-shutdownCtx.Done(): return fmt.Errorf("%w after %s", errShutdownTimeout, shutdownTimeout) } ``` `[Suggestion processed]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies and fixes a race condition in the shutdown logic, ensuring all shutdown operations are bound by a single, shared timeout, which improves the agent's robustness. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>✅ <s>Ensure state transitions are atomic</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit removed `require_atomic? false` from multiple state-transition update actions including `establish_connection` (and others like connection_failed, degrade, restore_health, etc.), aligning those transitions with atomic execution. code diff: ```diff @@ -282,7 +307,6 @@ update :establish_connection do description "Mark agent as connected (from connecting state)" accept [:poller_id] - require_atomic? false change transition_state(:connected) change set_attribute(:is_healthy, true) @@ -293,7 +317,6 @@ update :connection_failed do description "Mark connection attempt as failed" - require_atomic? false change transition_state(:disconnected) change set_attribute(:modified_time, &DateTime.utc_now/0) @@ -302,7 +325,6 @@ update :degrade do description "Mark agent as degraded (connected but unhealthy)" - require_atomic? false change transition_state(:degraded) change set_attribute(:is_healthy, false) @@ -312,7 +334,6 @@ update :restore_health do description "Restore agent health (from degraded to connected)" - require_atomic? false change transition_state(:connected) change set_attribute(:is_healthy, true) @@ -322,7 +343,6 @@ update :lose_connection do description "Mark agent as disconnected (connection lost)" - require_atomic? false change transition_state(:disconnected) change set_attribute(:poller_id, nil) @@ -332,7 +352,6 @@ update :reconnect do description "Start reconnection process (from disconnected to connecting)" - require_atomic? false change transition_state(:connecting) change set_attribute(:modified_time, &DateTime.utc_now/0) @@ -342,7 +361,6 @@ update :mark_unavailable do description "Mark agent as unavailable (admin action)" argument :reason, :string - require_atomic? false change transition_state(:unavailable) change set_attribute(:is_healthy, false) @@ -352,7 +370,6 @@ update :recover do description "Start recovery process (from unavailable to connecting)" - require_atomic? false change transition_state(:connecting) change set_attribute(:modified_time, &DateTime.utc_now/0) @@ -363,7 +380,6 @@ update :connect do description "Mark agent as connected to a poller (legacy - use establish_connection)" accept [:poller_id] - require_atomic? false change transition_state(:connected) change set_attribute(:is_healthy, true) @@ -374,7 +390,6 @@ update :disconnect do description "Mark agent as disconnected (legacy - use lose_connection)" - require_atomic? false change transition_state(:disconnected) change set_attribute(:poller_id, nil) @@ -384,7 +399,6 @@ update :mark_unhealthy do description "Mark agent as unhealthy (legacy - use degrade)" - require_atomic? false change transition_state(:degraded) change set_attribute(:is_healthy, false) ``` </details> ___ **Remove <code>require_atomic? false</code> from all state-transition actions like <br><code>establish_connection</code> to ensure atomicity and prevent data inconsistency. State <br>transitions are critical and should be transactional.** [elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [270-292]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843aR270-R292) ```diff update :heartbeat do description "Update last_seen_time and health status (for connected agents)" accept [:is_healthy, :capabilities] require_atomic? false change set_attribute(:last_seen_time, &DateTime.utc_now/0) change set_attribute(:modified_time, &DateTime.utc_now/0) end # State machine transition actions # Each action includes PublishStateChange to emit NATS events update :establish_connection do description "Mark agent as connected (from connecting state)" accept [:poller_id] - require_atomic? false change transition_state(:connected) change set_attribute(:is_healthy, true) change set_attribute(:last_seen_time, &DateTime.utc_now/0) change set_attribute(:modified_time, &DateTime.utc_now/0) change {ServiceRadar.Infrastructure.Changes.PublishStateChange, entity_type: :agent, new_state: :connected} end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies that using `require_atomic? false` for state transition actions can lead to data inconsistency, which is a significant correctness issue. Ensuring atomicity for these critical operations is crucial for maintaining a valid state. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>✅ <s>Combine create and update into one</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit removed the separate update call that set token fields after creation, and instead applies the token attributes directly on the create changeset (via force_change_attribute) before a single Ash.create, making the operation effectively atomic and avoiding inconsistent intermediate state. code diff: ```diff + changeset = + OnboardingPackage + |> Ash.Changeset.for_create(:create, create_attrs, + actor: actor, + authorize?: authorize?, + tenant: tenant + ) + |> Ash.Changeset.force_change_attribute(:join_token_ciphertext, join_token_ciphertext) + |> Ash.Changeset.force_change_attribute(:join_token_expires_at, join_expires) + |> Ash.Changeset.force_change_attribute(:download_token_hash, download_token_hash) + |> Ash.Changeset.force_change_attribute(:download_token_expires_at, download_expires) + + case Ash.create(changeset) do {:ok, package} -> - # Update with token fields - token_attrs = %{ - join_token_ciphertext: join_token_ciphertext, - join_token_expires_at: join_expires, - download_token_hash: download_token_hash, - download_token_expires_at: download_expires - } - - case package - |> Ash.Changeset.for_update(:update_tokens, token_attrs, - actor: actor, - authorize?: authorize?, - tenant: tenant - ) - |> Ash.update() do - {:ok, updated_package} -> - # Record creation event - OnboardingEvents.record(package.id, :created, - actor: get_actor_name(actor), - source_ip: source_ip - ) - - {:ok, - %{ - package: updated_package, - join_token: join_token, - download_token: download_token - }} - - {:error, error} -> - {:error, error} - end + # Record creation event + OnboardingEvents.record(package.id, :created, + actor: get_actor_name(actor), + source_ip: source_ip + ) + + {:ok, + %{ + package: package, + join_token: join_token, + download_token: download_token + }} ``` </details> ___ **Refactor the <code>create</code> function to be atomic by combining the "create-then-update" <br>logic into a single operation. Pass all attributes, including tokens, to the <br><code>:create</code> action to prevent data inconsistency.** [elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [156-199]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-e4fe8e19bc324416302bb4c962f57133b3f62eb82053766844d881c522a473e5R156-R199) ```diff -case OnboardingPackage - |> Ash.Changeset.for_create(:create, create_attrs, - actor: actor, - authorize?: authorize?, - tenant: tenant - ) - |> Ash.create() do +# Encrypt join token and hash download token +join_token_ciphertext = Crypto.encrypt(join_token) +download_token_hash = Crypto.hash_token(download_token) + +# Prepare attributes for Ash create +create_attrs = + attrs + |> Map.put(:created_by, get_actor_name(actor)) + |> Map.put(:join_token_ciphertext, join_token_ciphertext) + |> Map.put(:join_token_expires_at, join_expires) + |> Map.put(:download_token_hash, download_token_hash) + |> Map.put(:download_token_expires_at, download_expires) + +case OnboardingPackage.create(create_attrs, actor: actor, authorize?: authorize?, tenant: tenant) do {:ok, package} -> - # Update with token fields - token_attrs = %{ - join_token_ciphertext: join_token_ciphertext, - join_token_expires_at: join_expires, - download_token_hash: download_token_hash, - download_token_expires_at: download_expires - } + # Record creation event + OnboardingEvents.record(package.id, :created, + actor: get_actor_name(actor), + source_ip: source_ip + ) - case package - |> Ash.Changeset.for_update(:update_tokens, token_attrs, - actor: actor, - authorize?: authorize?, - tenant: tenant - ) - |> Ash.update() do - {:ok, updated_package} -> - # Record creation event - OnboardingEvents.record(package.id, :created, - actor: get_actor_name(actor), - source_ip: source_ip - ) - - {:ok, - %{ - package: updated_package, - join_token: join_token, - download_token: download_token - }} - - {:error, error} -> - {:error, error} - end + {:ok, + %{ + package: package, + join_token: join_token, + download_token: download_token + }} {:error, error} -> {:error, error} end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly points out a non-atomic "create-then-update" pattern that could lead to inconsistent data. Consolidating this into a single atomic `create` operation is a critical improvement for data integrity. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>✅ <s>Prevent race condition during updates</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit removed the separate conditional DeviceAliasState.confirm/2 call and instead passed confirm_threshold into DeviceAliasState.record_sighting/3, implying the confirmation decision is now handled within the sighting recording operation (moving toward the suggested atomic consolidation), though it did not introduce the specifically suggested atomic_sighting_and_confirm function or the suggested return-shape handling. code diff: ```diff # Record the sighting - case DeviceAliasState.record_sighting(existing, %{}, actor: actor) do + case DeviceAliasState.record_sighting( + existing, + %{confirm_threshold: confirm_threshold}, + actor: actor + ) do {:ok, updated} -> - # Check if we should confirm - if updated.state == :detected and updated.sighting_count >= confirm_threshold do - DeviceAliasState.confirm(updated, actor: actor) - end ``` </details> ___ **Prevent a race condition in <code>handle_existing_alias/4</code> by consolidating the logic <br>for recording a sighting and confirming the state into a single, atomic action <br>on the <code>DeviceAliasState</code> resource.** [elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [438-474]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28R438-R474) ```diff defp handle_existing_alias(existing, _update, confirm_threshold, actor) do alias ServiceRadar.Identity.DeviceAliasState - # Record the sighting - case DeviceAliasState.record_sighting(existing, %{}, actor: actor) do - {:ok, updated} -> - # Check if we should confirm - if updated.state == :detected and updated.sighting_count >= confirm_threshold do - DeviceAliasState.confirm(updated, actor: actor) - end + # Atomically record sighting and potentially confirm + case DeviceAliasState.atomic_sighting_and_confirm(existing, %{confirm_threshold: confirm_threshold}, actor: actor) do + {:ok, %{data: %{state_changed: true, new_state: new_state, sighting_count: count}}} -> + # Generate event for state change + %{ + device_id: existing.device_id, + partition: existing.partition || "", + action: "alias_state_changed", + reason: "state_transition", + timestamp: DateTime.utc_now(), + severity: "Info", + level: 6, + metadata: %{ + "alias_type" => to_string(existing.alias_type), + "alias_value" => existing.alias_value, + "previous_state" => to_string(existing.state), + "new_state" => to_string(new_state), + "sighting_count" => to_string(count) + } + } - # Generate event if state changed - if updated.state != existing.state do - %{ - device_id: existing.device_id, - partition: existing.partition || "", - action: "alias_state_changed", - reason: "state_transition", - timestamp: DateTime.utc_now(), - severity: "Info", - level: 6, - metadata: %{ - "alias_type" => to_string(existing.alias_type), - "alias_value" => existing.alias_value, - "previous_state" => to_string(existing.state), - "new_state" => to_string(updated.state), - "sighting_count" => to_string(updated.sighting_count) - } - } - else - nil - end + {:ok, _} -> + # Sighting recorded, but no state change + nil {:error, _reason} -> nil end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies a race condition in `handle_existing_alias/4` due to non-atomic read-modify-write operations. Consolidating the sighting and confirmation logic into a single atomic action is crucial to prevent incorrect state transitions. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Conditionally enable SSL for gRPC</summary> ___ **Conditionally enable SSL for the gRPC connection based on an environment <br>variable, allowing for both secure and insecure deployments.** [elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [487-518]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554R487-R518) ```diff defp create_fresh_channel do host = System.get_env("DATASVC_HOST", "datasvc") port = String.to_integer(System.get_env("DATASVC_PORT", "50057")) + ssl_enabled = System.get_env("DATASVC_SSL", "false") in ["true", "1", "yes"] cert_dir = System.get_env("DATASVC_CERT_DIR", "/etc/serviceradar/certs") cert_name = System.get_env("DATASVC_CERT_NAME", "core") server_name = System.get_env("DATASVC_SERVER_NAME", "datasvc.serviceradar") endpoint = "#{host}:#{port}" - ssl_opts = [ - cacertfile: String.to_charlist(Path.join(cert_dir, "root.pem")), - certfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}.pem")), - keyfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}-key.pem")), - verify: :verify_peer, - server_name_indication: String.to_charlist(server_name) - ] + cred_opts = + if ssl_enabled do + ssl_opts = [ + cacertfile: String.to_charlist(Path.join(cert_dir, "root.pem")), + certfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}.pem")), + keyfile: String.to_charlist(Path.join(cert_dir, "#{cert_name}-key.pem")), + verify: :verify_peer, + server_name_indication: String.to_charlist(server_name) + ] - connect_opts = [ - cred: GRPC.Credential.new(ssl: ssl_opts), - adapter_opts: [connect_timeout: 5_000] - ] + [cred: GRPC.Credential.new(ssl: ssl_opts)] + else + [] + end + + connect_opts = + cred_opts + |> Keyword.put(:adapter_opts, [connect_timeout: 5_000]) case GRPC.Stub.connect(endpoint, connect_opts) do {:ok, channel} -> Logger.debug("Created fresh NATS account gRPC channel to #{endpoint}") {:ok, channel} {:error, reason} -> Logger.error("Failed to create NATS account gRPC channel: #{inspect(reason)}") {:error, {:connection_failed, reason}} end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly points out that the function hardcodes an mTLS connection, which contradicts the module's documentation stating it uses the same configuration as `DataService.Client`. Applying this change improves flexibility and aligns the component's behavior with other parts of the system, making it usable in non-SSL environments. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Prevent misleading operator config log</summary> ___ **Add a check to ensure <code>operatorConfigPath</code> is not empty before attempting to write <br>the NATS operator configuration to prevent misleading log messages.** [cmd/data-services/main.go [117-130]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R117-R130) ```diff if operatorConfigPath != "" || resolverPath != "" { natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath) log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath) // If operator is already initialized, write the config now // This ensures config files exist even when datasvc restarts with an existing operator - if operator != nil { + if operator != nil && operatorConfigPath != "" { if err := natsAccountServer.WriteOperatorConfig(); err != nil { log.Printf("Warning: failed to write initial operator config: %v", err) } else { log.Printf("Wrote initial operator config to %s", operatorConfigPath) } } } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=6 --> <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The suggestion correctly identifies a condition that leads to a misleading log message and proposes a simple fix, improving the accuracy of the application's logging. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>✅ <s>Ensure deterministic record selection order</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit adds `sorted_fallback = Enum.sort(fallback)` and updates the subsequent Enum.reduce to iterate over `sorted_fallback` instead of the unsorted map, making processing deterministic. code diff: ```diff # Merge fallback records that don't have canonical equivalents + sorted_fallback = Enum.sort(fallback) + {canonical, meta} = - Enum.reduce(fallback, {canonical, meta}, fn {key, device}, {can, m} -> + Enum.reduce(sorted_fallback, {canonical, meta}, fn {key, device}, {can, m} -> ``` </details> ___ **Sort the <code>fallback</code> map by key before reducing it in <code>select_canonical_records</code> to <br>ensure deterministic device selection and prevent metric instability.** [elixir/serviceradar_core/lib/serviceradar/core/stats_aggregator.ex [263-286]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-1f4ac8290be7d27cac0ed660e51a9b3b23a219a6bb43b3735f3c5a9768321031R263-R286) ```diff defp select_canonical_records(devices, meta) do # Group by canonical ID to deduplicate canonical = %{} fallback = %{} {canonical, fallback, meta} = Enum.reduce(devices, {canonical, fallback, meta}, fn device, {can, fall, m} -> process_device_record(device, can, fall, m) end) # Merge fallback records that don't have canonical equivalents + # Sort the fallback map to ensure deterministic processing + sorted_fallback = Enum.sort(fallback) + {canonical, meta} = - Enum.reduce(fallback, {canonical, meta}, fn {key, device}, {can, m} -> + Enum.reduce(sorted_fallback, {canonical, meta}, fn {key, device}, {can, m} -> if Map.has_key?(can, key) do {can, %{m | skipped_non_canonical: m.skipped_non_canonical + 1}} else {Map.put(can, key, device), %{m | inferred_canonical_fallback: m.inferred_canonical_fallback + 1}} end end) selected = Map.values(canonical) {selected, meta} end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The suggestion correctly identifies that iterating over a map is not order-guaranteed, which could lead to non-deterministic behavior and metric instability. Sorting the fallback records before processing is a valid improvement for ensuring consistent results. </details></details></td><td align=center>Low </td></tr><tr><td rowspan=6>General</td> <td> <details><summary>✅ <s>Safely parse integer env vars</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>Updated get_env_int to use Integer.parse/1 and return nil for non-numeric or partially numeric environment variable values instead of raising. code diff: ```diff @@ -302,7 +302,11 @@ case System.get_env(key) do nil -> nil "" -> nil - value -> String.to_integer(value) + value -> + case Integer.parse(value) do + {int, ""} -> int + _ -> nil + end ``` </details> ___ **Replace <code>String.to_integer/1</code> with the safer <code>Integer.parse/1</code> in <code>get_env_int</code> to <br>prevent crashes from non-numeric environment variable values.** [elixir/serviceradar_core/lib/serviceradar/data_service/client.ex [301-307]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-503e195ad79e05e12d7ad03a675f6e35ffdfc201b8571b0d30a220fe036e03a1R301-R307) ```diff defp get_env_int(key) do case System.get_env(key) do nil -> nil "" -> nil - value -> String.to_integer(value) + value -> + case Integer.parse(value) do + {int, ""} -> int + _ -> nil + end end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly points out that `String.to_integer/1` will crash if the environment variable contains a non-numeric string. Using `Integer.parse/1` makes the configuration loading more robust against invalid user input. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Use default limits struct</summary> ___ **In <code>build_limits</code>, return a default <code>Proto.AccountLimits{}</code> struct instead of <code>nil</code> <br>when no limits are provided to ensure a valid protobuf message.** [elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [520]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554R520-R520) ```diff -defp build_limits(nil), do: nil +defp build_limits(nil), do: %Proto.AccountLimits{} ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies that sending `nil` for a protobuf message field can cause issues. Returning a default, empty `AccountLimits` struct is safer and ensures the gRPC request is always well-formed, improving the client's robustness. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>✅ <s>Handle invalid timestamps safely</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit replaced the bang version with DateTime.from_unix/1 and added pattern matching to return nil on error, preventing potential crashes from invalid UNIX timestamps. code diff: ```diff expires_at = case response.expires_at_unix do 0 -> nil - unix -> DateTime.from_unix!(unix) + unix -> + case DateTime.from_unix(unix) do + {:ok, dt} -> dt + {:error, _} -> nil + end ``` </details> ___ **Replace <code>DateTime.from_unix!/1</code> with the safer <code>DateTime.from_unix/1</code> and handle <br>potential errors to prevent crashes from invalid timestamps.** [elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [174-178]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554R174-R178) ```diff expires_at = case response.expires_at_unix do 0 -> nil - unix -> DateTime.from_unix!(unix) + unix -> + case DateTime.from_unix(unix) do + {:ok, dt} -> dt + {:error, _} -> nil + end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: This suggestion correctly identifies a potential crash when parsing a UNIX timestamp from a gRPC response. Using the non-bang version `DateTime.from_unix/1` and handling the error case makes the code more resilient to invalid data from the remote service. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Improve detection of trailing config data</summary> ___ **Replace the flawed check for trailing JSON data with the more robust and <br>idiomatic <code>dec.More()</code> method to correctly handle whitespace and prevent invalid <br>configurations from being accepted.** [cmd/agent/main.go [133-137]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3R133-R137) ```diff - if err := dec.Decode(&struct{}{}); err == nil { + if dec.More() { return nil, fmt.Errorf("failed to parse config: %w", errConfigTrailingData) - } else if !errors.Is(err, io.EOF) { - return nil, fmt.Errorf("failed to parse config: %w", err) } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=11 --> <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly identifies a subtle flaw in the JSON parsing logic and proposes using the idiomatic `dec.More()` function, which improves code correctness and readability. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Include TLS1.2 cipher suites</summary> ___ **Update the <code>ciphers</code> list to include suites for both TLS 1.3 and TLS 1.2. This <br>ensures that TLS 1.2 connections are supported, as currently only TLS 1.3 cipher <br>suites are configured.** [elixir/serviceradar_core/lib/serviceradar/spiffe.ex [323]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-0cb8d921c19f671b66f91c0978e351e71d927c5f4694924984c9f1ed34d7ee78R323-R323) ```diff -ciphers: :ssl.cipher_suites(:default, :"tlsv1.3") +ciphers: :ssl.cipher_suites(:default, :"tlsv1.3") ++ + :ssl.cipher_suites(:default, :"tlsv1.2") ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly points out that while `versions` allows both TLS 1.3 and 1.2, the `ciphers` list only includes suites for TLS 1.3. Adding TLS 1.2 cipher suites is necessary to ensure TLS 1.2 connections can be established, improving compatibility. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>✅ <s>Use SQL fragment for endpoint</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The :endpoint expr was changed from `host <> ":" <> fragment("?::text", port)` to a single SQL fragment `fragment("? || ':' || ?::text", host, port)`, ensuring concatenation happens correctly in the database. code diff: ```diff @@ -636,7 +650,7 @@ :string, expr( if not is_nil(host) and not is_nil(port) do - host <> ":" <> fragment("?::text", port) + fragment("? || ':' || ?::text", host, port) else nil end ``` </details> ___ **In the <code>:endpoint</code> calculation, use a single SQL <code>fragment</code> for string concatenation <br>instead of mixing Elixir's <code><></code> operator with a fragment. This ensures the <br>concatenation is performed correctly in the database.** [elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [635-643]](https://github.com/carverauto/serviceradar/pull/2225/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843aR635-R643) ```diff calculate :endpoint, - :string, - expr( - if not is_nil(host) and not is_nil(port) do - host <> ":" <> fragment("?::text", port) - else - nil - end - ) + :string, + expr( + case do + not is_nil(host) and not is_nil(port) -> + fragment("? || ':' || ?::text", host, port) + true -> + nil + end + ) ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The suggestion correctly identifies that the current implementation mixes Elixir string concatenation with an SQL fragment, which is not valid in an Ash `expr`. The proposed change to use a single `fragment` for concatenation is a valid and necessary correction for the calculation to work at the database level. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2632
No description provided.