Chore/arc fixes #2641

Merged
mfreeman451 merged 1 commit from refs/pull/2641/head into testing 2026-01-10 00:03:02 +00:00
mfreeman451 commented 2026-01-10 00:02:30 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2236
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2236
Original created: 2026-01-10T00:02:30Z
Original updated: 2026-01-10T00:05:45Z
Original head: carverauto/serviceradar:chore/arc-fixes
Original base: testing
Original merged: 2026-01-10T00:03:02Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Enhancement, Bug fix


Description

  • Refactored agent architecture from KV-based configuration to push-mode with file-based config loading, including graceful shutdown handling

  • Renamed "poller" to "gateway" terminology across all services (Go agents, SNMP checker, sysmon, trapd, zen consumer, rperf checker) for consistency

  • Added NATS credentials file support across multiple components (trapd, zen consumer, OTEL, flowgger, sysmon) with proper configuration and validation

  • Implemented comprehensive Elixir backend with multi-tenant architecture including:

    • Agent Gateway gRPC server with mTLS security for agent status pushes
    • Edge onboarding packages context module with certificate generation
    • Device actor for distributed state management and caching
    • Cluster status API for unified distributed queries
    • NATS account client for tenant isolation
    • SPIFFE/SPIRE integration for cluster node authorization
    • Device statistics aggregator with periodic snapshots
    • Multi-tenant organization resource with NATS integration
    • Comprehensive database schema migration with 40+ tables
  • Added NATS account service initialization in data-services with operator configuration and JWT resolver support

  • Enhanced CLI with new nats-bootstrap and admin subcommands for NATS operations

  • Fixed missing gateway_id field in rperf checker responses

  • Updated API documentation from "service pollers" to "service gateways"

  • Removed legacy components including old poller and sync services with associated Docker/Helm configurations


Diagram Walkthrough

flowchart LR
  A["Go Agents<br/>Push Mode"] -->|"mTLS<br/>Status Push"| B["Agent Gateway<br/>gRPC Server"]
  B -->|"Multi-Tenant<br/>Isolation"| C["Elixir Core<br/>Backend"]
  C -->|"Distributed<br/>State"| D["Device Actor<br/>Registry"]
  C -->|"Cluster<br/>Coordination"| E["Cluster Status<br/>API"]
  F["NATS<br/>Infrastructure"] -->|"Account<br/>Management"| C
  G["Edge<br/>Components"] -->|"Certificate<br/>Generation"| C
  H["Checkers<br/>SNMP/Sysmon/Trapd"] -->|"Gateway<br/>Terminology"| B

File Walkthrough

Relevant files
Enhancement
29 files
main.go
Refactor agent to push mode with file-based config             

cmd/agent/main.go

  • Refactored agent startup from KV-based configuration with edge
    onboarding to direct file-based config loading
  • Replaced complex bootstrap and watch mechanisms with simpler
    loadConfig() function that supports embedded defaults
  • Introduced runPushMode() function implementing push-based architecture
    with gateway client and push loop
  • Added graceful shutdown handling with signal management and bounded
    timeout (10 seconds)
  • Simplified imports by removing KV, config bootstrap, and edge
    onboarding dependencies
+174/-74
main.go
Initialize NATS account service with resolver support       

cmd/data-services/main.go

  • Added NATS account service initialization with operator configuration
    and client identity validation
  • Implemented resolver path configuration from environment variables
    with fallback to config file
  • Added system account credentials file setup for JWT resolver
    operations
  • Registered NATSAccountServiceServer in gRPC service registration
+68/-0   
main.go
Add NATS bootstrap and admin commands, rename poller to gateway

cmd/cli/main.go

  • Renamed update-poller subcommand to update-gateway to reflect
    terminology change
  • Added nats-bootstrap subcommand for NATS operations
  • Added admin subcommand dispatcher with nats admin resource routing
  • Introduced dispatchAdminCommand() helper function for admin subcommand
    routing
+16/-2   
main.go
Rename SNMP poller service to gateway service                       

cmd/checkers/snmp/main.go

  • Renamed SNMPPollerService to SNMPGatewayService for consistency with
    gateway terminology
  • Updated Poller struct reference to Gateway struct
+1/-1     
app.go
Rename gRPC poller service to agent gateway service           

cmd/core/app/app.go

  • Renamed gRPC service registration from RegisterPollerServiceServer to
    RegisterAgentGatewayServiceServer
+1/-1     
config.rs
Add NATS credentials file configuration support                   

cmd/consumers/zen/src/config.rs

  • Added optional nats_creds_file configuration field with validation
  • Implemented nats_creds_path() method to resolve credentials file path
    with support for absolute/relative paths and cert directory resolution
  • Added validation to ensure nats_creds_file is not empty when provided
+26/-0   
server.rs
Rename poller_id to gateway_id in sysmon checker                 

cmd/checkers/sysmon/src/server.rs

  • Renamed poller_id field to gateway_id in status and results response
    logging and response building
  • Updated log messages to reflect gateway terminology instead of poller
+6/-6     
config.rs
Add NATS credentials file configuration to OTEL                   

cmd/otel/src/config.rs

  • Added optional creds_file field to NATSConfigTOML struct for
    credentials file configuration
  • Implemented parsing of creds_file with trimming and empty string
    handling
  • Updated NATSConfig struct to include creds_file field
  • Added example configuration showing creds_file usage
+13/-0   
main.rs
Add NATS creds support and rename poller_id to gateway_id

cmd/trapd/src/main.rs

  • Added support for NATS credentials file in connection options for both
    secure and non-secure modes
  • Updated connection logic to use credentials_file() when creds_path is
    available
  • Renamed poller_id to gateway_id in status and results response
    building
+23/-3   
nats_output.rs
Add NATS credentials file support to flowgger                       

cmd/flowgger/src/flowgger/output/nats_output.rs

  • Added creds_file field to NATSConfig struct for credentials file
    support
  • Implemented parsing of creds_file configuration with empty string
    handling
  • Added credentials file application to NATS connection options
+14/-0   
config.rs
Add NATS credentials file configuration to trapd                 

cmd/trapd/src/config.rs

  • Added optional nats_creds_file configuration field with validation
  • Implemented nats_creds_path() method to resolve credentials file path
    using security config path resolution
  • Added validation to ensure nats_creds_file is not empty when provided
+21/-0   
nats_output.rs
Add NATS credentials file support to OTEL output                 

cmd/otel/src/nats_output.rs

  • Added creds_file field to NATSConfig struct with default None value
  • Implemented credentials file application in NATS connection setup with
    debug logging
+7/-0     
grpc_server.rs
Rename poller_id to gateway_id in zen consumer                     

cmd/consumers/zen/src/grpc_server.rs

  • Renamed poller_id field to gateway_id in status and results response
    building
+2/-2     
nats.rs
Add NATS credentials file support to zen consumer               

cmd/consumers/zen/src/nats.rs

  • Added credentials file support to NATS connection options using
    nats_creds_path() method
+4/-0     
setup.rs
Add NATS creds file debug logging                                               

cmd/otel/src/setup.rs

  • Added debug logging for NATS credentials file configuration
+1/-0     
onboarding_packages.ex
Implement edge onboarding packages context module               

elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex

  • Implemented comprehensive Ash-based context module for edge onboarding
    package management
  • Provides CRUD operations with token generation, delivery verification,
    revocation, and soft-delete
  • Includes component certificate generation signed by tenant CA with
    SPIFFE URI support
  • Implements create_with_tenant_cert() for multi-tenant deployments with
    automatic certificate bundling
  • Supports filtering by status, component type, gateway ID, and other
    attributes
+622/-0 
cluster_status.ex
Add unified cluster status API for distributed queries     

elixir/serviceradar_core/lib/serviceradar/cluster/cluster_status.ex

  • Created unified cluster status API for querying from any node in ERTS
    cluster
  • Provides node information, registry counts, and health status without
    requiring coordinator role
  • Implements RPC fallback to core-elx coordinator for health data when
    queried from non-coordinator nodes
  • Includes comprehensive documentation of web-ng to core-elx
    architecture and cross-node communication patterns
  • Supports finding coordinator node and checking if current node is
    coordinator
+272/-0 
agent_gateway_server.ex
gRPC Agent Gateway Server with Multi-Tenant mTLS Security

elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex

  • Implements gRPC server for receiving status pushes from Go agents with
    multi-tenant security via mTLS certificates
  • Provides three main RPC endpoints: hello (agent enrollment),
    get_config (configuration retrieval), push_status (status updates),
    and stream_status (chunked streaming)
  • Extracts tenant identity from mTLS client certificates and enforces
    component identity validation to prevent cross-tenant impersonation
  • Includes comprehensive input validation, resource limits (max 5000
    services per request), and error handling with proper gRPC status
    codes
+1020/-0
agent.ex
OCSF-Compliant Agent Resource with State Machine                 

elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex

  • Defines Agent resource using Ash framework with OCSF v1.4.0 compliance
    for Go agent monitoring
  • Implements state machine with lifecycle transitions (connecting,
    connected, degraded, disconnected, unavailable)
  • Provides capability definitions (ICMP, TCP, HTTP, gRPC, DNS, Process,
    SNMP) with UI metadata and OCSF type mappings
  • Includes JSON API routes for agent registration, connection
    management, and heartbeat operations with multi-tenant isolation
+665/-0 
tenant_registry.ex
Multi-Tenant Process Registry with Horde Integration         

elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex

  • Manages per-tenant Horde registries and DynamicSupervisors for
    multi-tenant process isolation and discovery
  • Provides slug-to-UUID mapping via ETS table for admin/debug lookups
    while using hash-based registry names for security
  • Implements registration, lookup, and lifecycle management APIs for
    tenant-scoped processes (gateways, agents, checkers)
  • Includes convenience functions for agent and gateway registration with
    metadata and heartbeat tracking
+634/-0 
alias_events.ex
Device Alias Lifecycle Event Tracking System                         

elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex

  • Tracks device alias lifecycle events (service IDs, IP addresses,
    collectors) with change detection and audit logging
  • Provides AliasRecord struct to parse and compare alias metadata from
    device records
  • Implements alias change detection, lifecycle event generation, and
    persistence to DeviceAliasState resource
  • Supports sighting counting and state transitions (confirm, mark_stale)
    with configurable thresholds
+654/-0 
generator.ex
X.509 Certificate Generation for Tenant CAs                           

elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex

  • Generates X.509 certificates for per-tenant CAs and edge components
    using Erlang's :public_key module
  • Implements tenant intermediate CA generation (10-year validity) and
    edge component certificate generation (1-year validity)
  • Builds certificates with proper extensions (Basic Constraints, Key
    Usage, Extended Key Usage, SAN, Subject Key ID)
  • Provides SPIFFE ID generation and CN extraction utilities for
    certificate validation and tenant identification
+541/-0 
account_client.ex
NATS Account gRPC Client for Tenant Isolation                       

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex

  • New gRPC client module for NATS account management and tenant
    isolation
  • Implements account creation, user credential generation, JWT signing,
    and operator bootstrap
  • Provides channel management with fallback to fresh connections when
    DataService.Client unavailable
  • Includes helper functions for building protobuf requests (limits,
    permissions, subject mappings)
+567/-0 
spiffe.ex
SPIFFE/SPIRE Integration for Cluster Node Authorization   

elixir/serviceradar_core/lib/serviceradar/spiffe.ex

  • New SPIFFE/SPIRE integration module for distributed cluster node
    authorization
  • Supports both filesystem and workload API modes for X.509 SVID
    certificate loading
  • Provides SPIFFE ID parsing, verification, and TLS configuration for
    ERTS distribution
  • Includes certificate expiry monitoring and rotation watching
    capabilities
+564/-0 
stats_aggregator.ex
Device Statistics Aggregator with Periodic Snapshots         

elixir/serviceradar_core/lib/serviceradar/core/stats_aggregator.ex

  • New GenServer for periodic device statistics aggregation and caching
  • Computes snapshots with device counts, availability, activity, and
    capability breakdowns
  • Implements canonical record deduplication and partition-based
    statistics
  • Includes telemetry recording and alert handler integration for anomaly
    detection
+628/-0 
tenant.ex
Multi-Tenant Organization Resource with NATS Integration 

elixir/serviceradar_core/lib/serviceradar/identity/tenant.ex

  • New Ash resource representing multi-tenant organizations with
    role-based access control
  • Implements NATS account provisioning with encrypted seed storage via
    AshCloak
  • Provides tenant registration, plan management, and per-tenant CA
    generation for edge isolation
  • Includes comprehensive policies for super_admin, admin, and regular
    user access levels
+604/-0 
device.ex
Device Actor for Distributed State Management                       

elixir/serviceradar_core/lib/serviceradar/actors/device.ex

  • New GenServer actor for device runtime state management and caching
  • Implements identity resolution, event buffering, and health status
    tracking
  • Provides Horde-based distributed registration and automatic
    hibernation after inactivity
  • Includes event flushing, health check scheduling, and PubSub
    broadcasting for state changes
+613/-0 
client.ex
DataService KV Store gRPC Client with Reconnection             

elixir/serviceradar_core/lib/serviceradar/data_service/client.ex

  • New gRPC client for datasvc KV service with automatic reconnection and
    backoff
  • Supports mTLS with client certificates and configurable SSL/TLS
    options
  • Implements put, get, delete, list_keys, and put_many operations with
    timeout handling
  • Includes connection pooling via GRPC.Stub and graceful error recovery
+511/-0 
reserved_tenant_slug.ex
Tenant Slug Reservation Validation                                             

elixir/serviceradar_core/lib/serviceradar/identity/validations/reserved_tenant_slug.ex

  • New Ash validation for enforcing reserved tenant slug constraints
  • Ensures platform tenant uses the reserved slug and regular tenants
    cannot use it
  • Handles both changeset attributes and data fields with
    case-insensitive string comparison
+87/-0   
Documentation
2 files
main.go
Update API documentation terminology                                         

cmd/core/main.go

  • Updated API description from "service pollers" to "service gateways"
    in Swagger documentation
+1/-1     
main.go
Update config-sync role documentation                                       

cmd/tools/config-sync/main.go

  • Updated role flag documentation from "poller" to "gateway" in help
    text
+1/-1     
Bug fix
1 files
server.rs
Add gateway_id field to rperf checker responses                   

cmd/checkers/rperf-client/src/server.rs

  • Added missing gateway_id field to status response in error case
  • Added missing gateway_id field to results response
+2/-0     
Tests
1 files
message_processor.rs
Add nats_creds_file to zen test config                                     

cmd/consumers/zen/src/message_processor.rs

  • Added nats_creds_file: None field to test configuration struct
+1/-0     
Configuration changes
3 files
20260107043446_initial_schema.exs
Add comprehensive initial database schema migration           

elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs

  • Created comprehensive initial schema migration with 40+ tables for
    multi-tenant ServiceRadar system
  • Includes tables for user management, NATS infrastructure, device
    discovery, monitoring, alerts, and edge onboarding
  • Implements tenant isolation, encryption for sensitive fields, and
    proper foreign key relationships
  • Defines indexes and unique constraints for data integrity and query
    performance
+1416/-0
20260107043447_add_oban_jobs.exs
Oban Job Queue Tenant Migration                                                   

elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043447_add_oban_jobs.exs

  • Adds Oban job queue migration for tenant-scoped background job
    processing
  • Supports configurable prefix for multi-tenant schema isolation
+13/-0   
.gitkeep
Docker Compose Credentials Directory                                         

docker/compose/creds/.gitkeep

  • New directory placeholder for Docker Compose credentials
+1/-0     
Additional files
101 files
.bazelignore +4/-0     
.bazelrc +5/-0     
.env-sample +33/-0   
.env.example +38/-0   
main.yml +18/-0   
AGENTS.md +177/-11
INSTALL.md +11/-11 
MODULE.bazel +5/-0     
Makefile +55/-14 
README-Docker.md +17/-2   
README.md +3/-3     
ROADMAP.md +1/-1     
BUILD.bazel +11/-6   
BUILD.bazel +12/-0   
mix_release.bzl +124/-49
BUILD.bazel +1/-0     
README.md +4/-4     
config.json +5/-6     
build.rs +0/-1     
monitoring.proto +3/-26   
README.md +2/-2     
monitoring.proto +2/-26   
zen-consumer-with-otel.json +1/-0     
zen-consumer.json +1/-0     
config.json +4/-4     
config.json +4/-4     
BUILD.bazel +1/-0     
README.md +3/-3     
README.md +9/-12   
flowgger.toml +1/-0     
otel.toml +1/-0     
BUILD.bazel +0/-25   
config.json +0/-111 
main.go +0/-138 
BUILD.bazel +0/-25   
config.json +0/-77   
main.go +0/-123 
docker-compose.elx.yml +109/-0 
docker-compose.spiffe.yml +8/-158 
docker-compose.yml +318/-269
Dockerfile.agent-gateway +94/-0   
Dockerfile.core-elx +108/-0 
Dockerfile.poller +0/-70   
Dockerfile.sync +0/-95   
Dockerfile.tools +1/-2     
Dockerfile.web-ng +6/-0     
agent-minimal.docker.json +6/-6     
agent.docker.json +5/-20   
agent.mtls.json +7/-10   
bootstrap-nested-spire.sh +0/-80   
datasvc.docker.json +3/-2     
datasvc.mtls.json +14/-1   
db-event-writer.docker.json +2/-2     
db-event-writer.mtls.json +3/-2     
FRICTION_POINTS.md +0/-355 
README.md +0/-207 
SETUP_GUIDE.md +0/-307 
docker-compose.edge-e2e.yml +0/-27   
manage-packages.sh +0/-211 
setup-edge-e2e.sh +0/-198 
edge-poller-restart.sh +0/-178 
downstream-agent.conf +0/-32   
env +0/-4     
server.conf +0/-51   
upstream-agent.conf +0/-32   
entrypoint-certs.sh +13/-9   
entrypoint-poller.sh +0/-274 
entrypoint-sync.sh +0/-96   
fix-cert-permissions.sh +2/-2     
flowgger.docker.toml +2/-1     
generate-certs.sh +214/-12
nats.docker.conf +16/-160
netflow-consumer.mtls.json +1/-0     
otel.docker.toml +2/-0     
pg_hba.conf +9/-0     
pg_ident.conf +17/-0   
poller-stack.compose.yml +0/-121 
poller.docker.json +0/-128 
poller.mtls.json +0/-135 
poller.spiffe.json +0/-55   
refresh-upstream-credentials.sh +0/-248 
seed-poller-kv.sh +0/-83   
setup-edge-poller.sh +0/-204 
README.md +5/-5     
bootstrap-compose-spire.sh +0/-2     
ssl_dist.core.conf +17/-0   
ssl_dist.gateway.conf +17/-0   
ssl_dist.web.conf +17/-0   
sync.docker.json +0/-71   
sync.mtls.json +0/-75   
sysmon-osx.checker.json +1/-1     
tools-profile.sh +1/-2     
trapd.docker.json +2/-1     
update-config.sh +1/-190 
zen.docker.json +2/-1     
BUILD.bazel +80/-84 
push_targets.bzl +2/-2     
AGENTS.md +8/-8     
CNCF_DAY0.md +31/-31 
CNCF_security_self_assessment.md +7/-7     
Additional files not shown

Imported from GitHub pull request. Original GitHub pull request: #2236 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2236 Original created: 2026-01-10T00:02:30Z Original updated: 2026-01-10T00:05:45Z Original head: carverauto/serviceradar:chore/arc-fixes Original base: testing Original merged: 2026-01-10T00:03:02Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Enhancement, Bug fix ___ ### **Description** - **Refactored agent architecture** from KV-based configuration to push-mode with file-based config loading, including graceful shutdown handling - **Renamed "poller" to "gateway"** terminology across all services (Go agents, SNMP checker, sysmon, trapd, zen consumer, rperf checker) for consistency - **Added NATS credentials file support** across multiple components (trapd, zen consumer, OTEL, flowgger, sysmon) with proper configuration and validation - **Implemented comprehensive Elixir backend** with multi-tenant architecture including: - Agent Gateway gRPC server with mTLS security for agent status pushes - Edge onboarding packages context module with certificate generation - Device actor for distributed state management and caching - Cluster status API for unified distributed queries - NATS account client for tenant isolation - SPIFFE/SPIRE integration for cluster node authorization - Device statistics aggregator with periodic snapshots - Multi-tenant organization resource with NATS integration - Comprehensive database schema migration with 40+ tables - **Added NATS account service initialization** in data-services with operator configuration and JWT resolver support - **Enhanced CLI** with new `nats-bootstrap` and `admin` subcommands for NATS operations - **Fixed missing `gateway_id` field** in rperf checker responses - **Updated API documentation** from "service pollers" to "service gateways" - **Removed legacy components** including old poller and sync services with associated Docker/Helm configurations ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Go Agents<br/>Push Mode"] -->|"mTLS<br/>Status Push"| B["Agent Gateway<br/>gRPC Server"] B -->|"Multi-Tenant<br/>Isolation"| C["Elixir Core<br/>Backend"] C -->|"Distributed<br/>State"| D["Device Actor<br/>Registry"] C -->|"Cluster<br/>Coordination"| E["Cluster Status<br/>API"] F["NATS<br/>Infrastructure"] -->|"Account<br/>Management"| C G["Edge<br/>Components"] -->|"Certificate<br/>Generation"| C H["Checkers<br/>SNMP/Sysmon/Trapd"] -->|"Gateway<br/>Terminology"| B ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>29 files</summary><table> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Refactor agent to push mode with file-based config</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/agent/main.go <ul><li>Refactored agent startup from KV-based configuration with edge <br>onboarding to direct file-based config loading<br> <li> Replaced complex bootstrap and watch mechanisms with simpler <br><code>loadConfig()</code> function that supports embedded defaults<br> <li> Introduced <code>runPushMode()</code> function implementing push-based architecture <br>with gateway client and push loop<br> <li> Added graceful shutdown handling with signal management and bounded <br>timeout (10 seconds)<br> <li> Simplified imports by removing KV, config bootstrap, and edge <br>onboarding dependencies</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3">+174/-74</a></td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Initialize NATS account service with resolver support</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/data-services/main.go <ul><li>Added NATS account service initialization with operator configuration <br>and client identity validation<br> <li> Implemented resolver path configuration from environment variables <br>with fallback to config file<br> <li> Added system account credentials file setup for JWT resolver <br>operations<br> <li> Registered <code>NATSAccountServiceServer</code> in gRPC service registration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816">+68/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Add NATS bootstrap and admin commands, rename poller to gateway</code></dd></summary> <hr> cmd/cli/main.go <ul><li>Renamed <code>update-poller</code> subcommand to <code>update-gateway</code> to reflect <br>terminology change<br> <li> Added <code>nats-bootstrap</code> subcommand for NATS operations<br> <li> Added <code>admin</code> subcommand dispatcher with <code>nats</code> admin resource routing<br> <li> Introduced <code>dispatchAdminCommand()</code> helper function for admin subcommand <br>routing</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-ed4d81d29a7267f93fd77e17993fd3491b9ef6ded18490b4514d10ed1d803bc2">+16/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Rename SNMP poller service to gateway service</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/snmp/main.go <ul><li>Renamed <code>SNMPPollerService</code> to <code>SNMPGatewayService</code> for consistency with <br>gateway terminology<br> <li> Updated <code>Poller</code> struct reference to <code>Gateway</code> struct</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-f25402eade63525184cb5e7437accff93c7b9338eebe81add6dc5f2a9eb12550">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>app.go</strong><dd><code>Rename gRPC poller service to agent gateway service</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/core/app/app.go <ul><li>Renamed gRPC service registration from <code>RegisterPollerServiceServer</code> to <br><code>RegisterAgentGatewayServiceServer</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4ad8a289575edf3b163088617b7a40ae1305c29ced0c7d59b3751c57d6938072">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>Add NATS credentials file configuration support</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/config.rs <ul><li>Added optional <code>nats_creds_file</code> configuration field with validation<br> <li> Implemented <code>nats_creds_path()</code> method to resolve credentials file path <br>with support for absolute/relative paths and cert directory resolution<br> <li> Added validation to ensure <code>nats_creds_file</code> is not empty when provided</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-05038f3867985e757de9027609950e682bad6d1992dac6acd7c28962a3c65dc4">+26/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>server.rs</strong><dd><code>Rename poller_id to gateway_id in sysmon checker</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/sysmon/src/server.rs <ul><li>Renamed <code>poller_id</code> field to <code>gateway_id</code> in status and results response <br>logging and response building<br> <li> Updated log messages to reflect gateway terminology instead of poller</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33e">+6/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>Add NATS credentials file configuration to OTEL</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/config.rs <ul><li>Added optional <code>creds_file</code> field to <code>NATSConfigTOML</code> struct for <br>credentials file configuration<br> <li> Implemented parsing of <code>creds_file</code> with trimming and empty string <br>handling<br> <li> Updated <code>NATSConfig</code> struct to include <code>creds_file</code> field<br> <li> Added example configuration showing <code>creds_file</code> usage</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-abbaec651da3d6af96b482e0f77bb909b65dbe0cabd78b5803769cc9dab0a1b0">+13/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.rs</strong><dd><code>Add NATS creds support and rename poller_id to gateway_id</code></dd></summary> <hr> cmd/trapd/src/main.rs <ul><li>Added support for NATS credentials file in connection options for both <br>secure and non-secure modes<br> <li> Updated connection logic to use <code>credentials_file()</code> when <code>creds_path</code> is <br>available<br> <li> Renamed <code>poller_id</code> to <code>gateway_id</code> in status and results response <br>building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-33b655d8730ae3e9c844ee280787d11f1b0d5343119188273f89558805f814ba">+23/-3</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats_output.rs</strong><dd><code>Add NATS credentials file support to flowgger</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/flowgger/src/flowgger/output/nats_output.rs <ul><li>Added <code>creds_file</code> field to <code>NATSConfig</code> struct for credentials file <br>support<br> <li> Implemented parsing of <code>creds_file</code> configuration with empty string <br>handling<br> <li> Added credentials file application to NATS connection options</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-a82e2e4d413539bf0b414b5629665b19648447523994cba639c4d1238aa5a0c1">+14/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>Add NATS credentials file configuration to trapd</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/trapd/src/config.rs <ul><li>Added optional <code>nats_creds_file</code> configuration field with validation<br> <li> Implemented <code>nats_creds_path()</code> method to resolve credentials file path <br>using security config path resolution<br> <li> Added validation to ensure <code>nats_creds_file</code> is not empty when provided</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-c89b88ba4d2bf0a054d0ba69a672a92c30140b8d19503d67b980a218ffe3106d">+21/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats_output.rs</strong><dd><code>Add NATS credentials file support to OTEL output</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/nats_output.rs <ul><li>Added <code>creds_file</code> field to <code>NATSConfig</code> struct with default <code>None</code> value<br> <li> Implemented credentials file application in NATS connection setup with <br>debug logging</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-6b585ea3564a481174e04da1270e2e13edd4e2b980d02a2652d6d21e6d82a498">+7/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>grpc_server.rs</strong><dd><code>Rename poller_id to gateway_id in zen consumer</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/grpc_server.rs <ul><li>Renamed <code>poller_id</code> field to <code>gateway_id</code> in status and results response <br>building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e4564a93f6cf84ff91cd3d8141fc9272ec9b4ec19defd107afa42be01fcfed5b">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats.rs</strong><dd><code>Add NATS credentials file support to zen consumer</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/nats.rs <ul><li>Added credentials file support to NATS connection options using <br><code>nats_creds_path()</code> method</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-97f7335def0ad5d644b594a1076ae2d7080b11259cbb8de22c7946cc8e4b39f8">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>setup.rs</strong><dd><code>Add NATS creds file debug logging</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/setup.rs - Added debug logging for NATS credentials file configuration </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-3891f667deb20fd26e296d3e2742c57378d3764fe1743118e612465ae360391f">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>onboarding_packages.ex</strong><dd><code>Implement edge onboarding packages context module</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex <ul><li>Implemented comprehensive Ash-based context module for edge onboarding <br>package management<br> <li> Provides CRUD operations with token generation, delivery verification, <br>revocation, and soft-delete<br> <li> Includes component certificate generation signed by tenant CA with <br>SPIFFE URI support<br> <li> Implements <code>create_with_tenant_cert()</code> for multi-tenant deployments with <br>automatic certificate bundling<br> <li> Supports filtering by status, component type, gateway ID, and other <br>attributes</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e4fe8e19bc324416302bb4c962f57133b3f62eb82053766844d881c522a473e5">+622/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>cluster_status.ex</strong><dd><code>Add unified cluster status API for distributed queries</code>&nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/cluster/cluster_status.ex <ul><li>Created unified cluster status API for querying from any node in ERTS <br>cluster<br> <li> Provides node information, registry counts, and health status without <br>requiring coordinator role<br> <li> Implements RPC fallback to core-elx coordinator for health data when <br>queried from non-coordinator nodes<br> <li> Includes comprehensive documentation of web-ng to core-elx <br>architecture and cross-node communication patterns<br> <li> Supports finding coordinator node and checking if current node is <br>coordinator</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4d39914bf1e3d207119f8d94afc598809746aa5843fb55e52cac9222d0fd335b">+272/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>agent_gateway_server.ex</strong><dd><code>gRPC Agent Gateway Server with Multi-Tenant mTLS Security</code></dd></summary> <hr> elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex <ul><li>Implements gRPC server for receiving status pushes from Go agents with <br>multi-tenant security via mTLS certificates<br> <li> Provides three main RPC endpoints: <code>hello</code> (agent enrollment), <br><code>get_config</code> (configuration retrieval), <code>push_status</code> (status updates), <br>and <code>stream_status</code> (chunked streaming)<br> <li> Extracts tenant identity from mTLS client certificates and enforces <br>component identity validation to prevent cross-tenant impersonation<br> <li> Includes comprehensive input validation, resource limits (max 5000 <br>services per request), and error handling with proper gRPC status <br>codes</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9f">+1020/-0</a></td> </tr> <tr> <td> <details> <summary><strong>agent.ex</strong><dd><code>OCSF-Compliant Agent Resource with State Machine</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex <ul><li>Defines <code>Agent</code> resource using Ash framework with OCSF v1.4.0 compliance <br>for Go agent monitoring<br> <li> Implements state machine with lifecycle transitions (connecting, <br>connected, degraded, disconnected, unavailable)<br> <li> Provides capability definitions (ICMP, TCP, HTTP, gRPC, DNS, Process, <br>SNMP) with UI metadata and OCSF type mappings<br> <li> Includes JSON API routes for agent registration, connection <br>management, and heartbeat operations with multi-tenant isolation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843a">+665/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>tenant_registry.ex</strong><dd><code>Multi-Tenant Process Registry with Horde Integration</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex <ul><li>Manages per-tenant Horde registries and DynamicSupervisors for <br>multi-tenant process isolation and discovery<br> <li> Provides slug-to-UUID mapping via ETS table for admin/debug lookups <br>while using hash-based registry names for security<br> <li> Implements registration, lookup, and lifecycle management APIs for <br>tenant-scoped processes (gateways, agents, checkers)<br> <li> Includes convenience functions for agent and gateway registration with <br>metadata and heartbeat tracking</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-91248b3b128a2e3d9bea6ffdb5e0f295e4a1745e82f87687c640ad01416fb85d">+634/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>alias_events.ex</strong><dd><code>Device Alias Lifecycle Event Tracking System</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex <ul><li>Tracks device alias lifecycle events (service IDs, IP addresses, <br>collectors) with change detection and audit logging<br> <li> Provides <code>AliasRecord</code> struct to parse and compare alias metadata from <br>device records<br> <li> Implements alias change detection, lifecycle event generation, and <br>persistence to <code>DeviceAliasState</code> resource<br> <li> Supports sighting counting and state transitions (confirm, mark_stale) <br>with configurable thresholds</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28">+654/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>generator.ex</strong><dd><code>X.509 Certificate Generation for Tenant CAs</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex <ul><li>Generates X.509 certificates for per-tenant CAs and edge components <br>using Erlang's <code>:public_key</code> module<br> <li> Implements tenant intermediate CA generation (10-year validity) and <br>edge component certificate generation (1-year validity)<br> <li> Builds certificates with proper extensions (Basic Constraints, Key <br>Usage, Extended Key Usage, SAN, Subject Key ID)<br> <li> Provides SPIFFE ID generation and CN extraction utilities for <br>certificate validation and tenant identification</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-b48e4a9e1189da61e2a60e16f56fce81298d76b7cdab745107140fed3f6e48b4">+541/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>account_client.ex</strong><dd><code>NATS Account gRPC Client for Tenant Isolation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex <ul><li>New gRPC client module for NATS account management and tenant <br>isolation<br> <li> Implements account creation, user credential generation, JWT signing, <br>and operator bootstrap<br> <li> Provides channel management with fallback to fresh connections when <br>DataService.Client unavailable<br> <li> Includes helper functions for building protobuf requests (limits, <br>permissions, subject mappings)</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554">+567/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>spiffe.ex</strong><dd><code>SPIFFE/SPIRE Integration for Cluster Node Authorization</code>&nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/spiffe.ex <ul><li>New SPIFFE/SPIRE integration module for distributed cluster node <br>authorization<br> <li> Supports both filesystem and workload API modes for X.509 SVID <br>certificate loading<br> <li> Provides SPIFFE ID parsing, verification, and TLS configuration for <br>ERTS distribution<br> <li> Includes certificate expiry monitoring and rotation watching <br>capabilities</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0cb8d921c19f671b66f91c0978e351e71d927c5f4694924984c9f1ed34d7ee78">+564/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>stats_aggregator.ex</strong><dd><code>Device Statistics Aggregator with Periodic Snapshots</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/core/stats_aggregator.ex <ul><li>New GenServer for periodic device statistics aggregation and caching<br> <li> Computes snapshots with device counts, availability, activity, and <br>capability breakdowns<br> <li> Implements canonical record deduplication and partition-based <br>statistics<br> <li> Includes telemetry recording and alert handler integration for anomaly <br>detection</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-1f4ac8290be7d27cac0ed660e51a9b3b23a219a6bb43b3735f3c5a9768321031">+628/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>tenant.ex</strong><dd><code>Multi-Tenant Organization Resource with NATS Integration</code>&nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/identity/tenant.ex <ul><li>New Ash resource representing multi-tenant organizations with <br>role-based access control<br> <li> Implements NATS account provisioning with encrypted seed storage via <br>AshCloak<br> <li> Provides tenant registration, plan management, and per-tenant CA <br>generation for edge isolation<br> <li> Includes comprehensive policies for super_admin, admin, and regular <br>user access levels</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9d0658a5118ece5eac7a6326788fdf59407a52f87c4b9c9ac69e6900bc04dc2a">+604/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>device.ex</strong><dd><code>Device Actor for Distributed State Management</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/actors/device.ex <ul><li>New GenServer actor for device runtime state management and caching<br> <li> Implements identity resolution, event buffering, and health status <br>tracking<br> <li> Provides Horde-based distributed registration and automatic <br>hibernation after inactivity<br> <li> Includes event flushing, health check scheduling, and PubSub <br>broadcasting for state changes</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-eba1d95a852e4a736813c7b486da651704f20718e24f931c966ff3f37c421eea">+613/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>client.ex</strong><dd><code>DataService KV Store gRPC Client with Reconnection</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/data_service/client.ex <ul><li>New gRPC client for datasvc KV service with automatic reconnection and <br>backoff<br> <li> Supports mTLS with client certificates and configurable SSL/TLS <br>options<br> <li> Implements put, get, delete, list_keys, and put_many operations with <br>timeout handling<br> <li> Includes connection pooling via GRPC.Stub and graceful error recovery</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-503e195ad79e05e12d7ad03a675f6e35ffdfc201b8571b0d30a220fe036e03a1">+511/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>reserved_tenant_slug.ex</strong><dd><code>Tenant Slug Reservation Validation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/identity/validations/reserved_tenant_slug.ex <ul><li>New Ash validation for enforcing reserved tenant slug constraints<br> <li> Ensures platform tenant uses the reserved slug and regular tenants <br>cannot use it<br> <li> Handles both changeset attributes and data fields with <br>case-insensitive string comparison</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-3495f8e60db2c1472c538ed33b6d2be79730d7938ad034d273281040ee1558cb">+87/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>2 files</summary><table> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Update API documentation terminology</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/core/main.go <ul><li>Updated API description from "service pollers" to "service gateways" <br>in Swagger documentation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4ab3fd1d4debc53dd2499d94a0f60c648fdae4235dd1e3678095a975f5bb434a">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Update config-sync role documentation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/tools/config-sync/main.go <ul><li>Updated role flag documentation from "poller" to "gateway" in help <br>text</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-bc6eeb1b05bcb9179525e32fac1de9926b5823ec3504be546ab10c5c9740f544">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Bug fix</strong></td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary><strong>server.rs</strong><dd><code>Add gateway_id field to rperf checker responses</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/rperf-client/src/server.rs <ul><li>Added missing <code>gateway_id</code> field to status response in error case<br> <li> Added missing <code>gateway_id</code> field to results response</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-bce0f4ca6548712f224b73816825d28e831acbbff7dbed3c98671ed50f65d028">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary><strong>message_processor.rs</strong><dd><code>Add nats_creds_file to zen test config</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/message_processor.rs - Added `nats_creds_file: None` field to test configuration struct </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9fcbc5358a9009e60a8cd22d21e5a9ea652787c727732d0b869e0865495114c3">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>3 files</summary><table> <tr> <td> <details> <summary><strong>20260107043446_initial_schema.exs</strong><dd><code>Add comprehensive initial database schema migration</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs <ul><li>Created comprehensive initial schema migration with 40+ tables for <br>multi-tenant ServiceRadar system<br> <li> Includes tables for user management, NATS infrastructure, device <br>discovery, monitoring, alerts, and edge onboarding<br> <li> Implements tenant isolation, encryption for sensitive fields, and <br>proper foreign key relationships<br> <li> Defines indexes and unique constraints for data integrity and query <br>performance</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0d217dc9822fab0d3390e8ec21040f98e67106e5c9126e043a9b701efcbfb576">+1416/-0</a></td> </tr> <tr> <td> <details> <summary><strong>20260107043447_add_oban_jobs.exs</strong><dd><code>Oban Job Queue Tenant Migration</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043447_add_oban_jobs.exs <ul><li>Adds Oban job queue migration for tenant-scoped background job <br>processing<br> <li> Supports configurable prefix for multi-tenant schema isolation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-ea6c90f66966a8b9c0ab56774c296ab7ff8a22fa4116c9180574c9bc7a8543e5">+13/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>.gitkeep</strong><dd><code>Docker Compose Credentials Directory</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> docker/compose/creds/.gitkeep - New directory placeholder for Docker Compose credentials </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d72c41aab2d6f2c230a4340dfefe7917cdd12bed942c825aa0d4c9875a637bac">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Additional files</strong></td><td><details><summary>101 files</summary><table> <tr> <td><strong>.bazelignore</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-a5641cd37d6ad98b32cdfce1980836cc68312277bc6a7052f55da02ada5bc6cf">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>.bazelrc</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-544556920c45b42cbfe40159b082ce8af6bd929e492d076769226265f215832f">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>.env-sample</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-c4368a972a7fa60d9c4e333cebf68cdb9a67acb810451125c02e3b7eb2594e3d">+33/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>.env.example</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-a3046da0d15a27e89f2afe639b25748a7ad4d9290af3e7b1b6c1a5533c8f0a8c">+38/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>main.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-7829468e86c1cc5d5133195b5cb48e1ff6c75e3e9203777f6b2e379d9e4882b3">+18/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>AGENTS.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-a54ff182c7e8acf56acfd6e4b9c3ff41e2c41a31c9b211b2deb9df75d9a478f9">+177/-11</a></td> </tr> <tr> <td><strong>INSTALL.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-09b140a43ebfdd8dbec31ce72cafffd15164d2860fd390692a030bcb932b54a0">+11/-11</a>&nbsp; </td> </tr> <tr> <td><strong>MODULE.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-6136fc12446089c3db7360e923203dd114b6a1466252e71667c6791c20fe6bdc">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>Makefile</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52">+55/-14</a>&nbsp; </td> </tr> <tr> <td><strong>README-Docker.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9fd61d24482efe68c22d8d41e2a1dcc440f39195aa56e7a050f2abe598179efd">+17/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>ROADMAP.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-683343bdf93f55ed3cada86151abb8051282e1936e58d4e0a04beca95dff6e51">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-884fa9353a5226345e44fbabea3300efc7a87dfbcde0b6a42521ca51823f1b68">+11/-6</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0e80ea46aeb61a873324685edb96eae864c7a2004fbb7ee404b4ec951190ba10">+12/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>mix_release.bzl</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-86ec281f99363b6b6eb1f49e21d83b7eeca93a35b552b9f305fffc6855e38ccd">+124/-49</a></td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-143f8d1549d52f28906f19ce28e5568a5be474470ff103c2c1e63c3e6b08d670">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-bfd308915d0cf522e7fc76600dee687617dc69165ab22502a1d219850c0c0860">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-5b1bc8fe77422534739bdd3a38dc20d2634a86c171265c34e1b5d0c5a61b6bab">+5/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>build.rs</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-251e7a923f45f8f903e510d10f183366bda06d281c8ecc3669e1858256e2186d">+0/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>monitoring.proto</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-b56f709f4a0a3db694f2124353908318631f23e20b7846bc4b8ee869e2e0632a">+3/-26</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-2e9751b437fa61442aac074c7a4a912d0ac50ac3ea156ac8aedd8478d21c6bdb">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>monitoring.proto</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9faf6025eb0d3d38383f5b7ad2b733abeb38454d5e4de3e83994e94b12d87a50">+2/-26</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>zen-consumer-with-otel.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-68375f1f7847e1fbdf75664f6be65b1ad94ae6ce86ed73fc5964d65054668acb">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>zen-consumer.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4d308af9802a93a0f656e8c02a3b5fcd8991407bb18360f087470db74e1f9524">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-2423ef78d36e905ae993b69ff59f5df6b2e1b9492fb0fa8c6d0aad7c76d2d229">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-ef778d85ac6f9652c25cb0d631f0fe8dfb3edac4dde5d719a4fc2926fb5c3216">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-c62c0139ebdb337369f4067567cd2c52b8e7decb3ddfabc77f9f67b2f6e5789c">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0b0725713b87dca1de57200214a4fe04633f0d856c39aa8032280227bf8e8141">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-f425b4378f84e0ba0c6f532facff17ff5d55b4dc6033d8bf35130a159cd2ba32">+9/-12</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>flowgger.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-af9f49f931e282dca53d1f0521b036d222fe671f77e61a876a84cf4c6d7cca4d">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-c64b9ace832b8ea57a2be62f84166e03bb1904882635d444ec76a880cdf14cc0">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e1f7c698e0e3a4e6afa971c1140e71cbf22593fbb19c81cb26b02c15c5dc46ec">+0/-25</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9edc2486fff55fc399e0ac96dba5137948a7ea7285f5ef7846835355684b7ab5">+0/-111</a>&nbsp; </td> </tr> <tr> <td><strong>main.go</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4b8ec845da50cd58d011e69f9d1c30530ee1968df26616b8768bb1fc03433bbe">+0/-138</a>&nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4f5d2ea4260d490a0d6f28adde0b35eca8af77d22f3ee366a783946c53687619">+0/-25</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-bcac20d6b3cb81f0059e766839ba1ee59a885009249501b0ba1182ebb1daea25">+0/-77</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>main.go</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-78dc6bc53f1c760c66f43ff5f486bfe78a65bee8b2e0d4862293ec0892da2b29">+0/-123</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.elx.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9562070d7ad4a3e9b2d06567008cf35de1d96448d914b3b45bf6c36d97cdd914">+109/-0</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.spiffe.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-603fd9e7d40841d174f26b95d0cb0c9537430bf3f7a5da3ccbba4ea3d8ac66c9">+8/-158</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3">+318/-269</a></td> </tr> <tr> <td><strong>Dockerfile.agent-gateway</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-332bc81a932ae08efa711a71b60fe0954d99bf17ebdab00a3baaa177a44de8b0">+94/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.core-elx</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-5ec7a971285669999af442a0c7f141c34f7fd9180257307f5c4ed12f789a2182">+108/-0</a>&nbsp; </td> </tr> <tr> <td><strong>Dockerfile.poller</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d3ba129830fb366bfe23b00db4ef6218b10fc981d3c04842b1b3b3b367a8982f">+0/-70</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.sync</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0227933b9961fd553af1d229e89d71a0271fdc475081bbcef49b587941af1eda">+0/-95</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.tools</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0258db71e4070e342198965f1d046f3097640850b037df8a2287a7e239630add">+1/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.web-ng</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-92d43af1965575d56c3380ecc8a81024aac2ff36f039ec2d3839e9fc7852bc10">+6/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent-minimal.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-1f09fad94636c90373af8e270f6ba0332ae4f4d1df50a4909729280a3a9691e6">+6/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-5d33fe703515d03076d31261ecf946e9c6fc668cf5bf65099d49b670739e455e">+5/-20</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-008f2216f159a9bd5db9cc90baaf6f1e64487df7af05b56ab3b9d6c4946aa95f">+7/-10</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>bootstrap-nested-spire.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-ab4746a08fb1e0b307a1e47660cd22182e283a087cba87dcbff0fdfe750f44f1">+0/-80</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>datasvc.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-3f2719d3dbfe042e8383739e3c78e74e5f851a44e5e46bea8e79c4b79fdcc34f">+3/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>datasvc.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-3a45619e57f1e6e9a31486ec7fffb33ef246e271f82bac272ee0a946b88da70a">+14/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>db-event-writer.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9fc51271f7ef5bb460160013e24e44e829b730656891d26fc49d5fe72fbb3147">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>db-event-writer.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-7a33f95f7545499abf0ed9fc91b58499ab209639e4885019579c959583fc7496">+3/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>FRICTION_POINTS.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-b0653c58880f810ba832c0500733d63de309db98b43009fe73a1862494cf41bd">+0/-355</a>&nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-31849f033cfc932acee35f549c069abb1f36101c352e553dd6bff8713b29f98c">+0/-207</a>&nbsp; </td> </tr> <tr> <td><strong>SETUP_GUIDE.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-b4914f8640a78038e45f51235a624535672680dc902de5f107fc051f4f281913">+0/-307</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.edge-e2e.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-575d19ea771bdf8102cb9729db43a1bfd6afc2527160e54105beeac2e314f362">+0/-27</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>manage-packages.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-3c2ff6febbddb956c71557894adaf7d0a39a1f20dda120fe126364946bc47280">+0/-211</a>&nbsp; </td> </tr> <tr> <td><strong>setup-edge-e2e.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-2714e2c7e111f69ea9e9f5ddd7f6a70fa5ea96e3a53b851cb13b8b8b7cd12917">+0/-198</a>&nbsp; </td> </tr> <tr> <td><strong>edge-poller-restart.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-96a8fe52c38fd0d7c14895127df34a27be311cac89c53d28ee178661b629bd22">+0/-178</a>&nbsp; </td> </tr> <tr> <td><strong>downstream-agent.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-747de0375ced42af978ca7dac239862bdabb7f6bd0bd634f134b485517a7b4ee">+0/-32</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>env</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-686f1a954c542f2ec9bf14c3170648b65190ad242c7f3a95a0f872ae41b8b1c6">+0/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>server.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-025f5b5ab79526cf549ca1fdb90dd659ba76b438f05a7f77d916d18728c4b572">+0/-51</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>upstream-agent.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e8a869ddf4affa31536a8d4e4e6f09c40072a7026da2c609d93c6ecf04138902">+0/-32</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>entrypoint-certs.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-83d6800b184a5233c66c69766286b0a60fece1bc64addb112d9f8dc019437f05">+13/-9</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>entrypoint-poller.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e202d27e3331088745eb55cdd2b3e40ac3f5df109d9ff5c76c0faed60772807a">+0/-274</a>&nbsp; </td> </tr> <tr> <td><strong>entrypoint-sync.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9d5620b8e6833309dbafb8ee6b6b75c3b942d163c3fe7f1a9827958b2d640265">+0/-96</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>fix-cert-permissions.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-17ea40a11edcaa7c85bb4215fda46b5a32505246fef0ab5f3ed47b28470c5ec8">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>flowgger.docker.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-824f8797b418d4b9f5ea41e4a3741a0ed64b881f343072464489a76b7ea01008">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate-certs.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-8298241543b4744a6ac7780c760ac5b5a0a87ba62de19c8612ebe1aba0996ebd">+214/-12</a></td> </tr> <tr> <td><strong>nats.docker.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-06f2012494f428fe1bfb304972061c2094e0d99da88ba9af6914f7776872e6eb">+16/-160</a></td> </tr> <tr> <td><strong>netflow-consumer.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-f15920e8498a24f71ce3eec4f48fe8fefbb1765a90362998af779a660fcef9e1">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.docker.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d4af38790e3657b7589cd37a7539d5308b032f11caba7aa740ddc86bf99f4415">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>pg_hba.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-7bd5f7292054916c7e5997f4c84ac9ec07d4c945621a48936c2aed0575fb96eb">+9/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>pg_ident.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e7b8ce062e32c61fdc3bcc9e525c1f1df1c8008fbc02b11409e58c67baa17cc5">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>poller-stack.compose.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-f3b5c991c2c1f7646db0ca4ed9bcb5df0f313ce6a05d8f3c890f80c873f776f5">+0/-121</a>&nbsp; </td> </tr> <tr> <td><strong>poller.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d64ebb69ec31e831efd187c47a5bfff2573960306b177f6464e91cb44a3c709d">+0/-128</a>&nbsp; </td> </tr> <tr> <td><strong>poller.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-ef5d74bb3607431245c2bf06169d7fee89cae817e114035075b59a671229ab46">+0/-135</a>&nbsp; </td> </tr> <tr> <td><strong>poller.spiffe.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4e04bd23a0216287d5c0bb3831e0f95e7922ed03e8386a10ae7f4873e4fdb538">+0/-55</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>refresh-upstream-credentials.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d3b3a8fcdea1b49c9e1c0ecc12d61fb6d416313520e8ad52edbee9094dbdc271">+0/-248</a>&nbsp; </td> </tr> <tr> <td><strong>seed-poller-kv.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-c12070f475dbe7dc83e747fa6ec9d2ebdbdd97921a54f372abc89a102b783ad7">+0/-83</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>setup-edge-poller.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d7aec89d87f4cc98f4d6935e49a8f6ce571bc6dda254d894e93b60922f3a775f">+0/-204</a>&nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0cb49b4e37a7692f026133d5de971d449f42a1068226e848da5adf9af0ff4a2e">+5/-5</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>bootstrap-compose-spire.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-ca219a124d4c95ee7995764d7e0c322b4bfe59e357b7bcb42bc5d7c8b9b0af0d">+0/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.core.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-08d49d8621b581d1a9aa5c456f61e8c5774e021083c982cbb514019f915a1701">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.gateway.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4a43a8290d45ac68592000e7ef51afe78b4213090155bd42aafb46e66130f7ae">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.web.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-cef5be462ddb059fdfdeb9fd7c5cd70e656c4cd8b6ae1fe3fe312557b3da80ac">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4237fcee4f33a230abf28e12e8d4823499d163759cd1ff124fec1c62faa8b8b4">+0/-71</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-c652c07f7127be5b2932d92e6ef4c7448c544d1f3095cb96a03294fa58fd3c4c">+0/-75</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sysmon-osx.checker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-044334b566d907c77656b7f951092709da2a111dc968da9a76315b1c71200cf4">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>tools-profile.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-f47597e2f5d4d085d8bf109109608f8ec0b7db8e90545e869b9ae409b607a4ac">+1/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>trapd.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-1ab1a0e03e63bc02e0ef31992a7187a377927272ed2060150b40d44cc0ea3357">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>update-config.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-9ae50be83a13010a038389c74407ba1bde8cabcea0944e238c4b3374133f78bf">+1/-190</a>&nbsp; </td> </tr> <tr> <td><strong>zen.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-e060a3164cdc2746e0d9ad000fcf43c4bcdb05f4a41c586d7220e2ff2a7df01d">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-0e4db31c224a8f72ae8e870a849e38a59d74a2c7f7b04347b0b3eb07e20c5a80">+80/-84</a>&nbsp; </td> </tr> <tr> <td><strong>push_targets.bzl</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-4af33fe62caba04b6d479589c16cfb85babc39bae5c92595d4d4e31660738513">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>AGENTS.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d00743dc4e878f258330deadcd4008e361276d760b58b672150c54bb1ac9758a">+8/-8</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>CNCF_DAY0.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-49a3c958df12738f89e0f701e36d8aee08dc3e8bf275a0421159cff0c2e5a9ce">+31/-31</a>&nbsp; </td> </tr> <tr> <td><strong>CNCF_security_self_assessment.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-d64ffa4a0d2f25cc11d0731c714caa8b406a11b20bf63396fdb3144471e58105">+7/-7</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>Additional files not shown</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2236/files#diff-2f328e4cd8dbe3ad193e49d92bcf045f47a6b72b1e9487d366f6b8288589b4ca"></a></td> </tr> </table></details></td></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2026-01-10 00:03:58 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2236#issuecomment-3731073168
Original created: 2026-01-10T00:03:58Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Sensitive log exposure

Description: Potential sensitive information exposure via logging full request fields (e.g.,
req.details, plus IDs) at info! level; similar request-logging of potentially
user-controlled/sensitive fields appears in other updated services (e.g., cmd/trapd/src/*
and cmd/consumers/zen/src/grpc_server.rs), which could leak secrets or PII into logs if
details contains credentials, tokens, or device/user data.
server.rs [214-219]

Referred Code
) -> Result<Response<monitoring::StatusResponse>, Status> {
    let req = request.into_inner();
    info!(
        "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}",
        req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details
    );
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Possible runtime crash: Converting role strings via String.to_existing_atom/1 can raise for unexpected role
values, creating an unhandled crash path instead of graceful authorization failure.

Referred Code
defp get_role(%{role: role}) when is_atom(role), do: role
defp get_role(%{role: role}) when is_binary(role), do: String.to_existing_atom(role)
defp get_role(_), do: nil

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Sensitive data logged: The new info! log lines include req.details (and identifiers), which may contain sensitive
payload data and is emitted at info level.

Referred Code
info!(
    "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}",
    req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details
);

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unsafe config parsing: New config parsing uses v.as_str().unwrap() which can panic on malformed/unexpected
external configuration values instead of validating and returning a controlled error.

Referred Code
let creds_file = cfg.lookup("output.nats_creds_file").and_then(|v| {
    let value = v.as_str().unwrap().trim();
    if value.is_empty() {
        None
    } else {
        Some(PathBuf::from(value))
    }
});

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit context: The newly added NATS account service initialization logs startup states but the diff does
not show audit logging for privileged gRPC actions (e.g., operator bootstrap/signing) with
actor identity and outcome.

Referred Code
// Initialize NATS account service
// This service is stateless - it only holds the operator key for signing operations.
// Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption.
// The service can start without an operator and bootstrap later via gRPC.
var natsAccountServer *datasvc.NATSAccountServer
if cfg.NATSOperator != nil {
	operator, opErr := accounts.NewOperator(cfg.NATSOperator)
	if opErr != nil {
		// Operator not available yet - that's okay, bootstrap will be called later
		log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr)
		natsAccountServer = datasvc.NewNATSAccountServer(nil)
	} else {
		natsAccountServer = datasvc.NewNATSAccountServer(operator)
		log.Printf("NATS account service initialized with operator %s", operator.Name())
	}

	natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities)
	if len(cfg.NATSOperator.AllowedClientIdentities) == 0 {
		log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected")
	} else {
		log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities)


 ... (clipped 40 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Error exposure risk: The new config-loading errors include file paths and detailed parsing causes, and the diff
does not indicate whether these errors are strictly internal logs versus user-facing
outputs.

Referred Code
// loadConfig loads agent configuration from file, falling back to embedded defaults.
func loadConfig(configPath string) (*agent.ServerConfig, error) {
	var cfg agent.ServerConfig

	// Try to read config file
	data, err := os.ReadFile(configPath)
	if err != nil {
		if os.IsNotExist(err) {
			if os.Getenv("SR_ALLOW_EMBEDDED_DEFAULT_CONFIG") != "true" {
				return nil, fmt.Errorf(
					"%w at %s (set SR_ALLOW_EMBEDDED_DEFAULT_CONFIG=true to use embedded defaults)",
					errConfigFileMissing,
					configPath,
				)
			}
			// Fall back to embedded default config (explicitly allowed)
			data = defaultConfig
		} else {
			return nil, fmt.Errorf("failed to read config file: %w", err)
		}
	}


 ... (clipped 11 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2236#issuecomment-3731073168 Original created: 2026-01-10T00:03:58Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/e84e5537a1073b2f70489b17d9034499a1253a88 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=1>⚪</td> <td><details><summary><strong>Sensitive log exposure </strong></summary><br> <b>Description:</b> Potential sensitive information exposure via logging full request fields (e.g., <br><code>req.details</code>, plus IDs) at <code>info!</code> level; similar request-logging of potentially <br>user-controlled/sensitive fields appears in other updated services (e.g., <code>cmd/trapd/src/*</code> <br>and <code>cmd/consumers/zen/src/grpc_server.rs</code>), which could leak secrets or PII into logs if <br><code>details</code> contains credentials, tokens, or device/user data.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2236/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33eR214-R219'>server.rs [214-219]</a></strong><br> <details open><summary>Referred Code</summary> ```rust ) -> Result<Response<monitoring::StatusResponse>, Status> { let req = request.into_inner(); info!( "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}", req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details ); ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=1>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=3>🔴</td> <td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2236/files#diff-fe9ffd2cc784341900636685ba3a460edad760b75f444dd62fb9d48e34133b8aR48-R50'><strong>Possible runtime crash</strong></a>: Converting role strings via <code>String.to_existing_atom/1</code> can raise for unexpected role <br>values, creating an unhandled crash path instead of graceful authorization failure.<br> <details open><summary>Referred Code</summary> ```elixir defp get_role(%{role: role}) when is_atom(role), do: role defp get_role(%{role: role}) when is_binary(role), do: String.to_existing_atom(role) defp get_role(_), do: nil ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2236/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33eR216-R219'><strong>Sensitive data logged</strong></a>: The new <code>info!</code> log lines include <code>req.details</code> (and identifiers), which may contain sensitive <br>payload data and is emitted at info level.<br> <details open><summary>Referred Code</summary> ```rust info!( "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}", req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details ); ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2236/files#diff-a82e2e4d413539bf0b414b5629665b19648447523994cba639c4d1238aa5a0c1R86-R93'><strong>Unsafe config parsing</strong></a>: New config parsing uses <code>v.as_str().unwrap()</code> which can panic on malformed/unexpected <br>external configuration values instead of validating and returning a controlled error.<br> <details open><summary>Referred Code</summary> ```rust let creds_file = cfg.lookup("output.nats_creds_file").and_then(|v| { let value = v.as_str().unwrap().trim(); if value.is_empty() { None } else { Some(PathBuf::from(value)) } }); ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=2>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2236/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R82-R142'><strong>Missing audit context</strong></a>: The newly added NATS account service initialization logs startup states but the diff does <br>not show audit logging for privileged gRPC actions (e.g., operator bootstrap/signing) with <br>actor identity and outcome.<br> <details open><summary>Referred Code</summary> ```go // Initialize NATS account service // This service is stateless - it only holds the operator key for signing operations. // Account state (seeds, JWTs) is stored by Elixir in CNPG with AshCloak encryption. // The service can start without an operator and bootstrap later via gRPC. var natsAccountServer *datasvc.NATSAccountServer if cfg.NATSOperator != nil { operator, opErr := accounts.NewOperator(cfg.NATSOperator) if opErr != nil { // Operator not available yet - that's okay, bootstrap will be called later log.Printf("NATS account service starting without operator (will bootstrap later): %v", opErr) natsAccountServer = datasvc.NewNATSAccountServer(nil) } else { natsAccountServer = datasvc.NewNATSAccountServer(operator) log.Printf("NATS account service initialized with operator %s", operator.Name()) } natsAccountServer.SetAllowedClientIdentities(cfg.NATSOperator.AllowedClientIdentities) if len(cfg.NATSOperator.AllowedClientIdentities) == 0 { log.Printf("Warning: no allowed client identities configured for NATS account service; requests will be rejected") } else { log.Printf("NATS account service allowed identities: %v", cfg.NATSOperator.AllowedClientIdentities) ... (clipped 40 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2236/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3R106-R137'><strong>Error exposure risk</strong></a>: The new config-loading errors include file paths and detailed parsing causes, and the diff <br>does not indicate whether these errors are strictly internal logs versus user-facing <br>outputs.<br> <details open><summary>Referred Code</summary> ```go // loadConfig loads agent configuration from file, falling back to embedded defaults. func loadConfig(configPath string) (*agent.ServerConfig, error) { var cfg agent.ServerConfig // Try to read config file data, err := os.ReadFile(configPath) if err != nil { if os.IsNotExist(err) { if os.Getenv("SR_ALLOW_EMBEDDED_DEFAULT_CONFIG") != "true" { return nil, fmt.Errorf( "%w at %s (set SR_ALLOW_EMBEDDED_DEFAULT_CONFIG=true to use embedded defaults)", errConfigFileMissing, configPath, ) } // Fall back to embedded default config (explicitly allowed) data = defaultConfig } else { return nil, fmt.Errorf("failed to read config file: %w", err) } } ... (clipped 11 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> <!-- placeholder --> <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2026-01-10 00:05:45 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2236#issuecomment-3731083921
Original created: 2026-01-10T00:05:45Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Fix tenant-hopping security vulnerability

Prioritize the tenant ID from resource attributes (attrs) over caller-provided
options (opts) in resolve_tenant/2 to prevent a tenant-hopping security
vulnerability.

elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [448-461]

 defp resolve_tenant(opts, attrs \\ %{}) do
   tenant_value =
-    Keyword.get(opts, :tenant) ||
-      Keyword.get(opts, :tenant_id) ||
-      Map.get(attrs, :tenant_id) ||
-      Map.get(attrs, "tenant_id")
+    Map.get(attrs, :tenant_id) ||
+      Map.get(attrs, "tenant_id") ||
+      Keyword.get(opts, :tenant) ||
+      Keyword.get(opts, :tenant_id)
 
   if is_nil(tenant_value) do
     raise ArgumentError,
           "Tenant could not be resolved - missing tenant identifier in options or attributes"
   end
 
   TenantSchemas.schema_for_tenant(tenant_value)
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 10

__

Why: The suggestion correctly identifies a critical tenant-hopping security vulnerability and provides the correct fix, which is to prioritize the tenant ID from the resource attributes over caller-provided options.

High
Prevent tenant isolation security vulnerability

Use the full SHA256 hash in base_name/1 instead of a truncated version to
prevent potential hash collisions that could break tenant isolation.

elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex [199-207]

 defp base_name(tenant_id) do
-  # Use first 12 chars of SHA256 hash for shorter but unique names
+  # Use the full SHA256 hash for the name to prevent collisions
   hash =
     :crypto.hash(:sha256, tenant_id)
     |> Base.encode16(case: :lower)
-    |> String.slice(0, 12)
 
   "#{@registry_prefix}#{hash}"
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a potential hash collision vulnerability from truncating the SHA256 hash, which could break tenant isolation, and proposes using the full hash to mitigate this security risk.

High
Strengthen multi-tenant authorization security policies

Strengthen multi-tenant authorization policies for create and update actions by
using forbid_if to explicitly deny access if the tenant_id does not match the
actor's tenant or is being improperly changed.

elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [408-427]

 policies do
   # Super admins can see all agents across tenants
   bypass always() do
     authorize_if actor_attribute_equals(:role, :super_admin)
   end
 
   # Tenant isolation: users can only see agents in their tenant
   policy action_type(:read) do
     authorize_if expr(tenant_id == ^actor(:tenant_id))
   end
 
   # Allow create/update for agents in user's tenant
   policy action_type(:create) do
-    authorize_if expr(tenant_id == ^actor(:tenant_id))
+    # Forbid if the tenant_id in the changeset does not match the actor's tenant_id
+    forbid_if expr(not is_nil(changeset_attribute(:tenant_id)) and changeset_attribute(:tenant_id) != ^actor(:tenant_id))
+    authorize_if expr(changeset_attribute(:tenant_id) == ^actor(:tenant_id))
   end
 
   policy action_type(:update) do
+    # Forbid if an attempt is made to change the tenant_id
+    forbid_if expr(changing?(tenant_id))
+    # Authorize if the record's tenant_id matches the actor's tenant_id
     authorize_if expr(tenant_id == ^actor(:tenant_id))
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a security weakness in the multi-tenant authorization policies and proposes a more robust implementation using forbid_if and changeset_attribute to prevent potential bypasses.

High
Possible issue
Fix duplicate shutdown logic bug

Refactor the pushLoop error handling to remove duplicate shutdown logic and
prevent server.Stop() from being called multiple times.

cmd/agent/main.go [231-245]

 	case err := <-errChan:
+		// If the push loop returns a non-cancellation error, it's fatal.
+		// The server needs to be stopped before exiting.
 		if err != nil && !errors.Is(err, context.Canceled) {
 			stopCtx, stopCancel := context.WithTimeout(context.Background(), 10*time.Second)
 			defer stopCancel()
 			if stopErr := server.Stop(stopCtx); stopErr != nil {
-				log.Error().Err(stopErr).Msg("Error stopping agent services")
+				log.Error().Err(stopErr).Msg("Error stopping agent services on push loop failure")
 			}
 			return fmt.Errorf("push loop error: %w", err)
 		}
+		// If err is nil or context.Canceled, the loop exited cleanly.
+		// The signal handler path will manage the final server stop.
 
-		stopCtx, stopCancel := context.WithTimeout(context.Background(), 10*time.Second)
-		defer stopCancel()
-		if stopErr := server.Stop(stopCtx); stopErr != nil {
-			log.Error().Err(stopErr).Msg("Error stopping agent services")
-		}
-
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a bug in the shutdown logic where server.Stop() could be called twice, leading to unpredictable behavior. This is a significant correctness fix for error and signal handling.

Medium
Handle potential ArgumentError in fallback

Add a catch for ArgumentError in the get_channel/0 function to prevent crashes
when ServiceRadar.DataService.Client is not available.

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [458-489]

 defp get_channel do
   # Try to get channel from DataService.Client first
   result =
     try do
       case GenServer.call(ServiceRadar.DataService.Client, :get_channel, 5_000) do
         {:ok, channel} ->
           # Verify the connection is still alive
           conn_pid = channel.adapter_payload.conn_pid
 
           if Process.alive?(conn_pid) do
             {:ok, channel}
           else
             Logger.warning("DataService.Client connection is dead, creating fresh connection")
             create_fresh_channel()
           end
 
         {:error, reason} ->
           Logger.warning("DataService.Client not connected: #{inspect(reason)}, creating fresh connection")
           create_fresh_channel()
       end
     catch
       :exit, {:noproc, _} ->
         Logger.warning("DataService.Client not started, creating fresh connection")
         create_fresh_channel()
 
       :exit, {:timeout, _} ->
         Logger.warning("DataService.Client timeout, creating fresh connection")
         create_fresh_channel()
+
+      e in [ArgumentError] ->
+        Logger.warning("DataService.Client not available (#{inspect(e)}), creating fresh connection")
+        create_fresh_channel()
     end
 
   result
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies an unhandled ArgumentError that could crash the calling process, improving the robustness of the error handling and fallback logic.

Medium
Handle lookup errors explicitly

In build_lifecycle_events/2, handle potential errors from
lookup_existing_alias_records/2 by propagating an error tuple instead of
silently continuing.

elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [247-278]

 def build_lifecycle_events(updates, opts) when is_list(updates) do
   ...
-  # Get device IDs for lookup
   device_ids = Enum.map(alias_updates, & &1.device_id) |> Enum.sort()
 
-  # Lookup existing devices (for comparing previous alias state)
-  existing_records = lookup_existing_alias_records(device_ids, opts)
+  # Lookup existing devices and handle errors
+  case lookup_existing_alias_records(device_ids, opts) do
+    {:ok, existing_records} ->
+      events =
+        Enum.flat_map(alias_updates, fn update ->
+          current = AliasRecord.from_metadata(update.metadata)
+          previous = Map.get(existing_records, update.device_id)
 
-  # Build events for changes
-  events =
-    Enum.flat_map(alias_updates, fn update ->
-      current = AliasRecord.from_metadata(update.metadata)
-      previous = Map.get(existing_records, update.device_id)
+          if alias_change_detected?(previous, current) do
+            [build_alias_event(update, current, previous)]
+          else
+            []
+          end
+        end)
 
-      if alias_change_detected?(previous, current) do
-        [build_alias_event(update, current, previous)]
-      else
-        []
-      end
-    end)
+      {:ok, events}
 
-  {:ok, events}
+    {:error, reason} ->
+      {:error, {:lookup_failed, reason}}
+  end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that a database lookup failure is silently ignored, which could lead to incorrect data and spurious events. Explicitly handling the error improves the function's robustness.

Medium
Fix incomplete alias change detection

Modify alias_change_detected?/2 to also detect when service or IP aliases are
removed, not just when they are added.

elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [307-315]

 def alias_change_detected?(previous, current) do
   # Check if any core field changed
-  # Check if new keys were introduced
+  # Check if new keys were introduced or old keys removed
   trim(previous.current_service_id) != trim(current.current_service_id) or
     trim(previous.current_ip) != trim(current.current_ip) or
     trim(previous.collector_ip) != trim(current.collector_ip) or
     new_keys_introduced?(previous.services, current.services) or
-    new_keys_introduced?(previous.ips, current.ips)
+    new_keys_introduced?(current.services, previous.services) or
+    new_keys_introduced?(previous.ips, current.ips) or
+    new_keys_introduced?(current.ips, previous.ips)
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a bug where the removal of an alias is not detected, which would lead to an incomplete audit trail.

Medium
Use default limits struct

In build_limits/1, return a default empty %Proto.AccountLimits{} struct instead
of nil to prevent potential gRPC encoding errors.

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [524]

-defp build_limits(nil), do: nil
+defp build_limits(nil), do: %Proto.AccountLimits{}

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that returning nil for a protobuf message field can cause encoding errors, and providing a default empty struct is a robust fix.

Medium
General
Check creds file exists

Before setting the resolver client, verify that the systemCredsFile exists and
is a valid file, logging a warning if it is not.

cmd/data-services/main.go [132-141]

 systemCredsFile := cfg.NATSOperator.SystemAccountCredsFile
 if envPath := os.Getenv("NATS_SYSTEM_ACCOUNT_CREDS_FILE"); envPath != "" {
     systemCredsFile = envPath
 }
 if systemCredsFile == "" {
     log.Printf("Warning: no system account creds configured; PushAccountJWT will fail")
 } else {
-    natsAccountServer.SetResolverClient(cfg.NATSURL, cfg.NATSSecurity, systemCredsFile)
-    log.Printf("NATS resolver client configured with system creds at %s", systemCredsFile)
+    if info, statErr := os.Stat(systemCredsFile); statErr != nil || info.IsDir() {
+        log.Printf("Warning: system account creds file not found or invalid: %v; PushAccountJWT will fail", statErr)
+    } else {
+        natsAccountServer.SetResolverClient(cfg.NATSURL, cfg.NATSSecurity, systemCredsFile)
+        log.Printf("NATS resolver client configured with system creds at %s", systemCredsFile)
+    }
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: This suggestion improves robustness by adding a check to ensure the credentials file exists and is not a directory before attempting to use it, providing an early and more specific warning for misconfigurations.

Medium
Guard writing operator config path

Add a check to ensure operatorConfigPath is not empty before calling
natsAccountServer.WriteOperatorConfig() to prevent writing to an invalid path.

cmd/data-services/main.go [117-130]

 if operatorConfigPath != "" || resolverPath != "" {
     natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath)
     log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath)
-    // If operator is already initialized, write the config now
-    // This ensures config files exist even when datasvc restarts with an existing operator
-    if operator != nil {
+    // Only write operator config if we have a valid path and operator
+    if operator != nil && operatorConfigPath != "" {
         if err := natsAccountServer.WriteOperatorConfig(); err != nil {
             log.Printf("Warning: failed to write initial operator config: %v", err)
         } else {
             log.Printf("Wrote initial operator config to %s", operatorConfigPath)
         }
     }
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential issue where WriteOperatorConfig could be called with an empty path. Adding a guard prevents this, making the startup logic more robust against partial configurations.

Medium
Replace update! with safe update

Replace Ash.update! with Ash.update in create_with_tenant_cert/2 and handle the
potential error tuple to prevent the process from crashing on failure.

elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [552-557]

-updated = result.package
-  |> Ash.Changeset.for_update(:update_tokens, %{
-    bundle_ciphertext: bundle_ciphertext,
-    downstream_spiffe_id: cert_data.spiffe_id
-  }, authorize?: false, tenant: tenant_schema)
-  |> Ash.update!()
+case result.package
+     |> Ash.Changeset.for_update(:update_tokens, %{
+       bundle_ciphertext: bundle_ciphertext,
+       downstream_spiffe_id: cert_data.spiffe_id
+     }, authorize?: false, tenant: tenant_schema)
+     |> Ash.update() do
+  {:ok, updated} ->
+    updated
 
+  {:error, error} ->
+    return {:error, error}
+end
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly advises replacing the bang version Ash.update! with the safer Ash.update and handling the error case, which improves the function's robustness and prevents potential crashes.

Medium
Log fallback to default config

Add a log message to indicate when the agent falls back to using the embedded
default configuration because the specified config file was not found.

cmd/agent/main.go [112-126]

 if err != nil {
     if os.IsNotExist(err) {
         if os.Getenv("SR_ALLOW_EMBEDDED_DEFAULT_CONFIG") != "true" {
             return nil, fmt.Errorf(
                 "%w at %s (set SR_ALLOW_EMBEDDED_DEFAULT_CONFIG=true to use embedded defaults)",
                 errConfigFileMissing,
                 configPath,
             )
         }
         // Fall back to embedded default config (explicitly allowed)
+        log.Printf("info: config file %s not found, using embedded defaults", configPath)
         data = defaultConfig
     } else {
         return nil, fmt.Errorf("failed to read config file: %w", err)
     }
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: This is a valuable observability improvement. Logging the fallback to the default configuration makes the agent's behavior more transparent and aids in debugging configuration issues, which could otherwise be silent.

Low
Log when truncating status messages

Add a Logger.warning in normalize_message/2 to log when a status message is
being truncated, improving observability of silent data loss.

elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [591-609]

 defp normalize_message(msg, source) do
   max_bytes =
     case source do
       "results" -> @max_results_message_bytes
       _ -> @max_status_message_bytes
     end
 
   if byte_size(msg) > max_bytes do
     if source == "results" do
       raise GRPC.RPCError,
         status: :resource_exhausted,
         message: "results payload exceeds max size"
     else
+      Logger.warning("Truncating status message to #{@max_status_message_bytes} bytes for source=#{source}")
       binary_part(msg, 0, @max_status_message_bytes)
     end
   else
     msg
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion improves observability by adding a log message when a status message is truncated, which is valuable for debugging and monitoring data integrity without being a critical issue.

Low
Use consistent gateway_id for metrics

In record_push_metrics/2, use the local gateway_id() helper instead of
Config.gateway_id() for consistent metric tagging.

elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [612-622]

 defp record_push_metrics(agent_id, service_count) do
   :telemetry.execute(
     [:serviceradar, :agent_gateway, :push, :complete],
     %{service_count: service_count},
     %{
       agent_id: agent_id,
-      gateway_id: Config.gateway_id(),
+      gateway_id: gateway_id(),
       domain: Config.domain()
     }
   )
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out an inconsistency in how gateway_id is retrieved, which could lead to mismatched metric tags. Using the local gateway_id() helper ensures data consistency.

Low
  • More
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2236#issuecomment-3731083921 Original created: 2026-01-10T00:05:45Z --- ## PR Code Suggestions ✨ <!-- e84e553 --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=3>Security</td> <td> <details><summary>Fix tenant-hopping security vulnerability</summary> ___ **Prioritize the tenant ID from resource attributes (<code>attrs</code>) over caller-provided <br>options (<code>opts</code>) in <code>resolve_tenant/2</code> to prevent a tenant-hopping security <br>vulnerability.** [elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [448-461]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-e4fe8e19bc324416302bb4c962f57133b3f62eb82053766844d881c522a473e5R448-R461) ```diff defp resolve_tenant(opts, attrs \\ %{}) do tenant_value = - Keyword.get(opts, :tenant) || - Keyword.get(opts, :tenant_id) || - Map.get(attrs, :tenant_id) || - Map.get(attrs, "tenant_id") + Map.get(attrs, :tenant_id) || + Map.get(attrs, "tenant_id") || + Keyword.get(opts, :tenant) || + Keyword.get(opts, :tenant_id) if is_nil(tenant_value) do raise ArgumentError, "Tenant could not be resolved - missing tenant identifier in options or attributes" end TenantSchemas.schema_for_tenant(tenant_value) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 10</summary> __ Why: The suggestion correctly identifies a critical tenant-hopping security vulnerability and provides the correct fix, which is to prioritize the tenant ID from the resource attributes over caller-provided options. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Prevent tenant isolation security vulnerability</summary> ___ **Use the full SHA256 hash in <code>base_name/1</code> instead of a truncated version to <br>prevent potential hash collisions that could break tenant isolation.** [elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex [199-207]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-91248b3b128a2e3d9bea6ffdb5e0f295e4a1745e82f87687c640ad01416fb85dR199-R207) ```diff defp base_name(tenant_id) do - # Use first 12 chars of SHA256 hash for shorter but unique names + # Use the full SHA256 hash for the name to prevent collisions hash = :crypto.hash(:sha256, tenant_id) |> Base.encode16(case: :lower) - |> String.slice(0, 12) "#{@registry_prefix}#{hash}" end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a potential hash collision vulnerability from truncating the SHA256 hash, which could break tenant isolation, and proposes using the full hash to mitigate this security risk. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Strengthen multi-tenant authorization security policies</summary> ___ **Strengthen multi-tenant authorization policies for <code>create</code> and <code>update</code> actions by <br>using <code>forbid_if</code> to explicitly deny access if the <code>tenant_id</code> does not match the <br>actor's tenant or is being improperly changed.** [elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [408-427]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843aR408-R427) ```diff policies do # Super admins can see all agents across tenants bypass always() do authorize_if actor_attribute_equals(:role, :super_admin) end # Tenant isolation: users can only see agents in their tenant policy action_type(:read) do authorize_if expr(tenant_id == ^actor(:tenant_id)) end # Allow create/update for agents in user's tenant policy action_type(:create) do - authorize_if expr(tenant_id == ^actor(:tenant_id)) + # Forbid if the tenant_id in the changeset does not match the actor's tenant_id + forbid_if expr(not is_nil(changeset_attribute(:tenant_id)) and changeset_attribute(:tenant_id) != ^actor(:tenant_id)) + authorize_if expr(changeset_attribute(:tenant_id) == ^actor(:tenant_id)) end policy action_type(:update) do + # Forbid if an attempt is made to change the tenant_id + forbid_if expr(changing?(tenant_id)) + # Authorize if the record's tenant_id matches the actor's tenant_id authorize_if expr(tenant_id == ^actor(:tenant_id)) end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a security weakness in the multi-tenant authorization policies and proposes a more robust implementation using `forbid_if` and `changeset_attribute` to prevent potential bypasses. </details></details></td><td align=center>High </td></tr><tr><td rowspan=5>Possible issue</td> <td> <details><summary>Fix duplicate shutdown logic bug</summary> ___ **Refactor the <code>pushLoop</code> error handling to remove duplicate shutdown logic and <br>prevent <code>server.Stop()</code> from being called multiple times.** [cmd/agent/main.go [231-245]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3R231-R245) ```diff case err := <-errChan: + // If the push loop returns a non-cancellation error, it's fatal. + // The server needs to be stopped before exiting. if err != nil && !errors.Is(err, context.Canceled) { stopCtx, stopCancel := context.WithTimeout(context.Background(), 10*time.Second) defer stopCancel() if stopErr := server.Stop(stopCtx); stopErr != nil { - log.Error().Err(stopErr).Msg("Error stopping agent services") + log.Error().Err(stopErr).Msg("Error stopping agent services on push loop failure") } return fmt.Errorf("push loop error: %w", err) } + // If err is nil or context.Canceled, the loop exited cleanly. + // The signal handler path will manage the final server stop. - stopCtx, stopCancel := context.WithTimeout(context.Background(), 10*time.Second) - defer stopCancel() - if stopErr := server.Stop(stopCtx); stopErr != nil { - log.Error().Err(stopErr).Msg("Error stopping agent services") - } - ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=3 --> <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies a bug in the shutdown logic where `server.Stop()` could be called twice, leading to unpredictable behavior. This is a significant correctness fix for error and signal handling. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Handle potential ArgumentError in fallback</summary> ___ **Add a <code>catch</code> for <code>ArgumentError</code> in the <code>get_channel/0</code> function to prevent crashes <br>when <code>ServiceRadar.DataService.Client</code> is not available.** [elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [458-489]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554R458-R489) ```diff defp get_channel do # Try to get channel from DataService.Client first result = try do case GenServer.call(ServiceRadar.DataService.Client, :get_channel, 5_000) do {:ok, channel} -> # Verify the connection is still alive conn_pid = channel.adapter_payload.conn_pid if Process.alive?(conn_pid) do {:ok, channel} else Logger.warning("DataService.Client connection is dead, creating fresh connection") create_fresh_channel() end {:error, reason} -> Logger.warning("DataService.Client not connected: #{inspect(reason)}, creating fresh connection") create_fresh_channel() end catch :exit, {:noproc, _} -> Logger.warning("DataService.Client not started, creating fresh connection") create_fresh_channel() :exit, {:timeout, _} -> Logger.warning("DataService.Client timeout, creating fresh connection") create_fresh_channel() + + e in [ArgumentError] -> + Logger.warning("DataService.Client not available (#{inspect(e)}), creating fresh connection") + create_fresh_channel() end result end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies an unhandled `ArgumentError` that could crash the calling process, improving the robustness of the error handling and fallback logic. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Handle lookup errors explicitly</summary> ___ **In <code>build_lifecycle_events/2</code>, handle potential errors from <br><code>lookup_existing_alias_records/2</code> by propagating an error tuple instead of <br>silently continuing.** [elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [247-278]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28R247-R278) ```diff def build_lifecycle_events(updates, opts) when is_list(updates) do ... - # Get device IDs for lookup device_ids = Enum.map(alias_updates, & &1.device_id) |> Enum.sort() - # Lookup existing devices (for comparing previous alias state) - existing_records = lookup_existing_alias_records(device_ids, opts) + # Lookup existing devices and handle errors + case lookup_existing_alias_records(device_ids, opts) do + {:ok, existing_records} -> + events = + Enum.flat_map(alias_updates, fn update -> + current = AliasRecord.from_metadata(update.metadata) + previous = Map.get(existing_records, update.device_id) - # Build events for changes - events = - Enum.flat_map(alias_updates, fn update -> - current = AliasRecord.from_metadata(update.metadata) - previous = Map.get(existing_records, update.device_id) + if alias_change_detected?(previous, current) do + [build_alias_event(update, current, previous)] + else + [] + end + end) - if alias_change_detected?(previous, current) do - [build_alias_event(update, current, previous)] - else - [] - end - end) + {:ok, events} - {:ok, events} + {:error, reason} -> + {:error, {:lookup_failed, reason}} + end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly points out that a database lookup failure is silently ignored, which could lead to incorrect data and spurious events. Explicitly handling the error improves the function's robustness. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix incomplete alias change detection</summary> ___ **Modify <code>alias_change_detected?/2</code> to also detect when service or IP aliases are <br>removed, not just when they are added.** [elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [307-315]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28R307-R315) ```diff def alias_change_detected?(previous, current) do # Check if any core field changed - # Check if new keys were introduced + # Check if new keys were introduced or old keys removed trim(previous.current_service_id) != trim(current.current_service_id) or trim(previous.current_ip) != trim(current.current_ip) or trim(previous.collector_ip) != trim(current.collector_ip) or new_keys_introduced?(previous.services, current.services) or - new_keys_introduced?(previous.ips, current.ips) + new_keys_introduced?(current.services, previous.services) or + new_keys_introduced?(previous.ips, current.ips) or + new_keys_introduced?(current.ips, previous.ips) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies a bug where the removal of an alias is not detected, which would lead to an incomplete audit trail. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Use default limits struct</summary> ___ **In <code>build_limits/1</code>, return a default empty <code>%Proto.AccountLimits{}</code> struct instead <br>of <code>nil</code> to prevent potential gRPC encoding errors.** [elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex [524]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554R524-R524) ```diff -defp build_limits(nil), do: nil +defp build_limits(nil), do: %Proto.AccountLimits{} ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies that returning `nil` for a protobuf message field can cause encoding errors, and providing a default empty struct is a robust fix. </details></details></td><td align=center>Medium </td></tr><tr><td rowspan=6>General</td> <td> <details><summary>Check creds file exists</summary> ___ **Before setting the resolver client, verify that the <code>systemCredsFile</code> exists and <br>is a valid file, logging a warning if it is not.** [cmd/data-services/main.go [132-141]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R132-R141) ```diff systemCredsFile := cfg.NATSOperator.SystemAccountCredsFile if envPath := os.Getenv("NATS_SYSTEM_ACCOUNT_CREDS_FILE"); envPath != "" { systemCredsFile = envPath } if systemCredsFile == "" { log.Printf("Warning: no system account creds configured; PushAccountJWT will fail") } else { - natsAccountServer.SetResolverClient(cfg.NATSURL, cfg.NATSSecurity, systemCredsFile) - log.Printf("NATS resolver client configured with system creds at %s", systemCredsFile) + if info, statErr := os.Stat(systemCredsFile); statErr != nil || info.IsDir() { + log.Printf("Warning: system account creds file not found or invalid: %v; PushAccountJWT will fail", statErr) + } else { + natsAccountServer.SetResolverClient(cfg.NATSURL, cfg.NATSSecurity, systemCredsFile) + log.Printf("NATS resolver client configured with system creds at %s", systemCredsFile) + } } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=8 --> <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: This suggestion improves robustness by adding a check to ensure the credentials file exists and is not a directory before attempting to use it, providing an early and more specific warning for misconfigurations. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Guard writing operator config path</summary> ___ **Add a check to ensure <code>operatorConfigPath</code> is not empty before calling <br><code>natsAccountServer.WriteOperatorConfig()</code> to prevent writing to an invalid path.** [cmd/data-services/main.go [117-130]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R117-R130) ```diff if operatorConfigPath != "" || resolverPath != "" { natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath) log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath) - // If operator is already initialized, write the config now - // This ensures config files exist even when datasvc restarts with an existing operator - if operator != nil { + // Only write operator config if we have a valid path and operator + if operator != nil && operatorConfigPath != "" { if err := natsAccountServer.WriteOperatorConfig(); err != nil { log.Printf("Warning: failed to write initial operator config: %v", err) } else { log.Printf("Wrote initial operator config to %s", operatorConfigPath) } } } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=9 --> <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies a potential issue where `WriteOperatorConfig` could be called with an empty path. Adding a guard prevents this, making the startup logic more robust against partial configurations. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Replace update! with safe update</summary> ___ **Replace <code>Ash.update!</code> with <code>Ash.update</code> in <code>create_with_tenant_cert/2</code> and handle the <br>potential error tuple to prevent the process from crashing on failure.** [elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex [552-557]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-e4fe8e19bc324416302bb4c962f57133b3f62eb82053766844d881c522a473e5R552-R557) ```diff -updated = result.package - |> Ash.Changeset.for_update(:update_tokens, %{ - bundle_ciphertext: bundle_ciphertext, - downstream_spiffe_id: cert_data.spiffe_id - }, authorize?: false, tenant: tenant_schema) - |> Ash.update!() +case result.package + |> Ash.Changeset.for_update(:update_tokens, %{ + bundle_ciphertext: bundle_ciphertext, + downstream_spiffe_id: cert_data.spiffe_id + }, authorize?: false, tenant: tenant_schema) + |> Ash.update() do + {:ok, updated} -> + updated + {:error, error} -> + return {:error, error} +end + ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly advises replacing the bang version `Ash.update!` with the safer `Ash.update` and handling the error case, which improves the function's robustness and prevents potential crashes. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Log fallback to default config</summary> ___ **Add a log message to indicate when the agent falls back to using the embedded <br>default configuration because the specified config file was not found.** [cmd/agent/main.go [112-126]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3R112-R126) ```diff if err != nil { if os.IsNotExist(err) { if os.Getenv("SR_ALLOW_EMBEDDED_DEFAULT_CONFIG") != "true" { return nil, fmt.Errorf( "%w at %s (set SR_ALLOW_EMBEDDED_DEFAULT_CONFIG=true to use embedded defaults)", errConfigFileMissing, configPath, ) } // Fall back to embedded default config (explicitly allowed) + log.Printf("info: config file %s not found, using embedded defaults", configPath) data = defaultConfig } else { return nil, fmt.Errorf("failed to read config file: %w", err) } } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=11 --> <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: This is a valuable observability improvement. Logging the fallback to the default configuration makes the agent's behavior more transparent and aids in debugging configuration issues, which could otherwise be silent. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Log when truncating status messages</summary> ___ **Add a <code>Logger.warning</code> in <code>normalize_message/2</code> to log when a status message is <br>being truncated, improving observability of silent data loss.** [elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [591-609]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR591-R609) ```diff defp normalize_message(msg, source) do max_bytes = case source do "results" -> @max_results_message_bytes _ -> @max_status_message_bytes end if byte_size(msg) > max_bytes do if source == "results" do raise GRPC.RPCError, status: :resource_exhausted, message: "results payload exceeds max size" else + Logger.warning("Truncating status message to #{@max_status_message_bytes} bytes for source=#{source}") binary_part(msg, 0, @max_status_message_bytes) end else msg end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion improves observability by adding a log message when a status message is truncated, which is valuable for debugging and monitoring data integrity without being a critical issue. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Use consistent gateway_id for metrics</summary> ___ **In <code>record_push_metrics/2</code>, use the local <code>gateway_id()</code> helper instead of <br><code>Config.gateway_id()</code> for consistent metric tagging.** [elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [612-622]](https://github.com/carverauto/serviceradar/pull/2236/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR612-R622) ```diff defp record_push_metrics(agent_id, service_count) do :telemetry.execute( [:serviceradar, :agent_gateway, :push, :complete], %{service_count: service_count}, %{ agent_id: agent_id, - gateway_id: Config.gateway_id(), + gateway_id: gateway_id(), domain: Config.domain() } ) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The suggestion correctly points out an inconsistency in how `gateway_id` is retrieved, which could lead to mismatched metric tags. Using the local `gateway_id()` helper ensures data consistency. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] More <!-- /improve --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2641
No description provided.