Feat/rule builder UI #2646

Merged
mfreeman451 merged 7 commits from refs/pull/2646/head into testing 2026-01-11 08:22:47 +00:00
mfreeman451 commented 2026-01-11 05:41:42 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2243
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2243
Original created: 2026-01-11T05:41:42Z
Original updated: 2026-01-11T08:22:48Z
Original head: carverauto/serviceradar:feat/rule-builder-ui
Original base: testing
Original merged: 2026-01-11T08:22:47Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Enhancement


Description

Major architectural refactoring transitioning from poller-based to gateway-based architecture with comprehensive platform modernization:

Core Architecture Changes:

  • Refactored agent from KV-based config management to push mode with file-based configuration and direct gateway connection

  • Implemented gRPC agent gateway server with mTLS multi-tenant security for receiving agent status pushes

  • Removed legacy poller, sync, and edge onboarding components; replaced with modern push-based architecture

NATS Infrastructure:

  • Added NATS account service initialization with operator key management and gRPC server registration

  • Implemented comprehensive NATS credentials file support across all consumers (zen, trapd, flowgger, otel)

  • Added wildcard subject pattern matching (* and >) for flexible event routing

  • Updated default NATS subjects and added configurable logs subject support

Multi-Tenant Platform Foundation:

  • Created comprehensive database schema migration with 40+ tables for user management, NATS infrastructure, device discovery, monitoring, and alerts

  • Implemented multi-tenant process registry and supervisor management via DynamicSupervisor with Horde registries

  • Added SPIFFE/SPIRE integration for distributed cluster security with X.509 certificate generation for tenant CAs

  • Implemented per-tenant CA and edge component certificate generation with SPIFFE URIs

Monitoring & Alerting:

  • Implemented stateful alert engine with bucketed time-window aggregation for log/event rules

  • Added alert generation service for monitoring events with severity levels and webhook notifications

  • Implemented poll orchestrator for distributed service check execution across cluster gateways

  • Added device alias lifecycle event tracking system for audit and alerting

Infrastructure Management:

  • Defined Agent resource with OCSF v1.4.0 state machine and capability definitions

  • Implemented edge onboarding packages with encrypted token generation and component certificate signing

  • Added NATS account management gRPC client for tenant account and credential lifecycle

  • Configured agent gateway runtime with cluster topology strategies and mTLS security

Terminology Updates:

  • Renamed all references from "poller" to "gateway" across Go services (SNMP, sysmon, rperf-client)

  • Updated CLI commands: renamed update-poller to update-gateway, added nats-bootstrap and admin subcommands

  • Updated API documentation and help text to reflect new gateway terminology

Message Processing Simplification:

  • Removed CloudEvents wrapping from zen consumer message processing

  • Simplified to direct JSON publishing of context data instead of wrapped events


Diagram Walkthrough

flowchart LR
  Agent["Agent<br/>Push Mode"] -->|mTLS| Gateway["Agent Gateway<br/>gRPC Server"]
  Gateway -->|Status Updates| Core["Core Platform<br/>Multi-Tenant"]
  Core -->|Config| Agent
  
  NATS["NATS Infrastructure<br/>Operator + Accounts"] -->|Credentials| Consumers["Consumers<br/>zen, trapd, flowgger, otel"]
  Consumers -->|Events| Core
  
  Core -->|Orchestrate| Checks["Service Checks<br/>Poll Orchestrator"]
  Checks -->|Execute| Gateways["Distributed Gateways<br/>SNMP, sysmon, rperf"]
  
  Core -->|Rules| AlertEngine["Stateful Alert Engine<br/>Bucketed Aggregation"]
  AlertEngine -->|Webhooks| Notifications["Alert Notifications"]
  
  TenantCA["Tenant CA<br/>Certificate Generation"] -->|Certs| Onboarding["Edge Onboarding<br/>Packages"]
  Onboarding -->|Enroll| Agent

File Walkthrough

Relevant files
Enhancement
27 files
main.go
Refactor agent to push mode with file-based config             

cmd/agent/main.go

  • Refactored agent startup from KV-based config management to direct
    file-based configuration loading
  • Removed edge onboarding, KV watch, and lifecycle server dependencies;
    replaced with push mode architecture
  • Added loadConfig() function for JSON config parsing with embedded
    defaults fallback
  • Implemented runPushMode() function handling gateway connection, push
    loop, and graceful shutdown with signal handling
  • Added version injection via ldflags and proper error handling for
    missing gateway address
+174/-74
main.go
Initialize NATS account service with operator support       

cmd/data-services/main.go

  • Added NATS account service initialization with operator key management
  • Configured resolver paths and system account credentials from
    environment variables with config file fallback
  • Registered NATSAccountService gRPC server when configured
  • Added logging for operator initialization, allowed client identities,
    and resolver configuration
+68/-0   
main.go
Add NATS bootstrap and admin command support                         

cmd/cli/main.go

  • Renamed update-poller subcommand to update-gateway
  • Added nats-bootstrap subcommand for NATS operator initialization
  • Added admin subcommand dispatcher with nats admin resource routing
  • Created dispatchAdminCommand() helper function for admin subcommand
    routing
+16/-2   
main.go
Rename SNMP poller to gateway terminology                               

cmd/checkers/snmp/main.go

  • Renamed SNMPPollerService to SNMPGatewayService
  • Renamed Poller struct to Gateway struct
+1/-1     
app.go
Rename poller service to agent gateway service                     

cmd/core/app/app.go

  • Renamed gRPC service registration from RegisterPollerServiceServer to
    RegisterAgentGatewayServiceServer
+1/-1     
config.rs
Add NATS credentials and wildcard subject matching             

cmd/consumers/zen/src/config.rs

  • Added nats_creds_file optional configuration field for NATS
    credentials
  • Added nats_creds_path() method resolving credentials file path with
    support for absolute/relative paths and cert directory
  • Updated subject pattern matching to support wildcard patterns (* and
    >) via new subject_matches() function
  • Updated test fixtures to use wildcard subject patterns and added
    credentials file validation
+85/-28 
config.rs
Add NATS credentials and logs subject configuration           

cmd/otel/src/config.rs

  • Added logs_subject optional field for dedicated logs subject
    configuration
  • Added creds_file optional field for NATS credentials file path
  • Updated default NATS subject from events.otel to otel
  • Added credentials file parsing with empty string handling
+21/-3   
nats_output.rs
Implement NATS credentials and configurable logs subject 

cmd/otel/src/nats_output.rs

  • Added logs_subject and creds_file fields to NATSConfig struct
  • Updated stream subject configuration to use configurable logs_subject
    with fallback
  • Added credentials file authentication to NATS connection setup
  • Updated default subject from events.otel to otel
+22/-5   
server.rs
Rename poller_id to gateway_id in sysmon service                 

cmd/checkers/sysmon/src/server.rs

  • Renamed poller_id field to gateway_id in GetStatus and GetResults
    response logging and response building
+6/-6     
message_processor.rs
Simplify message processing to direct JSON publishing       

cmd/consumers/zen/src/message_processor.rs

  • Removed CloudEvents event building and UUID/URL dependencies
  • Simplified message processing to directly publish context JSON instead
    of wrapped CloudEvents
  • Removed event_type derivation from rules
+2/-16   
main.rs
Add NATS credentials support and rename poller_id               

cmd/trapd/src/main.rs

  • Added NATS credentials file support via nats_creds_path() method
  • Updated NATS connection setup to use credentials file when available
    for both secure and non-secure modes
  • Renamed poller_id to gateway_id in GetStatus and GetResults responses
+23/-3   
nats_output.rs
Add NATS credentials file support to flowgger                       

cmd/flowgger/src/flowgger/output/nats_output.rs

  • Added creds_file optional field to NATSConfig struct
  • Added credentials file parsing with empty string handling
  • Updated NATS connection setup to apply credentials file when available
+14/-0   
config.rs
Add NATS credentials configuration to trapd                           

cmd/trapd/src/config.rs

  • Added nats_creds_file optional configuration field
  • Added nats_creds_path() method resolving credentials file path with
    security config support
  • Added validation for non-empty credentials file configuration
+22/-1   
grpc_server.rs
Rename poller_id to gateway_id in zen gRPC service             

cmd/consumers/zen/src/grpc_server.rs

  • Renamed poller_id field to gateway_id in GetStatus and GetResults
    response building
+2/-2     
nats.rs
Add NATS credentials file support to zen consumer               

cmd/consumers/zen/src/nats.rs

  • Added credentials file support to NATS connection setup via
    nats_creds_path()
+4/-0     
server.rs
Add gateway_id field to rperf service responses                   

cmd/checkers/rperf-client/src/server.rs

  • Added gateway_id field to GetStatus and GetResults response building
+2/-0     
account_client.ex
Implement NATS account management gRPC client                       

elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex

  • Implemented gRPC client for datasvc NATSAccountService with account
    and credential management
  • Provides functions for creating tenant accounts, generating user
    credentials, signing account JWTs, and bootstrapping operator
  • Includes channel management with fallback to fresh connection creation
    and comprehensive error handling
  • Supports account limits, subject mappings, stream exports/imports, and
    credential expiration
+621/-0 
stateful_alert_engine.ex
Stateful alert engine with bucketed rule evaluation           

elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex

  • Implements a GenServer-based stateful alert evaluation engine for
    log/event rules with bucketed time-window aggregation
  • Manages alert state snapshots in ETS, persists to database, and
    handles threshold-based firing/recovery logic
  • Provides rule matching against logs and events with support for
    filtering by subject, service name, severity, and body content
  • Integrates with alert generation, webhook notifications, and event
    recording for comprehensive alert lifecycle management
+960/-0 
agent_gateway_server.ex
gRPC agent gateway server with mTLS multi-tenant security

elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex

  • Implements gRPC server for receiving agent status pushes and streaming
    updates with multi-tenant mTLS security
  • Extracts tenant identity from client certificates and validates
    component identity to prevent spoofing
  • Handles agent enrollment, configuration delivery, and status
    processing with comprehensive validation and error handling
  • Manages agent registry updates, heartbeats, and forwards status data
    to core cluster for processing
+1020/-0
onboarding_packages.ex
Edge onboarding packages with certificate generation         

elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex

  • Provides Ash-based context for managing edge onboarding packages with
    token generation and delivery workflows
  • Supports package creation with encrypted join/download tokens and
    optional component certificates signed by tenant CA
  • Implements package lifecycle operations including delivery
    verification, revocation, and soft-delete with event recording
  • Generates component certificates with SPIFFE URIs and manages
    certificate bundles for secure agent onboarding
+622/-0 
agent.ex
Agent resource with OCSF state machine and capabilities   

elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex

  • Defines Agent resource as Ash-based OCSF v1.4.0 Agent object for
    managing Go agents running on monitored hosts
  • Implements state machine with transitions for agent lifecycle
    (connecting, connected, degraded, disconnected, unavailable)
  • Provides capability definitions and type mappings for agent monitoring
    capabilities (ICMP, TCP, HTTP, gRPC, DNS, Process, SNMP)
  • Includes JSON API routes for agent registration, connection
    management, and heartbeat operations with tenant isolation policies
+665/-0 
alert_generator.ex
Alert generation service for monitoring events                     

elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex

  • New module for generating alerts from monitoring events (service state
    changes, device availability, metric violations, etc.)
  • Implements alert creation with severity levels and webhook
    notifications
  • Handles stats anomaly detection with cooldown mechanism using
    persistent_term
  • Provides startup/shutdown notification functions for core service
    lifecycle
+609/-0 
tenant_registry.ex
Multi-tenant process registry and supervisor management   

elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex

  • New DynamicSupervisor managing per-tenant Horde registries and
    supervisors for multi-tenant process isolation
  • Implements slug-to-UUID mapping via ETS table for admin/debug lookups
  • Provides registry lifecycle management (creation, stopping) and child
    process management
  • Includes convenience functions for gateway and agent registration and
    discovery
+634/-0 
alias_events.ex
Device alias lifecycle event tracking system                         

elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex

  • New module for tracking device alias lifecycle events (service IDs,
    IPs, collectors)
  • Implements AliasRecord struct for parsing and comparing alias metadata
  • Detects alias changes and generates lifecycle events for
    audit/alerting
  • Provides functions to process and persist alias updates to
    DeviceAliasState resource
+654/-0 
generator.ex
X.509 certificate generation for tenant CAs                           

elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex

  • New module for X.509 certificate generation for per-tenant CAs and
    edge components
  • Generates tenant intermediate CAs (10-year validity) and edge
    component certificates (1-year validity)
  • Implements certificate signing, encoding, and SPIFFE ID extraction
    from certificate CNs
  • Provides SPKI SHA-256 hash computation for certificate pinning
+541/-0 
spiffe.ex
SPIFFE/SPIRE integration for distributed cluster                 

elixir/serviceradar_core/lib/serviceradar/spiffe.ex

  • New module for SPIFFE/SPIRE integration providing X.509 SVID loading
    and verification
  • Implements SSL/TLS options configuration for ERTS distribution and
    client/server connections
  • Parses and validates SPIFFE IDs with trust domain verification
  • Provides certificate expiry monitoring and file rotation watching
    capabilities
+564/-0 
poll_orchestrator.ex
Poll execution orchestrator for service checks                     

elixir/serviceradar_core/lib/serviceradar/monitoring/poll_orchestrator.ex

  • New module orchestrating poll execution for scheduled service checks
    across the cluster
  • Manages PollJob lifecycle (creation, state transitions, completion)
    via AshStateMachine
  • Discovers available gateways via Horde registry and dispatches checks
    using location-transparent PIDs
  • Supports multiple gateway assignment modes (any, partition, domain,
    specific) and async execution
+450/-0 
Documentation
3 files
main.go
Update API documentation terminology                                         

cmd/core/main.go

  • Updated API description from "service pollers" to "service gateways"
+1/-1     
main.go
Update config-sync role documentation                                       

cmd/tools/config-sync/main.go

  • Updated role flag description from "poller" to "gateway" in help text
+1/-1     
setup.rs
Add NATS credentials debug logging                                             

cmd/otel/src/setup.rs

  • Added debug logging for NATS credentials file configuration
+1/-0     
Configuration
1 files
20260107043446_initial_schema.exs
Add initial tenant database schema migration                         

elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs

  • Created comprehensive database schema migration with 40+ tables for
    multi-tenant platform
  • Includes tables for user management, NATS infrastructure, device
    discovery, monitoring, alerts, and onboarding
  • Defines relationships, indexes, and constraints for tenant isolation
    and data integrity
  • Implements encrypted fields for sensitive data (keys, credentials,
    certificates)
+1416/-0
Configuration changes
1 files
runtime.exs
Agent gateway runtime configuration with cluster strategies

elixir/serviceradar_agent_gateway/config/runtime.exs

  • Configures cluster topology strategies (Kubernetes DNS, DNSPoll, EPMD,
    Gossip) for agent gateway node discovery
  • Sets up SPIFFE/mTLS configuration for secure inter-node communication
    with trust domain and certificate paths
  • Disables core database and Oban job queue in agent gateway while
    enabling cluster coordination with core nodes
  • Configures PubSub and telemetry for distributed logging and event
    aggregation across the cluster
+209/-0 
Additional files
101 files
.bazelignore +4/-0     
.bazelrc +5/-0     
.env-sample +33/-0   
.env.example +38/-0   
main.yml +18/-0   
AGENTS.md +177/-11
INSTALL.md +11/-11 
MODULE.bazel +22/-2   
Makefile +55/-14 
README-Docker.md +17/-2   
README.md +3/-3     
ROADMAP.md +1/-1     
BUILD.bazel +11/-6   
BUILD.bazel +12/-0   
mix_release.bzl +141/-49
BUILD.bazel +1/-0     
README.md +4/-4     
config.json +5/-6     
build.rs +0/-1     
monitoring.proto +3/-26   
BUILD.bazel +1/-1     
README.md +2/-2     
monitoring.proto +2/-26   
Cargo.toml +0/-3     
README.md +8/-8     
zen-consumer-with-otel.json +14/-11 
zen-consumer.json +14/-11 
config.json +4/-4     
config.json +4/-4     
BUILD.bazel +1/-0     
README.md +3/-3     
README.md +9/-12   
flowgger.toml +2/-1     
otel.toml +3/-1     
otel.toml.example +5/-2     
BUILD.bazel +0/-25   
config.json +0/-111 
main.go +0/-138 
BUILD.bazel +0/-25   
config.json +0/-77   
main.go +0/-123 
README.md +3/-3     
docker-compose.elx.yml +117/-0 
docker-compose.spiffe.yml +8/-158 
docker-compose.yml +316/-269
Dockerfile.agent-gateway +94/-0   
Dockerfile.core-elx +108/-0 
Dockerfile.poller +0/-70   
Dockerfile.sync +0/-95   
Dockerfile.tools +1/-2     
Dockerfile.web-ng +6/-0     
agent-minimal.docker.json +6/-6     
agent.docker.json +5/-20   
agent.mtls.json +7/-10   
bootstrap-nested-spire.sh +0/-80   
.gitkeep +1/-0     
datasvc.docker.json +3/-2     
datasvc.mtls.json +14/-1   
db-event-writer.docker.json +15/-11 
db-event-writer.mtls.json +10/-8   
FRICTION_POINTS.md +0/-355 
README.md +0/-207 
SETUP_GUIDE.md +0/-307 
docker-compose.edge-e2e.yml +0/-27   
manage-packages.sh +0/-211 
setup-edge-e2e.sh +0/-198 
edge-poller-restart.sh +0/-178 
downstream-agent.conf +0/-32   
env +0/-4     
server.conf +0/-51   
upstream-agent.conf +0/-32   
entrypoint-certs.sh +13/-9   
entrypoint-poller.sh +0/-274 
entrypoint-sync.sh +0/-96   
fix-cert-permissions.sh +2/-2     
flowgger.docker.toml +3/-2     
generate-certs.sh +214/-12
nats.docker.conf +16/-160
netflow-consumer.mtls.json +1/-0     
otel.docker.toml +7/-2     
pg_hba.conf +9/-0     
pg_ident.conf +17/-0   
poller-stack.compose.yml +0/-121 
poller.docker.json +0/-128 
poller.mtls.json +0/-135 
poller.spiffe.json +0/-55   
refresh-upstream-credentials.sh +0/-248 
seed-poller-kv.sh +0/-83   
setup-edge-poller.sh +0/-204 
README.md +5/-5     
bootstrap-compose-spire.sh +0/-2     
ssl_dist.core.conf +17/-0   
ssl_dist.gateway.conf +17/-0   
ssl_dist.web.conf +17/-0   
sync.docker.json +0/-71   
sync.mtls.json +0/-75   
sysmon-osx.checker.json +1/-1     
tools-profile.sh +1/-2     
trapd.docker.json +3/-2     
update-config.sh +1/-190 
Additional files not shown

Imported from GitHub pull request. Original GitHub pull request: #2243 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2243 Original created: 2026-01-11T05:41:42Z Original updated: 2026-01-11T08:22:48Z Original head: carverauto/serviceradar:feat/rule-builder-ui Original base: testing Original merged: 2026-01-11T08:22:47Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Enhancement ___ ### **Description** Major architectural refactoring transitioning from poller-based to gateway-based architecture with comprehensive platform modernization: **Core Architecture Changes:** - Refactored agent from KV-based config management to push mode with file-based configuration and direct gateway connection - Implemented gRPC agent gateway server with mTLS multi-tenant security for receiving agent status pushes - Removed legacy poller, sync, and edge onboarding components; replaced with modern push-based architecture **NATS Infrastructure:** - Added NATS account service initialization with operator key management and gRPC server registration - Implemented comprehensive NATS credentials file support across all consumers (zen, trapd, flowgger, otel) - Added wildcard subject pattern matching (`*` and `>`) for flexible event routing - Updated default NATS subjects and added configurable logs subject support **Multi-Tenant Platform Foundation:** - Created comprehensive database schema migration with 40+ tables for user management, NATS infrastructure, device discovery, monitoring, and alerts - Implemented multi-tenant process registry and supervisor management via DynamicSupervisor with Horde registries - Added SPIFFE/SPIRE integration for distributed cluster security with X.509 certificate generation for tenant CAs - Implemented per-tenant CA and edge component certificate generation with SPIFFE URIs **Monitoring & Alerting:** - Implemented stateful alert engine with bucketed time-window aggregation for log/event rules - Added alert generation service for monitoring events with severity levels and webhook notifications - Implemented poll orchestrator for distributed service check execution across cluster gateways - Added device alias lifecycle event tracking system for audit and alerting **Infrastructure Management:** - Defined Agent resource with OCSF v1.4.0 state machine and capability definitions - Implemented edge onboarding packages with encrypted token generation and component certificate signing - Added NATS account management gRPC client for tenant account and credential lifecycle - Configured agent gateway runtime with cluster topology strategies and mTLS security **Terminology Updates:** - Renamed all references from "poller" to "gateway" across Go services (SNMP, sysmon, rperf-client) - Updated CLI commands: renamed `update-poller` to `update-gateway`, added `nats-bootstrap` and `admin` subcommands - Updated API documentation and help text to reflect new gateway terminology **Message Processing Simplification:** - Removed CloudEvents wrapping from zen consumer message processing - Simplified to direct JSON publishing of context data instead of wrapped events ___ ### Diagram Walkthrough ```mermaid flowchart LR Agent["Agent<br/>Push Mode"] -->|mTLS| Gateway["Agent Gateway<br/>gRPC Server"] Gateway -->|Status Updates| Core["Core Platform<br/>Multi-Tenant"] Core -->|Config| Agent NATS["NATS Infrastructure<br/>Operator + Accounts"] -->|Credentials| Consumers["Consumers<br/>zen, trapd, flowgger, otel"] Consumers -->|Events| Core Core -->|Orchestrate| Checks["Service Checks<br/>Poll Orchestrator"] Checks -->|Execute| Gateways["Distributed Gateways<br/>SNMP, sysmon, rperf"] Core -->|Rules| AlertEngine["Stateful Alert Engine<br/>Bucketed Aggregation"] AlertEngine -->|Webhooks| Notifications["Alert Notifications"] TenantCA["Tenant CA<br/>Certificate Generation"] -->|Certs| Onboarding["Edge Onboarding<br/>Packages"] Onboarding -->|Enroll| Agent ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>27 files</summary><table> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Refactor agent to push mode with file-based config</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/agent/main.go <ul><li>Refactored agent startup from KV-based config management to direct <br>file-based configuration loading<br> <li> Removed edge onboarding, KV watch, and lifecycle server dependencies; <br>replaced with push mode architecture<br> <li> Added <code>loadConfig()</code> function for JSON config parsing with embedded <br>defaults fallback<br> <li> Implemented <code>runPushMode()</code> function handling gateway connection, push <br>loop, and graceful shutdown with signal handling<br> <li> Added version injection via ldflags and proper error handling for <br>missing gateway address</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3">+174/-74</a></td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Initialize NATS account service with operator support</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/data-services/main.go <ul><li>Added NATS account service initialization with operator key management<br> <li> Configured resolver paths and system account credentials from <br>environment variables with config file fallback<br> <li> Registered <code>NATSAccountService</code> gRPC server when configured<br> <li> Added logging for operator initialization, allowed client identities, <br>and resolver configuration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816">+68/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Add NATS bootstrap and admin command support</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/cli/main.go <ul><li>Renamed <code>update-poller</code> subcommand to <code>update-gateway</code><br> <li> Added <code>nats-bootstrap</code> subcommand for NATS operator initialization<br> <li> Added <code>admin</code> subcommand dispatcher with <code>nats</code> admin resource routing<br> <li> Created <code>dispatchAdminCommand()</code> helper function for admin subcommand <br>routing</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ed4d81d29a7267f93fd77e17993fd3491b9ef6ded18490b4514d10ed1d803bc2">+16/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Rename SNMP poller to gateway terminology</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/snmp/main.go <ul><li>Renamed <code>SNMPPollerService</code> to <code>SNMPGatewayService</code><br> <li> Renamed <code>Poller</code> struct to <code>Gateway</code> struct</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f25402eade63525184cb5e7437accff93c7b9338eebe81add6dc5f2a9eb12550">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>app.go</strong><dd><code>Rename poller service to agent gateway service</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/core/app/app.go <ul><li>Renamed gRPC service registration from <code>RegisterPollerServiceServer</code> to <br><code>RegisterAgentGatewayServiceServer</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4ad8a289575edf3b163088617b7a40ae1305c29ced0c7d59b3751c57d6938072">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>Add NATS credentials and wildcard subject matching</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/config.rs <ul><li>Added <code>nats_creds_file</code> optional configuration field for NATS <br>credentials<br> <li> Added <code>nats_creds_path()</code> method resolving credentials file path with <br>support for absolute/relative paths and cert directory<br> <li> Updated subject pattern matching to support wildcard patterns (<code>*</code> and <br><code>></code>) via new <code>subject_matches()</code> function<br> <li> Updated test fixtures to use wildcard subject patterns and added <br>credentials file validation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-05038f3867985e757de9027609950e682bad6d1992dac6acd7c28962a3c65dc4">+85/-28</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>Add NATS credentials and logs subject configuration</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/config.rs <ul><li>Added <code>logs_subject</code> optional field for dedicated logs subject <br>configuration<br> <li> Added <code>creds_file</code> optional field for NATS credentials file path<br> <li> Updated default NATS subject from <code>events.otel</code> to <code>otel</code><br> <li> Added credentials file parsing with empty string handling</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-abbaec651da3d6af96b482e0f77bb909b65dbe0cabd78b5803769cc9dab0a1b0">+21/-3</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats_output.rs</strong><dd><code>Implement NATS credentials and configurable logs subject</code>&nbsp; </dd></summary> <hr> cmd/otel/src/nats_output.rs <ul><li>Added <code>logs_subject</code> and <code>creds_file</code> fields to <code>NATSConfig</code> struct<br> <li> Updated stream subject configuration to use configurable <code>logs_subject</code> <br>with fallback<br> <li> Added credentials file authentication to NATS connection setup<br> <li> Updated default subject from <code>events.otel</code> to <code>otel</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-6b585ea3564a481174e04da1270e2e13edd4e2b980d02a2652d6d21e6d82a498">+22/-5</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>server.rs</strong><dd><code>Rename poller_id to gateway_id in sysmon service</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/sysmon/src/server.rs <ul><li>Renamed <code>poller_id</code> field to <code>gateway_id</code> in GetStatus and GetResults <br>response logging and response building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33e">+6/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>message_processor.rs</strong><dd><code>Simplify message processing to direct JSON publishing</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/message_processor.rs <ul><li>Removed CloudEvents event building and UUID/URL dependencies<br> <li> Simplified message processing to directly publish context JSON instead <br>of wrapped CloudEvents<br> <li> Removed event_type derivation from rules</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9fcbc5358a9009e60a8cd22d21e5a9ea652787c727732d0b869e0865495114c3">+2/-16</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.rs</strong><dd><code>Add NATS credentials support and rename poller_id</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/trapd/src/main.rs <ul><li>Added NATS credentials file support via <code>nats_creds_path()</code> method<br> <li> Updated NATS connection setup to use credentials file when available <br>for both secure and non-secure modes<br> <li> Renamed <code>poller_id</code> to <code>gateway_id</code> in GetStatus and GetResults responses</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-33b655d8730ae3e9c844ee280787d11f1b0d5343119188273f89558805f814ba">+23/-3</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats_output.rs</strong><dd><code>Add NATS credentials file support to flowgger</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/flowgger/src/flowgger/output/nats_output.rs <ul><li>Added <code>creds_file</code> optional field to <code>NATSConfig</code> struct<br> <li> Added credentials file parsing with empty string handling<br> <li> Updated NATS connection setup to apply credentials file when available</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a82e2e4d413539bf0b414b5629665b19648447523994cba639c4d1238aa5a0c1">+14/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>config.rs</strong><dd><code>Add NATS credentials configuration to trapd</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/trapd/src/config.rs <ul><li>Added <code>nats_creds_file</code> optional configuration field<br> <li> Added <code>nats_creds_path()</code> method resolving credentials file path with <br>security config support<br> <li> Added validation for non-empty credentials file configuration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c89b88ba4d2bf0a054d0ba69a672a92c30140b8d19503d67b980a218ffe3106d">+22/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>grpc_server.rs</strong><dd><code>Rename poller_id to gateway_id in zen gRPC service</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/grpc_server.rs <ul><li>Renamed <code>poller_id</code> field to <code>gateway_id</code> in GetStatus and GetResults <br>response building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e4564a93f6cf84ff91cd3d8141fc9272ec9b4ec19defd107afa42be01fcfed5b">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>nats.rs</strong><dd><code>Add NATS credentials file support to zen consumer</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/consumers/zen/src/nats.rs <ul><li>Added credentials file support to NATS connection setup via <br><code>nats_creds_path()</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-97f7335def0ad5d644b594a1076ae2d7080b11259cbb8de22c7946cc8e4b39f8">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>server.rs</strong><dd><code>Add gateway_id field to rperf service responses</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/checkers/rperf-client/src/server.rs <ul><li>Added <code>gateway_id</code> field to GetStatus and GetResults response building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bce0f4ca6548712f224b73816825d28e831acbbff7dbed3c98671ed50f65d028">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>account_client.ex</strong><dd><code>Implement NATS account management gRPC client</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex <ul><li>Implemented gRPC client for datasvc <code>NATSAccountService</code> with account <br>and credential management<br> <li> Provides functions for creating tenant accounts, generating user <br>credentials, signing account JWTs, and bootstrapping operator<br> <li> Includes channel management with fallback to fresh connection creation <br>and comprehensive error handling<br> <li> Supports account limits, subject mappings, stream exports/imports, and <br>credential expiration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554">+621/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>stateful_alert_engine.ex</strong><dd><code>Stateful alert engine with bucketed rule evaluation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex <ul><li>Implements a GenServer-based stateful alert evaluation engine for <br>log/event rules with bucketed time-window aggregation<br> <li> Manages alert state snapshots in ETS, persists to database, and <br>handles threshold-based firing/recovery logic<br> <li> Provides rule matching against logs and events with support for <br>filtering by subject, service name, severity, and body content<br> <li> Integrates with alert generation, webhook notifications, and event <br>recording for comprehensive alert lifecycle management</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bae3a52db882de8c947e62f219a95dff8db4e155e37d9a361dbe14ec25fcd3bd">+960/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>agent_gateway_server.ex</strong><dd><code>gRPC agent gateway server with mTLS multi-tenant security</code></dd></summary> <hr> elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex <ul><li>Implements gRPC server for receiving agent status pushes and streaming <br>updates with multi-tenant mTLS security<br> <li> Extracts tenant identity from client certificates and validates <br>component identity to prevent spoofing<br> <li> Handles agent enrollment, configuration delivery, and status <br>processing with comprehensive validation and error handling<br> <li> Manages agent registry updates, heartbeats, and forwards status data <br>to core cluster for processing</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9f">+1020/-0</a></td> </tr> <tr> <td> <details> <summary><strong>onboarding_packages.ex</strong><dd><code>Edge onboarding packages with certificate generation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex <ul><li>Provides Ash-based context for managing edge onboarding packages with <br>token generation and delivery workflows<br> <li> Supports package creation with encrypted join/download tokens and <br>optional component certificates signed by tenant CA<br> <li> Implements package lifecycle operations including delivery <br>verification, revocation, and soft-delete with event recording<br> <li> Generates component certificates with SPIFFE URIs and manages <br>certificate bundles for secure agent onboarding</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e4fe8e19bc324416302bb4c962f57133b3f62eb82053766844d881c522a473e5">+622/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>agent.ex</strong><dd><code>Agent resource with OCSF state machine and capabilities</code>&nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex <ul><li>Defines Agent resource as Ash-based OCSF v1.4.0 Agent object for <br>managing Go agents running on monitored hosts<br> <li> Implements state machine with transitions for agent lifecycle <br>(connecting, connected, degraded, disconnected, unavailable)<br> <li> Provides capability definitions and type mappings for agent monitoring <br>capabilities (ICMP, TCP, HTTP, gRPC, DNS, Process, SNMP)<br> <li> Includes JSON API routes for agent registration, connection <br>management, and heartbeat operations with tenant isolation policies</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843a">+665/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>alert_generator.ex</strong><dd><code>Alert generation service for monitoring events</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex <ul><li>New module for generating alerts from monitoring events (service state <br>changes, device availability, metric violations, etc.)<br> <li> Implements alert creation with severity levels and webhook <br>notifications<br> <li> Handles stats anomaly detection with cooldown mechanism using <br>persistent_term<br> <li> Provides startup/shutdown notification functions for core service <br>lifecycle</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-62074160ac91002a439bab337a032329681bc55c84a59ab9934bc76d05a5de04">+609/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>tenant_registry.ex</strong><dd><code>Multi-tenant process registry and supervisor management</code>&nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex <ul><li>New DynamicSupervisor managing per-tenant Horde registries and <br>supervisors for multi-tenant process isolation<br> <li> Implements slug-to-UUID mapping via ETS table for admin/debug lookups<br> <li> Provides registry lifecycle management (creation, stopping) and child <br>process management<br> <li> Includes convenience functions for gateway and agent registration and <br>discovery</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-91248b3b128a2e3d9bea6ffdb5e0f295e4a1745e82f87687c640ad01416fb85d">+634/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>alias_events.ex</strong><dd><code>Device alias lifecycle event tracking system</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex <ul><li>New module for tracking device alias lifecycle events (service IDs, <br>IPs, collectors)<br> <li> Implements <code>AliasRecord</code> struct for parsing and comparing alias metadata<br> <li> Detects alias changes and generates lifecycle events for <br>audit/alerting<br> <li> Provides functions to process and persist alias updates to <br>DeviceAliasState resource</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28">+654/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>generator.ex</strong><dd><code>X.509 certificate generation for tenant CAs</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex <ul><li>New module for X.509 certificate generation for per-tenant CAs and <br>edge components<br> <li> Generates tenant intermediate CAs (10-year validity) and edge <br>component certificates (1-year validity)<br> <li> Implements certificate signing, encoding, and SPIFFE ID extraction <br>from certificate CNs<br> <li> Provides SPKI SHA-256 hash computation for certificate pinning</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b48e4a9e1189da61e2a60e16f56fce81298d76b7cdab745107140fed3f6e48b4">+541/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>spiffe.ex</strong><dd><code>SPIFFE/SPIRE integration for distributed cluster</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/spiffe.ex <ul><li>New module for SPIFFE/SPIRE integration providing X.509 SVID loading <br>and verification<br> <li> Implements SSL/TLS options configuration for ERTS distribution and <br>client/server connections<br> <li> Parses and validates SPIFFE IDs with trust domain verification<br> <li> Provides certificate expiry monitoring and file rotation watching <br>capabilities</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0cb8d921c19f671b66f91c0978e351e71d927c5f4694924984c9f1ed34d7ee78">+564/-0</a>&nbsp; </td> </tr> <tr> <td> <details> <summary><strong>poll_orchestrator.ex</strong><dd><code>Poll execution orchestrator for service checks</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/monitoring/poll_orchestrator.ex <ul><li>New module orchestrating poll execution for scheduled service checks <br>across the cluster<br> <li> Manages PollJob lifecycle (creation, state transitions, completion) <br>via AshStateMachine<br> <li> Discovers available gateways via Horde registry and dispatches checks <br>using location-transparent PIDs<br> <li> Supports multiple gateway assignment modes (any, partition, domain, <br>specific) and async execution</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-68a63639fc9d92d29501700c6604921098c9bbbf21e54f9148c1109c17c9c6d4">+450/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>3 files</summary><table> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Update API documentation terminology</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/core/main.go <ul><li>Updated API description from "service pollers" to "service gateways"</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4ab3fd1d4debc53dd2499d94a0f60c648fdae4235dd1e3678095a975f5bb434a">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>main.go</strong><dd><code>Update config-sync role documentation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/tools/config-sync/main.go <ul><li>Updated role flag description from "poller" to "gateway" in help text</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bc6eeb1b05bcb9179525e32fac1de9926b5823ec3504be546ab10c5c9740f544">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>setup.rs</strong><dd><code>Add NATS credentials debug logging</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> cmd/otel/src/setup.rs - Added debug logging for NATS credentials file configuration </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3891f667deb20fd26e296d3e2742c57378d3764fe1743118e612465ae360391f">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration</strong></td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary><strong>20260107043446_initial_schema.exs</strong><dd><code>Add initial tenant database schema migration</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs <ul><li>Created comprehensive database schema migration with 40+ tables for <br>multi-tenant platform<br> <li> Includes tables for user management, NATS infrastructure, device <br>discovery, monitoring, alerts, and onboarding<br> <li> Defines relationships, indexes, and constraints for tenant isolation <br>and data integrity<br> <li> Implements encrypted fields for sensitive data (keys, credentials, <br>certificates)</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0d217dc9822fab0d3390e8ec21040f98e67106e5c9126e043a9b701efcbfb576">+1416/-0</a></td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary><strong>runtime.exs</strong><dd><code>Agent gateway runtime configuration with cluster strategies</code></dd></summary> <hr> elixir/serviceradar_agent_gateway/config/runtime.exs <ul><li>Configures cluster topology strategies (Kubernetes DNS, DNSPoll, EPMD, <br>Gossip) for agent gateway node discovery<br> <li> Sets up SPIFFE/mTLS configuration for secure inter-node communication <br>with trust domain and certificate paths<br> <li> Disables core database and Oban job queue in agent gateway while <br>enabling cluster coordination with core nodes<br> <li> Configures PubSub and telemetry for distributed logging and event <br>aggregation across the cluster</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-842568fafa717a8064203543d674517fd28ad7dd2a4d3f0f157d274cfda4f18b">+209/-0</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Additional files</strong></td><td><details><summary>101 files</summary><table> <tr> <td><strong>.bazelignore</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a5641cd37d6ad98b32cdfce1980836cc68312277bc6a7052f55da02ada5bc6cf">+4/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>.bazelrc</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-544556920c45b42cbfe40159b082ce8af6bd929e492d076769226265f215832f">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>.env-sample</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c4368a972a7fa60d9c4e333cebf68cdb9a67acb810451125c02e3b7eb2594e3d">+33/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>.env.example</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a3046da0d15a27e89f2afe639b25748a7ad4d9290af3e7b1b6c1a5533c8f0a8c">+38/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>main.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7829468e86c1cc5d5133195b5cb48e1ff6c75e3e9203777f6b2e379d9e4882b3">+18/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>AGENTS.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a54ff182c7e8acf56acfd6e4b9c3ff41e2c41a31c9b211b2deb9df75d9a478f9">+177/-11</a></td> </tr> <tr> <td><strong>INSTALL.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-09b140a43ebfdd8dbec31ce72cafffd15164d2860fd390692a030bcb932b54a0">+11/-11</a>&nbsp; </td> </tr> <tr> <td><strong>MODULE.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-6136fc12446089c3db7360e923203dd114b6a1466252e71667c6791c20fe6bdc">+22/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Makefile</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52">+55/-14</a>&nbsp; </td> </tr> <tr> <td><strong>README-Docker.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9fd61d24482efe68c22d8d41e2a1dcc440f39195aa56e7a050f2abe598179efd">+17/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>ROADMAP.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-683343bdf93f55ed3cada86151abb8051282e1936e58d4e0a04beca95dff6e51">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-884fa9353a5226345e44fbabea3300efc7a87dfbcde0b6a42521ca51823f1b68">+11/-6</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0e80ea46aeb61a873324685edb96eae864c7a2004fbb7ee404b4ec951190ba10">+12/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>mix_release.bzl</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-86ec281f99363b6b6eb1f49e21d83b7eeca93a35b552b9f305fffc6855e38ccd">+141/-49</a></td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-143f8d1549d52f28906f19ce28e5568a5be474470ff103c2c1e63c3e6b08d670">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bfd308915d0cf522e7fc76600dee687617dc69165ab22502a1d219850c0c0860">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5b1bc8fe77422534739bdd3a38dc20d2634a86c171265c34e1b5d0c5a61b6bab">+5/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>build.rs</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-251e7a923f45f8f903e510d10f183366bda06d281c8ecc3669e1858256e2186d">+0/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>monitoring.proto</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b56f709f4a0a3db694f2124353908318631f23e20b7846bc4b8ee869e2e0632a">+3/-26</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7da152990199fd73c1eecb40f9c49e0d4e6453a8ec1acb111e445c55d1ca0af0">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2e9751b437fa61442aac074c7a4a912d0ac50ac3ea156ac8aedd8478d21c6bdb">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>monitoring.proto</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9faf6025eb0d3d38383f5b7ad2b733abeb38454d5e4de3e83994e94b12d87a50">+2/-26</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Cargo.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-fcf0c672917b64a5b953a914af013f16dddd6a1d813810236364e32f1ae70382">+0/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-643d2c3959322902c5bc9a22666b1e9ef71fa0bb87c9451b0e4147a4d5b51987">+8/-8</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>zen-consumer-with-otel.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-68375f1f7847e1fbdf75664f6be65b1ad94ae6ce86ed73fc5964d65054668acb">+14/-11</a>&nbsp; </td> </tr> <tr> <td><strong>zen-consumer.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4d308af9802a93a0f656e8c02a3b5fcd8991407bb18360f087470db74e1f9524">+14/-11</a>&nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2423ef78d36e905ae993b69ff59f5df6b2e1b9492fb0fa8c6d0aad7c76d2d229">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ef778d85ac6f9652c25cb0d631f0fe8dfb3edac4dde5d719a4fc2926fb5c3216">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c62c0139ebdb337369f4067567cd2c52b8e7decb3ddfabc77f9f67b2f6e5789c">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0b0725713b87dca1de57200214a4fe04633f0d856c39aa8032280227bf8e8141">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f425b4378f84e0ba0c6f532facff17ff5d55b4dc6033d8bf35130a159cd2ba32">+9/-12</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>flowgger.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-af9f49f931e282dca53d1f0521b036d222fe671f77e61a876a84cf4c6d7cca4d">+2/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c64b9ace832b8ea57a2be62f84166e03bb1904882635d444ec76a880cdf14cc0">+3/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.toml.example</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c1889866f35f98cdba9cd229fc119273c5fa5fca501451db23813b575f6fec66">+5/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e1f7c698e0e3a4e6afa971c1140e71cbf22593fbb19c81cb26b02c15c5dc46ec">+0/-25</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9edc2486fff55fc399e0ac96dba5137948a7ea7285f5ef7846835355684b7ab5">+0/-111</a>&nbsp; </td> </tr> <tr> <td><strong>main.go</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4b8ec845da50cd58d011e69f9d1c30530ee1968df26616b8768bb1fc03433bbe">+0/-138</a>&nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4f5d2ea4260d490a0d6f28adde0b35eca8af77d22f3ee366a783946c53687619">+0/-25</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bcac20d6b3cb81f0059e766839ba1ee59a885009249501b0ba1182ebb1daea25">+0/-77</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>main.go</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-78dc6bc53f1c760c66f43ff5f486bfe78a65bee8b2e0d4862293ec0892da2b29">+0/-123</a>&nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9c32ee8446458b6fd2ae7fee52016f4b707a59978b67888cd5bee2804d934528">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>docker-compose.elx.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9562070d7ad4a3e9b2d06567008cf35de1d96448d914b3b45bf6c36d97cdd914">+117/-0</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.spiffe.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-603fd9e7d40841d174f26b95d0cb0c9537430bf3f7a5da3ccbba4ea3d8ac66c9">+8/-158</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3">+316/-269</a></td> </tr> <tr> <td><strong>Dockerfile.agent-gateway</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-332bc81a932ae08efa711a71b60fe0954d99bf17ebdab00a3baaa177a44de8b0">+94/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.core-elx</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5ec7a971285669999af442a0c7f141c34f7fd9180257307f5c4ed12f789a2182">+108/-0</a>&nbsp; </td> </tr> <tr> <td><strong>Dockerfile.poller</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d3ba129830fb366bfe23b00db4ef6218b10fc981d3c04842b1b3b3b367a8982f">+0/-70</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.sync</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0227933b9961fd553af1d229e89d71a0271fdc475081bbcef49b587941af1eda">+0/-95</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.tools</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0258db71e4070e342198965f1d046f3097640850b037df8a2287a7e239630add">+1/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>Dockerfile.web-ng</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-92d43af1965575d56c3380ecc8a81024aac2ff36f039ec2d3839e9fc7852bc10">+6/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent-minimal.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-1f09fad94636c90373af8e270f6ba0332ae4f4d1df50a4909729280a3a9691e6">+6/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5d33fe703515d03076d31261ecf946e9c6fc668cf5bf65099d49b670739e455e">+5/-20</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-008f2216f159a9bd5db9cc90baaf6f1e64487df7af05b56ab3b9d6c4946aa95f">+7/-10</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>bootstrap-nested-spire.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ab4746a08fb1e0b307a1e47660cd22182e283a087cba87dcbff0fdfe750f44f1">+0/-80</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>.gitkeep</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d72c41aab2d6f2c230a4340dfefe7917cdd12bed942c825aa0d4c9875a637bac">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>datasvc.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3f2719d3dbfe042e8383739e3c78e74e5f851a44e5e46bea8e79c4b79fdcc34f">+3/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>datasvc.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3a45619e57f1e6e9a31486ec7fffb33ef246e271f82bac272ee0a946b88da70a">+14/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>db-event-writer.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9fc51271f7ef5bb460160013e24e44e829b730656891d26fc49d5fe72fbb3147">+15/-11</a>&nbsp; </td> </tr> <tr> <td><strong>db-event-writer.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7a33f95f7545499abf0ed9fc91b58499ab209639e4885019579c959583fc7496">+10/-8</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>FRICTION_POINTS.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b0653c58880f810ba832c0500733d63de309db98b43009fe73a1862494cf41bd">+0/-355</a>&nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-31849f033cfc932acee35f549c069abb1f36101c352e553dd6bff8713b29f98c">+0/-207</a>&nbsp; </td> </tr> <tr> <td><strong>SETUP_GUIDE.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b4914f8640a78038e45f51235a624535672680dc902de5f107fc051f4f281913">+0/-307</a>&nbsp; </td> </tr> <tr> <td><strong>docker-compose.edge-e2e.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-575d19ea771bdf8102cb9729db43a1bfd6afc2527160e54105beeac2e314f362">+0/-27</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>manage-packages.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3c2ff6febbddb956c71557894adaf7d0a39a1f20dda120fe126364946bc47280">+0/-211</a>&nbsp; </td> </tr> <tr> <td><strong>setup-edge-e2e.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2714e2c7e111f69ea9e9f5ddd7f6a70fa5ea96e3a53b851cb13b8b8b7cd12917">+0/-198</a>&nbsp; </td> </tr> <tr> <td><strong>edge-poller-restart.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-96a8fe52c38fd0d7c14895127df34a27be311cac89c53d28ee178661b629bd22">+0/-178</a>&nbsp; </td> </tr> <tr> <td><strong>downstream-agent.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-747de0375ced42af978ca7dac239862bdabb7f6bd0bd634f134b485517a7b4ee">+0/-32</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>env</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-686f1a954c542f2ec9bf14c3170648b65190ad242c7f3a95a0f872ae41b8b1c6">+0/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>server.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-025f5b5ab79526cf549ca1fdb90dd659ba76b438f05a7f77d916d18728c4b572">+0/-51</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>upstream-agent.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e8a869ddf4affa31536a8d4e4e6f09c40072a7026da2c609d93c6ecf04138902">+0/-32</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>entrypoint-certs.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-83d6800b184a5233c66c69766286b0a60fece1bc64addb112d9f8dc019437f05">+13/-9</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>entrypoint-poller.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e202d27e3331088745eb55cdd2b3e40ac3f5df109d9ff5c76c0faed60772807a">+0/-274</a>&nbsp; </td> </tr> <tr> <td><strong>entrypoint-sync.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9d5620b8e6833309dbafb8ee6b6b75c3b942d163c3fe7f1a9827958b2d640265">+0/-96</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>fix-cert-permissions.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-17ea40a11edcaa7c85bb4215fda46b5a32505246fef0ab5f3ed47b28470c5ec8">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>flowgger.docker.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-824f8797b418d4b9f5ea41e4a3741a0ed64b881f343072464489a76b7ea01008">+3/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate-certs.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-8298241543b4744a6ac7780c760ac5b5a0a87ba62de19c8612ebe1aba0996ebd">+214/-12</a></td> </tr> <tr> <td><strong>nats.docker.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-06f2012494f428fe1bfb304972061c2094e0d99da88ba9af6914f7776872e6eb">+16/-160</a></td> </tr> <tr> <td><strong>netflow-consumer.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f15920e8498a24f71ce3eec4f48fe8fefbb1765a90362998af779a660fcef9e1">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.docker.toml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d4af38790e3657b7589cd37a7539d5308b032f11caba7aa740ddc86bf99f4415">+7/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>pg_hba.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7bd5f7292054916c7e5997f4c84ac9ec07d4c945621a48936c2aed0575fb96eb">+9/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>pg_ident.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e7b8ce062e32c61fdc3bcc9e525c1f1df1c8008fbc02b11409e58c67baa17cc5">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>poller-stack.compose.yml</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f3b5c991c2c1f7646db0ca4ed9bcb5df0f313ce6a05d8f3c890f80c873f776f5">+0/-121</a>&nbsp; </td> </tr> <tr> <td><strong>poller.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d64ebb69ec31e831efd187c47a5bfff2573960306b177f6464e91cb44a3c709d">+0/-128</a>&nbsp; </td> </tr> <tr> <td><strong>poller.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ef5d74bb3607431245c2bf06169d7fee89cae817e114035075b59a671229ab46">+0/-135</a>&nbsp; </td> </tr> <tr> <td><strong>poller.spiffe.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4e04bd23a0216287d5c0bb3831e0f95e7922ed03e8386a10ae7f4873e4fdb538">+0/-55</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>refresh-upstream-credentials.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d3b3a8fcdea1b49c9e1c0ecc12d61fb6d416313520e8ad52edbee9094dbdc271">+0/-248</a>&nbsp; </td> </tr> <tr> <td><strong>seed-poller-kv.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c12070f475dbe7dc83e747fa6ec9d2ebdbdd97921a54f372abc89a102b783ad7">+0/-83</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>setup-edge-poller.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d7aec89d87f4cc98f4d6935e49a8f6ce571bc6dda254d894e93b60922f3a775f">+0/-204</a>&nbsp; </td> </tr> <tr> <td><strong>README.md</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0cb49b4e37a7692f026133d5de971d449f42a1068226e848da5adf9af0ff4a2e">+5/-5</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>bootstrap-compose-spire.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ca219a124d4c95ee7995764d7e0c322b4bfe59e357b7bcb42bc5d7c8b9b0af0d">+0/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.core.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-08d49d8621b581d1a9aa5c456f61e8c5774e021083c982cbb514019f915a1701">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.gateway.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4a43a8290d45ac68592000e7ef51afe78b4213090155bd42aafb46e66130f7ae">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>ssl_dist.web.conf</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-cef5be462ddb059fdfdeb9fd7c5cd70e656c4cd8b6ae1fe3fe312557b3da80ac">+17/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4237fcee4f33a230abf28e12e8d4823499d163759cd1ff124fec1c62faa8b8b4">+0/-71</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.mtls.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c652c07f7127be5b2932d92e6ef4c7448c544d1f3095cb96a03294fa58fd3c4c">+0/-75</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>sysmon-osx.checker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-044334b566d907c77656b7f951092709da2a111dc968da9a76315b1c71200cf4">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>tools-profile.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f47597e2f5d4d085d8bf109109608f8ec0b7db8e90545e869b9ae409b607a4ac">+1/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>trapd.docker.json</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-1ab1a0e03e63bc02e0ef31992a7187a377927272ed2060150b40d44cc0ea3357">+3/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>update-config.sh</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9ae50be83a13010a038389c74407ba1bde8cabcea0944e238c4b3374133f78bf">+1/-190</a>&nbsp; </td> </tr> <tr> <td><strong>Additional files not shown</strong></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2f328e4cd8dbe3ad193e49d92bcf045f47a6b72b1e9487d366f6b8288589b4ca"></a></td> </tr> </table></details></td></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2026-01-11 05:43:17 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734057887
Original created: 2026-01-11T05:43:17Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Arbitrary file write

Description: Environment-variable overrides (NATS_OPERATOR_CONFIG_PATH, NATS_RESOLVER_PATH) can direct
the service to write operator/resolver config files to arbitrary filesystem locations via
WriteOperatorConfig(), which is a potential arbitrary file-write risk if an attacker can
influence the process environment or config.
main.go [107-129]

Referred Code
operatorConfigPath := cfg.NATSOperator.OperatorConfigPath
if envPath := os.Getenv("NATS_OPERATOR_CONFIG_PATH"); envPath != "" {
	operatorConfigPath = envPath
}

resolverPath := cfg.NATSOperator.ResolverPath
if envPath := os.Getenv("NATS_RESOLVER_PATH"); envPath != "" {
	resolverPath = envPath
}

if operatorConfigPath != "" || resolverPath != "" {
	natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath)
	log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath)

	// If operator is already initialized, write the config now
	// This ensures config files exist even when datasvc restarts with an existing operator
	if operator != nil {
		if err := natsAccountServer.WriteOperatorConfig(); err != nil {
			log.Printf("Warning: failed to write initial operator config: %v", err)
		} else {
			log.Printf("Wrote initial operator config to %s", operatorConfigPath)


 ... (clipped 2 lines)
Sensitive path logging

Description: Debug logging prints the configured NATS credentials file path (nats.creds_file), which
can expose sensitive filesystem layout/secret locations to anyone with access to logs.
setup.rs [48-52]

Referred Code
    nats.url, nats.subject, nats.stream
);
debug!("NATS timeout: {:?}", nats.timeout);
debug!("NATS creds file: {:?}", nats.creds_file);
debug!("NATS TLS cert: {:?}", nats.tls_cert);

Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Potential sensitive logs: New info! logs print req.details verbatim, which may contain sensitive information and is
not redacted or structured for safe auditing.

Referred Code
info!(
    "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}",
    req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details
);
debug!("Processing GetStatus request");

let start_time = std::time::Instant::now();
let collector = self.collector.read().await;
let metrics = collector
    .get_latest_metrics()
    .await
    .ok_or_else(|| Status::unavailable("No metrics available yet"))?; // metrics is MetricSample

debug!("Returning metrics with timestamp {}", metrics.timestamp);

// Create the outer JSON object, embedding the metrics struct directly
let outer_data = serde_json::json!({
    "status": metrics, // Embed the MetricSample struct directly. It's already Serialize.
    "response_time": start_time.elapsed().as_nanos() as i64,
    "available": true // This 'available' is part of the JSON payload in StatusResponse.message
});


 ... (clipped 34 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Audit coverage unclear: The diff shows some operational logging (agent enrollment/config/status), but it is not
verifiable from the provided hunks whether all critical security/data actions across the
refactor have complete audit logging with consistent actor context.

Referred Code
@doc """
Handle an agent hello/enrollment request.

Called by the agent on startup to announce itself and register with the gateway.
Validates the mTLS certificate, extracts tenant identity, and registers the agent.
"""
@spec hello(Monitoring.AgentHelloRequest.t(), GRPC.Server.Stream.t()) ::
        Monitoring.AgentHelloResponse.t()
def hello(request, stream) do
  agent_id =
    case request.agent_id do
      nil ->
        ""

      value ->
        value
        |> to_string()
        |> String.trim()
    end

  if agent_id == "" do



 ... (clipped 247 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
AuthZ/validation scope: The gateway server includes strong validation and mTLS-based tenant resolution, but full
verification of end-to-end authorization and secure handling across newly added/modified
files is not possible from the partial diff provided.

Referred Code

Tenant identity is extracted from the mTLS client certificate using
`ServiceRadar.Edge.TenantResolver`. The certificate contains:
- CN: `<component_id>.<partition_id>.<tenant_slug>.serviceradar`
- SPIFFE URI SAN: `spiffe://serviceradar.local/<component_type>/<tenant_slug>/<partition_id>/<component_id>`

The issuer CA SPKI hash is also verified against stored tenant CA records.
This ensures tenants cannot impersonate each other.

## Protocol

The server implements the AgentGatewayService:
- `PushStatus`: Receives a batch of service statuses from an agent
- `StreamStatus`: Receives streaming chunks of service statuses

## Usage

The server is started automatically by the application supervisor.
Incoming status updates are forwarded to the core cluster for processing.
"""




 ... (clipped 892 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734057887 Original created: 2026-01-11T05:43:17Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/63be0bbc00671b9270c09031c20348d38060df99 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=2>⚪</td> <td><details><summary><strong>Arbitrary file write </strong></summary><br> <b>Description:</b> Environment-variable overrides (<code>NATS_OPERATOR_CONFIG_PATH</code>, <code>NATS_RESOLVER_PATH</code>) can direct <br>the service to write operator/resolver config files to arbitrary filesystem locations via <br><code>WriteOperatorConfig()</code>, which is a potential arbitrary file-write risk if an attacker can <br>influence the process environment or config.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R107-R129'>main.go [107-129]</a></strong><br> <details open><summary>Referred Code</summary> ```go operatorConfigPath := cfg.NATSOperator.OperatorConfigPath if envPath := os.Getenv("NATS_OPERATOR_CONFIG_PATH"); envPath != "" { operatorConfigPath = envPath } resolverPath := cfg.NATSOperator.ResolverPath if envPath := os.Getenv("NATS_RESOLVER_PATH"); envPath != "" { resolverPath = envPath } if operatorConfigPath != "" || resolverPath != "" { natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath) log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath) // If operator is already initialized, write the config now // This ensures config files exist even when datasvc restarts with an existing operator if operator != nil { if err := natsAccountServer.WriteOperatorConfig(); err != nil { log.Printf("Warning: failed to write initial operator config: %v", err) } else { log.Printf("Wrote initial operator config to %s", operatorConfigPath) ... (clipped 2 lines) ``` </details></details></td></tr> <tr><td><details><summary><strong>Sensitive path logging </strong></summary><br> <b>Description:</b> Debug logging prints the configured NATS credentials file path (<code>nats.creds_file</code>), which <br>can expose sensitive filesystem layout/secret locations to anyone with access to logs.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-3891f667deb20fd26e296d3e2742c57378d3764fe1743118e612465ae360391fR48-R52'>setup.rs [48-52]</a></strong><br> <details open><summary>Referred Code</summary> ```rust nats.url, nats.subject, nats.stream ); debug!("NATS timeout: {:?}", nats.timeout); debug!("NATS creds file: {:?}", nats.creds_file); debug!("NATS TLS cert: {:?}", nats.tls_cert); ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=3>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=1>🔴</td> <td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33eR216-R270'><strong>Potential sensitive logs</strong></a>: New <code>info!</code> logs print <code>req.details</code> verbatim, which may contain sensitive information and is <br>not redacted or structured for safe auditing.<br> <details open><summary>Referred Code</summary> ```rust info!( "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}", req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details ); debug!("Processing GetStatus request"); let start_time = std::time::Instant::now(); let collector = self.collector.read().await; let metrics = collector .get_latest_metrics() .await .ok_or_else(|| Status::unavailable("No metrics available yet"))?; // metrics is MetricSample debug!("Returning metrics with timestamp {}", metrics.timestamp); // Create the outer JSON object, embedding the metrics struct directly let outer_data = serde_json::json!({ "status": metrics, // Embed the MetricSample struct directly. It's already Serialize. "response_time": start_time.elapsed().as_nanos() as i64, "available": true // This 'available' is part of the JSON payload in StatusResponse.message }); ... (clipped 34 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=2>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR58-R325'><strong>Audit coverage unclear</strong></a>: The diff shows some operational logging (agent enrollment/config/status), but it is not <br>verifiable from the provided hunks whether all critical security/data actions across the <br>refactor have complete audit logging with consistent actor context.<br> <details open><summary>Referred Code</summary> ```elixir @doc """ Handle an agent hello/enrollment request. Called by the agent on startup to announce itself and register with the gateway. Validates the mTLS certificate, extracts tenant identity, and registers the agent. """ @spec hello(Monitoring.AgentHelloRequest.t(), GRPC.Server.Stream.t()) :: Monitoring.AgentHelloResponse.t() def hello(request, stream) do agent_id = case request.agent_id do nil -> "" value -> value |> to_string() |> String.trim() end if agent_id == "" do ... (clipped 247 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR14-R926'><strong>AuthZ/validation scope</strong></a>: The gateway server includes strong validation and mTLS-based tenant resolution, but full <br>verification of end-to-end authorization and secure handling across newly added/modified <br>files is not possible from the partial diff provided.<br> <details open><summary>Referred Code</summary> ```elixir Tenant identity is extracted from the mTLS client certificate using `ServiceRadar.Edge.TenantResolver`. The certificate contains: - CN: `<component_id>.<partition_id>.<tenant_slug>.serviceradar` - SPIFFE URI SAN: `spiffe://serviceradar.local/<component_type>/<tenant_slug>/<partition_id>/<component_id>` The issuer CA SPKI hash is also verified against stored tenant CA records. This ensures tenants cannot impersonate each other. ## Protocol The server implements the AgentGatewayService: - `PushStatus`: Receives a batch of service statuses from an agent - `StreamStatus`: Receives streaming chunks of service statuses ## Usage The server is started automatically by the application supervisor. Incoming status updates are forwarded to the core cluster for processing. """ ... (clipped 892 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2026-01-11 05:44:34 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734058682
Original created: 2026-01-11T05:44:34Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix race condition in shutdown

Refactor the shutdown goroutine to prevent a race condition by ensuring the push
loop is fully stopped before stopping the server.

cmd/agent/main.go [207-223]

 go func() {
 	defer close(shutdownDone)
 
-	cancel()
+	cancel() // Signal the push loop to stop.
+
+	// Wait for the push loop to finish. It will send an error (or nil) on errChan when it exits.
+	// This also handles any final work pushLoop.Stop might do.
 	if err := pushLoop.Stop(shutdownCtx); err != nil {
 		log.Warn().Err(err).Msg("Push loop stop did not complete before timeout")
 	}
-	select {
-	case <-errChan:
-	case <-shutdownCtx.Done():
-		return
-	}
 
+	// Wait for the push loop goroutine to exit.
+	<-errChan
+
+	// Now that the push loop is stopped, stop the server.
 	if err := server.Stop(shutdownCtx); err != nil {
 		log.Error().Err(err).Msg("Error stopping agent services")
 	}
 }()
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a race condition in the shutdown logic that could prevent services from being stopped correctly, and the proposed fix resolves this critical bug.

High
Use correct max_by for deduplication

Correct the implementation of deduplicate_by_device_id/1 by replacing the
incorrect usage of Enum.max_by/3 with Enum.max_by/2. Convert timestamps to a
comparable integer value to correctly find the most recent update.

elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [513-525]

 defp deduplicate_by_device_id(updates) do
   updates
   |> Enum.group_by(& &1.device_id)
   |> Enum.map(fn {_device_id, grouped} ->
-    # Keep the update with the latest timestamp, handling nil timestamps
-    Enum.max_by(grouped, & &1.timestamp, fn
-      nil, nil -> :eq
-      nil, _ -> :lt
-      _, nil -> :gt
-      a, b -> DateTime.compare(a, b)
+    Enum.max_by(grouped, fn update ->
+      case update.timestamp do
+        %DateTime{} = ts -> DateTime.to_unix(ts, :millisecond)
+        _ -> 0
+      end
     end)
   end)
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a bug in the usage of Enum.max_by/3, which would not sort as intended. The proposed fix using Enum.max_by/2 with a transformation function is correct and resolves the bug, ensuring the latest update is always selected.

High
Ensure notifications are sent transactionally

Refactor the alert notification logic to use an Ash.Notifier. This ensures that
webhook notifications are sent only after the database transaction to create the
alert has successfully committed, preventing notifications for rolled-back
transactions.

elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex [413-436]

 defp create_alert_and_notify(attrs, opts) do
   tenant_id = Map.get(attrs, :tenant_id) || Keyword.get(opts, :tenant_id)
   tenant_schema = Keyword.get(opts, :tenant) || tenant_schema_for(tenant_id)
   actor = Keyword.get(opts, :actor) || system_actor(tenant_id)
 
   # Create the alert in the database
   if is_nil(tenant_schema) do
     Logger.warning("Skipping alert creation; tenant schema missing", tenant_id: tenant_id)
     {:error, :missing_tenant_schema}
   else
+    # The webhook notification should be handled by an Ash.Notifier
+    # attached to the Alert resource to ensure it only fires on successful
+    # transaction commit. This avoids sending notifications for rolled-back transactions.
     case Alert
          |> Ash.Changeset.for_create(:trigger, attrs, actor: actor, tenant: tenant_schema)
          |> Ash.create() do
       {:ok, alert} ->
-        # Also send webhook notification
-        send_webhook_notification(alert, opts)
+        # The webhook notification should be triggered via an Ash Notifier
+        # to ensure it's only sent after the transaction commits.
+        # For example, if a `WebhookNotifier` is configured in your domain:
+        # ServiceRadar.Monitoring.WebhookNotifier.notify()
         {:ok, alert}
 
       {:error, error} ->
         Logger.error("Failed to create alert: #{inspect(error)}")
         {:error, error}
     end
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a data consistency issue where a notification could be sent for a database record that was never committed. The proposed solution to use an Ash.Notifier is the correct, framework-idiomatic way to ensure transactional integrity for side effects.

Medium
Include tenant identifiers in CA data

Update the return map of generate_tenant_ca/2 to include tenant_id and
tenant_slug. This is required by the downstream generate_component_cert/5
function and will prevent a runtime error.

elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex [106-116]

 {:ok,
  %{
+   tenant_id: tenant_id,
+   tenant_slug: tenant.slug,
    certificate_pem: cert_pem,
    private_key_pem: key_pem,
    spki_sha256: spki_sha256,
    serial_number: serial_to_hex(serial),
    not_before: not_before,
    not_after: not_after,
    subject_cn: subject_cn
  }}

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that generate_component_cert/5 requires tenant_id from the tenant_ca map, which is missing from the return value of generate_tenant_ca/2. Adding it prevents a runtime error and fixes a clear bug in the data flow between functions.

Medium
Improve fault tolerance in RPC calls

Modify core_call to continue iterating through available core nodes on
application-level errors, rather than halting immediately, to improve fault
tolerance.

elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [856-872]

 defp core_call(module, function, args, timeout \\ 5_000) do
   nodes = core_nodes()
 
   if nodes == [] do
     {:error, :core_unavailable}
   else
     Enum.reduce_while(nodes, {:error, :core_unavailable}, fn node, _acc ->
       case :rpc.call(node, module, function, args, timeout) do
         {:badrpc, _} ->
           {:cont, {:error, :core_unavailable}}
 
+        {:ok, {:ok, _}} = result ->
+          {:halt, result}
+
+        {:ok, {:error, _}} = error_result ->
+          {:cont, error_result}
+
         result ->
           {:halt, {:ok, result}}
       end
     end)
   end
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out a potential issue where core_call might fail prematurely if the first node returns an application-level error, without trying other nodes. The proposed change improves fault tolerance by continuing to the next node unless a definitive success is returned.

Medium
Handle both string and atom keys

Update get_nested_value to handle both string and atom keys for nested map
lookups to improve robustness against varying key types.

elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [739-745]

 defp get_nested_value(map, key) when is_map(map) and is_binary(key) do
   key
   |> String.split(".")
   |> Enum.reduce(map, fn segment, acc ->
-    if is_map(acc), do: Map.get(acc, segment), else: nil
+    if is_map(acc) do
+      Map.get(acc, segment) || Map.get(acc, String.to_atom(segment))
+    else
+      nil
+    end
   end)
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that get_nested_value only handles string keys, while other parts of the code handle both string and atom keys. The proposed change makes the matching logic more robust and consistent with the rest of the file.

Low
Security
Prevent potential atom exhaustion vulnerability

To prevent a potential atom exhaustion Denial of Service (DoS) vulnerability,
replace the use of String.to_existing_atom/1 with a lookup in a string-keyed map
for capability definitions.

elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [70-76]

+@string_capability_definitions Map.new(@capability_definitions, fn {k, v} -> {to_string(k), v} end)
+
 def capability_info(capability) when is_binary(capability) do
-  try do
-    capability_info(String.to_existing_atom(capability))
-  rescue
-    ArgumentError -> %{icon: "hero-cube", color: "ghost", description: capability}
-  end
+  Map.get(@string_capability_definitions, capability, %{icon: "hero-cube", color: "ghost", description: capability})
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a significant security vulnerability (atom exhaustion DoS) and provides a robust, idiomatic fix by using a string-keyed map instead of converting external input to atoms.

High
General
Throttle rule reload on error

In load_rules, update rules_loaded_at even on failure to prevent the engine from
retrying the load on every evaluation and flooding logs.

elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [154-167]

 defp load_rules(state) do
   rules =
     StatefulAlertRule
     |> Ash.Query.for_read(:active, %{}, tenant: state.schema)
     |> Ash.read(authorize?: false)
     |> unwrap_page()
 
   updated = %{state | rules: rules, rules_loaded_at: System.monotonic_time(:millisecond)}
   {updated, rules}
 rescue
   error ->
     Logger.warning("Failed to load stateful alert rules: #{inspect(error)}")
-    {state, []}
+    updated = %{state | rules_loaded_at: System.monotonic_time(:millisecond)}
+    {updated, []}
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that a failure to load rules will cause a retry on every subsequent evaluation, potentially flooding logs. Updating rules_loaded_at on failure introduces a cooldown period, which is a sensible improvement for error handling and system stability.

Medium
Avoid using internal process dictionary

Replace the use of the internal :"$ancestors" process dictionary key with the
public Supervisor.which_children/1 API to find a process's parent supervisor.
This makes the implementation more robust and less likely to break with future
OTP updates.

elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex [274-302]

 defp get_parent_supervisor(pid) do
-  # Get all PIDs that are children of our DynamicSupervisor
-  child_pids =
-    DynamicSupervisor.which_children(__MODULE__)
-    |> Enum.map(fn {_, child_pid, _, _} -> child_pid end)
-    |> MapSet.new()
+  # Find the TenantSupervisor that has the given pid as a child.
+  # This is more robust than relying on the internal :"$ancestors" process dictionary key.
+  DynamicSupervisor.which_children(__MODULE__)
+  |> Enum.find_value(fn {_, tenant_sup_pid, _, _} ->
+    child_pids =
+      Supervisor.which_children(tenant_sup_pid)
+      |> Enum.map(fn {_, child_pid, _, _} -> child_pid end)
 
-  # Get ancestors of the registry process and find which one is our child
-  case Process.info(pid, :dictionary) do
-    {:dictionary, dict} ->
-      ancestors = Keyword.get(dict, :"$ancestors", [])
-
-      # Find the ancestor that is a direct child of our DynamicSupervisor
-      Enum.find_value(ancestors, fn
-        ancestor when is_pid(ancestor) ->
-          if MapSet.member?(child_pids, ancestor), do: ancestor, else: nil
-
-        ancestor when is_atom(ancestor) ->
-          ancestor_pid = Process.whereis(ancestor)
-          if ancestor_pid && MapSet.member?(child_pids, ancestor_pid), do: ancestor_pid, else: nil
-
-        _ ->
-          nil
-      end)
-
-    _ ->
-      nil
-  end
+    if pid in child_pids, do: tenant_sup_pid, else: nil
+  end)
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out the fragility of relying on the undocumented :"$ancestors" process dictionary key and proposes a more robust solution using public APIs, which improves code maintainability and resilience to OTP updates.

Medium
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734058682 Original created: 2026-01-11T05:44:34Z --- ## PR Code Suggestions ✨ <!-- 63be0bb --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=6>Possible issue</td> <td> <details><summary>Fix race condition in shutdown</summary> ___ **Refactor the shutdown goroutine to prevent a race condition by ensuring the push <br>loop is fully stopped before stopping the server.** [cmd/agent/main.go [207-223]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3R207-R223) ```diff go func() { defer close(shutdownDone) - cancel() + cancel() // Signal the push loop to stop. + + // Wait for the push loop to finish. It will send an error (or nil) on errChan when it exits. + // This also handles any final work pushLoop.Stop might do. if err := pushLoop.Stop(shutdownCtx); err != nil { log.Warn().Err(err).Msg("Push loop stop did not complete before timeout") } - select { - case <-errChan: - case <-shutdownCtx.Done(): - return - } + // Wait for the push loop goroutine to exit. + <-errChan + + // Now that the push loop is stopped, stop the server. if err := server.Stop(shutdownCtx); err != nil { log.Error().Err(err).Msg("Error stopping agent services") } }() ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=0 --> <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a race condition in the shutdown logic that could prevent services from being stopped correctly, and the proposed fix resolves this critical bug. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Use correct max_by for deduplication</summary> ___ **Correct the implementation of <code>deduplicate_by_device_id/1</code> by replacing the <br>incorrect usage of <code>Enum.max_by/3</code> with <code>Enum.max_by/2</code>. Convert timestamps to a <br>comparable integer value to correctly find the most recent update.** [elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [513-525]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28R513-R525) ```diff defp deduplicate_by_device_id(updates) do updates |> Enum.group_by(& &1.device_id) |> Enum.map(fn {_device_id, grouped} -> - # Keep the update with the latest timestamp, handling nil timestamps - Enum.max_by(grouped, & &1.timestamp, fn - nil, nil -> :eq - nil, _ -> :lt - _, nil -> :gt - a, b -> DateTime.compare(a, b) + Enum.max_by(grouped, fn update -> + case update.timestamp do + %DateTime{} = ts -> DateTime.to_unix(ts, :millisecond) + _ -> 0 + end end) end) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a bug in the usage of `Enum.max_by/3`, which would not sort as intended. The proposed fix using `Enum.max_by/2` with a transformation function is correct and resolves the bug, ensuring the latest update is always selected. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Ensure notifications are sent transactionally</summary> ___ **Refactor the alert notification logic to use an <code>Ash.Notifier</code>. This ensures that <br>webhook notifications are sent only after the database transaction to create the <br>alert has successfully committed, preventing notifications for rolled-back <br>transactions.** [elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex [413-436]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-62074160ac91002a439bab337a032329681bc55c84a59ab9934bc76d05a5de04R413-R436) ```diff defp create_alert_and_notify(attrs, opts) do tenant_id = Map.get(attrs, :tenant_id) || Keyword.get(opts, :tenant_id) tenant_schema = Keyword.get(opts, :tenant) || tenant_schema_for(tenant_id) actor = Keyword.get(opts, :actor) || system_actor(tenant_id) # Create the alert in the database if is_nil(tenant_schema) do Logger.warning("Skipping alert creation; tenant schema missing", tenant_id: tenant_id) {:error, :missing_tenant_schema} else + # The webhook notification should be handled by an Ash.Notifier + # attached to the Alert resource to ensure it only fires on successful + # transaction commit. This avoids sending notifications for rolled-back transactions. case Alert |> Ash.Changeset.for_create(:trigger, attrs, actor: actor, tenant: tenant_schema) |> Ash.create() do {:ok, alert} -> - # Also send webhook notification - send_webhook_notification(alert, opts) + # The webhook notification should be triggered via an Ash Notifier + # to ensure it's only sent after the transaction commits. + # For example, if a `WebhookNotifier` is configured in your domain: + # ServiceRadar.Monitoring.WebhookNotifier.notify() {:ok, alert} {:error, error} -> Logger.error("Failed to create alert: #{inspect(error)}") {:error, error} end end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: This suggestion correctly identifies a data consistency issue where a notification could be sent for a database record that was never committed. The proposed solution to use an `Ash.Notifier` is the correct, framework-idiomatic way to ensure transactional integrity for side effects. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Include tenant identifiers in CA data</summary> ___ **Update the return map of <code>generate_tenant_ca/2</code> to include <code>tenant_id</code> and <br><code>tenant_slug</code>. This is required by the downstream <code>generate_component_cert/5</code> <br>function and will prevent a runtime error.** [elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex [106-116]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-b48e4a9e1189da61e2a60e16f56fce81298d76b7cdab745107140fed3f6e48b4R106-R116) ```diff {:ok, %{ + tenant_id: tenant_id, + tenant_slug: tenant.slug, certificate_pem: cert_pem, private_key_pem: key_pem, spki_sha256: spki_sha256, serial_number: serial_to_hex(serial), not_before: not_before, not_after: not_after, subject_cn: subject_cn }} ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies that `generate_component_cert/5` requires `tenant_id` from the `tenant_ca` map, which is missing from the return value of `generate_tenant_ca/2`. Adding it prevents a runtime error and fixes a clear bug in the data flow between functions. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Improve fault tolerance in RPC calls</summary> ___ **Modify <code>core_call</code> to continue iterating through available core nodes on <br>application-level errors, rather than halting immediately, to improve fault <br>tolerance.** [elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [856-872]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR856-R872) ```diff defp core_call(module, function, args, timeout \\ 5_000) do nodes = core_nodes() if nodes == [] do {:error, :core_unavailable} else Enum.reduce_while(nodes, {:error, :core_unavailable}, fn node, _acc -> case :rpc.call(node, module, function, args, timeout) do {:badrpc, _} -> {:cont, {:error, :core_unavailable}} + {:ok, {:ok, _}} = result -> + {:halt, result} + + {:ok, {:error, _}} = error_result -> + {:cont, error_result} + result -> {:halt, {:ok, result}} end end) end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly points out a potential issue where `core_call` might fail prematurely if the first node returns an application-level error, without trying other nodes. The proposed change improves fault tolerance by continuing to the next node unless a definitive success is returned. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Handle both string and atom keys</summary> ___ **Update <code>get_nested_value</code> to handle both string and atom keys for nested map <br>lookups to improve robustness against varying key types.** [elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [739-745]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-bae3a52db882de8c947e62f219a95dff8db4e155e37d9a361dbe14ec25fcd3bdR739-R745) ```diff defp get_nested_value(map, key) when is_map(map) and is_binary(key) do key |> String.split(".") |> Enum.reduce(map, fn segment, acc -> - if is_map(acc), do: Map.get(acc, segment), else: nil + if is_map(acc) do + Map.get(acc, segment) || Map.get(acc, String.to_atom(segment)) + else + nil + end end) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly identifies that `get_nested_value` only handles string keys, while other parts of the code handle both string and atom keys. The proposed change makes the matching logic more robust and consistent with the rest of the file. </details></details></td><td align=center>Low </td></tr><tr><td rowspan=1>Security</td> <td> <details><summary>Prevent potential atom exhaustion vulnerability</summary> ___ **To prevent a potential atom exhaustion Denial of Service (DoS) vulnerability, <br>replace the use of <code>String.to_existing_atom/1</code> with a lookup in a string-keyed map <br>for capability definitions.** [elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [70-76]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843aR70-R76) ```diff +@string_capability_definitions Map.new(@capability_definitions, fn {k, v} -> {to_string(k), v} end) + def capability_info(capability) when is_binary(capability) do - try do - capability_info(String.to_existing_atom(capability)) - rescue - ArgumentError -> %{icon: "hero-cube", color: "ghost", description: capability} - end + Map.get(@string_capability_definitions, capability, %{icon: "hero-cube", color: "ghost", description: capability}) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a significant security vulnerability (atom exhaustion DoS) and provides a robust, idiomatic fix by using a string-keyed map instead of converting external input to atoms. </details></details></td><td align=center>High </td></tr><tr><td rowspan=2>General</td> <td> <details><summary>Throttle rule reload on error</summary> ___ **In <code>load_rules</code>, update <code>rules_loaded_at</code> even on failure to prevent the engine from <br>retrying the load on every evaluation and flooding logs.** [elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [154-167]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-bae3a52db882de8c947e62f219a95dff8db4e155e37d9a361dbe14ec25fcd3bdR154-R167) ```diff defp load_rules(state) do rules = StatefulAlertRule |> Ash.Query.for_read(:active, %{}, tenant: state.schema) |> Ash.read(authorize?: false) |> unwrap_page() updated = %{state | rules: rules, rules_loaded_at: System.monotonic_time(:millisecond)} {updated, rules} rescue error -> Logger.warning("Failed to load stateful alert rules: #{inspect(error)}") - {state, []} + updated = %{state | rules_loaded_at: System.monotonic_time(:millisecond)} + {updated, []} end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies that a failure to load rules will cause a retry on every subsequent evaluation, potentially flooding logs. Updating `rules_loaded_at` on failure introduces a cooldown period, which is a sensible improvement for error handling and system stability. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Avoid using internal process dictionary</summary> ___ **Replace the use of the internal <code>:"$ancestors"</code> process dictionary key with the <br>public <code>Supervisor.which_children/1</code> API to find a process's parent supervisor. <br>This makes the implementation more robust and less likely to break with future <br>OTP updates.** [elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex [274-302]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-91248b3b128a2e3d9bea6ffdb5e0f295e4a1745e82f87687c640ad01416fb85dR274-R302) ```diff defp get_parent_supervisor(pid) do - # Get all PIDs that are children of our DynamicSupervisor - child_pids = - DynamicSupervisor.which_children(__MODULE__) - |> Enum.map(fn {_, child_pid, _, _} -> child_pid end) - |> MapSet.new() + # Find the TenantSupervisor that has the given pid as a child. + # This is more robust than relying on the internal :"$ancestors" process dictionary key. + DynamicSupervisor.which_children(__MODULE__) + |> Enum.find_value(fn {_, tenant_sup_pid, _, _} -> + child_pids = + Supervisor.which_children(tenant_sup_pid) + |> Enum.map(fn {_, child_pid, _, _} -> child_pid end) - # Get ancestors of the registry process and find which one is our child - case Process.info(pid, :dictionary) do - {:dictionary, dict} -> - ancestors = Keyword.get(dict, :"$ancestors", []) - - # Find the ancestor that is a direct child of our DynamicSupervisor - Enum.find_value(ancestors, fn - ancestor when is_pid(ancestor) -> - if MapSet.member?(child_pids, ancestor), do: ancestor, else: nil - - ancestor when is_atom(ancestor) -> - ancestor_pid = Process.whereis(ancestor) - if ancestor_pid && MapSet.member?(child_pids, ancestor_pid), do: ancestor_pid, else: nil - - _ -> - nil - end) - - _ -> - nil - end + if pid in child_pids, do: tenant_sup_pid, else: nil + end) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly points out the fragility of relying on the undocumented `:"$ancestors"` process dictionary key and proposes a more robust solution using public APIs, which improves code maintainability and resilience to OTP updates. </details></details></td><td align=center>Medium </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2646
No description provided.