carverauto/serviceradar

Fork 0

Feat/rule builder UI #2646

Merged

mfreeman451 merged 7 commits from refs/pull/2646/head into testing

2026-01-11 08:22:47 +00:00

mfreeman451 commented

2026-01-11 05:41:42 +00:00

(Migrated from github.com)

Owner

Imported from GitHub pull request.

Original GitHub pull request: #2243
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2243
Original created: 2026-01-11T05:41:42Z
Original updated: 2026-01-11T08:22:48Z
Original head: carverauto/serviceradar:feat/rule-builder-ui
Original base: testing
Original merged: 2026-01-11T08:22:47Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Issue ticket number and link

Code checklist before requesting a review

I have signed the DCO?
The build completes without errors?
All tests are passing when running make test?

PR Type

Enhancement

Description

Major architectural refactoring transitioning from poller-based to gateway-based architecture with comprehensive platform modernization:

Core Architecture Changes:

Refactored agent from KV-based config management to push mode with file-based configuration and direct gateway connection
Implemented gRPC agent gateway server with mTLS multi-tenant security for receiving agent status pushes
Removed legacy poller, sync, and edge onboarding components; replaced with modern push-based architecture

NATS Infrastructure:

Added NATS account service initialization with operator key management and gRPC server registration
Implemented comprehensive NATS credentials file support across all consumers (zen, trapd, flowgger, otel)
Added wildcard subject pattern matching (* and >) for flexible event routing
Updated default NATS subjects and added configurable logs subject support

Multi-Tenant Platform Foundation:

Created comprehensive database schema migration with 40+ tables for user management, NATS infrastructure, device discovery, monitoring, and alerts
Implemented multi-tenant process registry and supervisor management via DynamicSupervisor with Horde registries
Added SPIFFE/SPIRE integration for distributed cluster security with X.509 certificate generation for tenant CAs
Implemented per-tenant CA and edge component certificate generation with SPIFFE URIs

Monitoring & Alerting:

Implemented stateful alert engine with bucketed time-window aggregation for log/event rules
Added alert generation service for monitoring events with severity levels and webhook notifications
Implemented poll orchestrator for distributed service check execution across cluster gateways
Added device alias lifecycle event tracking system for audit and alerting

Infrastructure Management:

Defined Agent resource with OCSF v1.4.0 state machine and capability definitions
Implemented edge onboarding packages with encrypted token generation and component certificate signing
Added NATS account management gRPC client for tenant account and credential lifecycle
Configured agent gateway runtime with cluster topology strategies and mTLS security

Terminology Updates:

Renamed all references from "poller" to "gateway" across Go services (SNMP, sysmon, rperf-client)
Updated CLI commands: renamed update-poller to update-gateway, added nats-bootstrap and admin subcommands
Updated API documentation and help text to reflect new gateway terminology

Message Processing Simplification:

Removed CloudEvents wrapping from zen consumer message processing
Simplified to direct JSON publishing of context data instead of wrapped events

Diagram Walkthrough

flowchart LR
  Agent["Agent<br/>Push Mode"] -->|mTLS| Gateway["Agent Gateway<br/>gRPC Server"]
  Gateway -->|Status Updates| Core["Core Platform<br/>Multi-Tenant"]
  Core -->|Config| Agent
  
  NATS["NATS Infrastructure<br/>Operator + Accounts"] -->|Credentials| Consumers["Consumers<br/>zen, trapd, flowgger, otel"]
  Consumers -->|Events| Core
  
  Core -->|Orchestrate| Checks["Service Checks<br/>Poll Orchestrator"]
  Checks -->|Execute| Gateways["Distributed Gateways<br/>SNMP, sysmon, rperf"]
  
  Core -->|Rules| AlertEngine["Stateful Alert Engine<br/>Bucketed Aggregation"]
  AlertEngine -->|Webhooks| Notifications["Alert Notifications"]
  
  TenantCA["Tenant CA<br/>Certificate Generation"] -->|Certs| Onboarding["Edge Onboarding<br/>Packages"]
  Onboarding -->|Enroll| Agent

File Walkthrough

Relevant files

Enhancement

27 files

main.go `Refactor agent to push mode with file-based config` cmd/agent/main.go Refactored agent startup from KV-based config management to direct file-based configuration loading Removed edge onboarding, KV watch, and lifecycle server dependencies; replaced with push mode architecture Added `loadConfig()` function for JSON config parsing with embedded defaults fallback Implemented `runPushMode()` function handling gateway connection, push loop, and graceful shutdown with signal handling Added version injection via ldflags and proper error handling for missing gateway address	+174/-74
main.go `Initialize NATS account service with operator support` cmd/data-services/main.go Added NATS account service initialization with operator key management Configured resolver paths and system account credentials from environment variables with config file fallback Registered `NATSAccountService` gRPC server when configured Added logging for operator initialization, allowed client identities, and resolver configuration	+68/-0
main.go `Add NATS bootstrap and admin command support` cmd/cli/main.go Renamed `update-poller` subcommand to `update-gateway` Added `nats-bootstrap` subcommand for NATS operator initialization Added `admin` subcommand dispatcher with `nats` admin resource routing Created `dispatchAdminCommand()` helper function for admin subcommand routing	+16/-2
main.go `Rename SNMP poller to gateway terminology` cmd/checkers/snmp/main.go Renamed `SNMPPollerService` to `SNMPGatewayService` Renamed `Poller` struct to `Gateway` struct	+1/-1
app.go `Rename poller service to agent gateway service` cmd/core/app/app.go Renamed gRPC service registration from `RegisterPollerServiceServer` to `RegisterAgentGatewayServiceServer`	+1/-1
config.rs `Add NATS credentials and wildcard subject matching` cmd/consumers/zen/src/config.rs Added `nats_creds_file` optional configuration field for NATS credentials Added `nats_creds_path()` method resolving credentials file path with support for absolute/relative paths and cert directory Updated subject pattern matching to support wildcard patterns (`*` and `>`) via new `subject_matches()` function Updated test fixtures to use wildcard subject patterns and added credentials file validation	+85/-28
config.rs `Add NATS credentials and logs subject configuration` cmd/otel/src/config.rs Added `logs_subject` optional field for dedicated logs subject configuration Added `creds_file` optional field for NATS credentials file path Updated default NATS subject from `events.otel` to `otel` Added credentials file parsing with empty string handling	+21/-3
nats_output.rs `Implement NATS credentials and configurable logs subject` cmd/otel/src/nats_output.rs Added `logs_subject` and `creds_file` fields to `NATSConfig` struct Updated stream subject configuration to use configurable `logs_subject` with fallback Added credentials file authentication to NATS connection setup Updated default subject from `events.otel` to `otel`	+22/-5
server.rs `Rename poller_id to gateway_id in sysmon service` cmd/checkers/sysmon/src/server.rs Renamed `poller_id` field to `gateway_id` in GetStatus and GetResults response logging and response building	+6/-6
message_processor.rs `Simplify message processing to direct JSON publishing` cmd/consumers/zen/src/message_processor.rs Removed CloudEvents event building and UUID/URL dependencies Simplified message processing to directly publish context JSON instead of wrapped CloudEvents Removed event_type derivation from rules	+2/-16
main.rs `Add NATS credentials support and rename poller_id` cmd/trapd/src/main.rs Added NATS credentials file support via `nats_creds_path()` method Updated NATS connection setup to use credentials file when available for both secure and non-secure modes Renamed `poller_id` to `gateway_id` in GetStatus and GetResults responses	+23/-3
nats_output.rs `Add NATS credentials file support to flowgger` cmd/flowgger/src/flowgger/output/nats_output.rs Added `creds_file` optional field to `NATSConfig` struct Added credentials file parsing with empty string handling Updated NATS connection setup to apply credentials file when available	+14/-0
config.rs `Add NATS credentials configuration to trapd` cmd/trapd/src/config.rs Added `nats_creds_file` optional configuration field Added `nats_creds_path()` method resolving credentials file path with security config support Added validation for non-empty credentials file configuration	+22/-1
grpc_server.rs `Rename poller_id to gateway_id in zen gRPC service` cmd/consumers/zen/src/grpc_server.rs Renamed `poller_id` field to `gateway_id` in GetStatus and GetResults response building	+2/-2
nats.rs `Add NATS credentials file support to zen consumer` cmd/consumers/zen/src/nats.rs Added credentials file support to NATS connection setup via `nats_creds_path()`	+4/-0
server.rs `Add gateway_id field to rperf service responses` cmd/checkers/rperf-client/src/server.rs Added `gateway_id` field to GetStatus and GetResults response building	+2/-0
account_client.ex `Implement NATS account management gRPC client` elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex Implemented gRPC client for datasvc `NATSAccountService` with account and credential management Provides functions for creating tenant accounts, generating user credentials, signing account JWTs, and bootstrapping operator Includes channel management with fallback to fresh connection creation and comprehensive error handling Supports account limits, subject mappings, stream exports/imports, and credential expiration	+621/-0
stateful_alert_engine.ex `Stateful alert engine with bucketed rule evaluation` elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex Implements a GenServer-based stateful alert evaluation engine for log/event rules with bucketed time-window aggregation Manages alert state snapshots in ETS, persists to database, and handles threshold-based firing/recovery logic Provides rule matching against logs and events with support for filtering by subject, service name, severity, and body content Integrates with alert generation, webhook notifications, and event recording for comprehensive alert lifecycle management	+960/-0
agent_gateway_server.ex `gRPC agent gateway server with mTLS multi-tenant security` elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex Implements gRPC server for receiving agent status pushes and streaming updates with multi-tenant mTLS security Extracts tenant identity from client certificates and validates component identity to prevent spoofing Handles agent enrollment, configuration delivery, and status processing with comprehensive validation and error handling Manages agent registry updates, heartbeats, and forwards status data to core cluster for processing	+1020/-0
onboarding_packages.ex `Edge onboarding packages with certificate generation` elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex Provides Ash-based context for managing edge onboarding packages with token generation and delivery workflows Supports package creation with encrypted join/download tokens and optional component certificates signed by tenant CA Implements package lifecycle operations including delivery verification, revocation, and soft-delete with event recording Generates component certificates with SPIFFE URIs and manages certificate bundles for secure agent onboarding	+622/-0
agent.ex `Agent resource with OCSF state machine and capabilities` elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex Defines Agent resource as Ash-based OCSF v1.4.0 Agent object for managing Go agents running on monitored hosts Implements state machine with transitions for agent lifecycle (connecting, connected, degraded, disconnected, unavailable) Provides capability definitions and type mappings for agent monitoring capabilities (ICMP, TCP, HTTP, gRPC, DNS, Process, SNMP) Includes JSON API routes for agent registration, connection management, and heartbeat operations with tenant isolation policies	+665/-0
alert_generator.ex `Alert generation service for monitoring events` elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex New module for generating alerts from monitoring events (service state changes, device availability, metric violations, etc.) Implements alert creation with severity levels and webhook notifications Handles stats anomaly detection with cooldown mechanism using persistent_term Provides startup/shutdown notification functions for core service lifecycle	+609/-0
tenant_registry.ex `Multi-tenant process registry and supervisor management` elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex New DynamicSupervisor managing per-tenant Horde registries and supervisors for multi-tenant process isolation Implements slug-to-UUID mapping via ETS table for admin/debug lookups Provides registry lifecycle management (creation, stopping) and child process management Includes convenience functions for gateway and agent registration and discovery	+634/-0
alias_events.ex `Device alias lifecycle event tracking system` elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex New module for tracking device alias lifecycle events (service IDs, IPs, collectors) Implements `AliasRecord` struct for parsing and comparing alias metadata Detects alias changes and generates lifecycle events for audit/alerting Provides functions to process and persist alias updates to DeviceAliasState resource	+654/-0
generator.ex `X.509 certificate generation for tenant CAs` elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex New module for X.509 certificate generation for per-tenant CAs and edge components Generates tenant intermediate CAs (10-year validity) and edge component certificates (1-year validity) Implements certificate signing, encoding, and SPIFFE ID extraction from certificate CNs Provides SPKI SHA-256 hash computation for certificate pinning	+541/-0
spiffe.ex `SPIFFE/SPIRE integration for distributed cluster` elixir/serviceradar_core/lib/serviceradar/spiffe.ex New module for SPIFFE/SPIRE integration providing X.509 SVID loading and verification Implements SSL/TLS options configuration for ERTS distribution and client/server connections Parses and validates SPIFFE IDs with trust domain verification Provides certificate expiry monitoring and file rotation watching capabilities	+564/-0
poll_orchestrator.ex `Poll execution orchestrator for service checks` elixir/serviceradar_core/lib/serviceradar/monitoring/poll_orchestrator.ex New module orchestrating poll execution for scheduled service checks across the cluster Manages PollJob lifecycle (creation, state transitions, completion) via AshStateMachine Discovers available gateways via Horde registry and dispatches checks using location-transparent PIDs Supports multiple gateway assignment modes (any, partition, domain, specific) and async execution	+450/-0

Documentation

3 files

main.go `Update API documentation terminology` cmd/core/main.go Updated API description from "service pollers" to "service gateways"	+1/-1
main.go `Update config-sync role documentation` cmd/tools/config-sync/main.go Updated role flag description from "poller" to "gateway" in help text	+1/-1
setup.rs `Add NATS credentials debug logging` cmd/otel/src/setup.rs Added debug logging for NATS credentials file configuration	+1/-0

Configuration

1 files

20260107043446_initial_schema.exs

Add initial tenant database schema migration

elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs

Created comprehensive database schema migration with 40+ tables for
multi-tenant platform
Includes tables for user management, NATS infrastructure, device
discovery, monitoring, alerts, and onboarding
Defines relationships, indexes, and constraints for tenant isolation
and data integrity
Implements encrypted fields for sensitive data (keys, credentials,
certificates)

+1416/-0

Configuration changes

1 files

runtime.exs

Agent gateway runtime configuration with cluster strategies

elixir/serviceradar_agent_gateway/config/runtime.exs

Configures cluster topology strategies (Kubernetes DNS, DNSPoll, EPMD,
Gossip) for agent gateway node discovery
Sets up SPIFFE/mTLS configuration for secure inter-node communication
with trust domain and certificate paths
Disables core database and Oban job queue in agent gateway while
enabling cluster coordination with core nodes
Configures PubSub and telemetry for distributed logging and event
aggregation across the cluster

+209/-0

Additional files

101 files

.bazelignore	+4/-0
.bazelrc	+5/-0
.env-sample	+33/-0
.env.example	+38/-0
main.yml	+18/-0
AGENTS.md	+177/-11
INSTALL.md	+11/-11
MODULE.bazel	+22/-2
Makefile	+55/-14
README-Docker.md	+17/-2
README.md	+3/-3
ROADMAP.md	+1/-1
BUILD.bazel	+11/-6
BUILD.bazel	+12/-0
mix_release.bzl	+141/-49
BUILD.bazel	+1/-0
README.md	+4/-4
config.json	+5/-6
build.rs	+0/-1
monitoring.proto	+3/-26
BUILD.bazel	+1/-1
README.md	+2/-2
monitoring.proto	+2/-26
Cargo.toml	+0/-3
README.md	+8/-8
zen-consumer-with-otel.json	+14/-11
zen-consumer.json	+14/-11
config.json	+4/-4
config.json	+4/-4
BUILD.bazel	+1/-0
README.md	+3/-3
README.md	+9/-12
flowgger.toml	+2/-1
otel.toml	+3/-1
otel.toml.example	+5/-2
BUILD.bazel	+0/-25
config.json	+0/-111
main.go	+0/-138
BUILD.bazel	+0/-25
config.json	+0/-77
main.go	+0/-123
README.md	+3/-3
docker-compose.elx.yml	+117/-0
docker-compose.spiffe.yml	+8/-158
docker-compose.yml	+316/-269
Dockerfile.agent-gateway	+94/-0
Dockerfile.core-elx	+108/-0
Dockerfile.poller	+0/-70
Dockerfile.sync	+0/-95
Dockerfile.tools	+1/-2
Dockerfile.web-ng	+6/-0
agent-minimal.docker.json	+6/-6
agent.docker.json	+5/-20
agent.mtls.json	+7/-10
bootstrap-nested-spire.sh	+0/-80
.gitkeep	+1/-0
datasvc.docker.json	+3/-2
datasvc.mtls.json	+14/-1
db-event-writer.docker.json	+15/-11
db-event-writer.mtls.json	+10/-8
FRICTION_POINTS.md	+0/-355
README.md	+0/-207
SETUP_GUIDE.md	+0/-307
docker-compose.edge-e2e.yml	+0/-27
manage-packages.sh	+0/-211
setup-edge-e2e.sh	+0/-198
edge-poller-restart.sh	+0/-178
downstream-agent.conf	+0/-32
env	+0/-4
server.conf	+0/-51
upstream-agent.conf	+0/-32
entrypoint-certs.sh	+13/-9
entrypoint-poller.sh	+0/-274
entrypoint-sync.sh	+0/-96
fix-cert-permissions.sh	+2/-2
flowgger.docker.toml	+3/-2
generate-certs.sh	+214/-12
nats.docker.conf	+16/-160
netflow-consumer.mtls.json	+1/-0
otel.docker.toml	+7/-2
pg_hba.conf	+9/-0
pg_ident.conf	+17/-0
poller-stack.compose.yml	+0/-121
poller.docker.json	+0/-128
poller.mtls.json	+0/-135
poller.spiffe.json	+0/-55
refresh-upstream-credentials.sh	+0/-248
seed-poller-kv.sh	+0/-83
setup-edge-poller.sh	+0/-204
README.md	+5/-5
bootstrap-compose-spire.sh	+0/-2
ssl_dist.core.conf	+17/-0
ssl_dist.gateway.conf	+17/-0
ssl_dist.web.conf	+17/-0
sync.docker.json	+0/-71
sync.mtls.json	+0/-75
sysmon-osx.checker.json	+1/-1
tools-profile.sh	+1/-2
trapd.docker.json	+3/-2
update-config.sh	+1/-190
Additional files not shown

Imported from GitHub pull request. Original GitHub pull request: #2243 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2243 Original created: 2026-01-11T05:41:42Z Original updated: 2026-01-11T08:22:48Z Original head: carverauto/serviceradar:feat/rule-builder-ui Original base: testing Original merged: 2026-01-11T08:22:47Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Enhancement ___ ### **Description** Major architectural refactoring transitioning from poller-based to gateway-based architecture with comprehensive platform modernization: **Core Architecture Changes:** - Refactored agent from KV-based config management to push mode with file-based configuration and direct gateway connection - Implemented gRPC agent gateway server with mTLS multi-tenant security for receiving agent status pushes - Removed legacy poller, sync, and edge onboarding components; replaced with modern push-based architecture **NATS Infrastructure:** - Added NATS account service initialization with operator key management and gRPC server registration - Implemented comprehensive NATS credentials file support across all consumers (zen, trapd, flowgger, otel) - Added wildcard subject pattern matching (`*` and `>`) for flexible event routing - Updated default NATS subjects and added configurable logs subject support **Multi-Tenant Platform Foundation:** - Created comprehensive database schema migration with 40+ tables for user management, NATS infrastructure, device discovery, monitoring, and alerts - Implemented multi-tenant process registry and supervisor management via DynamicSupervisor with Horde registries - Added SPIFFE/SPIRE integration for distributed cluster security with X.509 certificate generation for tenant CAs - Implemented per-tenant CA and edge component certificate generation with SPIFFE URIs **Monitoring & Alerting:** - Implemented stateful alert engine with bucketed time-window aggregation for log/event rules - Added alert generation service for monitoring events with severity levels and webhook notifications - Implemented poll orchestrator for distributed service check execution across cluster gateways - Added device alias lifecycle event tracking system for audit and alerting **Infrastructure Management:** - Defined Agent resource with OCSF v1.4.0 state machine and capability definitions - Implemented edge onboarding packages with encrypted token generation and component certificate signing - Added NATS account management gRPC client for tenant account and credential lifecycle - Configured agent gateway runtime with cluster topology strategies and mTLS security **Terminology Updates:** - Renamed all references from "poller" to "gateway" across Go services (SNMP, sysmon, rperf-client) - Updated CLI commands: renamed `update-poller` to `update-gateway`, added `nats-bootstrap` and `admin` subcommands - Updated API documentation and help text to reflect new gateway terminology **Message Processing Simplification:** - Removed CloudEvents wrapping from zen consumer message processing - Simplified to direct JSON publishing of context data instead of wrapped events ___ ### Diagram Walkthrough ```mermaid flowchart LR Agent["Agent Push Mode"] -->|mTLS| Gateway["Agent Gateway gRPC Server"] Gateway -->|Status Updates| Core["Core Platform Multi-Tenant"] Core -->|Config| Agent NATS["NATS Infrastructure Operator + Accounts"] -->|Credentials| Consumers["Consumers zen, trapd, flowgger, otel"] Consumers -->|Events| Core Core -->|Orchestrate| Checks["Service Checks Poll Orchestrator"] Checks -->|Execute| Gateways["Distributed Gateways SNMP, sysmon, rperf"] Core -->|Rules| AlertEngine["Stateful Alert Engine Bucketed Aggregation"] AlertEngine -->|Webhooks| Notifications["Alert Notifications"] TenantCA["Tenant CA Certificate Generation"] -->|Certs| Onboarding["Edge Onboarding Packages"] Onboarding -->|Enroll| Agent ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td>Enhancement</td><td><details><summary>27 files</summary><table> <tr> <td> <details> <summary>main.go<dd><code>Refactor agent to push mode with file-based config</code>              </dd></summary> <hr> cmd/agent/main.go <ul><li>Refactored agent startup from KV-based config management to direct file-based configuration loading <li> Removed edge onboarding, KV watch, and lifecycle server dependencies; replaced with push mode architecture <li> Added <code>loadConfig()</code> function for JSON config parsing with embedded defaults fallback <li> Implemented <code>runPushMode()</code> function handling gateway connection, push loop, and graceful shutdown with signal handling <li> Added version injection via ldflags and proper error handling for missing gateway address</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3">+174/-74</a></td> </tr> <tr> <td> <details> <summary>main.go<dd><code>Initialize NATS account service with operator support</code>        </dd></summary> <hr> cmd/data-services/main.go <ul><li>Added NATS account service initialization with operator key management <li> Configured resolver paths and system account credentials from environment variables with config file fallback <li> Registered <code>NATSAccountService</code> gRPC server when configured <li> Added logging for operator initialization, allowed client identities, and resolver configuration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816">+68/-0</a>    </td> </tr> <tr> <td> <details> <summary>main.go<dd><code>Add NATS bootstrap and admin command support</code>                          </dd></summary> <hr> cmd/cli/main.go <ul><li>Renamed <code>update-poller</code> subcommand to <code>update-gateway</code> <li> Added <code>nats-bootstrap</code> subcommand for NATS operator initialization <li> Added <code>admin</code> subcommand dispatcher with <code>nats</code> admin resource routing <li> Created <code>dispatchAdminCommand()</code> helper function for admin subcommand routing</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ed4d81d29a7267f93fd77e17993fd3491b9ef6ded18490b4514d10ed1d803bc2">+16/-2</a>    </td> </tr> <tr> <td> <details> <summary>main.go<dd><code>Rename SNMP poller to gateway terminology</code>                                </dd></summary> <hr> cmd/checkers/snmp/main.go <ul><li>Renamed <code>SNMPPollerService</code> to <code>SNMPGatewayService</code> <li> Renamed <code>Poller</code> struct to <code>Gateway</code> struct</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f25402eade63525184cb5e7437accff93c7b9338eebe81add6dc5f2a9eb12550">+1/-1</a>      </td> </tr> <tr> <td> <details> <summary>app.go<dd><code>Rename poller service to agent gateway service</code>                      </dd></summary> <hr> cmd/core/app/app.go <ul><li>Renamed gRPC service registration from <code>RegisterPollerServiceServer</code> to <code>RegisterAgentGatewayServiceServer</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4ad8a289575edf3b163088617b7a40ae1305c29ced0c7d59b3751c57d6938072">+1/-1</a>      </td> </tr> <tr> <td> <details> <summary>config.rs<dd><code>Add NATS credentials and wildcard subject matching</code>              </dd></summary> <hr> cmd/consumers/zen/src/config.rs <ul><li>Added <code>nats_creds_file</code> optional configuration field for NATS credentials <li> Added <code>nats_creds_path()</code> method resolving credentials file path with support for absolute/relative paths and cert directory <li> Updated subject pattern matching to support wildcard patterns (<code>*</code> and <code>></code>) via new <code>subject_matches()</code> function <li> Updated test fixtures to use wildcard subject patterns and added credentials file validation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-05038f3867985e757de9027609950e682bad6d1992dac6acd7c28962a3c65dc4">+85/-28</a>  </td> </tr> <tr> <td> <details> <summary>config.rs<dd><code>Add NATS credentials and logs subject configuration</code>            </dd></summary> <hr> cmd/otel/src/config.rs <ul><li>Added <code>logs_subject</code> optional field for dedicated logs subject configuration <li> Added <code>creds_file</code> optional field for NATS credentials file path <li> Updated default NATS subject from <code>events.otel</code> to <code>otel</code> <li> Added credentials file parsing with empty string handling</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-abbaec651da3d6af96b482e0f77bb909b65dbe0cabd78b5803769cc9dab0a1b0">+21/-3</a>    </td> </tr> <tr> <td> <details> <summary>nats_output.rs<dd><code>Implement NATS credentials and configurable logs subject</code>  </dd></summary> <hr> cmd/otel/src/nats_output.rs <ul><li>Added <code>logs_subject</code> and <code>creds_file</code> fields to <code>NATSConfig</code> struct <li> Updated stream subject configuration to use configurable <code>logs_subject</code> with fallback <li> Added credentials file authentication to NATS connection setup <li> Updated default subject from <code>events.otel</code> to <code>otel</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-6b585ea3564a481174e04da1270e2e13edd4e2b980d02a2652d6d21e6d82a498">+22/-5</a>    </td> </tr> <tr> <td> <details> <summary>server.rs<dd><code>Rename poller_id to gateway_id in sysmon service</code>                  </dd></summary> <hr> cmd/checkers/sysmon/src/server.rs <ul><li>Renamed <code>poller_id</code> field to <code>gateway_id</code> in GetStatus and GetResults response logging and response building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33e">+6/-6</a>      </td> </tr> <tr> <td> <details> <summary>message_processor.rs<dd><code>Simplify message processing to direct JSON publishing</code>        </dd></summary> <hr> cmd/consumers/zen/src/message_processor.rs <ul><li>Removed CloudEvents event building and UUID/URL dependencies <li> Simplified message processing to directly publish context JSON instead of wrapped CloudEvents <li> Removed event_type derivation from rules</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9fcbc5358a9009e60a8cd22d21e5a9ea652787c727732d0b869e0865495114c3">+2/-16</a>    </td> </tr> <tr> <td> <details> <summary>main.rs<dd><code>Add NATS credentials support and rename poller_id</code>                </dd></summary> <hr> cmd/trapd/src/main.rs <ul><li>Added NATS credentials file support via <code>nats_creds_path()</code> method <li> Updated NATS connection setup to use credentials file when available for both secure and non-secure modes <li> Renamed <code>poller_id</code> to <code>gateway_id</code> in GetStatus and GetResults responses</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-33b655d8730ae3e9c844ee280787d11f1b0d5343119188273f89558805f814ba">+23/-3</a>    </td> </tr> <tr> <td> <details> <summary>nats_output.rs<dd><code>Add NATS credentials file support to flowgger</code>                        </dd></summary> <hr> cmd/flowgger/src/flowgger/output/nats_output.rs <ul><li>Added <code>creds_file</code> optional field to <code>NATSConfig</code> struct <li> Added credentials file parsing with empty string handling <li> Updated NATS connection setup to apply credentials file when available</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a82e2e4d413539bf0b414b5629665b19648447523994cba639c4d1238aa5a0c1">+14/-0</a>    </td> </tr> <tr> <td> <details> <summary>config.rs<dd><code>Add NATS credentials configuration to trapd</code>                            </dd></summary> <hr> cmd/trapd/src/config.rs <ul><li>Added <code>nats_creds_file</code> optional configuration field <li> Added <code>nats_creds_path()</code> method resolving credentials file path with security config support <li> Added validation for non-empty credentials file configuration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c89b88ba4d2bf0a054d0ba69a672a92c30140b8d19503d67b980a218ffe3106d">+22/-1</a>    </td> </tr> <tr> <td> <details> <summary>grpc_server.rs<dd><code>Rename poller_id to gateway_id in zen gRPC service</code>              </dd></summary> <hr> cmd/consumers/zen/src/grpc_server.rs <ul><li>Renamed <code>poller_id</code> field to <code>gateway_id</code> in GetStatus and GetResults response building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e4564a93f6cf84ff91cd3d8141fc9272ec9b4ec19defd107afa42be01fcfed5b">+2/-2</a>      </td> </tr> <tr> <td> <details> <summary>nats.rs<dd><code>Add NATS credentials file support to zen consumer</code>                </dd></summary> <hr> cmd/consumers/zen/src/nats.rs <ul><li>Added credentials file support to NATS connection setup via <code>nats_creds_path()</code></ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-97f7335def0ad5d644b594a1076ae2d7080b11259cbb8de22c7946cc8e4b39f8">+4/-0</a>      </td> </tr> <tr> <td> <details> <summary>server.rs<dd><code>Add gateway_id field to rperf service responses</code>                    </dd></summary> <hr> cmd/checkers/rperf-client/src/server.rs <ul><li>Added <code>gateway_id</code> field to GetStatus and GetResults response building</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bce0f4ca6548712f224b73816825d28e831acbbff7dbed3c98671ed50f65d028">+2/-0</a>      </td> </tr> <tr> <td> <details> <summary>account_client.ex<dd><code>Implement NATS account management gRPC client</code>                        </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/nats/account_client.ex <ul><li>Implemented gRPC client for datasvc <code>NATSAccountService</code> with account and credential management <li> Provides functions for creating tenant accounts, generating user credentials, signing account JWTs, and bootstrapping operator <li> Includes channel management with fallback to fresh connection creation and comprehensive error handling <li> Supports account limits, subject mappings, stream exports/imports, and credential expiration</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2e18ac777ac600b12982ba9e9d5327e23ebd84c139a2add7976f8bf61283e554">+621/-0</a>  </td> </tr> <tr> <td> <details> <summary>stateful_alert_engine.ex<dd><code>Stateful alert engine with bucketed rule evaluation</code>            </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex <ul><li>Implements a GenServer-based stateful alert evaluation engine for log/event rules with bucketed time-window aggregation <li> Manages alert state snapshots in ETS, persists to database, and handles threshold-based firing/recovery logic <li> Provides rule matching against logs and events with support for filtering by subject, service name, severity, and body content <li> Integrates with alert generation, webhook notifications, and event recording for comprehensive alert lifecycle management</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bae3a52db882de8c947e62f219a95dff8db4e155e37d9a361dbe14ec25fcd3bd">+960/-0</a>  </td> </tr> <tr> <td> <details> <summary>agent_gateway_server.ex<dd><code>gRPC agent gateway server with mTLS multi-tenant security</code></dd></summary> <hr> elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex <ul><li>Implements gRPC server for receiving agent status pushes and streaming updates with multi-tenant mTLS security <li> Extracts tenant identity from client certificates and validates component identity to prevent spoofing <li> Handles agent enrollment, configuration delivery, and status processing with comprehensive validation and error handling <li> Manages agent registry updates, heartbeats, and forwards status data to core cluster for processing</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9f">+1020/-0</a></td> </tr> <tr> <td> <details> <summary>onboarding_packages.ex<dd><code>Edge onboarding packages with certificate generation</code>          </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/onboarding_packages.ex <ul><li>Provides Ash-based context for managing edge onboarding packages with token generation and delivery workflows <li> Supports package creation with encrypted join/download tokens and optional component certificates signed by tenant CA <li> Implements package lifecycle operations including delivery verification, revocation, and soft-delete with event recording <li> Generates component certificates with SPIFFE URIs and manages certificate bundles for secure agent onboarding</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e4fe8e19bc324416302bb4c962f57133b3f62eb82053766844d881c522a473e5">+622/-0</a>  </td> </tr> <tr> <td> <details> <summary>agent.ex<dd><code>Agent resource with OCSF state machine and capabilities</code>    </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex <ul><li>Defines Agent resource as Ash-based OCSF v1.4.0 Agent object for managing Go agents running on monitored hosts <li> Implements state machine with transitions for agent lifecycle (connecting, connected, degraded, disconnected, unavailable) <li> Provides capability definitions and type mappings for agent monitoring capabilities (ICMP, TCP, HTTP, gRPC, DNS, Process, SNMP) <li> Includes JSON API routes for agent registration, connection management, and heartbeat operations with tenant isolation policies</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843a">+665/-0</a>  </td> </tr> <tr> <td> <details> <summary>alert_generator.ex<dd><code>Alert generation service for monitoring events</code>                      </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex <ul><li>New module for generating alerts from monitoring events (service state changes, device availability, metric violations, etc.) <li> Implements alert creation with severity levels and webhook notifications <li> Handles stats anomaly detection with cooldown mechanism using persistent_term <li> Provides startup/shutdown notification functions for core service lifecycle</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-62074160ac91002a439bab337a032329681bc55c84a59ab9934bc76d05a5de04">+609/-0</a>  </td> </tr> <tr> <td> <details> <summary>tenant_registry.ex<dd><code>Multi-tenant process registry and supervisor management</code>    </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex <ul><li>New DynamicSupervisor managing per-tenant Horde registries and supervisors for multi-tenant process isolation <li> Implements slug-to-UUID mapping via ETS table for admin/debug lookups <li> Provides registry lifecycle management (creation, stopping) and child process management <li> Includes convenience functions for gateway and agent registration and discovery</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-91248b3b128a2e3d9bea6ffdb5e0f295e4a1745e82f87687c640ad01416fb85d">+634/-0</a>  </td> </tr> <tr> <td> <details> <summary>alias_events.ex<dd><code>Device alias lifecycle event tracking system</code>                          </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex <ul><li>New module for tracking device alias lifecycle events (service IDs, IPs, collectors) <li> Implements <code>AliasRecord</code> struct for parsing and comparing alias metadata <li> Detects alias changes and generates lifecycle events for audit/alerting <li> Provides functions to process and persist alias updates to DeviceAliasState resource</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28">+654/-0</a>  </td> </tr> <tr> <td> <details> <summary>generator.ex<dd><code>X.509 certificate generation for tenant CAs</code>                            </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex <ul><li>New module for X.509 certificate generation for per-tenant CAs and edge components <li> Generates tenant intermediate CAs (10-year validity) and edge component certificates (1-year validity) <li> Implements certificate signing, encoding, and SPIFFE ID extraction from certificate CNs <li> Provides SPKI SHA-256 hash computation for certificate pinning</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b48e4a9e1189da61e2a60e16f56fce81298d76b7cdab745107140fed3f6e48b4">+541/-0</a>  </td> </tr> <tr> <td> <details> <summary>spiffe.ex<dd><code>SPIFFE/SPIRE integration for distributed cluster</code>                  </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/spiffe.ex <ul><li>New module for SPIFFE/SPIRE integration providing X.509 SVID loading and verification <li> Implements SSL/TLS options configuration for ERTS distribution and client/server connections <li> Parses and validates SPIFFE IDs with trust domain verification <li> Provides certificate expiry monitoring and file rotation watching capabilities</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0cb8d921c19f671b66f91c0978e351e71d927c5f4694924984c9f1ed34d7ee78">+564/-0</a>  </td> </tr> <tr> <td> <details> <summary>poll_orchestrator.ex<dd><code>Poll execution orchestrator for service checks</code>                      </dd></summary> <hr> elixir/serviceradar_core/lib/serviceradar/monitoring/poll_orchestrator.ex <ul><li>New module orchestrating poll execution for scheduled service checks across the cluster <li> Manages PollJob lifecycle (creation, state transitions, completion) via AshStateMachine <li> Discovers available gateways via Horde registry and dispatches checks using location-transparent PIDs <li> Supports multiple gateway assignment modes (any, partition, domain, specific) and async execution</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-68a63639fc9d92d29501700c6604921098c9bbbf21e54f9148c1109c17c9c6d4">+450/-0</a>  </td> </tr> </table></details></td></tr><tr><td>Documentation</td><td><details><summary>3 files</summary><table> <tr> <td> <details> <summary>main.go<dd><code>Update API documentation terminology</code>                                          </dd></summary> <hr> cmd/core/main.go <ul><li>Updated API description from "service pollers" to "service gateways"</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4ab3fd1d4debc53dd2499d94a0f60c648fdae4235dd1e3678095a975f5bb434a">+1/-1</a>      </td> </tr> <tr> <td> <details> <summary>main.go<dd><code>Update config-sync role documentation</code>                                        </dd></summary> <hr> cmd/tools/config-sync/main.go <ul><li>Updated role flag description from "poller" to "gateway" in help text</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bc6eeb1b05bcb9179525e32fac1de9926b5823ec3504be546ab10c5c9740f544">+1/-1</a>      </td> </tr> <tr> <td> <details> <summary>setup.rs<dd><code>Add NATS credentials debug logging</code>                                              </dd></summary> <hr> cmd/otel/src/setup.rs - Added debug logging for NATS credentials file configuration </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3891f667deb20fd26e296d3e2742c57378d3764fe1743118e612465ae360391f">+1/-0</a>      </td> </tr> </table></details></td></tr><tr><td>Configuration</td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary>20260107043446_initial_schema.exs<dd><code>Add initial tenant database schema migration</code>                          </dd></summary> <hr> elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs <ul><li>Created comprehensive database schema migration with 40+ tables for multi-tenant platform <li> Includes tables for user management, NATS infrastructure, device discovery, monitoring, alerts, and onboarding <li> Defines relationships, indexes, and constraints for tenant isolation and data integrity <li> Implements encrypted fields for sensitive data (keys, credentials, certificates)</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0d217dc9822fab0d3390e8ec21040f98e67106e5c9126e043a9b701efcbfb576">+1416/-0</a></td> </tr> </table></details></td></tr><tr><td>Configuration changes</td><td><details><summary>1 files</summary><table> <tr> <td> <details> <summary>runtime.exs<dd><code>Agent gateway runtime configuration with cluster strategies</code></dd></summary> <hr> elixir/serviceradar_agent_gateway/config/runtime.exs <ul><li>Configures cluster topology strategies (Kubernetes DNS, DNSPoll, EPMD, Gossip) for agent gateway node discovery <li> Sets up SPIFFE/mTLS configuration for secure inter-node communication with trust domain and certificate paths <li> Disables core database and Oban job queue in agent gateway while enabling cluster coordination with core nodes <li> Configures PubSub and telemetry for distributed logging and event aggregation across the cluster</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-842568fafa717a8064203543d674517fd28ad7dd2a4d3f0f157d274cfda4f18b">+209/-0</a>  </td> </tr> </table></details></td></tr><tr><td>Additional files</td><td><details><summary>101 files</summary><table> <tr> <td>.bazelignore</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a5641cd37d6ad98b32cdfce1980836cc68312277bc6a7052f55da02ada5bc6cf">+4/-0</a>      </td> </tr> <tr> <td>.bazelrc</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-544556920c45b42cbfe40159b082ce8af6bd929e492d076769226265f215832f">+5/-0</a>      </td> </tr> <tr> <td>.env-sample</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c4368a972a7fa60d9c4e333cebf68cdb9a67acb810451125c02e3b7eb2594e3d">+33/-0</a>    </td> </tr> <tr> <td>.env.example</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a3046da0d15a27e89f2afe639b25748a7ad4d9290af3e7b1b6c1a5533c8f0a8c">+38/-0</a>    </td> </tr> <tr> <td>main.yml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7829468e86c1cc5d5133195b5cb48e1ff6c75e3e9203777f6b2e379d9e4882b3">+18/-0</a>    </td> </tr> <tr> <td>AGENTS.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-a54ff182c7e8acf56acfd6e4b9c3ff41e2c41a31c9b211b2deb9df75d9a478f9">+177/-11</a></td> </tr> <tr> <td>INSTALL.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-09b140a43ebfdd8dbec31ce72cafffd15164d2860fd390692a030bcb932b54a0">+11/-11</a>  </td> </tr> <tr> <td>MODULE.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-6136fc12446089c3db7360e923203dd114b6a1466252e71667c6791c20fe6bdc">+22/-2</a>    </td> </tr> <tr> <td>Makefile</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52">+55/-14</a>  </td> </tr> <tr> <td>README-Docker.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9fd61d24482efe68c22d8d41e2a1dcc440f39195aa56e7a050f2abe598179efd">+17/-2</a>    </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5">+3/-3</a>      </td> </tr> <tr> <td>ROADMAP.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-683343bdf93f55ed3cada86151abb8051282e1936e58d4e0a04beca95dff6e51">+1/-1</a>      </td> </tr> <tr> <td>BUILD.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-884fa9353a5226345e44fbabea3300efc7a87dfbcde0b6a42521ca51823f1b68">+11/-6</a>    </td> </tr> <tr> <td>BUILD.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0e80ea46aeb61a873324685edb96eae864c7a2004fbb7ee404b4ec951190ba10">+12/-0</a>    </td> </tr> <tr> <td>mix_release.bzl</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-86ec281f99363b6b6eb1f49e21d83b7eeca93a35b552b9f305fffc6855e38ccd">+141/-49</a></td> </tr> <tr> <td>BUILD.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-143f8d1549d52f28906f19ce28e5568a5be474470ff103c2c1e63c3e6b08d670">+1/-0</a>      </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bfd308915d0cf522e7fc76600dee687617dc69165ab22502a1d219850c0c0860">+4/-4</a>      </td> </tr> <tr> <td>config.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5b1bc8fe77422534739bdd3a38dc20d2634a86c171265c34e1b5d0c5a61b6bab">+5/-6</a>      </td> </tr> <tr> <td>build.rs</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-251e7a923f45f8f903e510d10f183366bda06d281c8ecc3669e1858256e2186d">+0/-1</a>      </td> </tr> <tr> <td>monitoring.proto</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b56f709f4a0a3db694f2124353908318631f23e20b7846bc4b8ee869e2e0632a">+3/-26</a>    </td> </tr> <tr> <td>BUILD.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7da152990199fd73c1eecb40f9c49e0d4e6453a8ec1acb111e445c55d1ca0af0">+1/-1</a>      </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2e9751b437fa61442aac074c7a4a912d0ac50ac3ea156ac8aedd8478d21c6bdb">+2/-2</a>      </td> </tr> <tr> <td>monitoring.proto</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9faf6025eb0d3d38383f5b7ad2b733abeb38454d5e4de3e83994e94b12d87a50">+2/-26</a>    </td> </tr> <tr> <td>Cargo.toml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-fcf0c672917b64a5b953a914af013f16dddd6a1d813810236364e32f1ae70382">+0/-3</a>      </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-643d2c3959322902c5bc9a22666b1e9ef71fa0bb87c9451b0e4147a4d5b51987">+8/-8</a>      </td> </tr> <tr> <td>zen-consumer-with-otel.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-68375f1f7847e1fbdf75664f6be65b1ad94ae6ce86ed73fc5964d65054668acb">+14/-11</a>  </td> </tr> <tr> <td>zen-consumer.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4d308af9802a93a0f656e8c02a3b5fcd8991407bb18360f087470db74e1f9524">+14/-11</a>  </td> </tr> <tr> <td>config.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2423ef78d36e905ae993b69ff59f5df6b2e1b9492fb0fa8c6d0aad7c76d2d229">+4/-4</a>      </td> </tr> <tr> <td>config.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ef778d85ac6f9652c25cb0d631f0fe8dfb3edac4dde5d719a4fc2926fb5c3216">+4/-4</a>      </td> </tr> <tr> <td>BUILD.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c62c0139ebdb337369f4067567cd2c52b8e7decb3ddfabc77f9f67b2f6e5789c">+1/-0</a>      </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0b0725713b87dca1de57200214a4fe04633f0d856c39aa8032280227bf8e8141">+3/-3</a>      </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f425b4378f84e0ba0c6f532facff17ff5d55b4dc6033d8bf35130a159cd2ba32">+9/-12</a>    </td> </tr> <tr> <td>flowgger.toml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-af9f49f931e282dca53d1f0521b036d222fe671f77e61a876a84cf4c6d7cca4d">+2/-1</a>      </td> </tr> <tr> <td>otel.toml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c64b9ace832b8ea57a2be62f84166e03bb1904882635d444ec76a880cdf14cc0">+3/-1</a>      </td> </tr> <tr> <td>otel.toml.example</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c1889866f35f98cdba9cd229fc119273c5fa5fca501451db23813b575f6fec66">+5/-2</a>      </td> </tr> <tr> <td>BUILD.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e1f7c698e0e3a4e6afa971c1140e71cbf22593fbb19c81cb26b02c15c5dc46ec">+0/-25</a>    </td> </tr> <tr> <td>config.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9edc2486fff55fc399e0ac96dba5137948a7ea7285f5ef7846835355684b7ab5">+0/-111</a>  </td> </tr> <tr> <td>main.go</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4b8ec845da50cd58d011e69f9d1c30530ee1968df26616b8768bb1fc03433bbe">+0/-138</a>  </td> </tr> <tr> <td>BUILD.bazel</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4f5d2ea4260d490a0d6f28adde0b35eca8af77d22f3ee366a783946c53687619">+0/-25</a>    </td> </tr> <tr> <td>config.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-bcac20d6b3cb81f0059e766839ba1ee59a885009249501b0ba1182ebb1daea25">+0/-77</a>    </td> </tr> <tr> <td>main.go</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-78dc6bc53f1c760c66f43ff5f486bfe78a65bee8b2e0d4862293ec0892da2b29">+0/-123</a>  </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9c32ee8446458b6fd2ae7fee52016f4b707a59978b67888cd5bee2804d934528">+3/-3</a>      </td> </tr> <tr> <td>docker-compose.elx.yml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9562070d7ad4a3e9b2d06567008cf35de1d96448d914b3b45bf6c36d97cdd914">+117/-0</a>  </td> </tr> <tr> <td>docker-compose.spiffe.yml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-603fd9e7d40841d174f26b95d0cb0c9537430bf3f7a5da3ccbba4ea3d8ac66c9">+8/-158</a>  </td> </tr> <tr> <td>docker-compose.yml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3">+316/-269</a></td> </tr> <tr> <td>Dockerfile.agent-gateway</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-332bc81a932ae08efa711a71b60fe0954d99bf17ebdab00a3baaa177a44de8b0">+94/-0</a>    </td> </tr> <tr> <td>Dockerfile.core-elx</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5ec7a971285669999af442a0c7f141c34f7fd9180257307f5c4ed12f789a2182">+108/-0</a>  </td> </tr> <tr> <td>Dockerfile.poller</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d3ba129830fb366bfe23b00db4ef6218b10fc981d3c04842b1b3b3b367a8982f">+0/-70</a>    </td> </tr> <tr> <td>Dockerfile.sync</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0227933b9961fd553af1d229e89d71a0271fdc475081bbcef49b587941af1eda">+0/-95</a>    </td> </tr> <tr> <td>Dockerfile.tools</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0258db71e4070e342198965f1d046f3097640850b037df8a2287a7e239630add">+1/-2</a>      </td> </tr> <tr> <td>Dockerfile.web-ng</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-92d43af1965575d56c3380ecc8a81024aac2ff36f039ec2d3839e9fc7852bc10">+6/-0</a>      </td> </tr> <tr> <td>agent-minimal.docker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-1f09fad94636c90373af8e270f6ba0332ae4f4d1df50a4909729280a3a9691e6">+6/-6</a>      </td> </tr> <tr> <td>agent.docker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-5d33fe703515d03076d31261ecf946e9c6fc668cf5bf65099d49b670739e455e">+5/-20</a>    </td> </tr> <tr> <td>agent.mtls.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-008f2216f159a9bd5db9cc90baaf6f1e64487df7af05b56ab3b9d6c4946aa95f">+7/-10</a>    </td> </tr> <tr> <td>bootstrap-nested-spire.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ab4746a08fb1e0b307a1e47660cd22182e283a087cba87dcbff0fdfe750f44f1">+0/-80</a>    </td> </tr> <tr> <td>.gitkeep</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d72c41aab2d6f2c230a4340dfefe7917cdd12bed942c825aa0d4c9875a637bac">+1/-0</a>      </td> </tr> <tr> <td>datasvc.docker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3f2719d3dbfe042e8383739e3c78e74e5f851a44e5e46bea8e79c4b79fdcc34f">+3/-2</a>      </td> </tr> <tr> <td>datasvc.mtls.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3a45619e57f1e6e9a31486ec7fffb33ef246e271f82bac272ee0a946b88da70a">+14/-1</a>    </td> </tr> <tr> <td>db-event-writer.docker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9fc51271f7ef5bb460160013e24e44e829b730656891d26fc49d5fe72fbb3147">+15/-11</a>  </td> </tr> <tr> <td>db-event-writer.mtls.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7a33f95f7545499abf0ed9fc91b58499ab209639e4885019579c959583fc7496">+10/-8</a>    </td> </tr> <tr> <td>FRICTION_POINTS.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b0653c58880f810ba832c0500733d63de309db98b43009fe73a1862494cf41bd">+0/-355</a>  </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-31849f033cfc932acee35f549c069abb1f36101c352e553dd6bff8713b29f98c">+0/-207</a>  </td> </tr> <tr> <td>SETUP_GUIDE.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-b4914f8640a78038e45f51235a624535672680dc902de5f107fc051f4f281913">+0/-307</a>  </td> </tr> <tr> <td>docker-compose.edge-e2e.yml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-575d19ea771bdf8102cb9729db43a1bfd6afc2527160e54105beeac2e314f362">+0/-27</a>    </td> </tr> <tr> <td>manage-packages.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-3c2ff6febbddb956c71557894adaf7d0a39a1f20dda120fe126364946bc47280">+0/-211</a>  </td> </tr> <tr> <td>setup-edge-e2e.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2714e2c7e111f69ea9e9f5ddd7f6a70fa5ea96e3a53b851cb13b8b8b7cd12917">+0/-198</a>  </td> </tr> <tr> <td>edge-poller-restart.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-96a8fe52c38fd0d7c14895127df34a27be311cac89c53d28ee178661b629bd22">+0/-178</a>  </td> </tr> <tr> <td>downstream-agent.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-747de0375ced42af978ca7dac239862bdabb7f6bd0bd634f134b485517a7b4ee">+0/-32</a>    </td> </tr> <tr> <td>env</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-686f1a954c542f2ec9bf14c3170648b65190ad242c7f3a95a0f872ae41b8b1c6">+0/-4</a>      </td> </tr> <tr> <td>server.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-025f5b5ab79526cf549ca1fdb90dd659ba76b438f05a7f77d916d18728c4b572">+0/-51</a>    </td> </tr> <tr> <td>upstream-agent.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e8a869ddf4affa31536a8d4e4e6f09c40072a7026da2c609d93c6ecf04138902">+0/-32</a>    </td> </tr> <tr> <td>entrypoint-certs.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-83d6800b184a5233c66c69766286b0a60fece1bc64addb112d9f8dc019437f05">+13/-9</a>    </td> </tr> <tr> <td>entrypoint-poller.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e202d27e3331088745eb55cdd2b3e40ac3f5df109d9ff5c76c0faed60772807a">+0/-274</a>  </td> </tr> <tr> <td>entrypoint-sync.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9d5620b8e6833309dbafb8ee6b6b75c3b942d163c3fe7f1a9827958b2d640265">+0/-96</a>    </td> </tr> <tr> <td>fix-cert-permissions.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-17ea40a11edcaa7c85bb4215fda46b5a32505246fef0ab5f3ed47b28470c5ec8">+2/-2</a>      </td> </tr> <tr> <td>flowgger.docker.toml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-824f8797b418d4b9f5ea41e4a3741a0ed64b881f343072464489a76b7ea01008">+3/-2</a>      </td> </tr> <tr> <td>generate-certs.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-8298241543b4744a6ac7780c760ac5b5a0a87ba62de19c8612ebe1aba0996ebd">+214/-12</a></td> </tr> <tr> <td>nats.docker.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-06f2012494f428fe1bfb304972061c2094e0d99da88ba9af6914f7776872e6eb">+16/-160</a></td> </tr> <tr> <td>netflow-consumer.mtls.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f15920e8498a24f71ce3eec4f48fe8fefbb1765a90362998af779a660fcef9e1">+1/-0</a>      </td> </tr> <tr> <td>otel.docker.toml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d4af38790e3657b7589cd37a7539d5308b032f11caba7aa740ddc86bf99f4415">+7/-2</a>      </td> </tr> <tr> <td>pg_hba.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-7bd5f7292054916c7e5997f4c84ac9ec07d4c945621a48936c2aed0575fb96eb">+9/-0</a>      </td> </tr> <tr> <td>pg_ident.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-e7b8ce062e32c61fdc3bcc9e525c1f1df1c8008fbc02b11409e58c67baa17cc5">+17/-0</a>    </td> </tr> <tr> <td>poller-stack.compose.yml</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f3b5c991c2c1f7646db0ca4ed9bcb5df0f313ce6a05d8f3c890f80c873f776f5">+0/-121</a>  </td> </tr> <tr> <td>poller.docker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d64ebb69ec31e831efd187c47a5bfff2573960306b177f6464e91cb44a3c709d">+0/-128</a>  </td> </tr> <tr> <td>poller.mtls.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ef5d74bb3607431245c2bf06169d7fee89cae817e114035075b59a671229ab46">+0/-135</a>  </td> </tr> <tr> <td>poller.spiffe.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4e04bd23a0216287d5c0bb3831e0f95e7922ed03e8386a10ae7f4873e4fdb538">+0/-55</a>    </td> </tr> <tr> <td>refresh-upstream-credentials.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d3b3a8fcdea1b49c9e1c0ecc12d61fb6d416313520e8ad52edbee9094dbdc271">+0/-248</a>  </td> </tr> <tr> <td>seed-poller-kv.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c12070f475dbe7dc83e747fa6ec9d2ebdbdd97921a54f372abc89a102b783ad7">+0/-83</a>    </td> </tr> <tr> <td>setup-edge-poller.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-d7aec89d87f4cc98f4d6935e49a8f6ce571bc6dda254d894e93b60922f3a775f">+0/-204</a>  </td> </tr> <tr> <td>README.md</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-0cb49b4e37a7692f026133d5de971d449f42a1068226e848da5adf9af0ff4a2e">+5/-5</a>      </td> </tr> <tr> <td>bootstrap-compose-spire.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-ca219a124d4c95ee7995764d7e0c322b4bfe59e357b7bcb42bc5d7c8b9b0af0d">+0/-2</a>      </td> </tr> <tr> <td>ssl_dist.core.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-08d49d8621b581d1a9aa5c456f61e8c5774e021083c982cbb514019f915a1701">+17/-0</a>    </td> </tr> <tr> <td>ssl_dist.gateway.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4a43a8290d45ac68592000e7ef51afe78b4213090155bd42aafb46e66130f7ae">+17/-0</a>    </td> </tr> <tr> <td>ssl_dist.web.conf</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-cef5be462ddb059fdfdeb9fd7c5cd70e656c4cd8b6ae1fe3fe312557b3da80ac">+17/-0</a>    </td> </tr> <tr> <td>sync.docker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-4237fcee4f33a230abf28e12e8d4823499d163759cd1ff124fec1c62faa8b8b4">+0/-71</a>    </td> </tr> <tr> <td>sync.mtls.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-c652c07f7127be5b2932d92e6ef4c7448c544d1f3095cb96a03294fa58fd3c4c">+0/-75</a>    </td> </tr> <tr> <td>sysmon-osx.checker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-044334b566d907c77656b7f951092709da2a111dc968da9a76315b1c71200cf4">+1/-1</a>      </td> </tr> <tr> <td>tools-profile.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-f47597e2f5d4d085d8bf109109608f8ec0b7db8e90545e869b9ae409b607a4ac">+1/-2</a>      </td> </tr> <tr> <td>trapd.docker.json</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-1ab1a0e03e63bc02e0ef31992a7187a377927272ed2060150b40d44cc0ea3357">+3/-2</a>      </td> </tr> <tr> <td>update-config.sh</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-9ae50be83a13010a038389c74407ba1bde8cabcea0944e238c4b3374133f78bf">+1/-190</a>  </td> </tr> <tr> <td>Additional files not shown</td> <td><a href="https://github.com/carverauto/serviceradar/pull/2243/files#diff-2f328e4cd8dbe3ad193e49d92bcf045f47a6b72b1e9487d366f6b8288589b4ca"></a></td> </tr> </table></details></td></tr></tbody></table> </details> ___

qodo-code-review[bot] commented

2026-01-11 05:43:17 +00:00

(Migrated from github.com)

Author

Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734057887
Original created: 2026-01-11T05:43:17Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Arbitrary file write Description: Environment-variable overrides (`NATS_OPERATOR_CONFIG_PATH`, `NATS_RESOLVER_PATH`) can direct the service to write operator/resolver config files to arbitrary filesystem locations via `WriteOperatorConfig()`, which is a potential arbitrary file-write risk if an attacker can influence the process environment or config. main.go [107-129] Referred Code operatorConfigPath := cfg.NATSOperator.OperatorConfigPath if envPath := os.Getenv("NATS_OPERATOR_CONFIG_PATH"); envPath != "" { operatorConfigPath = envPath } resolverPath := cfg.NATSOperator.ResolverPath if envPath := os.Getenv("NATS_RESOLVER_PATH"); envPath != "" { resolverPath = envPath } if operatorConfigPath != "" \|\| resolverPath != "" { natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath) log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath) // If operator is already initialized, write the config now // This ensures config files exist even when datasvc restarts with an existing operator if operator != nil { if err := natsAccountServer.WriteOperatorConfig(); err != nil { log.Printf("Warning: failed to write initial operator config: %v", err) } else { log.Printf("Wrote initial operator config to %s", operatorConfigPath) ... (clipped 2 lines)
	Sensitive path logging Description: Debug logging prints the configured NATS credentials file path (`nats.creds_file`), which can expose sensitive filesystem layout/secret locations to anyone with access to logs. setup.rs [48-52] Referred Code `nats.url, nats.subject, nats.stream ); debug!("NATS timeout: {:?}", nats.timeout); debug!("NATS creds file: {:?}", nats.creds_file); debug!("NATS TLS cert: {:?}", nats.tls_cert);`
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Potential sensitive logs: New `info!` logs print `req.details` verbatim, which may contain sensitive information and is not redacted or structured for safe auditing. Referred Code info!( "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}", req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details ); debug!("Processing GetStatus request"); let start_time = std::time::Instant::now(); let collector = self.collector.read().await; let metrics = collector .get_latest_metrics() .await .ok_or_else(\|\| Status::unavailable("No metrics available yet"))?; // metrics is MetricSample debug!("Returning metrics with timestamp {}", metrics.timestamp); // Create the outer JSON object, embedding the metrics struct directly let outer_data = serde_json::json!({ "status": metrics, // Embed the MetricSample struct directly. It's already Serialize. "response_time": start_time.elapsed().as_nanos() as i64, "available": true // This 'available' is part of the JSON payload in StatusResponse.message }); ... (clipped 34 lines) Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Audit coverage unclear: The diff shows some operational logging (agent enrollment/config/status), but it is not verifiable from the provided hunks whether all critical security/data actions across the refactor have complete audit logging with consistent actor context. Referred Code `@doc """ Handle an agent hello/enrollment request. Called by the agent on startup to announce itself and register with the gateway. Validates the mTLS certificate, extracts tenant identity, and registers the agent. """ @spec hello(Monitoring.AgentHelloRequest.t(), GRPC.Server.Stream.t()) :: Monitoring.AgentHelloResponse.t() def hello(request, stream) do agent_id = case request.agent_id do nil -> "" value -> value \|> to_string() \|> String.trim() end if agent_id == "" do ... (clipped 247 lines)` Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: AuthZ/validation scope: The gateway server includes strong validation and mTLS-based tenant resolution, but full verification of end-to-end authorization and secure handling across newly added/modified files is not possible from the partial diff provided. Referred Code Tenant identity is extracted from the mTLS client certificate using `ServiceRadar.Edge.TenantResolver`. The certificate contains: - CN: `<component_id>.<partition_id>.<tenant_slug>.serviceradar` - SPIFFE URI SAN: `spiffe://serviceradar.local/<component_type>/<tenant_slug>/<partition_id>/<component_id>` The issuer CA SPKI hash is also verified against stored tenant CA records. This ensures tenants cannot impersonate each other. ## Protocol The server implements the AgentGatewayService: - `PushStatus`: Receives a batch of service statuses from an agent - `StreamStatus`: Receives streaming chunks of service statuses ## Usage The server is started automatically by the application supervisor. Incoming status updates are forwarded to the core cluster for processing. """ ... (clipped 892 lines) Learn more about managing compliance generic rules or creating your own custom rules
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734057887 Original created: 2026-01-11T05:43:17Z --- ## PR Compliance Guide 🔍  Below is a summary of compliance checks for this PR: <table><tbody><tr><td colspan='2'>Security Compliance</td></tr> <tr><td rowspan=2>⚪</td> <td><details><summary>Arbitrary file write </summary> Description: Environment-variable overrides (<code>NATS_OPERATOR_CONFIG_PATH</code>, <code>NATS_RESOLVER_PATH</code>) can direct the service to write operator/resolver config files to arbitrary filesystem locations via <code>WriteOperatorConfig()</code>, which is a potential arbitrary file-write risk if an attacker can influence the process environment or config. <a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-5e7731adfb877918cd65d9d5531621312496450fd550fea2682efca4ca8fe816R107-R129'>main.go [107-129]</a> <details open><summary>Referred Code</summary> ```go operatorConfigPath := cfg.NATSOperator.OperatorConfigPath if envPath := os.Getenv("NATS_OPERATOR_CONFIG_PATH"); envPath != "" { operatorConfigPath = envPath } resolverPath := cfg.NATSOperator.ResolverPath if envPath := os.Getenv("NATS_RESOLVER_PATH"); envPath != "" { resolverPath = envPath } if operatorConfigPath != "" || resolverPath != "" { natsAccountServer.SetResolverPaths(operatorConfigPath, resolverPath) log.Printf("NATS resolver paths configured: operator=%s resolver=%s", operatorConfigPath, resolverPath) // If operator is already initialized, write the config now // This ensures config files exist even when datasvc restarts with an existing operator if operator != nil { if err := natsAccountServer.WriteOperatorConfig(); err != nil { log.Printf("Warning: failed to write initial operator config: %v", err) } else { log.Printf("Wrote initial operator config to %s", operatorConfigPath) ... (clipped 2 lines) ``` </details></details></td></tr> <tr><td><details><summary>Sensitive path logging </summary> Description: Debug logging prints the configured NATS credentials file path (<code>nats.creds_file</code>), which can expose sensitive filesystem layout/secret locations to anyone with access to logs. <a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-3891f667deb20fd26e296d3e2742c57378d3764fe1743118e612465ae360391fR48-R52'>setup.rs [48-52]</a> <details open><summary>Referred Code</summary> ```rust nats.url, nats.subject, nats.stream ); debug!("NATS timeout: {:?}", nats.timeout); debug!("NATS creds file: {:?}", nats.creds_file); debug!("NATS TLS cert: {:?}", nats.tls_cert); ``` </details></details></td></tr> <tr><td colspan='2'>Ticket Compliance</td></tr> <tr><td>⚪</td><td><details><summary>🎫 No ticket provided </summary> - [ ] Create ticket/issue  </details></td></tr> <tr><td colspan='2'>Codebase Duplication Compliance</td></tr> <tr><td>⚪</td><td><details><summary>Codebase context is not defined </summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'>Custom Compliance</td></tr> <tr><td rowspan=3>🟢</td><td> <details><summary>Generic: Meaningful Naming and Self-Documenting Code</summary> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code self-documenting **Status:** Passed > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary>Generic: Robust Error Handling and Edge Case Management</summary> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful degradation **Status:** Passed > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary>Generic: Secure Error Handling</summary> **Objective:** To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. **Status:** Passed > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=1>🔴</td> <td><details> <summary>Generic: Secure Logging Practices</summary> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. **Status:** <a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-2c4395fee16396339c3eea518ad9bec739174c67c9cedf62e6848c17136dd33eR216-R270'>Potential sensitive logs</a>: New <code>info!</code> logs print <code>req.details</code> verbatim, which may contain sensitive information and is not redacted or structured for safe auditing. <details open><summary>Referred Code</summary> ```rust info!( "Received GetStatus: service_name={}, service_type={}, agent_id={}, gateway_id={}, details={}", req.service_name, req.service_type, req.agent_id, req.gateway_id, req.details ); debug!("Processing GetStatus request"); let start_time = std::time::Instant::now(); let collector = self.collector.read().await; let metrics = collector .get_latest_metrics() .await .ok_or_else(|| Status::unavailable("No metrics available yet"))?; // metrics is MetricSample debug!("Returning metrics with timestamp {}", metrics.timestamp); // Create the outer JSON object, embedding the metrics struct directly let outer_data = serde_json::json!({ "status": metrics, // Embed the MetricSample struct directly. It's already Serialize. "response_time": start_time.elapsed().as_nanos() as i64, "available": true // This 'available' is part of the JSON payload in StatusResponse.message }); ... (clipped 34 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=2>⚪</td> <td><details> <summary>Generic: Comprehensive Audit Trails</summary> **Objective:** To create a detailed and reliable record of critical system actions for security analysis and compliance. **Status:** <a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR58-R325'>Audit coverage unclear</a>: The diff shows some operational logging (agent enrollment/config/status), but it is not verifiable from the provided hunks whether all critical security/data actions across the refactor have complete audit logging with consistent actor context. <details open><summary>Referred Code</summary> ```elixir @doc """ Handle an agent hello/enrollment request. Called by the agent on startup to announce itself and register with the gateway. Validates the mTLS certificate, extracts tenant identity, and registers the agent. """ @spec hello(Monitoring.AgentHelloRequest.t(), GRPC.Server.Stream.t()) :: Monitoring.AgentHelloResponse.t() def hello(request, stream) do agent_id = case request.agent_id do nil -> "" value -> value |> to_string() |> String.trim() end if agent_id == "" do ... (clipped 247 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary>Generic: Security-First Input Validation and Data Handling</summary> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities **Status:** <a href='https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR14-R926'>AuthZ/validation scope</a>: The gateway server includes strong validation and mTLS-based tenant resolution, but full verification of end-to-end authorization and secure handling across newly added/modified files is not possible from the partial diff provided. <details open><summary>Referred Code</summary> ```elixir Tenant identity is extracted from the mTLS client certificate using `ServiceRadar.Edge.TenantResolver`. The certificate contains: - CN: `<component_id>.<partition_id>.<tenant_slug>.serviceradar` - SPIFFE URI SAN: `spiffe://serviceradar.local/<component_type>/<tenant_slug>/<partition_id>/<component_id>` The issuer CA SPKI hash is also verified against stored tenant CA records. This ensures tenants cannot impersonate each other. ## Protocol The server implements the AgentGatewayService: - `PushStatus`: Receives a batch of service statuses from an agent - `StreamStatus`: Receives streaming chunks of service statuses ## Usage The server is started automatically by the application supervisor. Incoming status updates are forwarded to the core cluster for processing. """ ... (clipped 892 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update  </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant 🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label </details>

qodo-code-review[bot] commented

2026-01-11 05:44:34 +00:00

(Migrated from github.com)

Author

Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734058682
Original created: 2026-01-11T05:44:34Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix race condition in shutdown Refactor the shutdown goroutine to prevent a race condition by ensuring the push loop is fully stopped before stopping the server. cmd/agent/main.go [207-223] go func() { defer close(shutdownDone) - cancel() + cancel() // Signal the push loop to stop. + + // Wait for the push loop to finish. It will send an error (or nil) on errChan when it exits. + // This also handles any final work pushLoop.Stop might do. if err := pushLoop.Stop(shutdownCtx); err != nil { log.Warn().Err(err).Msg("Push loop stop did not complete before timeout") } - select { - case <-errChan: - case <-shutdownCtx.Done(): - return - } + // Wait for the push loop goroutine to exit. + <-errChan + + // Now that the push loop is stopped, stop the server. if err := server.Stop(shutdownCtx); err != nil { log.Error().Err(err).Msg("Error stopping agent services") } }() Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a race condition in the shutdown logic that could prevent services from being stopped correctly, and the proposed fix resolves this critical bug.	High
	Use correct max_by for deduplication Correct the implementation of `deduplicate_by_device_id/1` by replacing the incorrect usage of `Enum.max_by/3` with `Enum.max_by/2`. Convert timestamps to a comparable integer value to correctly find the most recent update. elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [513-525] `defp deduplicate_by_device_id(updates) do updates \|> Enum.group_by(& &1.device_id) \|> Enum.map(fn {_device_id, grouped} -> - # Keep the update with the latest timestamp, handling nil timestamps - Enum.max_by(grouped, & &1.timestamp, fn - nil, nil -> :eq - nil, _ -> :lt - _, nil -> :gt - a, b -> DateTime.compare(a, b) + Enum.max_by(grouped, fn update -> + case update.timestamp do + %DateTime{} = ts -> DateTime.to_unix(ts, :millisecond) + _ -> 0 + end end) end) end` `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a bug in the usage of `Enum.max_by/3`, which would not sort as intended. The proposed fix using `Enum.max_by/2` with a transformation function is correct and resolves the bug, ensuring the latest update is always selected.	High
	Ensure notifications are sent transactionally Refactor the alert notification logic to use an `Ash.Notifier`. This ensures that webhook notifications are sent only after the database transaction to create the alert has successfully committed, preventing notifications for rolled-back transactions. elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex [413-436] defp create_alert_and_notify(attrs, opts) do tenant_id = Map.get(attrs, :tenant_id) \|\| Keyword.get(opts, :tenant_id) tenant_schema = Keyword.get(opts, :tenant) \|\| tenant_schema_for(tenant_id) actor = Keyword.get(opts, :actor) \|\| system_actor(tenant_id) # Create the alert in the database if is_nil(tenant_schema) do Logger.warning("Skipping alert creation; tenant schema missing", tenant_id: tenant_id) {:error, :missing_tenant_schema} else + # The webhook notification should be handled by an Ash.Notifier + # attached to the Alert resource to ensure it only fires on successful + # transaction commit. This avoids sending notifications for rolled-back transactions. case Alert \|> Ash.Changeset.for_create(:trigger, attrs, actor: actor, tenant: tenant_schema) \|> Ash.create() do {:ok, alert} -> - # Also send webhook notification - send_webhook_notification(alert, opts) + # The webhook notification should be triggered via an Ash Notifier + # to ensure it's only sent after the transaction commits. + # For example, if a `WebhookNotifier` is configured in your domain: + # ServiceRadar.Monitoring.WebhookNotifier.notify() {:ok, alert} {:error, error} -> Logger.error("Failed to create alert: #{inspect(error)}") {:error, error} end end end `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: This suggestion correctly identifies a data consistency issue where a notification could be sent for a database record that was never committed. The proposed solution to use an `Ash.Notifier` is the correct, framework-idiomatic way to ensure transactional integrity for side effects.	Medium
	Include tenant identifiers in CA data Update the return map of `generate_tenant_ca/2` to include `tenant_id` and `tenant_slug`. This is required by the downstream `generate_component_cert/5` function and will prevent a runtime error. elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex [106-116] `{:ok, %{ + tenant_id: tenant_id, + tenant_slug: tenant.slug, certificate_pem: cert_pem, private_key_pem: key_pem, spki_sha256: spki_sha256, serial_number: serial_to_hex(serial), not_before: not_before, not_after: not_after, subject_cn: subject_cn }}` `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies that `generate_component_cert/5` requires `tenant_id` from the `tenant_ca` map, which is missing from the return value of `generate_tenant_ca/2`. Adding it prevents a runtime error and fixes a clear bug in the data flow between functions.	Medium
	Improve fault tolerance in RPC calls Modify `core_call` to continue iterating through available core nodes on application-level errors, rather than halting immediately, to improve fault tolerance. elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [856-872] `defp core_call(module, function, args, timeout \\ 5_000) do nodes = core_nodes() if nodes == [] do {:error, :core_unavailable} else Enum.reduce_while(nodes, {:error, :core_unavailable}, fn node, _acc -> case :rpc.call(node, module, function, args, timeout) do {:badrpc, _} -> {:cont, {:error, :core_unavailable}} + {:ok, {:ok, _}} = result -> + {:halt, result} + + {:ok, {:error, _}} = error_result -> + {:cont, error_result} + result -> {:halt, {:ok, result}} end end) end end` `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out a potential issue where `core_call` might fail prematurely if the first node returns an application-level error, without trying other nodes. The proposed change improves fault tolerance by continuing to the next node unless a definitive success is returned.	Medium
	Handle both string and atom keys Update `get_nested_value` to handle both string and atom keys for nested map lookups to improve robustness against varying key types. elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [739-745] `defp get_nested_value(map, key) when is_map(map) and is_binary(key) do key \|> String.split(".") \|> Enum.reduce(map, fn segment, acc -> - if is_map(acc), do: Map.get(acc, segment), else: nil + if is_map(acc) do + Map.get(acc, segment) \|\| Map.get(acc, String.to_atom(segment)) + else + nil + end end) end` `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `get_nested_value` only handles string keys, while other parts of the code handle both string and atom keys. The proposed change makes the matching logic more robust and consistent with the rest of the file.	Low
Security	Prevent potential atom exhaustion vulnerability To prevent a potential atom exhaustion Denial of Service (DoS) vulnerability, replace the use of `String.to_existing_atom/1` with a lookup in a string-keyed map for capability definitions. elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [70-76] `+@string_capability_definitions Map.new(@capability_definitions, fn {k, v} -> {to_string(k), v} end) + def capability_info(capability) when is_binary(capability) do - try do - capability_info(String.to_existing_atom(capability)) - rescue - ArgumentError -> %{icon: "hero-cube", color: "ghost", description: capability} - end + Map.get(@string_capability_definitions, capability, %{icon: "hero-cube", color: "ghost", description: capability}) end` `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a significant security vulnerability (atom exhaustion DoS) and provides a robust, idiomatic fix by using a string-keyed map instead of converting external input to atoms.	High
General	Throttle rule reload on error In `load_rules`, update `rules_loaded_at` even on failure to prevent the engine from retrying the load on every evaluation and flooding logs. elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [154-167] `defp load_rules(state) do rules = StatefulAlertRule \|> Ash.Query.for_read(:active, %{}, tenant: state.schema) \|> Ash.read(authorize?: false) \|> unwrap_page() updated = %{state \| rules: rules, rules_loaded_at: System.monotonic_time(:millisecond)} {updated, rules} rescue error -> Logger.warning("Failed to load stateful alert rules: #{inspect(error)}") - {state, []} + updated = %{state \| rules_loaded_at: System.monotonic_time(:millisecond)} + {updated, []} end` `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that a failure to load rules will cause a retry on every subsequent evaluation, potentially flooding logs. Updating `rules_loaded_at` on failure introduces a cooldown period, which is a sensible improvement for error handling and system stability.	Medium
General	Avoid using internal process dictionary Replace the use of the internal `:"$ancestors"` process dictionary key with the public `Supervisor.which_children/1` API to find a process's parent supervisor. This makes the implementation more robust and less likely to break with future OTP updates. elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex [274-302] defp get_parent_supervisor(pid) do - # Get all PIDs that are children of our DynamicSupervisor - child_pids = - DynamicSupervisor.which_children(__MODULE__) - \|> Enum.map(fn {_, child_pid, _, _} -> child_pid end) - \|> MapSet.new() + # Find the TenantSupervisor that has the given pid as a child. + # This is more robust than relying on the internal :"$ancestors" process dictionary key. + DynamicSupervisor.which_children(__MODULE__) + \|> Enum.find_value(fn {_, tenant_sup_pid, _, _} -> + child_pids = + Supervisor.which_children(tenant_sup_pid) + \|> Enum.map(fn {_, child_pid, _, _} -> child_pid end) - # Get ancestors of the registry process and find which one is our child - case Process.info(pid, :dictionary) do - {:dictionary, dict} -> - ancestors = Keyword.get(dict, :"$ancestors", []) - - # Find the ancestor that is a direct child of our DynamicSupervisor - Enum.find_value(ancestors, fn - ancestor when is_pid(ancestor) -> - if MapSet.member?(child_pids, ancestor), do: ancestor, else: nil - - ancestor when is_atom(ancestor) -> - ancestor_pid = Process.whereis(ancestor) - if ancestor_pid && MapSet.member?(child_pids, ancestor_pid), do: ancestor_pid, else: nil - - _ -> - nil - end) - - _ -> - nil - end + if pid in child_pids, do: tenant_sup_pid, else: nil + end) end `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out the fragility of relying on the undocumented `:"$ancestors"` process dictionary key and proposes a more robust solution using public APIs, which improves code maintainability and resilience to OTP updates.	Medium
Update

Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2243#issuecomment-3734058682 Original created: 2026-01-11T05:44:34Z --- ## PR Code Suggestions ✨  Explore these optional code suggestions: <table><thead><tr><td>Category</td><td align=left>Suggestion                                                                                                                                    </td><td align=center>Impact</td></tr><tbody><tr><td rowspan=6>Possible issue</td> <td> <details><summary>Fix race condition in shutdown</summary> ___ **Refactor the shutdown goroutine to prevent a race condition by ensuring the push loop is fully stopped before stopping the server.** [cmd/agent/main.go [207-223]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-61358711e980ccf505246fd3915f97cbd3a380e9b66f6fa5aad46749968c5ca3R207-R223) ```diff go func() { defer close(shutdownDone) - cancel() + cancel() // Signal the push loop to stop. + + // Wait for the push loop to finish. It will send an error (or nil) on errChan when it exits. + // This also handles any final work pushLoop.Stop might do. if err := pushLoop.Stop(shutdownCtx); err != nil { log.Warn().Err(err).Msg("Push loop stop did not complete before timeout") } - select { - case <-errChan: - case <-shutdownCtx.Done(): - return - } + // Wait for the push loop goroutine to exit. + <-errChan + + // Now that the push loop is stopped, stop the server. if err := server.Stop(shutdownCtx); err != nil { log.Error().Err(err).Msg("Error stopping agent services") } }() ``` - [ ] **Apply / Chat**  <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a race condition in the shutdown logic that could prevent services from being stopped correctly, and the proposed fix resolves this critical bug. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Use correct max_by for deduplication</summary> ___ **Correct the implementation of <code>deduplicate_by_device_id/1</code> by replacing the incorrect usage of <code>Enum.max_by/3</code> with <code>Enum.max_by/2</code>. Convert timestamps to a comparable integer value to correctly find the most recent update.** [elixir/serviceradar_core/lib/serviceradar/identity/alias_events.ex [513-525]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-bc3743067ea774f59bc5665770f7110a2d6e90f6e1156a7717a1c287f8979d28R513-R525) ```diff defp deduplicate_by_device_id(updates) do updates |> Enum.group_by(& &1.device_id) |> Enum.map(fn {_device_id, grouped} -> - # Keep the update with the latest timestamp, handling nil timestamps - Enum.max_by(grouped, & &1.timestamp, fn - nil, nil -> :eq - nil, _ -> :lt - _, nil -> :gt - a, b -> DateTime.compare(a, b) + Enum.max_by(grouped, fn update -> + case update.timestamp do + %DateTime{} = ts -> DateTime.to_unix(ts, :millisecond) + _ -> 0 + end end) end) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a bug in the usage of `Enum.max_by/3`, which would not sort as intended. The proposed fix using `Enum.max_by/2` with a transformation function is correct and resolves the bug, ensuring the latest update is always selected. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Ensure notifications are sent transactionally</summary> ___ **Refactor the alert notification logic to use an <code>Ash.Notifier</code>. This ensures that webhook notifications are sent only after the database transaction to create the alert has successfully committed, preventing notifications for rolled-back transactions.** [elixir/serviceradar_core/lib/serviceradar/monitoring/alert_generator.ex [413-436]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-62074160ac91002a439bab337a032329681bc55c84a59ab9934bc76d05a5de04R413-R436) ```diff defp create_alert_and_notify(attrs, opts) do tenant_id = Map.get(attrs, :tenant_id) || Keyword.get(opts, :tenant_id) tenant_schema = Keyword.get(opts, :tenant) || tenant_schema_for(tenant_id) actor = Keyword.get(opts, :actor) || system_actor(tenant_id) # Create the alert in the database if is_nil(tenant_schema) do Logger.warning("Skipping alert creation; tenant schema missing", tenant_id: tenant_id) {:error, :missing_tenant_schema} else + # The webhook notification should be handled by an Ash.Notifier + # attached to the Alert resource to ensure it only fires on successful + # transaction commit. This avoids sending notifications for rolled-back transactions. case Alert |> Ash.Changeset.for_create(:trigger, attrs, actor: actor, tenant: tenant_schema) |> Ash.create() do {:ok, alert} -> - # Also send webhook notification - send_webhook_notification(alert, opts) + # The webhook notification should be triggered via an Ash Notifier + # to ensure it's only sent after the transaction commits. + # For example, if a `WebhookNotifier` is configured in your domain: + # ServiceRadar.Monitoring.WebhookNotifier.notify() {:ok, alert} {:error, error} -> Logger.error("Failed to create alert: #{inspect(error)}") {:error, error} end end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: This suggestion correctly identifies a data consistency issue where a notification could be sent for a database record that was never committed. The proposed solution to use an `Ash.Notifier` is the correct, framework-idiomatic way to ensure transactional integrity for side effects. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Include tenant identifiers in CA data</summary> ___ **Update the return map of <code>generate_tenant_ca/2</code> to include <code>tenant_id</code> and <code>tenant_slug</code>. This is required by the downstream <code>generate_component_cert/5</code> function and will prevent a runtime error.** [elixir/serviceradar_core/lib/serviceradar/edge/tenant_ca/generator.ex [106-116]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-b48e4a9e1189da61e2a60e16f56fce81298d76b7cdab745107140fed3f6e48b4R106-R116) ```diff {:ok, %{ + tenant_id: tenant_id, + tenant_slug: tenant.slug, certificate_pem: cert_pem, private_key_pem: key_pem, spki_sha256: spki_sha256, serial_number: serial_to_hex(serial), not_before: not_before, not_after: not_after, subject_cn: subject_cn }} ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies that `generate_component_cert/5` requires `tenant_id` from the `tenant_ca` map, which is missing from the return value of `generate_tenant_ca/2`. Adding it prevents a runtime error and fixes a clear bug in the data flow between functions. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Improve fault tolerance in RPC calls</summary> ___ **Modify <code>core_call</code> to continue iterating through available core nodes on application-level errors, rather than halting immediately, to improve fault tolerance.** [elixir/serviceradar_agent_gateway/lib/serviceradar_agent_gateway/agent_gateway_server.ex [856-872]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-369a368073dc8ec1140bcea699005a1ce97a90cd59629df0bd18c71c7ffaae9fR856-R872) ```diff defp core_call(module, function, args, timeout \\ 5_000) do nodes = core_nodes() if nodes == [] do {:error, :core_unavailable} else Enum.reduce_while(nodes, {:error, :core_unavailable}, fn node, _acc -> case :rpc.call(node, module, function, args, timeout) do {:badrpc, _} -> {:cont, {:error, :core_unavailable}} + {:ok, {:ok, _}} = result -> + {:halt, result} + + {:ok, {:error, _}} = error_result -> + {:cont, error_result} + result -> {:halt, {:ok, result}} end end) end end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly points out a potential issue where `core_call` might fail prematurely if the first node returns an application-level error, without trying other nodes. The proposed change improves fault tolerance by continuing to the next node unless a definitive success is returned. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Handle both string and atom keys</summary> ___ **Update <code>get_nested_value</code> to handle both string and atom keys for nested map lookups to improve robustness against varying key types.** [elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [739-745]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-bae3a52db882de8c947e62f219a95dff8db4e155e37d9a361dbe14ec25fcd3bdR739-R745) ```diff defp get_nested_value(map, key) when is_map(map) and is_binary(key) do key |> String.split(".") |> Enum.reduce(map, fn segment, acc -> - if is_map(acc), do: Map.get(acc, segment), else: nil + if is_map(acc) do + Map.get(acc, segment) || Map.get(acc, String.to_atom(segment)) + else + nil + end end) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly identifies that `get_nested_value` only handles string keys, while other parts of the code handle both string and atom keys. The proposed change makes the matching logic more robust and consistent with the rest of the file. </details></details></td><td align=center>Low </td></tr><tr><td rowspan=1>Security</td> <td> <details><summary>Prevent potential atom exhaustion vulnerability</summary> ___ **To prevent a potential atom exhaustion Denial of Service (DoS) vulnerability, replace the use of <code>String.to_existing_atom/1</code> with a lookup in a string-keyed map for capability definitions.** [elixir/serviceradar_core/lib/serviceradar/infrastructure/agent.ex [70-76]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-c56f92b6ce744cab3f2dc00dde92e2017cffdd12ad4618f7fa720252f2a6843aR70-R76) ```diff +@string_capability_definitions Map.new(@capability_definitions, fn {k, v} -> {to_string(k), v} end) + def capability_info(capability) when is_binary(capability) do - try do - capability_info(String.to_existing_atom(capability)) - rescue - ArgumentError -> %{icon: "hero-cube", color: "ghost", description: capability} - end + Map.get(@string_capability_definitions, capability, %{icon: "hero-cube", color: "ghost", description: capability}) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a significant security vulnerability (atom exhaustion DoS) and provides a robust, idiomatic fix by using a string-keyed map instead of converting external input to atoms. </details></details></td><td align=center>High </td></tr><tr><td rowspan=2>General</td> <td> <details><summary>Throttle rule reload on error</summary> ___ **In <code>load_rules</code>, update <code>rules_loaded_at</code> even on failure to prevent the engine from retrying the load on every evaluation and flooding logs.** [elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [154-167]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-bae3a52db882de8c947e62f219a95dff8db4e155e37d9a361dbe14ec25fcd3bdR154-R167) ```diff defp load_rules(state) do rules = StatefulAlertRule |> Ash.Query.for_read(:active, %{}, tenant: state.schema) |> Ash.read(authorize?: false) |> unwrap_page() updated = %{state | rules: rules, rules_loaded_at: System.monotonic_time(:millisecond)} {updated, rules} rescue error -> Logger.warning("Failed to load stateful alert rules: #{inspect(error)}") - {state, []} + updated = %{state | rules_loaded_at: System.monotonic_time(:millisecond)} + {updated, []} end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies that a failure to load rules will cause a retry on every subsequent evaluation, potentially flooding logs. Updating `rules_loaded_at` on failure introduces a cooldown period, which is a sensible improvement for error handling and system stability. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Avoid using internal process dictionary</summary> ___ **Replace the use of the internal <code>:"$ancestors"</code> process dictionary key with the public <code>Supervisor.which_children/1</code> API to find a process's parent supervisor. This makes the implementation more robust and less likely to break with future OTP updates.** [elixir/serviceradar_core/lib/serviceradar/cluster/tenant_registry.ex [274-302]](https://github.com/carverauto/serviceradar/pull/2243/files#diff-91248b3b128a2e3d9bea6ffdb5e0f295e4a1745e82f87687c640ad01416fb85dR274-R302) ```diff defp get_parent_supervisor(pid) do - # Get all PIDs that are children of our DynamicSupervisor - child_pids = - DynamicSupervisor.which_children(__MODULE__) - |> Enum.map(fn {_, child_pid, _, _} -> child_pid end) - |> MapSet.new() + # Find the TenantSupervisor that has the given pid as a child. + # This is more robust than relying on the internal :"$ancestors" process dictionary key. + DynamicSupervisor.which_children(__MODULE__) + |> Enum.find_value(fn {_, tenant_sup_pid, _, _} -> + child_pids = + Supervisor.which_children(tenant_sup_pid) + |> Enum.map(fn {_, child_pid, _, _} -> child_pid end) - # Get ancestors of the registry process and find which one is our child - case Process.info(pid, :dictionary) do - {:dictionary, dict} -> - ancestors = Keyword.get(dict, :"$ancestors", []) - - # Find the ancestor that is a direct child of our DynamicSupervisor - Enum.find_value(ancestors, fn - ancestor when is_pid(ancestor) -> - if MapSet.member?(child_pids, ancestor), do: ancestor, else: nil - - ancestor when is_atom(ancestor) -> - ancestor_pid = Process.whereis(ancestor) - if ancestor_pid && MapSet.member?(child_pids, ancestor_pid), do: ancestor_pid, else: nil - - _ -> - nil - end) - - _ -> - nil - end + if pid in child_pids, do: tenant_sup_pid, else: nil + end) end ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly points out the fragility of relying on the undocumented `:"$ancestors"` process dictionary key and proposes a more robust solution using public APIs, which improves code maintainability and resilience to OTP updates. </details></details></td><td align=center>Medium </td></tr> <tr><td align="center" colspan="2"> - [ ] Update  </td><td></td></tr></tbody></table>