Chore/fix edge onboarding #2489

Merged
mfreeman451 merged 25 commits from refs/pull/2489/head into main 2025-11-30 18:53:54 +00:00
mfreeman451 commented 2025-11-29 03:33:20 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2033
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2033
Original created: 2025-11-29T03:33:20Z
Original updated: 2025-11-30T18:54:10Z
Original head: carverauto/serviceradar:chore/fix_edge_onboarding
Original base: main
Original merged: 2025-11-30T18:53:54Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Enhancement, Bug fix


Description

  • Embed SPIRE workload agent in sysmon-vm for standalone laptop deployments without shared sockets

  • Implement checker onboarding with join token/bundle delivery and SPIRE address resolution

  • Add service registry method to lookup agent poller IDs for edge onboarding fallback

  • Extend Helm SPIRE server trust to include Core and Datasvc service accounts

  • Enhance edge package UI with checker kind, config JSON, and parent agent poller auto-derivation


Diagram Walkthrough

flowchart LR
  A["sysmon-vm<br/>Checker"] -->|"onboarding token"| B["Edge Onboarding<br/>Service"]
  B -->|"fetch package"| C["Core API"]
  C -->|"join token +<br/>trust bundle"| B
  B -->|"start embedded<br/>SPIRE agent"| D["Embedded Agent<br/>Process"]
  D -->|"join via TCP"| E["Demo SPIRE<br/>Server"]
  E -->|"issue SVID"| D
  D -->|"expose workload<br/>socket"| A
  A -->|"bind with SVID"| F["Demo Poller"]

File Walkthrough

Relevant files
Enhancement
28 files
edge_onboarding.go
Add GetAgentPollerID method to ServiceManager interface   
+10/-0   
service_registry_adapter.go
Implement GetAgentPollerID adapter for agent poller lookup
+8/-0     
bootstrap.go
Add embeddedAgent field to Bootstrapper struct                     
+1/-0     
config.go
Generate sysmon-vm specific checker config with SPIRE settings
+53/-2   
deployment.go
Implement SPIRE address resolution and socket wait utilities
+53/-12 
spire.go
Configure checker SPIRE with embedded agent startup logic
+46/-8   
spire_agent.go
New file implementing embedded SPIRE agent process management
+170/-0 
entrypoint-sysmon-vm.sh
New entrypoint script for sysmon-vm Docker container         
+11/-0   
host-install-macos.sh
Install embedded SPIRE agent binary for macOS sysmon-vm   
+8/-0     
host-setup.sh
Download and install SPIRE agent for embedded onboarding flow
+50/-0   
_helpers.tpl
Add SPIRE socket host path helper template                             
+10/-0   
BUILD.bazel
Add sysmon-vm Docker image definitions for AMD64 and ARM64
+94/-0   
agent.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
core.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
datasvc.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
db-event-writer.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
faker.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
flowgger.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
mapper.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
otel.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
poller.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
rperf-checker.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
snmp-checker.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
spire-agent.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
sync.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
trapd.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
zen.yaml
Use configurable SPIRE socket host path instead of hardcoded path
+1/-1     
page.tsx
Add checker kind and config JSON fields to edge package UI
+189/-7 
Dependencies
2 files
MODULE.bazel
Add SPIRE Linux ARM64 binary dependency for multi-arch builds
+16/-0   
BUILD.bazel
Add SPIRE agent binary filegroups for AMD64 and ARM64       
+12/-0   
Documentation
7 files
README.md
Document embedded SPIRE agent installation for macOS         
+5/-1     
design.md
New design document for embedded SPIRE agent architecture
+31/-0   
proposal.md
New proposal for sysmon-vm embedded SPIRE agent feature   
+15/-0   
spec.md
New specification for sysmon-vm SPIFFE identity requirements
+20/-0   
tasks.md
New task list for embedded SPIRE agent implementation       
+14/-0   
proposal.md
New proposal for SPIRE admin trust stabilization                 
+14/-0   
tasks.md
New task list for SPIRE admin trust fixes                               
+8/-0     
Configuration changes
3 files
serviceradar-config.yaml
Add checker metadata with SPIRE upstream address configuration
+5/-0     
values.yaml
Update app tag and add socketHostPath configuration option
+3/-2     
configmap.yaml
Add checker metadata with SPIRE upstream address configuration
+5/-0     
Bug fix
1 files
spire-server.yaml
Add Core and Datasvc service accounts to SPIRE trust allow-list
+2/-0     

Imported from GitHub pull request. Original GitHub pull request: #2033 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2033 Original created: 2025-11-29T03:33:20Z Original updated: 2025-11-30T18:54:10Z Original head: carverauto/serviceradar:chore/fix_edge_onboarding Original base: main Original merged: 2025-11-30T18:53:54Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Enhancement, Bug fix ___ ### **Description** - Embed SPIRE workload agent in sysmon-vm for standalone laptop deployments without shared sockets - Implement checker onboarding with join token/bundle delivery and SPIRE address resolution - Add service registry method to lookup agent poller IDs for edge onboarding fallback - Extend Helm SPIRE server trust to include Core and Datasvc service accounts - Enhance edge package UI with checker kind, config JSON, and parent agent poller auto-derivation ___ ### Diagram Walkthrough ```mermaid flowchart LR A["sysmon-vm<br/>Checker"] -->|"onboarding token"| B["Edge Onboarding<br/>Service"] B -->|"fetch package"| C["Core API"] C -->|"join token +<br/>trust bundle"| B B -->|"start embedded<br/>SPIRE agent"| D["Embedded Agent<br/>Process"] D -->|"join via TCP"| E["Demo SPIRE<br/>Server"] E -->|"issue SVID"| D D -->|"expose workload<br/>socket"| A A -->|"bind with SVID"| F["Demo Poller"] ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>28 files</summary><table> <tr> <td><strong>edge_onboarding.go</strong><dd><code>Add GetAgentPollerID method to ServiceManager interface</code>&nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-85874e3c4bdcc9110db09909f10648d44cdee554b26c987f910502321eb20b5c">+10/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>service_registry_adapter.go</strong><dd><code>Implement GetAgentPollerID adapter for agent poller lookup</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-cfce7f6079001e11903e4e7992f09385c1c72d33bd47194971ee7a207586f1ce">+8/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>bootstrap.go</strong><dd><code>Add embeddedAgent field to Bootstrapper struct</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-b3b312d7d184427a6fd1bd68408e0af79fd1b1124e18d59878b9b0399f7937e0">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>config.go</strong><dd><code>Generate sysmon-vm specific checker config with SPIRE settings</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-e8ebde92155beb028a380b450be87da815cccfb50fede2faf7f10b1785d70f65">+53/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>deployment.go</strong><dd><code>Implement SPIRE address resolution and socket wait utilities</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-ec21f17eea21a6ea80f89a2f1d31cabaa524020c0a542eaa3ebf579c5197cd26">+53/-12</a>&nbsp; </td> </tr> <tr> <td><strong>spire.go</strong><dd><code>Configure checker SPIRE with embedded agent startup logic</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-5d784c136f3db0128ea90d628cbefdff9e2ba18eba9a1775f54fc630bde8ee1d">+46/-8</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>spire_agent.go</strong><dd><code>New file implementing embedded SPIRE agent process management</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-33b0f01d8a67d768586269f4e6d6fcfbe1fb28ea9cc51c98440e20868e183e92">+170/-0</a>&nbsp; </td> </tr> <tr> <td><strong>entrypoint-sysmon-vm.sh</strong><dd><code>New entrypoint script for sysmon-vm Docker container</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-89549c327c7446025d039bd5c116c8fe6b24da5f4b2569a93c9a88202bb009fe">+11/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>host-install-macos.sh</strong><dd><code>Install embedded SPIRE agent binary for macOS sysmon-vm</code>&nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-b8b0940137a4ea6f6800b649649c83c52dea5d1c59c3a7e5baa5ca20405eeb54">+8/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>host-setup.sh</strong><dd><code>Download and install SPIRE agent for embedded onboarding flow</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-4c51ac711b8e7d2f318653a23e84576e448ec0f4341668a209b3467c3c1315ea">+50/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>_helpers.tpl</strong><dd><code>Add SPIRE socket host path helper template</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-3d59d815f528d134e097ce2c3e830c6eaa738e27b6645df1e9b18136cd5d3c0d">+10/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong><dd><code>Add sysmon-vm Docker image definitions for AMD64 and ARM64</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-0e4db31c224a8f72ae8e870a849e38a59d74a2c7f7b04347b0b3eb07e20c5a80">+94/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-7442a08dbf73cd750874e97ac18538aa54da15ce7e47c27d0d65fe2cc8f356ad">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>core.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-06ab387d2c169d82a1de28b5e66c86f0417bd81b82a96246d0a2da8bfaa8d224">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>datasvc.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-5926357bf2f08469fae4974912c952846b55e45b3edee456455b0c403c3909af">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>db-event-writer.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-e4f899d11e5720f7049aa6fd632bd6993739410051bf65bc6fc8469739e5d2e4">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>faker.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-35afee08124f5b279319c616889348f5da33f6d76923237ae4019c5500fbb3e5">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>flowgger.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-511cfbfe42e0c41cd2fddf67a04911b48724ca9ea9f6d1ddc1f4bb7bf07086ab">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>mapper.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-6d1ed04a91f2f42cb29bf2381ceb7cbb12787ead37bc97dfb9524e88f5e2c4d0">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>otel.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-3422f8a7811e511a91937331e4a2795bbcae8846c19ae8f7f66d52a241ad300c">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>poller.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-609bbfe0a4f55ec766b1f55fa45382531724d3bfb781463782a2c3c8b24fd443">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>rperf-checker.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-933a439bea3a38d5dbb353ac67f9e79b84ba985fa41f8251bab22bba7721a2c2">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>snmp-checker.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-6f3d1202e9be1444128bdd8e4bc78121935ec7f0820f954211c39210166e6a3d">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>spire-agent.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-aa04a5403011130bfa140d517882c144cd2d4c5639d0860aceff70417e13661e">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>sync.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-b90e7258027182fe8d7b227ce4c20bef976721efe046aede069db6082dd1ea53">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>trapd.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-007ac4e264749139ee8ba84630cf34244ad17ed524b50ad7b28981c0ec3cf3f6">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>zen.yaml</strong><dd><code>Use configurable SPIRE socket host path instead of hardcoded path</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-6ad57a6af05e6ac723002f0979b35265589bd65639d2ea1605465c695f51364b">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>page.tsx</strong><dd><code>Add checker kind and config JSON fields to edge package UI</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-d627c4ba6a102a46a28024061371276f01053b63f20c75986aef2631b967bafc">+189/-7</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Dependencies</strong></td><td><details><summary>2 files</summary><table> <tr> <td><strong>MODULE.bazel</strong><dd><code>Add SPIRE Linux ARM64 binary dependency for multi-arch builds</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-6136fc12446089c3db7360e923203dd114b6a1466252e71667c6791c20fe6bdc">+16/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong><dd><code>Add SPIRE agent binary filegroups for AMD64 and ARM64</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-58acc9348755d3a8358b11e94ebf4d153df1f598a6910aa16c40162c53796332">+12/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>7 files</summary><table> <tr> <td><strong>README.md</strong><dd><code>Document embedded SPIRE agent installation for macOS</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-aa494b7da7f57782ef755e899ddfd8f8c3ec6fe980604a57bf98db5d3c8f611c">+5/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>design.md</strong><dd><code>New design document for embedded SPIRE agent architecture</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-9d0059ef8e54387197f8822c7983b0ba9d1302807b3d53206204fa4899a535b8">+31/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>proposal.md</strong><dd><code>New proposal for sysmon-vm embedded SPIRE agent feature</code>&nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-dc11876295a4bfdb3f8b024d333905f458855101590948253f8827deac1db7a1">+15/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>spec.md</strong><dd><code>New specification for sysmon-vm SPIFFE identity requirements</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-c4aba172c9deaaa109e4097357cd7d626bb9a174667609c13f4caae1d454b055">+20/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>tasks.md</strong><dd><code>New task list for embedded SPIRE agent implementation</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-d7eec953af5c2f7d204faf50abfaa39b20c25d38a2bb9faab354cbc139cd48c3">+14/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>proposal.md</strong><dd><code>New proposal for SPIRE admin trust stabilization</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-2748e7e0e84e9dab5eeb28f8b552f6c1db80de1da7c9d790e9fd7eead139604d">+14/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>tasks.md</strong><dd><code>New task list for SPIRE admin trust fixes</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-869ac834fcae12654adda790bc46d0807b3ac431c67c9de14e394da4b7721818">+8/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>3 files</summary><table> <tr> <td><strong>serviceradar-config.yaml</strong><dd><code>Add checker metadata with SPIRE upstream address configuration</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-b8c8d2484103b11c396bc60d290c81df63c30a0f81103eceb5852a17e1d2b5e3">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>values.yaml</strong><dd><code>Update app tag and add socketHostPath configuration option</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-d4449c7cb70362554b274f81eae5a4b81a8e81df494282e383d1b7ea3871c452">+3/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>configmap.yaml</strong><dd><code>Add checker metadata with SPIRE upstream address configuration</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-f4548beaa0a3a01a46971c82c5647a0f3f49eb38d66dd939d06d19018173fcd6">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Bug fix</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>spire-server.yaml</strong><dd><code>Add Core and Datasvc service accounts to SPIRE trust allow-list</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-7959f7b987adcd56306fe5ddf31fed367dd9bb995f9b82a8fa53553dac8e7077">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-11-29 03:34:10 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#issuecomment-3590952456
Original created: 2025-11-29T03:34:10Z

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🔴
Unsigned binary download

Description: The installer downloads and extracts a spire-agent binary over HTTPS without signature or
checksum verification, risking supply-chain compromise if the download is tampered or a
MITM occurs.
host-setup.sh [191-231]

Referred Code
version="${SPIRE_AGENT_VERSION:-1.11.2}"
archive="spire-${version}-darwin-arm64.tar.gz"
url="${SPIRE_AGENT_DOWNLOAD_URL:-https://github.com/spiffe/spire/releases/download/v${version}/${archive}}"
agent_path="${workspace}/bin/spire-agent"

if [[ -x "${agent_path}" ]]; then
  log "info" "SPIRE agent already present at ${agent_path}"
  return
fi

tmp_dir="$(mktemp -d)"
log "info" "attempting to download SPIRE agent ${version} to ${agent_path}"
if command -v curl >/dev/null 2>&1; then
  if ! curl -fsSL "${url}" -o "${tmp_dir}/${archive}"; then
    log "warn" "could not download SPIRE agent from ${url} (likely not published for darwin/arm64); set SPIRE_AGENT_PATH manually if needed"
    rm -rf "${tmp_dir}"
    return
  fi
else
  log "warn" "curl is required to download SPIRE agent; skipping download"
  rm -rf "${tmp_dir}"


 ... (clipped 20 lines)
Sensitive log exposure

Description: The embedded SPIRE agent process writes logs to a world-readable file (0644) and redirects
both stdout/stderr to it, potentially exposing sensitive join/attestation details or
tokens to other local users on multi-tenant hosts.
spire_agent.go [70-88]

Referred Code
logPath := filepath.Join(spireDir, "agent.log")
logFile, err := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644)
if err != nil {
	return "", fmt.Errorf("open agent log: %w", err)
}

cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath)
cmd.Stdout = logFile
cmd.Stderr = logFile

if err := cmd.Start(); err != nil {
	return "", fmt.Errorf("start embedded SPIRE agent: %w", err)
}

go func() {
	_ = cmd.Wait()
}()

if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err != nil {
Insecure file probing

Description: The waitForSocket function polls for the presence of arbitrary socket paths; when combined
with configurable paths from metadata/env in other flows, this can be abused to probe
filesystem state (information disclosure) or race against creation of sockets outside
intended directories.
deployment.go [167-183]

Referred Code
// waitForSocket polls for a Unix socket path to appear.
func waitForSocket(ctx context.Context, socketPath string, attempts int, delay time.Duration) error {
	for i := 0; i < attempts; i++ {
		select {
		case <-ctx.Done():
			return ctx.Err()
		default:
		}

		if info, err := os.Stat(socketPath); err == nil && info.Mode()&os.ModeSocket != 0 {
			return nil
		}
		time.Sleep(delay)
	}

	return fmt.Errorf("socket %s not available after %d attempts", socketPath, attempts)
}
Trust configuration spoofing

Description: Checker configuration generation accepts server_spiffe_id and workload socket from package
metadata without strict validation, enabling a malicious or tampered package to redirect
trust to an attacker-controlled SPIFFE ID or socket path (trust confusion / SSRF-like
local endpoint abuse).
config.go [281-324]

Referred Code
if b.pkg.CheckerKind == "sysmon-vm" || strings.EqualFold(b.pkg.ComponentID, "sysmon-vm") {
	// Emit sysmon-vm compatible config (matches pkg/checker/sysmonvm.Config)
	listenAddr := getStringFromMetadata(metadata, "listen_addr")
	if listenAddr == "" {
		listenAddr = "0.0.0.0:50110"
	}

	sampleInterval := getStringFromMetadata(metadata, "sample_interval")
	if sampleInterval == "" {
		sampleInterval = "250ms"
	}

	security := map[string]interface{}{
		"mode":            "spiffe",
		"role":            "checker",
		"trust_domain":    extractTrustDomain(b.pkg.DownstreamSPIFFEID),
		"workload_socket": workloadSocket,
	}
	if serverSPIFFE := getStringFromMetadata(metadata, "server_spiffe_id"); serverSPIFFE != "" {
		security["server_spiffe_id"] = serverSPIFFE
	}


 ... (clipped 23 lines)
Unverified socket trust

Description: In Docker deployments, the code auto-accepts any detected workload socket from fixed paths
without verifying ownership/permissions, which could allow a compromised container/host
path to supply a malicious socket endpoint.
spire.go [149-183]

Referred Code
override := strings.TrimSpace(os.Getenv("SPIRE_WORKLOAD_API"))
if override != "" {
	b.logger.Info().
		Str("workload_api_override", override).
		Msg("Using provided SPIRE workload API endpoint for checker")
	b.generatedConfigs["spire-workload-api-socket"] = []byte(override)
	return nil
}

metadata, _ := b.parseMetadata()
if socket := strings.TrimSpace(getStringFromMetadata(metadata, "spire_workload_api_socket")); socket != "" {
	b.logger.Info().
		Str("workload_api_socket", socket).
		Msg("Using metadata-provided SPIRE workload API endpoint for checker")
	b.generatedConfigs["spire-workload-api-socket"] = []byte(socket)
	return nil
}

// Docker: prefer existing workload sockets if mounted (poller/poller-nested)
if b.deploymentType == DeploymentTypeDocker {
	candidates := []string{


 ... (clipped 14 lines)
Unverified binary install

Description: The script installs a spire-agent binary if present without checksum/signature
verification or provenance checks, allowing a malicious local binary to be installed with
execute permissions.
host-install-macos.sh [43-48]

Referred Code
if [[ -x "${AGENT_SRC}" ]]; then
  install -m 0755 "${AGENT_SRC}" "${INSTALL_PREFIX}/spire-agent"
  echo "[info] installed SPIRE agent to ${INSTALL_PREFIX}/spire-agent for embedded sysmon-vm onboarding"
else
  echo "[warn] SPIRE agent not found at ${AGENT_SRC}; embedded onboarding will require SPIRE_AGENT_PATH or manual agent install" >&2
fi
Config injection risk

Description: The entrypoint executes the checker with a config path that may point to a writable path
in the container (/var/lib/serviceradar/config/checker.json), enabling config injection if
that path is writable by untrusted users or mounted from an untrusted source.
entrypoint-sysmon-vm.sh [1-11]

Referred Code
#!/usr/bin/env sh
set -eu

CONFIG="${CONFIG_PATH:-/etc/serviceradar/checkers/sysmon-vm.json}"

# Prefer generated config if onboarding wrote one
if [ -f /var/lib/serviceradar/config/checker.json ]; then
  CONFIG="/var/lib/serviceradar/config/checker.json"
fi

exec /usr/local/bin/serviceradar-sysmon-vm --config "${CONFIG}"
Insufficient server-side validation

Description: The client-side validator accepts arbitrary metadata_json and checker_config_json for
package creation, but if the backend does not robustly validate/limit these fields, this
increases risk of injecting untrusted configuration leading to trust manipulation or path
injection; ensure server-side validation and strict schemas.
page.tsx [640-736]

Referred Code
}

const componentType: EdgeComponentType =
  formState.componentType === 'agent' || formState.componentType === 'checker'
    ? formState.componentType
    : 'poller';
const componentId = formState.componentId.trim();
const parentId = formState.parentId.trim();
const checkerKind = formState.checkerKind.trim();
const checkerConfigJSON = formState.checkerConfigJSON.trim();
const metadataJSON = formState.metadataJSON.trim();

if (componentType === 'agent' && !parentId) {
  setFormError('Parent poller is required for agent packages');
  return;
}
if (componentType === 'checker' && !parentId) {
  setFormError('Parent agent is required for checker packages');
  return;
}
if (componentType === 'checker' && !checkerKind) {


 ... (clipped 76 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Secrets in logs: The embedded agent process redirects stdout/stderr of spire-agent to a log file without
safeguards, risking exposure of sensitive join tokens or credentials in logs.

Referred Code
logPath := filepath.Join(spireDir, "agent.log")
logFile, err := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644)
if err != nil {
	return "", fmt.Errorf("open agent log: %w", err)
}

cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath)
cmd.Stdout = logFile
cmd.Stderr = logFile

if err := cmd.Start(); err != nil {
	return "", fmt.Errorf("start embedded SPIRE agent: %w", err)
}

go func() {
	_ = cmd.Wait()
}()

if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err != nil {

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New critical actions like starting an embedded SPIRE agent and generating configs are
logged at info/debug but lack explicit, structured audit logging with actor context and
outcomes.

Referred Code
func (b *Bootstrapper) startEmbeddedSPIREAgent(ctx context.Context, spireDir, workloadSocketPath string) (string, error) {
	if b.downloadResult == nil {
		return "", ErrDownloadResultNotAvailable
	}
	if strings.TrimSpace(b.downloadResult.JoinToken) == "" {
		return "", ErrJoinTokenEmpty
	}

	upstreamAddr, upstreamPort, err := b.getSPIREAddressesForDeployment()
	if err != nil {
		return "", err
	}

	trustDomain := extractTrustDomain(b.pkg.DownstreamSPIFFEID)
	dataDir := filepath.Join(spireDir, "agent-data")
	socketDir := filepath.Dir(workloadSocketPath)
	if err := os.MkdirAll(socketDir, 0755); err != nil {
		return "", fmt.Errorf("create workload socket dir: %w", err)
	}
	if err := os.MkdirAll(dataDir, 0755); err != nil {
		return "", fmt.Errorf("create agent data dir: %w", err)


 ... (clipped 56 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Edge cases partial: While many errors are wrapped with context, functions like waitForSocket use fixed
attempts without configurable backoff and some paths rely on environment/metadata without
validating value formats (e.g., ports), which may require additional handling.

Referred Code
// waitForSocket polls for a Unix socket path to appear.
func waitForSocket(ctx context.Context, socketPath string, attempts int, delay time.Duration) error {
	for i := 0; i < attempts; i++ {
		select {
		case <-ctx.Done():
			return ctx.Err()
		default:
		}

		if info, err := os.Stat(socketPath); err == nil && info.Mode()&os.ModeSocket != 0 {
			return nil
		}
		time.Sleep(delay)
	}

	return fmt.Errorf("socket %s not available after %d attempts", socketPath, attempts)
}

func getStringFromMetadata(metadata map[string]interface{}, key string) string {
	if metadata == nil {
		return ""


 ... (clipped 10 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Raw error surfacing: The UI parses and displays backend error messages (parseErrorMessage) which may expose
internal details to end users if the API returns sensitive content.

Referred Code
  };
  const api = coreApiUrl.trim();
  if (api) {
    payload.api = api;
  }
  return `edgepkg-v1:${base64UrlEncode(JSON.stringify(payload))}`;
}

function parseErrorMessage(raw: string | null | undefined): string {
  if (!raw) {
    return '';
  }
  const trimmed = raw.trim();
  if (!trimmed) {
    return '';
  }
  try {
    const parsed = JSON.parse(trimmed);
    if (parsed && typeof parsed === 'object') {
      if ('message' in parsed && typeof parsed.message === 'string') {
        return parsed.message;


 ... (clipped 11 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Address validation: SPIRE upstream address and port are taken from env/metadata without validating format or
range, which could permit invalid or unsafe inputs.

Referred Code
// getSPIREAddressesForDeployment returns SPIRE server addresses based on deployment type.
func (b *Bootstrapper) getSPIREAddressesForDeployment() (address string, port string, err error) {
	metadata, _ := b.parseMetadata()

	addr := firstNonEmpty(
		os.Getenv("SPIRE_UPSTREAM_ADDRESS"),
		getStringFromMetadata(metadata, "spire_upstream_address"),
	)

	prt := firstNonEmpty(
		os.Getenv("SPIRE_UPSTREAM_PORT"),
		getStringFromMetadata(metadata, "spire_upstream_port"),
	)

	if addr == "" {
		return "", "", ErrSPIREUpstreamAddressNotFound
	}
	if prt == "" {
		return "", "", ErrSPIREUpstreamPortNotFound
	}



 ... (clipped 2 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2033#issuecomment-3590952456 Original created: 2025-11-29T03:34:10Z --- _You are nearing your monthly Qodo Merge usage quota. For more information, please visit [here](https://qodo-merge-docs.qodo.ai/installation/qodo_merge/#cloud-users)._ ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/3e659bdda801701a52150a18608ead6e0236f09a --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=1>🔴</td> <td><details><summary><strong>Unsigned binary download </strong></summary><br> <b>Description:</b> The installer downloads and extracts a <code>spire-agent</code> binary over HTTPS without signature or <br>checksum verification, risking supply-chain compromise if the download is tampered or a <br>MITM occurs.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-4c51ac711b8e7d2f318653a23e84576e448ec0f4341668a209b3467c3c1315eaR191-R231'>host-setup.sh [191-231]</a></strong><br> <details open><summary>Referred Code</summary> ```shell version="${SPIRE_AGENT_VERSION:-1.11.2}" archive="spire-${version}-darwin-arm64.tar.gz" url="${SPIRE_AGENT_DOWNLOAD_URL:-https://github.com/spiffe/spire/releases/download/v${version}/${archive}}" agent_path="${workspace}/bin/spire-agent" if [[ -x "${agent_path}" ]]; then log "info" "SPIRE agent already present at ${agent_path}" return fi tmp_dir="$(mktemp -d)" log "info" "attempting to download SPIRE agent ${version} to ${agent_path}" if command -v curl >/dev/null 2>&1; then if ! curl -fsSL "${url}" -o "${tmp_dir}/${archive}"; then log "warn" "could not download SPIRE agent from ${url} (likely not published for darwin/arm64); set SPIRE_AGENT_PATH manually if needed" rm -rf "${tmp_dir}" return fi else log "warn" "curl is required to download SPIRE agent; skipping download" rm -rf "${tmp_dir}" ... (clipped 20 lines) ``` </details></details></td></tr> <tr><td rowspan=7>⚪</td> <td><details><summary><strong>Sensitive log exposure </strong></summary><br> <b>Description:</b> The embedded SPIRE agent process writes logs to a world-readable file (<code>0644</code>) and redirects <br>both stdout/stderr to it, potentially exposing sensitive join/attestation details or <br>tokens to other local users on multi-tenant hosts.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-33b0f01d8a67d768586269f4e6d6fcfbe1fb28ea9cc51c98440e20868e183e92R70-R88'>spire_agent.go [70-88]</a></strong><br> <details open><summary>Referred Code</summary> ```go logPath := filepath.Join(spireDir, "agent.log") logFile, err := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644) if err != nil { return "", fmt.Errorf("open agent log: %w", err) } cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath) cmd.Stdout = logFile cmd.Stderr = logFile if err := cmd.Start(); err != nil { return "", fmt.Errorf("start embedded SPIRE agent: %w", err) } go func() { _ = cmd.Wait() }() if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err != nil { ``` </details></details></td></tr> <tr><td><details><summary><strong>Insecure file probing </strong></summary><br> <b>Description:</b> The <code>waitForSocket</code> function polls for the presence of arbitrary socket paths; when combined <br>with configurable paths from metadata/env in other flows, this can be abused to probe <br>filesystem state (information disclosure) or race against creation of sockets outside <br>intended directories.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-ec21f17eea21a6ea80f89a2f1d31cabaa524020c0a542eaa3ebf579c5197cd26R167-R183'>deployment.go [167-183]</a></strong><br> <details open><summary>Referred Code</summary> ```go // waitForSocket polls for a Unix socket path to appear. func waitForSocket(ctx context.Context, socketPath string, attempts int, delay time.Duration) error { for i := 0; i < attempts; i++ { select { case <-ctx.Done(): return ctx.Err() default: } if info, err := os.Stat(socketPath); err == nil && info.Mode()&os.ModeSocket != 0 { return nil } time.Sleep(delay) } return fmt.Errorf("socket %s not available after %d attempts", socketPath, attempts) } ``` </details></details></td></tr> <tr><td><details><summary><strong>Trust configuration spoofing </strong></summary><br> <b>Description:</b> Checker configuration generation accepts <code>server_spiffe_id</code> and workload socket from package <br>metadata without strict validation, enabling a malicious or tampered package to redirect <br>trust to an attacker-controlled SPIFFE ID or socket path (trust confusion / SSRF-like <br>local endpoint abuse).<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-e8ebde92155beb028a380b450be87da815cccfb50fede2faf7f10b1785d70f65R281-R324'>config.go [281-324]</a></strong><br> <details open><summary>Referred Code</summary> ```go if b.pkg.CheckerKind == "sysmon-vm" || strings.EqualFold(b.pkg.ComponentID, "sysmon-vm") { // Emit sysmon-vm compatible config (matches pkg/checker/sysmonvm.Config) listenAddr := getStringFromMetadata(metadata, "listen_addr") if listenAddr == "" { listenAddr = "0.0.0.0:50110" } sampleInterval := getStringFromMetadata(metadata, "sample_interval") if sampleInterval == "" { sampleInterval = "250ms" } security := map[string]interface{}{ "mode": "spiffe", "role": "checker", "trust_domain": extractTrustDomain(b.pkg.DownstreamSPIFFEID), "workload_socket": workloadSocket, } if serverSPIFFE := getStringFromMetadata(metadata, "server_spiffe_id"); serverSPIFFE != "" { security["server_spiffe_id"] = serverSPIFFE } ... (clipped 23 lines) ``` </details></details></td></tr> <tr><td><details><summary><strong>Unverified socket trust </strong></summary><br> <b>Description:</b> In Docker deployments, the code auto-accepts any detected workload socket from fixed paths <br>without verifying ownership/permissions, which could allow a compromised container/host <br>path to supply a malicious socket endpoint.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-5d784c136f3db0128ea90d628cbefdff9e2ba18eba9a1775f54fc630bde8ee1dR149-R183'>spire.go [149-183]</a></strong><br> <details open><summary>Referred Code</summary> ```go override := strings.TrimSpace(os.Getenv("SPIRE_WORKLOAD_API")) if override != "" { b.logger.Info(). Str("workload_api_override", override). Msg("Using provided SPIRE workload API endpoint for checker") b.generatedConfigs["spire-workload-api-socket"] = []byte(override) return nil } metadata, _ := b.parseMetadata() if socket := strings.TrimSpace(getStringFromMetadata(metadata, "spire_workload_api_socket")); socket != "" { b.logger.Info(). Str("workload_api_socket", socket). Msg("Using metadata-provided SPIRE workload API endpoint for checker") b.generatedConfigs["spire-workload-api-socket"] = []byte(socket) return nil } // Docker: prefer existing workload sockets if mounted (poller/poller-nested) if b.deploymentType == DeploymentTypeDocker { candidates := []string{ ... (clipped 14 lines) ``` </details></details></td></tr> <tr><td><details><summary><strong>Unverified binary install </strong></summary><br> <b>Description:</b> The script installs a <code>spire-agent</code> binary if present without checksum/signature <br>verification or provenance checks, allowing a malicious local binary to be installed with <br>execute permissions.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-b8b0940137a4ea6f6800b649649c83c52dea5d1c59c3a7e5baa5ca20405eeb54R43-R48'>host-install-macos.sh [43-48]</a></strong><br> <details open><summary>Referred Code</summary> ```shell if [[ -x "${AGENT_SRC}" ]]; then install -m 0755 "${AGENT_SRC}" "${INSTALL_PREFIX}/spire-agent" echo "[info] installed SPIRE agent to ${INSTALL_PREFIX}/spire-agent for embedded sysmon-vm onboarding" else echo "[warn] SPIRE agent not found at ${AGENT_SRC}; embedded onboarding will require SPIRE_AGENT_PATH or manual agent install" >&2 fi ``` </details></details></td></tr> <tr><td><details><summary><strong>Config injection risk </strong></summary><br> <b>Description:</b> The entrypoint executes the checker with a config path that may point to a writable path <br>in the container (<code>/var/lib/serviceradar/config/checker.json</code>), enabling config injection if <br>that path is writable by untrusted users or mounted from an untrusted source.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-89549c327c7446025d039bd5c116c8fe6b24da5f4b2569a93c9a88202bb009feR1-R11'>entrypoint-sysmon-vm.sh [1-11]</a></strong><br> <details open><summary>Referred Code</summary> ```shell #!/usr/bin/env sh set -eu CONFIG="${CONFIG_PATH:-/etc/serviceradar/checkers/sysmon-vm.json}" # Prefer generated config if onboarding wrote one if [ -f /var/lib/serviceradar/config/checker.json ]; then CONFIG="/var/lib/serviceradar/config/checker.json" fi exec /usr/local/bin/serviceradar-sysmon-vm --config "${CONFIG}" ``` </details></details></td></tr> <tr><td><details><summary><strong>Insufficient server-side validation</strong></summary><br> <b>Description:</b> The client-side validator accepts arbitrary <code>metadata_json</code> and <code>checker_config_json</code> for <br>package creation, but if the backend does not robustly validate/limit these fields, this <br>increases risk of injecting untrusted configuration leading to trust manipulation or path <br>injection; ensure server-side validation and strict schemas.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-d627c4ba6a102a46a28024061371276f01053b63f20c75986aef2631b967bafcR640-R736'>page.tsx [640-736]</a></strong><br> <details open><summary>Referred Code</summary> ```tsx } const componentType: EdgeComponentType = formState.componentType === 'agent' || formState.componentType === 'checker' ? formState.componentType : 'poller'; const componentId = formState.componentId.trim(); const parentId = formState.parentId.trim(); const checkerKind = formState.checkerKind.trim(); const checkerConfigJSON = formState.checkerConfigJSON.trim(); const metadataJSON = formState.metadataJSON.trim(); if (componentType === 'agent' && !parentId) { setFormError('Parent poller is required for agent packages'); return; } if (componentType === 'checker' && !parentId) { setFormError('Parent agent is required for checker packages'); return; } if (componentType === 'checker' && !checkerKind) { ... (clipped 76 lines) ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=1>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=1>🔴</td> <td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-33b0f01d8a67d768586269f4e6d6fcfbe1fb28ea9cc51c98440e20868e183e92R70-R88'><strong>Secrets in logs</strong></a>: The embedded agent process redirects stdout/stderr of spire-agent to a log file without <br>safeguards, risking exposure of sensitive join tokens or credentials in logs.<br> <details open><summary>Referred Code</summary> ```go logPath := filepath.Join(spireDir, "agent.log") logFile, err := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644) if err != nil { return "", fmt.Errorf("open agent log: %w", err) } cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath) cmd.Stdout = logFile cmd.Stderr = logFile if err := cmd.Start(); err != nil { return "", fmt.Errorf("start embedded SPIRE agent: %w", err) } go func() { _ = cmd.Wait() }() if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err != nil { ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=4>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-33b0f01d8a67d768586269f4e6d6fcfbe1fb28ea9cc51c98440e20868e183e92R23-R99'><strong>Missing audit logs</strong></a>: New critical actions like starting an embedded SPIRE agent and generating configs are <br>logged at info/debug but lack explicit, structured audit logging with actor context and <br>outcomes.<br> <details open><summary>Referred Code</summary> ```go func (b *Bootstrapper) startEmbeddedSPIREAgent(ctx context.Context, spireDir, workloadSocketPath string) (string, error) { if b.downloadResult == nil { return "", ErrDownloadResultNotAvailable } if strings.TrimSpace(b.downloadResult.JoinToken) == "" { return "", ErrJoinTokenEmpty } upstreamAddr, upstreamPort, err := b.getSPIREAddressesForDeployment() if err != nil { return "", err } trustDomain := extractTrustDomain(b.pkg.DownstreamSPIFFEID) dataDir := filepath.Join(spireDir, "agent-data") socketDir := filepath.Dir(workloadSocketPath) if err := os.MkdirAll(socketDir, 0755); err != nil { return "", fmt.Errorf("create workload socket dir: %w", err) } if err := os.MkdirAll(dataDir, 0755); err != nil { return "", fmt.Errorf("create agent data dir: %w", err) ... (clipped 56 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-ec21f17eea21a6ea80f89a2f1d31cabaa524020c0a542eaa3ebf579c5197cd26R167-R197'><strong>Edge cases partial</strong></a>: While many errors are wrapped with context, functions like waitForSocket use fixed <br>attempts without configurable backoff and some paths rely on environment/metadata without <br>validating value formats (e.g., ports), which may require additional handling.<br> <details open><summary>Referred Code</summary> ```go // waitForSocket polls for a Unix socket path to appear. func waitForSocket(ctx context.Context, socketPath string, attempts int, delay time.Duration) error { for i := 0; i < attempts; i++ { select { case <-ctx.Done(): return ctx.Err() default: } if info, err := os.Stat(socketPath); err == nil && info.Mode()&os.ModeSocket != 0 { return nil } time.Sleep(delay) } return fmt.Errorf("socket %s not available after %d attempts", socketPath, attempts) } func getStringFromMetadata(metadata map[string]interface{}, key string) string { if metadata == nil { return "" ... (clipped 10 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-d627c4ba6a102a46a28024061371276f01053b63f20c75986aef2631b967bafcR265-R296'><strong>Raw error surfacing</strong></a>: The UI parses and displays backend error messages (parseErrorMessage) which may expose <br>internal details to end users if the API returns sensitive content.<br> <details open><summary>Referred Code</summary> ```tsx }; const api = coreApiUrl.trim(); if (api) { payload.api = api; } return `edgepkg-v1:${base64UrlEncode(JSON.stringify(payload))}`; } function parseErrorMessage(raw: string | null | undefined): string { if (!raw) { return ''; } const trimmed = raw.trim(); if (!trimmed) { return ''; } try { const parsed = JSON.parse(trimmed); if (parsed && typeof parsed === 'object') { if ('message' in parsed && typeof parsed.message === 'string') { return parsed.message; ... (clipped 11 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2033/files#diff-ec21f17eea21a6ea80f89a2f1d31cabaa524020c0a542eaa3ebf579c5197cd26R143-R165'><strong>Address validation</strong></a>: SPIRE upstream address and port are taken from env/metadata without validating format or <br>range, which could permit invalid or unsafe inputs.<br> <details open><summary>Referred Code</summary> ```go // getSPIREAddressesForDeployment returns SPIRE server addresses based on deployment type. func (b *Bootstrapper) getSPIREAddressesForDeployment() (address string, port string, err error) { metadata, _ := b.parseMetadata() addr := firstNonEmpty( os.Getenv("SPIRE_UPSTREAM_ADDRESS"), getStringFromMetadata(metadata, "spire_upstream_address"), ) prt := firstNonEmpty( os.Getenv("SPIRE_UPSTREAM_PORT"), getStringFromMetadata(metadata, "spire_upstream_port"), ) if addr == "" { return "", "", ErrSPIREUpstreamAddressNotFound } if prt == "" { return "", "", ErrSPIREUpstreamPortNotFound } ... (clipped 2 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2025-11-29 03:35:30 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#issuecomment-3590953110
Original created: 2025-11-29T03:35:30Z

PR Code Suggestions

Latest suggestions up to 850d878

CategorySuggestion                                                                                                                                    Impact
Incremental [*]
Sanitize CA file paths
Suggestion Impact:The commit added validation to resolve absolute paths and ensure ca_cert_path and ca_key_path reside within certDir, and then used the sanitized absolute paths for file reads, mitigating path traversal risks.

code diff:

+	// Validate paths to prevent path traversal attacks
+	absCertDir, err := filepath.Abs(certDir)
+	if err != nil {
+		return nil, fmt.Errorf("invalid cert_dir path: %w", err)
+	}
+	absCaPath, err := filepath.Abs(caPath)
+	if err != nil {
+		return nil, fmt.Errorf("invalid ca_cert_path: %w", err)
+	}
+	absCaKeyPath, err := filepath.Abs(caKeyPath)
+	if err != nil {
+		return nil, fmt.Errorf("invalid ca_key_path: %w", err)
+	}
+	// Ensure CA cert and key paths are within the allowed certDir
+	if !strings.HasPrefix(absCaPath, absCertDir+string(filepath.Separator)) {
+		return nil, fmt.Errorf("ca_cert_path %q is outside allowed directory %q", caPath, certDir)
+	}
+	if !strings.HasPrefix(absCaKeyPath, absCertDir+string(filepath.Separator)) {
+		return nil, fmt.Errorf("ca_key_path %q is outside allowed directory %q", caKeyPath, certDir)
+	}
+
+	caPEM, err := os.ReadFile(absCaPath)
+	if err != nil {
+		return nil, fmt.Errorf("read ca cert from %s: %w", absCaPath, err)
 	}
 	caCert, err := parseCACertificate(caPEM)
 	if err != nil {
 		return nil, fmt.Errorf("parse ca certificate: %w", err)
 	}
 
-	caKeyBytes, err := os.ReadFile(caKeyPath)
-	if err != nil {
-		return nil, fmt.Errorf("read ca key from %s: %w", caKeyPath, err)
+	caKeyBytes, err := os.ReadFile(absCaKeyPath)
+	if err != nil {
+		return nil, fmt.Errorf("read ca key from %s: %w", absCaKeyPath, err)
 	}

Sanitize the ca_cert_path, ca_key_path, and cert_dir metadata values to prevent
path traversal attacks. Ensure all file access is constrained to a trusted base
directory.

pkg/core/edge_onboarding.go [1949-1965]

+certDir := strings.TrimSpace(metadata["cert_dir"])
+if certDir == "" {
+	certDir = defaultCertDir
+}
+// sanitize certDir
+certDir = filepath.Clean(certDir)
+if filepath.IsAbs(certDir) {
+	// only allow absolute certDir if it starts with defaultCertDir
+	base := filepath.Clean(defaultCertDir)
+	if !strings.HasPrefix(certDir+string(os.PathSeparator), base+string(os.PathSeparator)) && certDir != base {
+		return nil, fmt.Errorf("invalid cert_dir outside of base: %s", certDir)
+	}
+} else {
+	// make relative to base
+	certDir = filepath.Join(defaultCertDir, certDir)
+}
+
+caPath := firstNonEmpty(strings.TrimSpace(metadata["ca_cert_path"]), filepath.Join(certDir, "root.pem"))
+caKeyPath := firstNonEmpty(strings.TrimSpace(metadata["ca_key_path"]), filepath.Join(certDir, "root-key.pem"))
+
+// sanitize file paths and ensure they resolve under certDir
+sanitize := func(base, p string) (string, error) {
+	p = filepath.Clean(p)
+	if !filepath.IsAbs(p) {
+		p = filepath.Join(base, p)
+	}
+	baseClean := filepath.Clean(base)
+	if !strings.HasPrefix(p+string(os.PathSeparator), baseClean+string(os.PathSeparator)) && p != baseClean {
+		return "", fmt.Errorf("invalid path outside of base: %s", p)
+	}
+	return p, nil
+}
+
+var err error
+if caPath, err = sanitize(certDir, caPath); err != nil {
+	return nil, err
+}
+if caKeyPath, err = sanitize(certDir, caKeyPath); err != nil {
+	return nil, err
+}
+
 caPEM, err := os.ReadFile(caPath)
 if err != nil {
 	return nil, fmt.Errorf("read ca cert from %s: %w", caPath, err)
 }
 caCert, err := parseCACertificate(caPEM)
 if err != nil {
 	return nil, fmt.Errorf("parse ca certificate: %w", err)
 }
 
 caKeyBytes, err := os.ReadFile(caKeyPath)
 if err != nil {
 	return nil, fmt.Errorf("read ca key from %s: %w", caKeyPath, err)
 }
 caKey, err := parsePrivateKey(caKeyBytes)
 if err != nil {
 	return nil, fmt.Errorf("parse ca key: %w", err)
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: This suggestion addresses a critical path traversal vulnerability, where a malicious user could provide crafted metadata to read arbitrary files from the server's filesystem.

High
Enforce HTTPS and timeouts

Enforce HTTPS for Core API connections and set a default timeout on the HTTP
client. This prevents insecure communication and potential hangs during API
requests.

pkg/edgeonboarding/mtls/bootstrap.go [145-159]

 req, err := http.NewRequestWithContext(ctx, http.MethodPost, deliverURL, strings.NewReader(body))
 if err != nil {
 	return nil, fmt.Errorf("create deliver request: %w", err)
 }
 req.Header.Set("Content-Type", "application/json")
 req.Header.Set("Accept", "application/json")
 
+// Enforce HTTPS by default
+if !strings.HasPrefix(deliverURL, "https://") {
+	// Allow http only if explicitly configured by caller via cfg.Host starting with http://
+	u := strings.TrimSpace(cfg.Host)
+	if !strings.HasPrefix(u, "http://") {
+		return nil, fmt.Errorf("insecure core api url; https required: %s", deliverURL)
+	}
+}
+
 client := cfg.HTTPClient
 if client == nil {
-	client = http.DefaultClient
+	client = &http.Client{Timeout: 15 * time.Second}
 }
 
 resp, err := client.Do(req)
+if err != nil {
+	return nil, fmt.Errorf("request deliver endpoint failed: %w", err)
+}

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This suggestion improves security by enforcing HTTPS for API communication and adds robustness by setting a default HTTP client timeout, preventing the application from hanging.

Medium
Avoid forcing TLS on non-TLS modes

Conditionally apply the .cnpg.tls configuration in jq only when CNPG_SSL_MODE is
not disable, and add server_name to the TLS settings for proper hostname
verification.

docker/compose/update-config.sh [139-161]

 jq --arg host "$CNPG_HOST_VALUE" \
    --argjson port "${CNPG_PORT_VALUE:-5432}" \
    --arg db "$CNPG_DATABASE_VALUE" \
    --arg user "$CNPG_USERNAME_VALUE" \
    --arg ssl "$CNPG_SSL_MODE_VALUE" \
    --arg ca "$CNPG_CA_FILE_VALUE" \
    --arg cert "$CNPG_CERT_FILE_VALUE" \
    --arg key "$CNPG_KEY_FILE_VALUE" \
+   --arg sni "${CNPG_HOST_VALUE}.serviceradar" \
    '
    .cnpg = (.cnpg // {})
    | .cnpg.host = $host
    | .cnpg.port = ($port | tonumber)
    | .cnpg.database = $db
    | .cnpg.username = $user
    | .cnpg.ssl_mode = $ssl
-   | .cnpg.tls = {
-       ca_file: $ca,
-       cert_file: $cert,
-       key_file: $key
-     }
+   | .cnpg.tls = (if ($ssl|ascii_downcase) == "disable"
+                  then null
+                  else {
+                    ca_file: $ca,
+                    cert_file: $cert,
+                    key_file: $key,
+                    server_name: $sni
+                  }
+                  end)
    ' "$CONFIG_PATH" > /tmp/config-updated.json
 mv /tmp/config-updated.json "$CONFIG_PATH"
 echo "✅ Ensured CNPG TLS config (host=$CNPG_HOST_VALUE port=$CNPG_PORT_VALUE ssl_mode=$CNPG_SSL_MODE_VALUE)"

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the script unconditionally sets TLS configuration, which would cause connection failures if CNPG_SSL_MODE is disable, and also adds server_name which is crucial for verify-full mode.

Medium
Regenerate when key is missing

In generate-certs.sh, check for the existence of both the certificate and its
key file. If the key is missing, remove the existing certificate and regenerate
the pair to prevent startup failures.

docker/compose/generate-certs.sh [67-74]

 if [ -f "$CERT_DIR/$component.pem" ]; then
-    echo "Certificate for $component already exists, ensuring permissions."
-    chmod 600 "$CERT_DIR/$component-key.pem" 2>/dev/null || true
-    if [ "$component" = "cnpg" ]; then
-        chown 26:999 "$CERT_DIR/$component-key.pem" "$CERT_DIR/$component.pem" 2>/dev/null || true
+    if [ -f "$CERT_DIR/$component-key.pem" ]; then
+        echo "Certificate for $component already exists, ensuring permissions."
+        chmod 600 "$CERT_DIR/$component-key.pem" 2>/dev/null || true
+        if [ "$component" = "cnpg" ]; then
+            chown 26:999 "$CERT_DIR/$component-key.pem" "$CERT_DIR/$component.pem" 2>/dev/null || true
+        fi
+        return
+    else
+        echo "Certificate for $component exists but key is missing; regenerating pair."
+        rm -f "$CERT_DIR/$component.pem" 2>/dev/null || true
+        # fall through to generation below
     fi
-    return
 fi
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a state where a certificate exists but its key is missing, which would prevent services from starting; the proposed fix to regenerate the pair is a robust solution.

Medium
Fix NATS mTLS verification

In db-event-writer.mtls.json, update the nats_security configuration to align
the server_name with other services and add the missing client_ca_file to ensure
successful mTLS connections.

docker/compose/db-event-writer.mtls.json [1-19]

 {
   "listen_addr": "0.0.0.0:50041",
   "nats_url": "tls://nats:4222",
   "partition": "docker",
   "stream_name": "events",
   "consumer_name": "db-event-writer",
   "agent_id": "docker-db-event-writer",
   "poller_id": "docker-poller",
   "nats_security": {
     "mode": "mtls",
     "cert_dir": "/etc/serviceradar/certs",
-    "server_name": "nats",
+    "server_name": "nats.serviceradar",
     "role": "client",
     "tls": {
       "cert_file": "/etc/serviceradar/certs/db-event-writer.pem",
       "key_file": "/etc/serviceradar/certs/db-event-writer-key.pem",
-      "ca_file": "/etc/serviceradar/certs/root.pem"
+      "ca_file": "/etc/serviceradar/certs/root.pem",
+      "client_ca_file": "/etc/serviceradar/certs/root.pem"
     }
   },
   ...
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies inconsistent mTLS configuration that would likely cause connection failures, making it a critical fix for the service's functionality.

Medium
Suppress join token for mTLS

Explicitly clear the JoinToken and BundlePEM from the API response when an
MTLSBundle is present. This ensures that SPIRE-related artifacts are not sent
for mTLS packages.

pkg/core/api/edge_onboarding.go [543-546]

+if format == "json" {
+	payload := edgePackageDeliverResponse{
+		Package:   toEdgePackageView(result.Package),
+		JoinToken: result.JoinToken,
+		BundlePEM: string(result.BundlePEM),
+	}
+	if result.MTLSBundle != nil {
+		payload.MTLSBundle = json.RawMessage(result.MTLSBundle)
+		// Never return join token for mTLS packages
+		payload.JoinToken = ""
+		payload.BundlePEM = ""
+	}
+	s.writeJSON(w, http.StatusOK, payload)
+	return
+}
+
 if result.MTLSBundle != nil {
 	writeError(w, "mTLS packages must be downloaded with format=json", http.StatusBadRequest)
 	return
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: This is a good defense-in-depth measure to ensure that a JoinToken is never sent for mTLS packages, even if populated by mistake, thus preventing potential confusion or misuse.

Medium
Enforce required poller for children

In the handleCreate function, add a validation check to ensure a poller_id is
present before submitting the form for 'agent' or 'checker' component types to
prevent backend errors.

web/src/app/admin/edge-packages/page.tsx [713-831]

 const handleCreate = async (e: React.FormEvent<HTMLFormElement>) => {
   e.preventDefault();
   setActionError(null);
   const componentType: EdgeComponentType =
     formState.componentType === 'agent' || formState.componentType === 'checker'
       ? formState.componentType
       : 'poller';
   const securityMode: SecurityMode = formState.securityMode === 'mtls' ? 'mtls' : 'spire';
   const componentId = formState.componentId.trim();
   const parentId = formState.parentId.trim();
   const checkerKind = formState.checkerKind.trim();
   const checkerConfigJSON = formState.checkerConfigJSON.trim();
   const metadataJSON = formState.metadataJSON.trim();
   ...
   if (componentType === 'checker' && !checkerKind) {
     setFormError('Checker kind is required for checker packages');
     setActionError('Checker kind is required for checker packages');
     return;
   }
   ...
+  // derive poller override from parent/poller selection (existing logic)
+  const pollerOverride =
+    componentType === 'agent' || componentType === 'checker'
+      ? formState.pollerId.trim() || agentPollerMap.get(parentId) || ''
+      : '';
+  const pollerIdForPayload =
+    componentType === 'agent'
+      ? parentId
+      : componentType === 'checker' && pollerOverride
+        ? pollerOverride
+        : undefined;
+  // enforce poller_id presence for agent/checker
+  if ((componentType === 'agent' || componentType === 'checker') && !pollerIdForPayload) {
+    const message = 'Poller is required for agent and checker packages';
+    setFormError(message);
+    setActionError(message);
+    return;
+  }
+  ...
   let metadataObject: Record<string, unknown> = {};
   if (metadataJSON) {
     try {
       metadataObject = JSON.parse(metadataJSON) as Record<string, unknown>;
     } catch {
       const message = 'Metadata JSON must be valid JSON';
       setFormError(message);
       setActionError(message);
       return;
     }
   }
   ...
   if (securityMode === 'mtls') {
     metadataObject['security_mode'] = 'mtls';
   } else if (metadataObject['security_mode'] === 'mtls') {
     delete metadataObject['security_mode'];
   }
-  ...
-  const payload: {
-    label: string;
-    component_type: EdgeComponentType;
-    component_id?: string;
-    parent_id?: string;
-    poller_id?: string;
-    site?: string;
-    selectors?: string[];
-    metadata_json?: string;
-    checker_kind?: string;
-    checker_config_json?: string;
-    notes?: string;
-    downstream_spiffe_id?: string;
-    join_token_ttl_seconds?: number;
-    download_token_ttl_seconds?: number;
-  } = {
+  const metadataValue = Object.keys(metadataObject).length > 0 ? JSON.stringify(metadataObject) : undefined;
+  const payload = {
     label: formState.label.trim(),
     component_type: componentType,
     component_id: componentId || undefined,
     parent_id: parentId || undefined,
     poller_id: pollerIdForPayload,
     site: formState.site.trim() || undefined,
     selectors: selectors.length > 0 ? selectors : undefined,
     metadata_json: metadataValue,
     checker_kind: componentType === 'checker' ? checkerKind : undefined,
     checker_config_json: componentType === 'checker' && checkerConfigJSON ? checkerConfigJSON : undefined,
     notes: formState.notes.trim() || undefined,
     downstream_spiffe_id: securityMode === 'mtls' ? undefined : formState.downstreamSPIFFEID.trim() || undefined,
     join_token_ttl_seconds: joinMinutes ? Math.max(0, Math.round(Number(joinMinutes) * 60)) : undefined,
     download_token_ttl_seconds: downloadMinutes ? Math.max(0, Math.round(Number(downloadMinutes) * 60)) : undefined,
   };
+  ...
+};

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a missing validation for poller_id which could lead to backend errors, improving the form's robustness by adding a client-side guard.

Medium
Avoid duplicate column migration

Remove the redundant security_mode column definition from either the initial
schema or the subsequent migration file to prevent potential migration failures
and ensure schema consistency.

pkg/db/cnpg/migrations/00000000000001_timescale_schema.up.sql [373-381]

 CREATE TABLE IF NOT EXISTS edge_onboarding_packages (
     poller_id              TEXT,
     site                   TEXT,
     status                 TEXT             DEFAULT 'pending',
-    security_mode          TEXT             DEFAULT 'spire',
     downstream_entry_id    TEXT             DEFAULT '',
     downstream_spiffe_id   TEXT             DEFAULT '',
     selectors              TEXT[]           DEFAULT '{}',
     ...
 );

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a redundant database migration, which is a significant maintainability and correctness issue that could lead to future migration failures.

Medium
Security
Validate CA certificate capabilities

In parseCACertificate, validate that the parsed certificate has the IsCA flag
set and includes the KeyUsageCertSign key usage before returning it.

pkg/core/edge_onboarding.go [2085-2097]

 func parseCACertificate(pemBytes []byte) (*x509.Certificate, error) {
 	block, _ := pem.Decode(pemBytes)
 	if block == nil {
 		return nil, ErrCACertNoPEMBlock
 	}
 
 	cert, err := x509.ParseCertificate(block.Bytes)
 	if err != nil {
 		return nil, err
 	}
 
+	// Validate that the certificate can act as a CA
+	if !cert.IsCA {
+		return nil, fmt.Errorf("ca certificate: not a CA")
+	}
+	// Must be allowed to sign certificates
+	required := x509.KeyUsageCertSign
+	if cert.KeyUsage&required == 0 {
+		return nil, fmt.Errorf("ca certificate: missing KeyUsageCertSign")
+	}
+
 	return cert, nil
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This is a valid security enhancement that ensures only proper CA certificates can be used for signing, preventing misconfiguration and potential security weaknesses.

Medium
Reject encrypted CA keys explicitly

In parsePrivateKey, add a check using x509.IsEncryptedPEMBlock to explicitly
reject encrypted private keys, as they are not supported.

pkg/core/edge_onboarding.go [2099-2124]

 func parsePrivateKey(pemBytes []byte) (crypto.Signer, error) {
 	block, _ := pem.Decode(pemBytes)
 	if block == nil {
 		return nil, ErrCAKeyNoPEMBlock
+	}
+	// Explicitly reject encrypted keys (no passphrase support here)
+	if x509.IsEncryptedPEMBlock(block) || strings.Contains(strings.ToUpper(block.Headers["Proc-Type"]), "ENCRYPTED") {
+		return nil, fmt.Errorf("ca key: encrypted PEM not supported")
 	}
 
 	if key, err := x509.ParsePKCS1PrivateKey(block.Bytes); err == nil {
 		return key, nil
 	}
 	if key, err := x509.ParseECPrivateKey(block.Bytes); err == nil {
 		return key, nil
 	}
 	parsed, err := x509.ParsePKCS8PrivateKey(block.Bytes)
 	if err != nil {
 		return nil, err
 	}
 
 	switch key := parsed.(type) {
 	case *rsa.PrivateKey:
 		return key, nil
 	case *ecdsa.PrivateKey:
 		return key, nil
 	default:
 		return nil, fmt.Errorf("%w: %T", ErrCAKeyUnsupportedType, key)
 	}
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that encrypted private keys are not supported and adds an explicit check, which improves error handling and provides clearer feedback to the user.

Medium
Fix permissions for existing certs

Add a chmod 644 command for the existing certificate file
($CERT_DIR/$component.pem) to ensure its permissions are correctly set, matching
the behavior for newly generated certificates.

docker/compose/generate-certs.sh [67-74]

 if [ -f "$CERT_DIR/$component.pem" ]; then
     echo "Certificate for $component already exists, ensuring permissions."
+    chmod 644 "$CERT_DIR/$component.pem" 2>/dev/null || true
     chmod 600 "$CERT_DIR/$component-key.pem" 2>/dev/null || true
     if [ "$component" = "cnpg" ]; then
         chown 26:999 "$CERT_DIR/$component-key.pem" "$CERT_DIR/$component.pem" 2>/dev/null || true
     fi
     return
 fi
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out an inconsistency in file permissions and improves security by ensuring existing certificate files have the same strict permissions as newly generated ones.

Low
Possible issue
Cap client TTL to CA expiry

Before calling mintClientCertificate, ensure the client certificate's TTL does
not extend beyond the CA certificate's expiration date by capping it if
necessary.

pkg/core/edge_onboarding.go [2012-2015]

+// Cap client cert validity to CA's NotAfter
+caExpiry := caCert.NotAfter.UTC()
+desiredNotAfter := now.UTC().Add(clientCertTTL)
+if desiredNotAfter.After(caExpiry) {
+	clientCertTTL = caExpiry.Sub(now.UTC())
+	if clientCertTTL <= 0 {
+		return nil, fmt.Errorf("client cert ttl exceeds CA validity")
+	}
+}
+
 clientCert, clientKey, err := mintClientCertificate(caCert, caKey, clientName, serverName, endpoints, now, clientCertTTL)
 if err != nil {
 	return nil, err
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This is a valid correctness fix that prevents issuing client certificates that outlive their signing CA, which would lead to validation failures.

Medium
Fix mTLS server name verification

Update the server_name values for kv_security and security to match the Docker
Compose service names (datasvc and agent) to prevent TLS hostname verification
failures.

docker/compose/agent.mtls.json [1-53]

 {
   "kv_address": "datasvc:50057",
   "kv_security": {
     "mode": "mtls",
     "cert_dir": "/etc/serviceradar/certs",
-    "server_name": "datasvc.serviceradar",
+    "server_name": "datasvc", 
     "role": "agent",
     "tls": {
       "cert_file": "/etc/serviceradar/certs/agent.pem",
       "key_file": "/etc/serviceradar/certs/agent-key.pem",
       "ca_file": "/etc/serviceradar/certs/root.pem",
       "client_ca_file": "/etc/serviceradar/certs/root.pem"
     }
   },
   "security": {
     "mode": "mtls",
     "cert_dir": "/etc/serviceradar/certs",
-    "server_name": "agent.serviceradar",
+    "server_name": "agent", 
     "role": "agent",
     "tls": {
       "cert_file": "/etc/serviceradar/certs/agent.pem",
       "key_file": "/etc/serviceradar/certs/agent-key.pem",
       "ca_file": "/etc/serviceradar/certs/root.pem",
       "client_ca_file": "/etc/serviceradar/certs/root.pem"
     }
   },
   "logging": {
     "level": "info",
     "debug": false,
     "output": "stdout",
     "time_format": "",
     "otel": {
       "enabled": false,
       "endpoint": "",
       "service_name": "serviceradar-agent",
       "batch_timeout": "5s",
       "insecure": false,
       "tls": {
         "cert_file": "/etc/serviceradar/certs/agent.pem",
         "key_file": "/etc/serviceradar/certs/agent-key.pem",
         "ca_file": "/etc/serviceradar/certs/root.pem"
       }
     }
   }
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a likely mTLS handshake failure due to a server_name mismatch in the Docker Compose environment, which is a critical bug for the new feature's functionality.

Medium
Add NATS client TLS parameters

Add a nats_tls configuration block with CA, client certificate, and key paths to
the netflow-consumer.mtls.json file to enable a successful mTLS connection to
the NATS server.

docker/compose/netflow-consumer.mtls.json [1-17]

 {
   "listen_addr": "0.0.0.0:50120",
   "nats_url": "tls://nats:4222",
   "stream_name": "events",
   "consumer_name": "netflow-consumer",
   "enabled_fields": [],
   "disabled_fields": [],
   "dictionaries": [],
+  "nats_tls": {
+    "ca_file": "/etc/serviceradar/certs/root.pem",
+    "cert_file": "/etc/serviceradar/certs/netflow-consumer.pem",
+    "key_file": "/etc/serviceradar/certs/netflow-consumer-key.pem",
+    "server_name": "nats" 
+  },
   "cnpg": {
     "host": "cnpg",
     "port": 5432,
     "database": "serviceradar",
     "username": "serviceradar",
     "password": "serviceradar",
     "ssl_mode": "disable"
   }
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the NATS connection will fail because the nats_url specifies TLS but no TLS configuration is provided, which is a critical bug in the new configuration.

Medium
Conditionally apply CNPG TLS config

Modify the jq command to conditionally create the .cnpg.tls object only when the
ssl_mode is not disable.

docker/compose/entrypoint-db-event-writer.sh [139-160]

 jq --arg host "$CNPG_HOST_VALUE" \
    --argjson port "${CNPG_PORT_VALUE:-5432}" \
    --arg db "$CNPG_DATABASE_VALUE" \
    --arg user "$CNPG_USERNAME_VALUE" \
    --arg ssl "$CNPG_SSL_MODE_VALUE" \
    --arg ca "$CNPG_CA_FILE_VALUE" \
    --arg cert "$CNPG_CERT_FILE_VALUE" \
    --arg key "$CNPG_KEY_FILE_VALUE" \
    '
    .cnpg = (.cnpg // {})
    | .cnpg.host = $host
    | .cnpg.port = ($port | tonumber)
    | .cnpg.database = $db
    | .cnpg.username = $user
    | .cnpg.ssl_mode = $ssl
-   | .cnpg.tls = {
+   | .cnpg.tls = (if ($ssl|ascii_downcase) != "disable" then {
        ca_file: $ca,
        cert_file: $cert,
        key_file: $key
-     }
+     } else empty end)
    ' "$CONFIG_PATH" > /tmp/config-updated.json
 mv /tmp/config-updated.json "$CONFIG_PATH"
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion fixes a bug where the script generates an incorrect configuration by always including a tls block, even when ssl_mode is disable.

Medium
Validate metadata JSON before jq

Add a check to validate the $EDGE_DEFAULT_META variable as JSON before using it
in the jq command, falling back to an empty object if it's invalid to prevent
script failure.

docker/compose/update-config.sh [275-287]

+# Validate EDGE_DEFAULT_META JSON before applying
+if echo "$EDGE_DEFAULT_META" | jq -e . >/dev/null 2>&1; then
+    EDGE_META_SAFE="$EDGE_DEFAULT_META"
+else
+    echo "⚠️  EDGE_DEFAULT_META is not valid JSON; using {}"
+    EDGE_META_SAFE="{}"
+fi
+
 jq --arg key "$EDGE_ONBOARDING_KEY" \
-   --arg meta "$EDGE_DEFAULT_META" \
+   --arg meta "$EDGE_META_SAFE" \
    '
    .edge_onboarding = (.edge_onboarding // {})
    | .edge_onboarding.enabled = true
    | .edge_onboarding.encryption_key = $key
    | .edge_onboarding.default_selectors = (.edge_onboarding.default_selectors // ["unix:uid:0","unix:gid:0","unix:user:root","unix:group:root"])
    | .edge_onboarding.default_metadata = (.edge_onboarding.default_metadata // {})
    | .edge_onboarding.default_metadata.checker = (if $meta != "{}" then ($meta|fromjson) else (.edge_onboarding.default_metadata.checker // {}) end)
    | .edge_onboarding.join_token_ttl = (.edge_onboarding.join_token_ttl // "15m")
    | .edge_onboarding.download_token_ttl = (.edge_onboarding.download_token_ttl // "10m")
    ' "$CORE_CONFIG" > "$CORE_CONFIG.tmp"
 mv "$CORE_CONFIG.tmp" "$CORE_CONFIG"
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that invalid JSON in the $EDGE_DEFAULT_META variable would crash the script and proposes a robust validation check to prevent this failure.

Low
  • Update

Previous suggestions

Suggestions up to commit 3e659bd
CategorySuggestion                                                                                                                                    Impact
High-level
Consider a more robust process supervision model

The embedded SPIRE agent is launched without supervision. Implement a strategy
where the main application exits if the agent process dies, enabling external
supervisors to manage restarts correctly.

Examples:

pkg/edgeonboarding/spire_agent.go [80-86]
	if err := cmd.Start(); err != nil {
		return "", fmt.Errorf("start embedded SPIRE agent: %w", err)
	}

	go func() {
		_ = cmd.Wait()
	}()

Solution Walkthrough:

Before:

// pkg/edgeonboarding/spire_agent.go
func (b *Bootstrapper) startEmbeddedSPIREAgent(...) (string, error) {
    // ...
    cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath)
    if err := cmd.Start(); err != nil {
        return "", fmt.Errorf("start embedded SPIRE agent: %w", err)
    }

    // The agent process is not supervised.
    // If it dies, the parent process continues running.
    go func() {
        _ = cmd.Wait()
    }()

    if err := waitForSocket(ctx, workloadSocket, ...); err != nil {
        return "", fmt.Errorf("wait for workload socket: %w", err)
    }
    // ...
    return "unix:...", nil
}

After:

// pkg/edgeonboarding/spire_agent.go
func (b *Bootstrapper) startEmbeddedSPIREAgent(...) (string, error) {
    // ...
    cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath)
    if err := cmd.Start(); err != nil {
        return "", fmt.Errorf("start embedded SPIRE agent: %w", err)
    }

    // Supervise the agent process.
    go func() {
        err := cmd.Wait()
        // If the agent exits, log and terminate the parent process.
        b.logger.Error().Err(err).Msg("Embedded SPIRE agent exited unexpectedly, shutting down.")
        // This could be a call to a shutdown function, or os.Exit(1)
        // to ensure the entire service is restarted by its supervisor.
        os.Exit(1)
    }()

    if err := waitForSocket(ctx, workloadSocket, ...); err != nil {
        return "", fmt.Errorf("wait for workload socket: %w", err)
    }
    // ...
    return "unix:...", nil
}

Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical reliability flaw in the new embedded agent feature, where the parent process does not handle the unexpected termination of the child spire-agent, leading to silent failures.

High
Possible issue
Handle embedded agent process exit

Handle the exit status of the embedded SPIRE agent process to avoid silent
failures. Use channels and a select statement to concurrently wait for the
agent's socket to become available or for the process to exit, reporting any
premature exit as an error.

pkg/edgeonboarding/spire_agent.go [80-90]

 	if err := cmd.Start(); err != nil {
 		return "", fmt.Errorf("start embedded SPIRE agent: %w", err)
 	}
 
+	agentErrCh := make(chan error, 1)
 	go func() {
-		_ = cmd.Wait()
+		agentErrCh <- cmd.Wait()
 	}()
 
-	if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err != nil {
-		return "", fmt.Errorf("wait for workload socket: %w", err)
+	socketReadyCh := make(chan struct{})
+	go func() {
+		if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err == nil {
+			close(socketReadyCh)
+		}
+	}()
+
+	select {
+	case <-socketReadyCh:
+		// Socket is ready, continue.
+	case err := <-agentErrCh:
+		// Agent process exited before socket was ready.
+		return "", fmt.Errorf("embedded SPIRE agent exited unexpectedly: %w", err)
+	case <-ctx.Done():
+		return "", fmt.Errorf("context cancelled while waiting for SPIRE agent: %w", ctx.Err())
 	}
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the agent process could exit prematurely with an error that would be silently ignored, leading to a confusing timeout. The proposed fix using channels is a robust way to handle this concurrency and improve error reporting.

Medium
General
Prevent file descriptor leak
Suggestion Impact:A goroutine was added to wait for the agent process to exit, but it only calls cmd.Wait() without closing the log file. While not a full implementation, it reflects a change likely prompted by the suggestion area; however, the actual file descriptor leak fix (logFile.Close()) was not applied.

code diff:

-	go func() {
-		_ = cmd.Wait()
-	}()

Close the log file for the embedded SPIRE agent to prevent a file descriptor
leak. The file should be closed after the agent process, which writes to it, has
finished executing.

pkg/edgeonboarding/spire_agent.go [70-78]

 	logPath := filepath.Join(spireDir, "agent.log")
 	logFile, err := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644)
 	if err != nil {
 		return "", fmt.Errorf("open agent log: %w", err)
 	}
 
 	cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath)
 	cmd.Stdout = logFile
 	cmd.Stderr = logFile
 
+	go func() {
+		_ = cmd.Wait()
+		logFile.Close()
+	}()
+
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a file descriptor leak by not closing the opened log file. The proposed fix is appropriate, ensuring the resource is released after the agent process terminates.

Medium
Improve parsing of nested errors

Improve the parseErrorMessage function to handle nested JSON error responses.
Implement a recursive search to find message or error keys within the parsed
object, ensuring that detailed error messages are extracted correctly.

web/src/app/admin/edge-packages/page.tsx [273-295]

 function parseErrorMessage(raw: string | null | undefined): string {
   if (!raw) {
     return '';
   }
   const trimmed = raw.trim();
   if (!trimmed) {
     return '';
   }
   try {
     const parsed = JSON.parse(trimmed);
-    if (parsed && typeof parsed === 'object') {
-      if ('message' in parsed && typeof parsed.message === 'string') {
-        return parsed.message;
+    const findMessage = (obj: any): string | null => {
+      if (!obj || typeof obj !== 'object') {
+        return null;
       }
-      if ('error' in parsed && typeof parsed.error === 'string') {
-        return parsed.error;
+      if ('message' in obj && typeof obj.message === 'string' && obj.message) {
+        return obj.message;
       }
+      if ('error' in obj && typeof obj.error === 'string' && obj.error) {
+        return obj.error;
+      }
+      for (const key in obj) {
+        if (Object.prototype.hasOwnProperty.call(obj, key)) {
+          const result = findMessage(obj[key]);
+          if (result) {
+            return result;
+          }
+        }
+      }
+      return null;
+    };
+    const message = findMessage(parsed);
+    if (message) {
+      return message;
     }
   } catch {
     // fall through to return the raw string
   }
   return trimmed;
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the current error parsing logic is shallow and will fail to extract messages from nested JSON objects, which is a common pattern in API error responses. The proposed recursive approach makes the error handling more robust.

Low
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2033#issuecomment-3590953110 Original created: 2025-11-29T03:35:30Z --- ## PR Code Suggestions ✨ <!-- 850d878 --> Latest suggestions up to 850d878 <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=8>Incremental <sup><a href='https://qodo-merge-docs.qodo.ai/core-abilities/incremental_update/'>[*]</a></sup></td> <td> <details><summary>✅ <s>Sanitize CA file paths</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit added validation to resolve absolute paths and ensure ca_cert_path and ca_key_path reside within certDir, and then used the sanitized absolute paths for file reads, mitigating path traversal risks. code diff: ```diff + // Validate paths to prevent path traversal attacks + absCertDir, err := filepath.Abs(certDir) + if err != nil { + return nil, fmt.Errorf("invalid cert_dir path: %w", err) + } + absCaPath, err := filepath.Abs(caPath) + if err != nil { + return nil, fmt.Errorf("invalid ca_cert_path: %w", err) + } + absCaKeyPath, err := filepath.Abs(caKeyPath) + if err != nil { + return nil, fmt.Errorf("invalid ca_key_path: %w", err) + } + // Ensure CA cert and key paths are within the allowed certDir + if !strings.HasPrefix(absCaPath, absCertDir+string(filepath.Separator)) { + return nil, fmt.Errorf("ca_cert_path %q is outside allowed directory %q", caPath, certDir) + } + if !strings.HasPrefix(absCaKeyPath, absCertDir+string(filepath.Separator)) { + return nil, fmt.Errorf("ca_key_path %q is outside allowed directory %q", caKeyPath, certDir) + } + + caPEM, err := os.ReadFile(absCaPath) + if err != nil { + return nil, fmt.Errorf("read ca cert from %s: %w", absCaPath, err) } caCert, err := parseCACertificate(caPEM) if err != nil { return nil, fmt.Errorf("parse ca certificate: %w", err) } - caKeyBytes, err := os.ReadFile(caKeyPath) - if err != nil { - return nil, fmt.Errorf("read ca key from %s: %w", caKeyPath, err) + caKeyBytes, err := os.ReadFile(absCaKeyPath) + if err != nil { + return nil, fmt.Errorf("read ca key from %s: %w", absCaKeyPath, err) } ``` </details> ___ **Sanitize the <code>ca_cert_path</code>, <code>ca_key_path</code>, and <code>cert_dir</code> metadata values to prevent <br>path traversal attacks. Ensure all file access is constrained to a trusted base <br>directory.** [pkg/core/edge_onboarding.go [1949-1965]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-85874e3c4bdcc9110db09909f10648d44cdee554b26c987f910502321eb20b5cR1949-R1965) ```diff +certDir := strings.TrimSpace(metadata["cert_dir"]) +if certDir == "" { + certDir = defaultCertDir +} +// sanitize certDir +certDir = filepath.Clean(certDir) +if filepath.IsAbs(certDir) { + // only allow absolute certDir if it starts with defaultCertDir + base := filepath.Clean(defaultCertDir) + if !strings.HasPrefix(certDir+string(os.PathSeparator), base+string(os.PathSeparator)) && certDir != base { + return nil, fmt.Errorf("invalid cert_dir outside of base: %s", certDir) + } +} else { + // make relative to base + certDir = filepath.Join(defaultCertDir, certDir) +} + +caPath := firstNonEmpty(strings.TrimSpace(metadata["ca_cert_path"]), filepath.Join(certDir, "root.pem")) +caKeyPath := firstNonEmpty(strings.TrimSpace(metadata["ca_key_path"]), filepath.Join(certDir, "root-key.pem")) + +// sanitize file paths and ensure they resolve under certDir +sanitize := func(base, p string) (string, error) { + p = filepath.Clean(p) + if !filepath.IsAbs(p) { + p = filepath.Join(base, p) + } + baseClean := filepath.Clean(base) + if !strings.HasPrefix(p+string(os.PathSeparator), baseClean+string(os.PathSeparator)) && p != baseClean { + return "", fmt.Errorf("invalid path outside of base: %s", p) + } + return p, nil +} + +var err error +if caPath, err = sanitize(certDir, caPath); err != nil { + return nil, err +} +if caKeyPath, err = sanitize(certDir, caKeyPath); err != nil { + return nil, err +} + caPEM, err := os.ReadFile(caPath) if err != nil { return nil, fmt.Errorf("read ca cert from %s: %w", caPath, err) } caCert, err := parseCACertificate(caPEM) if err != nil { return nil, fmt.Errorf("parse ca certificate: %w", err) } caKeyBytes, err := os.ReadFile(caKeyPath) if err != nil { return nil, fmt.Errorf("read ca key from %s: %w", caKeyPath, err) } caKey, err := parsePrivateKey(caKeyBytes) if err != nil { return nil, fmt.Errorf("parse ca key: %w", err) } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: This suggestion addresses a critical path traversal vulnerability, where a malicious user could provide crafted metadata to read arbitrary files from the server's filesystem. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Enforce HTTPS and timeouts</summary> ___ **Enforce HTTPS for Core API connections and set a default timeout on the HTTP <br>client. This prevents insecure communication and potential hangs during API <br>requests.** [pkg/edgeonboarding/mtls/bootstrap.go [145-159]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-283b2e83f2964f3318b59df9e6fac381ee502a7238161c49f76fd4168b6d2800R145-R159) ```diff req, err := http.NewRequestWithContext(ctx, http.MethodPost, deliverURL, strings.NewReader(body)) if err != nil { return nil, fmt.Errorf("create deliver request: %w", err) } req.Header.Set("Content-Type", "application/json") req.Header.Set("Accept", "application/json") +// Enforce HTTPS by default +if !strings.HasPrefix(deliverURL, "https://") { + // Allow http only if explicitly configured by caller via cfg.Host starting with http:// + u := strings.TrimSpace(cfg.Host) + if !strings.HasPrefix(u, "http://") { + return nil, fmt.Errorf("insecure core api url; https required: %s", deliverURL) + } +} + client := cfg.HTTPClient if client == nil { - client = http.DefaultClient + client = &http.Client{Timeout: 15 * time.Second} } resp, err := client.Do(req) +if err != nil { + return nil, fmt.Errorf("request deliver endpoint failed: %w", err) +} ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: This suggestion improves security by enforcing HTTPS for API communication and adds robustness by setting a default HTTP client timeout, preventing the application from hanging. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Avoid forcing TLS on non-TLS modes</summary> ___ **Conditionally apply the <code>.cnpg.tls</code> configuration in <code>jq</code> only when <code>CNPG_SSL_MODE</code> is <br>not <code>disable</code>, and add <code>server_name</code> to the TLS settings for proper hostname <br>verification.** [docker/compose/update-config.sh [139-161]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-9ae50be83a13010a038389c74407ba1bde8cabcea0944e238c4b3374133f78bfR139-R161) ```diff jq --arg host "$CNPG_HOST_VALUE" \ --argjson port "${CNPG_PORT_VALUE:-5432}" \ --arg db "$CNPG_DATABASE_VALUE" \ --arg user "$CNPG_USERNAME_VALUE" \ --arg ssl "$CNPG_SSL_MODE_VALUE" \ --arg ca "$CNPG_CA_FILE_VALUE" \ --arg cert "$CNPG_CERT_FILE_VALUE" \ --arg key "$CNPG_KEY_FILE_VALUE" \ + --arg sni "${CNPG_HOST_VALUE}.serviceradar" \ ' .cnpg = (.cnpg // {}) | .cnpg.host = $host | .cnpg.port = ($port | tonumber) | .cnpg.database = $db | .cnpg.username = $user | .cnpg.ssl_mode = $ssl - | .cnpg.tls = { - ca_file: $ca, - cert_file: $cert, - key_file: $key - } + | .cnpg.tls = (if ($ssl|ascii_downcase) == "disable" + then null + else { + ca_file: $ca, + cert_file: $cert, + key_file: $key, + server_name: $sni + } + end) ' "$CONFIG_PATH" > /tmp/config-updated.json mv /tmp/config-updated.json "$CONFIG_PATH" echo "✅ Ensured CNPG TLS config (host=$CNPG_HOST_VALUE port=$CNPG_PORT_VALUE ssl_mode=$CNPG_SSL_MODE_VALUE)" ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies that the script unconditionally sets TLS configuration, which would cause connection failures if `CNPG_SSL_MODE` is `disable`, and also adds `server_name` which is crucial for `verify-full` mode. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Regenerate when key is missing</summary> ___ **In <code>generate-certs.sh</code>, check for the existence of both the certificate and its <br>key file. If the key is missing, remove the existing certificate and regenerate <br>the pair to prevent startup failures.** [docker/compose/generate-certs.sh [67-74]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-8298241543b4744a6ac7780c760ac5b5a0a87ba62de19c8612ebe1aba0996ebdR67-R74) ```diff if [ -f "$CERT_DIR/$component.pem" ]; then - echo "Certificate for $component already exists, ensuring permissions." - chmod 600 "$CERT_DIR/$component-key.pem" 2>/dev/null || true - if [ "$component" = "cnpg" ]; then - chown 26:999 "$CERT_DIR/$component-key.pem" "$CERT_DIR/$component.pem" 2>/dev/null || true + if [ -f "$CERT_DIR/$component-key.pem" ]; then + echo "Certificate for $component already exists, ensuring permissions." + chmod 600 "$CERT_DIR/$component-key.pem" 2>/dev/null || true + if [ "$component" = "cnpg" ]; then + chown 26:999 "$CERT_DIR/$component-key.pem" "$CERT_DIR/$component.pem" 2>/dev/null || true + fi + return + else + echo "Certificate for $component exists but key is missing; regenerating pair." + rm -f "$CERT_DIR/$component.pem" 2>/dev/null || true + # fall through to generation below fi - return fi ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=3 --> <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies a state where a certificate exists but its key is missing, which would prevent services from starting; the proposed fix to regenerate the pair is a robust solution. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix NATS mTLS verification</summary> ___ **In <code>db-event-writer.mtls.json</code>, update the <code>nats_security</code> configuration to align <br>the <code>server_name</code> with other services and add the missing <code>client_ca_file</code> to ensure <br>successful mTLS connections.** [docker/compose/db-event-writer.mtls.json [1-19]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-7a33f95f7545499abf0ed9fc91b58499ab209639e4885019579c959583fc7496R1-R19) ```diff { "listen_addr": "0.0.0.0:50041", "nats_url": "tls://nats:4222", "partition": "docker", "stream_name": "events", "consumer_name": "db-event-writer", "agent_id": "docker-db-event-writer", "poller_id": "docker-poller", "nats_security": { "mode": "mtls", "cert_dir": "/etc/serviceradar/certs", - "server_name": "nats", + "server_name": "nats.serviceradar", "role": "client", "tls": { "cert_file": "/etc/serviceradar/certs/db-event-writer.pem", "key_file": "/etc/serviceradar/certs/db-event-writer-key.pem", - "ca_file": "/etc/serviceradar/certs/root.pem" + "ca_file": "/etc/serviceradar/certs/root.pem", + "client_ca_file": "/etc/serviceradar/certs/root.pem" } }, ... } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies inconsistent mTLS configuration that would likely cause connection failures, making it a critical fix for the service's functionality. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Suppress join token for mTLS</summary> ___ **Explicitly clear the <code>JoinToken</code> and <code>BundlePEM</code> from the API response when an <br><code>MTLSBundle</code> is present. This ensures that SPIRE-related artifacts are not sent <br>for mTLS packages.** [pkg/core/api/edge_onboarding.go [543-546]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-c494568149a0b526ca5349c1d78d5e5ee7ec4144ecfd324c44f25fef584292bcR543-R546) ```diff +if format == "json" { + payload := edgePackageDeliverResponse{ + Package: toEdgePackageView(result.Package), + JoinToken: result.JoinToken, + BundlePEM: string(result.BundlePEM), + } + if result.MTLSBundle != nil { + payload.MTLSBundle = json.RawMessage(result.MTLSBundle) + // Never return join token for mTLS packages + payload.JoinToken = "" + payload.BundlePEM = "" + } + s.writeJSON(w, http.StatusOK, payload) + return +} + if result.MTLSBundle != nil { writeError(w, "mTLS packages must be downloaded with format=json", http.StatusBadRequest) return } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: This is a good defense-in-depth measure to ensure that a `JoinToken` is never sent for mTLS packages, even if populated by mistake, thus preventing potential confusion or misuse. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Enforce required poller for children</summary> ___ **In the <code>handleCreate</code> function, add a validation check to ensure a <code>poller_id</code> is <br>present before submitting the form for 'agent' or 'checker' component types to <br>prevent backend errors.** [web/src/app/admin/edge-packages/page.tsx [713-831]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-d627c4ba6a102a46a28024061371276f01053b63f20c75986aef2631b967bafcR713-R831) ```diff const handleCreate = async (e: React.FormEvent<HTMLFormElement>) => { e.preventDefault(); setActionError(null); const componentType: EdgeComponentType = formState.componentType === 'agent' || formState.componentType === 'checker' ? formState.componentType : 'poller'; const securityMode: SecurityMode = formState.securityMode === 'mtls' ? 'mtls' : 'spire'; const componentId = formState.componentId.trim(); const parentId = formState.parentId.trim(); const checkerKind = formState.checkerKind.trim(); const checkerConfigJSON = formState.checkerConfigJSON.trim(); const metadataJSON = formState.metadataJSON.trim(); ... if (componentType === 'checker' && !checkerKind) { setFormError('Checker kind is required for checker packages'); setActionError('Checker kind is required for checker packages'); return; } ... + // derive poller override from parent/poller selection (existing logic) + const pollerOverride = + componentType === 'agent' || componentType === 'checker' + ? formState.pollerId.trim() || agentPollerMap.get(parentId) || '' + : ''; + const pollerIdForPayload = + componentType === 'agent' + ? parentId + : componentType === 'checker' && pollerOverride + ? pollerOverride + : undefined; + // enforce poller_id presence for agent/checker + if ((componentType === 'agent' || componentType === 'checker') && !pollerIdForPayload) { + const message = 'Poller is required for agent and checker packages'; + setFormError(message); + setActionError(message); + return; + } + ... let metadataObject: Record<string, unknown> = {}; if (metadataJSON) { try { metadataObject = JSON.parse(metadataJSON) as Record<string, unknown>; } catch { const message = 'Metadata JSON must be valid JSON'; setFormError(message); setActionError(message); return; } } ... if (securityMode === 'mtls') { metadataObject['security_mode'] = 'mtls'; } else if (metadataObject['security_mode'] === 'mtls') { delete metadataObject['security_mode']; } - ... - const payload: { - label: string; - component_type: EdgeComponentType; - component_id?: string; - parent_id?: string; - poller_id?: string; - site?: string; - selectors?: string[]; - metadata_json?: string; - checker_kind?: string; - checker_config_json?: string; - notes?: string; - downstream_spiffe_id?: string; - join_token_ttl_seconds?: number; - download_token_ttl_seconds?: number; - } = { + const metadataValue = Object.keys(metadataObject).length > 0 ? JSON.stringify(metadataObject) : undefined; + const payload = { label: formState.label.trim(), component_type: componentType, component_id: componentId || undefined, parent_id: parentId || undefined, poller_id: pollerIdForPayload, site: formState.site.trim() || undefined, selectors: selectors.length > 0 ? selectors : undefined, metadata_json: metadataValue, checker_kind: componentType === 'checker' ? checkerKind : undefined, checker_config_json: componentType === 'checker' && checkerConfigJSON ? checkerConfigJSON : undefined, notes: formState.notes.trim() || undefined, downstream_spiffe_id: securityMode === 'mtls' ? undefined : formState.downstreamSPIFFEID.trim() || undefined, join_token_ttl_seconds: joinMinutes ? Math.max(0, Math.round(Number(joinMinutes) * 60)) : undefined, download_token_ttl_seconds: downloadMinutes ? Math.max(0, Math.round(Number(downloadMinutes) * 60)) : undefined, }; + ... +}; ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies a missing validation for `poller_id` which could lead to backend errors, improving the form's robustness by adding a client-side guard. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Avoid duplicate column migration</summary> ___ **Remove the redundant <code>security_mode</code> column definition from either the initial <br>schema or the subsequent migration file to prevent potential migration failures <br>and ensure schema consistency.** [pkg/db/cnpg/migrations/00000000000001_timescale_schema.up.sql [373-381]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-a0d031f19688dc66a2b1885ce222afb3866e771204814d1134adf837052291caR373-R381) ```diff CREATE TABLE IF NOT EXISTS edge_onboarding_packages ( poller_id TEXT, site TEXT, status TEXT DEFAULT 'pending', - security_mode TEXT DEFAULT 'spire', downstream_entry_id TEXT DEFAULT '', downstream_spiffe_id TEXT DEFAULT '', selectors TEXT[] DEFAULT '{}', ... ); ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies a redundant database migration, which is a significant maintainability and correctness issue that could lead to future migration failures. </details></details></td><td align=center>Medium </td></tr><tr><td rowspan=3>Security</td> <td> <details><summary>Validate CA certificate capabilities<!-- not_implemented --></summary> ___ **In <code>parseCACertificate</code>, validate that the parsed certificate has the <code>IsCA</code> flag <br>set and includes the <code>KeyUsageCertSign</code> key usage before returning it.** [pkg/core/edge_onboarding.go [2085-2097]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-85874e3c4bdcc9110db09909f10648d44cdee554b26c987f910502321eb20b5cR2085-R2097) ```diff func parseCACertificate(pemBytes []byte) (*x509.Certificate, error) { block, _ := pem.Decode(pemBytes) if block == nil { return nil, ErrCACertNoPEMBlock } cert, err := x509.ParseCertificate(block.Bytes) if err != nil { return nil, err } + // Validate that the certificate can act as a CA + if !cert.IsCA { + return nil, fmt.Errorf("ca certificate: not a CA") + } + // Must be allowed to sign certificates + required := x509.KeyUsageCertSign + if cert.KeyUsage&required == 0 { + return nil, fmt.Errorf("ca certificate: missing KeyUsageCertSign") + } + return cert, nil } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: This is a valid security enhancement that ensures only proper CA certificates can be used for signing, preventing misconfiguration and potential security weaknesses. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Reject encrypted CA keys explicitly<!-- not_implemented --></summary> ___ **In <code>parsePrivateKey</code>, add a check using <code>x509.IsEncryptedPEMBlock</code> to explicitly <br>reject encrypted private keys, as they are not supported.** [pkg/core/edge_onboarding.go [2099-2124]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-85874e3c4bdcc9110db09909f10648d44cdee554b26c987f910502321eb20b5cR2099-R2124) ```diff func parsePrivateKey(pemBytes []byte) (crypto.Signer, error) { block, _ := pem.Decode(pemBytes) if block == nil { return nil, ErrCAKeyNoPEMBlock + } + // Explicitly reject encrypted keys (no passphrase support here) + if x509.IsEncryptedPEMBlock(block) || strings.Contains(strings.ToUpper(block.Headers["Proc-Type"]), "ENCRYPTED") { + return nil, fmt.Errorf("ca key: encrypted PEM not supported") } if key, err := x509.ParsePKCS1PrivateKey(block.Bytes); err == nil { return key, nil } if key, err := x509.ParseECPrivateKey(block.Bytes); err == nil { return key, nil } parsed, err := x509.ParsePKCS8PrivateKey(block.Bytes) if err != nil { return nil, err } switch key := parsed.(type) { case *rsa.PrivateKey: return key, nil case *ecdsa.PrivateKey: return key, nil default: return nil, fmt.Errorf("%w: %T", ErrCAKeyUnsupportedType, key) } } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies that encrypted private keys are not supported and adds an explicit check, which improves error handling and provides clearer feedback to the user. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix permissions for existing certs</summary> ___ **Add a <code>chmod 644</code> command for the existing certificate file <br>(<code>$CERT_DIR/$component.pem</code>) to ensure its permissions are correctly set, matching <br>the behavior for newly generated certificates.** [docker/compose/generate-certs.sh [67-74]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-8298241543b4744a6ac7780c760ac5b5a0a87ba62de19c8612ebe1aba0996ebdR67-R74) ```diff if [ -f "$CERT_DIR/$component.pem" ]; then echo "Certificate for $component already exists, ensuring permissions." + chmod 644 "$CERT_DIR/$component.pem" 2>/dev/null || true chmod 600 "$CERT_DIR/$component-key.pem" 2>/dev/null || true if [ "$component" = "cnpg" ]; then chown 26:999 "$CERT_DIR/$component-key.pem" "$CERT_DIR/$component.pem" 2>/dev/null || true fi return fi ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=10 --> <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The suggestion correctly points out an inconsistency in file permissions and improves security by ensuring existing certificate files have the same strict permissions as newly generated ones. </details></details></td><td align=center>Low </td></tr><tr><td rowspan=5>Possible issue</td> <td> <details><summary>Cap client TTL to CA expiry<!-- not_implemented --></summary> ___ **Before calling <code>mintClientCertificate</code>, ensure the client certificate's TTL does <br>not extend beyond the CA certificate's expiration date by capping it if <br>necessary.** [pkg/core/edge_onboarding.go [2012-2015]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-85874e3c4bdcc9110db09909f10648d44cdee554b26c987f910502321eb20b5cR2012-R2015) ```diff +// Cap client cert validity to CA's NotAfter +caExpiry := caCert.NotAfter.UTC() +desiredNotAfter := now.UTC().Add(clientCertTTL) +if desiredNotAfter.After(caExpiry) { + clientCertTTL = caExpiry.Sub(now.UTC()) + if clientCertTTL <= 0 { + return nil, fmt.Errorf("client cert ttl exceeds CA validity") + } +} + clientCert, clientKey, err := mintClientCertificate(caCert, caKey, clientName, serverName, endpoints, now, clientCertTTL) if err != nil { return nil, err } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: This is a valid correctness fix that prevents issuing client certificates that outlive their signing CA, which would lead to validation failures. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix mTLS server name verification</summary> ___ **Update the <code>server_name</code> values for <code>kv_security</code> and <code>security</code> to match the Docker <br>Compose service names (<code>datasvc</code> and <code>agent</code>) to prevent TLS hostname verification <br>failures.** [docker/compose/agent.mtls.json [1-53]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-008f2216f159a9bd5db9cc90baaf6f1e64487df7af05b56ab3b9d6c4946aa95fR1-R53) ```diff { "kv_address": "datasvc:50057", "kv_security": { "mode": "mtls", "cert_dir": "/etc/serviceradar/certs", - "server_name": "datasvc.serviceradar", + "server_name": "datasvc", "role": "agent", "tls": { "cert_file": "/etc/serviceradar/certs/agent.pem", "key_file": "/etc/serviceradar/certs/agent-key.pem", "ca_file": "/etc/serviceradar/certs/root.pem", "client_ca_file": "/etc/serviceradar/certs/root.pem" } }, "security": { "mode": "mtls", "cert_dir": "/etc/serviceradar/certs", - "server_name": "agent.serviceradar", + "server_name": "agent", "role": "agent", "tls": { "cert_file": "/etc/serviceradar/certs/agent.pem", "key_file": "/etc/serviceradar/certs/agent-key.pem", "ca_file": "/etc/serviceradar/certs/root.pem", "client_ca_file": "/etc/serviceradar/certs/root.pem" } }, "logging": { "level": "info", "debug": false, "output": "stdout", "time_format": "", "otel": { "enabled": false, "endpoint": "", "service_name": "serviceradar-agent", "batch_timeout": "5s", "insecure": false, "tls": { "cert_file": "/etc/serviceradar/certs/agent.pem", "key_file": "/etc/serviceradar/certs/agent-key.pem", "ca_file": "/etc/serviceradar/certs/root.pem" } } } } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=12 --> <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies a likely mTLS handshake failure due to a `server_name` mismatch in the Docker Compose environment, which is a critical bug for the new feature's functionality. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Add NATS client TLS parameters</summary> ___ **Add a <code>nats_tls</code> configuration block with CA, client certificate, and key paths to <br>the <code>netflow-consumer.mtls.json</code> file to enable a successful mTLS connection to <br>the NATS server.** [docker/compose/netflow-consumer.mtls.json [1-17]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-f15920e8498a24f71ce3eec4f48fe8fefbb1765a90362998af779a660fcef9e1R1-R17) ```diff { "listen_addr": "0.0.0.0:50120", "nats_url": "tls://nats:4222", "stream_name": "events", "consumer_name": "netflow-consumer", "enabled_fields": [], "disabled_fields": [], "dictionaries": [], + "nats_tls": { + "ca_file": "/etc/serviceradar/certs/root.pem", + "cert_file": "/etc/serviceradar/certs/netflow-consumer.pem", + "key_file": "/etc/serviceradar/certs/netflow-consumer-key.pem", + "server_name": "nats" + }, "cnpg": { "host": "cnpg", "port": 5432, "database": "serviceradar", "username": "serviceradar", "password": "serviceradar", "ssl_mode": "disable" } } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=13 --> <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies that the NATS connection will fail because the `nats_url` specifies TLS but no TLS configuration is provided, which is a critical bug in the new configuration. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Conditionally apply CNPG TLS config</summary> ___ **Modify the <code>jq</code> command to conditionally create the <code>.cnpg.tls</code> object only when the <br><code>ssl_mode</code> is not <code>disable</code>.** [docker/compose/entrypoint-db-event-writer.sh [139-160]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-a76a07ca0b18c5d7d9cf0ba3f1a3f9330307be7acd0ca3d7a6be7b67c84f81afR139-R160) ```diff jq --arg host "$CNPG_HOST_VALUE" \ --argjson port "${CNPG_PORT_VALUE:-5432}" \ --arg db "$CNPG_DATABASE_VALUE" \ --arg user "$CNPG_USERNAME_VALUE" \ --arg ssl "$CNPG_SSL_MODE_VALUE" \ --arg ca "$CNPG_CA_FILE_VALUE" \ --arg cert "$CNPG_CERT_FILE_VALUE" \ --arg key "$CNPG_KEY_FILE_VALUE" \ ' .cnpg = (.cnpg // {}) | .cnpg.host = $host | .cnpg.port = ($port | tonumber) | .cnpg.database = $db | .cnpg.username = $user | .cnpg.ssl_mode = $ssl - | .cnpg.tls = { + | .cnpg.tls = (if ($ssl|ascii_downcase) != "disable" then { ca_file: $ca, cert_file: $cert, key_file: $key - } + } else empty end) ' "$CONFIG_PATH" > /tmp/config-updated.json mv /tmp/config-updated.json "$CONFIG_PATH" ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=14 --> <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion fixes a bug where the script generates an incorrect configuration by always including a `tls` block, even when `ssl_mode` is `disable`. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Validate metadata JSON before jq</summary> ___ **Add a check to validate the <code>$EDGE_DEFAULT_META</code> variable as JSON before using it <br>in the <code>jq</code> command, falling back to an empty object if it's invalid to prevent <br>script failure.** [docker/compose/update-config.sh [275-287]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-9ae50be83a13010a038389c74407ba1bde8cabcea0944e238c4b3374133f78bfR275-R287) ```diff +# Validate EDGE_DEFAULT_META JSON before applying +if echo "$EDGE_DEFAULT_META" | jq -e . >/dev/null 2>&1; then + EDGE_META_SAFE="$EDGE_DEFAULT_META" +else + echo "⚠️ EDGE_DEFAULT_META is not valid JSON; using {}" + EDGE_META_SAFE="{}" +fi + jq --arg key "$EDGE_ONBOARDING_KEY" \ - --arg meta "$EDGE_DEFAULT_META" \ + --arg meta "$EDGE_META_SAFE" \ ' .edge_onboarding = (.edge_onboarding // {}) | .edge_onboarding.enabled = true | .edge_onboarding.encryption_key = $key | .edge_onboarding.default_selectors = (.edge_onboarding.default_selectors // ["unix:uid:0","unix:gid:0","unix:user:root","unix:group:root"]) | .edge_onboarding.default_metadata = (.edge_onboarding.default_metadata // {}) | .edge_onboarding.default_metadata.checker = (if $meta != "{}" then ($meta|fromjson) else (.edge_onboarding.default_metadata.checker // {}) end) | .edge_onboarding.join_token_ttl = (.edge_onboarding.join_token_ttl // "15m") | .edge_onboarding.download_token_ttl = (.edge_onboarding.download_token_ttl // "10m") ' "$CORE_CONFIG" > "$CORE_CONFIG.tmp" mv "$CORE_CONFIG.tmp" "$CORE_CONFIG" ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=15 --> <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly identifies that invalid JSON in the `$EDGE_DEFAULT_META` variable would crash the script and proposes a robust validation check to prevent this failure. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table> ___ #### Previous suggestions <details><summary>✅ Suggestions up to commit 3e659bd</summary> <br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>High-level</td> <td> <details><summary>Consider a more robust process supervision model</summary> ___ **The embedded SPIRE agent is launched without supervision. Implement a strategy <br>where the main application exits if the agent process dies, enabling external <br>supervisors to manage restarts correctly.** ### Examples: <details> <summary> <a href="https://github.com/carverauto/serviceradar/pull/2033/files#diff-33b0f01d8a67d768586269f4e6d6fcfbe1fb28ea9cc51c98440e20868e183e92R80-R86">pkg/edgeonboarding/spire_agent.go [80-86]</a> </summary> ```go if err := cmd.Start(); err != nil { return "", fmt.Errorf("start embedded SPIRE agent: %w", err) } go func() { _ = cmd.Wait() }() ``` </details> ### Solution Walkthrough: #### Before: ```go // pkg/edgeonboarding/spire_agent.go func (b *Bootstrapper) startEmbeddedSPIREAgent(...) (string, error) { // ... cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath) if err := cmd.Start(); err != nil { return "", fmt.Errorf("start embedded SPIRE agent: %w", err) } // The agent process is not supervised. // If it dies, the parent process continues running. go func() { _ = cmd.Wait() }() if err := waitForSocket(ctx, workloadSocket, ...); err != nil { return "", fmt.Errorf("wait for workload socket: %w", err) } // ... return "unix:...", nil } ``` #### After: ```go // pkg/edgeonboarding/spire_agent.go func (b *Bootstrapper) startEmbeddedSPIREAgent(...) (string, error) { // ... cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath) if err := cmd.Start(); err != nil { return "", fmt.Errorf("start embedded SPIRE agent: %w", err) } // Supervise the agent process. go func() { err := cmd.Wait() // If the agent exits, log and terminate the parent process. b.logger.Error().Err(err).Msg("Embedded SPIRE agent exited unexpectedly, shutting down.") // This could be a call to a shutdown function, or os.Exit(1) // to ensure the entire service is restarted by its supervisor. os.Exit(1) }() if err := waitForSocket(ctx, workloadSocket, ...); err != nil { return "", fmt.Errorf("wait for workload socket: %w", err) } // ... return "unix:...", nil } ``` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: This suggestion correctly identifies a critical reliability flaw in the new embedded agent feature, where the parent process does not handle the unexpected termination of the child `spire-agent`, leading to silent failures. </details></details></td><td align=center>High </td></tr><tr><td rowspan=1>Possible issue</td> <td> <details><summary>Handle embedded agent process exit<!-- not_implemented --></summary> ___ **Handle the exit status of the embedded SPIRE agent process to avoid silent <br>failures. Use channels and a <code>select</code> statement to concurrently wait for the <br>agent's socket to become available or for the process to exit, reporting any <br>premature exit as an error.** [pkg/edgeonboarding/spire_agent.go [80-90]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-33b0f01d8a67d768586269f4e6d6fcfbe1fb28ea9cc51c98440e20868e183e92R80-R90) ```diff if err := cmd.Start(); err != nil { return "", fmt.Errorf("start embedded SPIRE agent: %w", err) } + agentErrCh := make(chan error, 1) go func() { - _ = cmd.Wait() + agentErrCh <- cmd.Wait() }() - if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err != nil { - return "", fmt.Errorf("wait for workload socket: %w", err) + socketReadyCh := make(chan struct{}) + go func() { + if err := waitForSocket(ctx, workloadSocket, 60, time.Second); err == nil { + close(socketReadyCh) + } + }() + + select { + case <-socketReadyCh: + // Socket is ready, continue. + case err := <-agentErrCh: + // Agent process exited before socket was ready. + return "", fmt.Errorf("embedded SPIRE agent exited unexpectedly: %w", err) + case <-ctx.Done(): + return "", fmt.Errorf("context cancelled while waiting for SPIRE agent: %w", ctx.Err()) } ``` <!-- /improve --apply_suggestion=1 --> <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies that the agent process could exit prematurely with an error that would be silently ignored, leading to a confusing timeout. The proposed fix using channels is a robust way to handle this concurrency and improve error reporting. </details></details></td><td align=center>Medium </td></tr><tr><td rowspan=2>General</td> <td> <details><summary>✅ <s>Prevent file descriptor leak<!-- not_implemented --></s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>A goroutine was added to wait for the agent process to exit, but it only calls cmd.Wait() without closing the log file. While not a full implementation, it reflects a change likely prompted by the suggestion area; however, the actual file descriptor leak fix (logFile.Close()) was not applied. code diff: ```diff - go func() { - _ = cmd.Wait() - }() ``` </details> ___ **Close the log file for the embedded SPIRE agent to prevent a file descriptor <br>leak. The file should be closed after the agent process, which writes to it, has <br>finished executing.** [pkg/edgeonboarding/spire_agent.go [70-78]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-33b0f01d8a67d768586269f4e6d6fcfbe1fb28ea9cc51c98440e20868e183e92R70-R78) ```diff logPath := filepath.Join(spireDir, "agent.log") logFile, err := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644) if err != nil { return "", fmt.Errorf("open agent log: %w", err) } cmd := exec.CommandContext(ctx, agentPath, "run", "-config", configPath) cmd.Stdout = logFile cmd.Stderr = logFile + go func() { + _ = cmd.Wait() + logFile.Close() + }() + ``` <!-- /improve --apply_suggestion=2 --> <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies a file descriptor leak by not closing the opened log file. The proposed fix is appropriate, ensuring the resource is released after the agent process terminates. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Improve parsing of nested errors</summary> ___ **Improve the <code>parseErrorMessage</code> function to handle nested JSON error responses. <br>Implement a recursive search to find <code>message</code> or <code>error</code> keys within the parsed <br>object, ensuring that detailed error messages are extracted correctly.** [web/src/app/admin/edge-packages/page.tsx [273-295]](https://github.com/carverauto/serviceradar/pull/2033/files#diff-d627c4ba6a102a46a28024061371276f01053b63f20c75986aef2631b967bafcR273-R295) ```diff function parseErrorMessage(raw: string | null | undefined): string { if (!raw) { return ''; } const trimmed = raw.trim(); if (!trimmed) { return ''; } try { const parsed = JSON.parse(trimmed); - if (parsed && typeof parsed === 'object') { - if ('message' in parsed && typeof parsed.message === 'string') { - return parsed.message; + const findMessage = (obj: any): string | null => { + if (!obj || typeof obj !== 'object') { + return null; } - if ('error' in parsed && typeof parsed.error === 'string') { - return parsed.error; + if ('message' in obj && typeof obj.message === 'string' && obj.message) { + return obj.message; } + if ('error' in obj && typeof obj.error === 'string' && obj.error) { + return obj.error; + } + for (const key in obj) { + if (Object.prototype.hasOwnProperty.call(obj, key)) { + const result = findMessage(obj[key]); + if (result) { + return result; + } + } + } + return null; + }; + const message = findMessage(parsed); + if (message) { + return message; } } catch { // fall through to return the raw string } return trimmed; } ``` <!-- /improve --apply_suggestion=3 --> <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly points out that the current error parsing logic is shallow and will fail to extract messages from nested JSON objects, which is a common pattern in API error responses. The proposed recursive approach makes the error handling more robust. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table> </details>
github-advanced-security[bot] commented 2025-11-30 01:42:01 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR review comment.

Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573290620
Original created: 2025-11-30T01:42:01Z
Original path: pkg/core/edge_onboarding.go
Original line: 1932

Uncontrolled data used in path expression

This path depends on a user-provided value.

Show more details

Imported GitHub PR review comment. Original author: @github-advanced-security[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573290620 Original created: 2025-11-30T01:42:01Z Original path: pkg/core/edge_onboarding.go Original line: 1932 --- ## Uncontrolled data used in path expression This path depends on a [user-provided value](1). [Show more details](https://github.com/carverauto/serviceradar/security/code-scanning/82)
github-advanced-security[bot] commented 2025-11-30 02:11:57 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR review comment.

Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573299154
Original created: 2025-11-30T02:11:57Z
Original path: pkg/core/edge_onboarding.go
Original line: 1930

Uncontrolled data used in path expression

This path depends on a user-provided value.

Show more details

Imported GitHub PR review comment. Original author: @github-advanced-security[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573299154 Original created: 2025-11-30T02:11:57Z Original path: pkg/core/edge_onboarding.go Original line: 1930 --- ## Uncontrolled data used in path expression This path depends on a [user-provided value](1). [Show more details](https://github.com/carverauto/serviceradar/security/code-scanning/83)
github-advanced-security[bot] commented 2025-11-30 02:11:57 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR review comment.

Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573299157
Original created: 2025-11-30T02:11:57Z
Original path: pkg/core/edge_onboarding.go
Original line: 1939

Uncontrolled data used in path expression

This path depends on a user-provided value.

Show more details

Imported GitHub PR review comment. Original author: @github-advanced-security[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573299157 Original created: 2025-11-30T02:11:57Z Original path: pkg/core/edge_onboarding.go Original line: 1939 --- ## Uncontrolled data used in path expression This path depends on a [user-provided value](1). [Show more details](https://github.com/carverauto/serviceradar/security/code-scanning/84)
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2489
No description provided.