2042 bugsysmon metrics not available from sysmon vm collection #2495

Merged
mfreeman451 merged 5 commits from refs/pull/2495/head into main 2025-12-03 02:25:24 +00:00
mfreeman451 commented 2025-12-03 01:30:40 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2043
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2043
Original created: 2025-12-03T01:30:40Z
Original updated: 2025-12-03T02:25:27Z
Original head: carverauto/serviceradar:2042-bugsysmon-metrics-not-available-from-sysmon-vm-collection
Original base: main
Original merged: 2025-12-03T02:25:24Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Bug fix, Enhancement


Description

  • Add memory metrics collection to sysmon-vm checker and fix metrics pipeline availability

  • Implement sysmon stall detection with logging when metrics stop arriving despite collector connectivity

  • Fix CNPG batch result reading to properly surface INSERT errors instead of silently discarding them

  • Add platform compatibility constraints (macOS-only) to sysmon-vm build targets to prevent Linux build failures

  • Improve sysmon metrics API routes to handle null/empty responses gracefully and preserve service-reported host identity


Diagram Walkthrough

flowchart LR
  A["sysmon-vm collector"] -->|"CPU + Memory metrics"| B["poller agent"]
  B -->|"enriched payload"| C["core metrics processor"]
  C -->|"stall detection"| D["sysmon stall tracker"]
  D -->|"warn event"| E["capability events"]
  C -->|"validated metrics"| F["CNPG batch insert"]
  F -->|"proper error reading"| G["metrics persisted"]
  G -->|"query results"| H["API endpoints"]
  H -->|"handle null safely"| I["UI/frontend"]

File Walkthrough

Relevant files
Enhancement
6 files
service.go
Add memory metrics collection to sysmon-vm                             
+16/-1   
metrics.go
Implement sysmon stall detection and logging                         
+153/-6 
flush.go
Add sysmon flush logging and CNPG status checks                   
+24/-0   
server.go
Initialize sysmon stall tracking state                                     
+1/-0     
types.go
Add sysmonStreamState type for stall tracking                       
+8/-0     
metrics.go
Add sysmon metrics storage logging and validation               
+15/-0   
Tests
3 files
service_test.go
Add memory metrics validation to tests                                     
+8/-0     
metrics_test.go
Add test for sysmon stall event emission                                 
+53/-0   
agent_poller_test.go
Add test for host identity preservation                                   
+26/-0   
Bug fix
5 files
sysmon.go
Fix metrics API to handle empty results gracefully             
+5/-7     
cnpg_metrics.go
Fix batch result reading and add detailed logging               
+38/-3   
agent_poller.go
Preserve service-reported host identity in payloads           
+32/-2   
route.ts
Handle null responses and use optional chaining                   
+13/-10 
route.ts
Handle null responses and use optional chaining                   
+13/-10 
Configuration changes
3 files
BUILD.bazel
Add macOS platform constraint to sysmonvm alias                   
+1/-0     
BUILD.bazel
Add macOS platform constraint to sysmon-vm binary               
+2/-0     
BUILD.bazel
Add macOS constraint and memory dependency                             
+3/-0     
Documentation
3 files
proposal.md
Document sysmon-vm metrics availability fix proposal         
+15/-0   
spec.md
Add sysmon metrics availability requirements                         
+22/-0   
tasks.md
Document investigation and fix tasks                                         
+11/-0   

Imported from GitHub pull request. Original GitHub pull request: #2043 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2043 Original created: 2025-12-03T01:30:40Z Original updated: 2025-12-03T02:25:27Z Original head: carverauto/serviceradar:2042-bugsysmon-metrics-not-available-from-sysmon-vm-collection Original base: main Original merged: 2025-12-03T02:25:24Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Bug fix, Enhancement ___ ### **Description** - Add memory metrics collection to sysmon-vm checker and fix metrics pipeline availability - Implement sysmon stall detection with logging when metrics stop arriving despite collector connectivity - Fix CNPG batch result reading to properly surface INSERT errors instead of silently discarding them - Add platform compatibility constraints (macOS-only) to sysmon-vm build targets to prevent Linux build failures - Improve sysmon metrics API routes to handle null/empty responses gracefully and preserve service-reported host identity ___ ### Diagram Walkthrough ```mermaid flowchart LR A["sysmon-vm collector"] -->|"CPU + Memory metrics"| B["poller agent"] B -->|"enriched payload"| C["core metrics processor"] C -->|"stall detection"| D["sysmon stall tracker"] D -->|"warn event"| E["capability events"] C -->|"validated metrics"| F["CNPG batch insert"] F -->|"proper error reading"| G["metrics persisted"] G -->|"query results"| H["API endpoints"] H -->|"handle null safely"| I["UI/frontend"] ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>6 files</summary><table> <tr> <td><strong>service.go</strong><dd><code>Add memory metrics collection to sysmon-vm</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-02211c03421188a7a92d44ce12a4ee8d9f09b224d7176b8713fcefac759ee989">+16/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>metrics.go</strong><dd><code>Implement sysmon stall detection and logging</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443ad">+153/-6</a>&nbsp; </td> </tr> <tr> <td><strong>flush.go</strong><dd><code>Add sysmon flush logging and CNPG status checks</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-4ad1cf4b0397bd277cd9c0553e4992ad7c9e9484d49d49d48af015958dc4bf4b">+24/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>server.go</strong><dd><code>Initialize sysmon stall tracking state</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-a12d34beb8640efbbedad28f8610fd58afeea457572d82e4fd295ab2a6bb8ee6">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>types.go</strong><dd><code>Add sysmonStreamState type for stall tracking</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-717f128517472d3dc4091e67bfc0f4fe4c36e32096c5ef87d78f34cbc64d2399">+8/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>metrics.go</strong><dd><code>Add sysmon metrics storage logging and validation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-2cf937853ac86eb9a879bd370cbe22b835ee0c48be6771b48b0276d93cc2ae93">+15/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>3 files</summary><table> <tr> <td><strong>service_test.go</strong><dd><code>Add memory metrics validation to tests</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-651c156758ec9bf449b0ec538346fc33096dfd6b4de52efed825c8b161564435">+8/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>metrics_test.go</strong><dd><code>Add test for sysmon stall event emission</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-2b0ce06b068be7b4418c3fe4d23e3b8bf536d0cc2b9201a6949976e298a9e95e">+53/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent_poller_test.go</strong><dd><code>Add test for host identity preservation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-3b72f1c80958942904f082df21f74bb67bde710581174e9e0aa1dba82bd7971a">+26/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Bug fix</strong></td><td><details><summary>5 files</summary><table> <tr> <td><strong>sysmon.go</strong><dd><code>Fix metrics API to handle empty results gracefully</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-4879e8b367ee55413c6cb006f23bb912261df5310c640d3af352033cf97ffc84">+5/-7</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>cnpg_metrics.go</strong><dd><code>Fix batch result reading and add detailed logging</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-8ca23b1b2b298b6eec9af9e23be2c7cb9caa65301f514c013ebbc5b52e933a23">+38/-3</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>agent_poller.go</strong><dd><code>Preserve service-reported host identity in payloads</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-7cdaae416a1e7b5d39ad75f138c49fa0aca12fc38e8ab0b26bb59c64446da15c">+32/-2</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>route.ts</strong><dd><code>Handle null responses and use optional chaining</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-565dceead2e723f416ae3cee24e62a1e93364ce67237233e01b4a26c09bf8ba5">+13/-10</a>&nbsp; </td> </tr> <tr> <td><strong>route.ts</strong><dd><code>Handle null responses and use optional chaining</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-8b59229cfa9e8e8f66e10be9569364853176d09da7cf0b4c987c85225da7fa55">+13/-10</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>3 files</summary><table> <tr> <td><strong>BUILD.bazel</strong><dd><code>Add macOS platform constraint to sysmonvm alias</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-884fa9353a5226345e44fbabea3300efc7a87dfbcde0b6a42521ca51823f1b68">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong><dd><code>Add macOS platform constraint to sysmon-vm binary</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-6f29400e46ef3dfa4e1d5e3ec41ebfeceb9628f6900f815f5c9bc8166ccc63a9">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>BUILD.bazel</strong><dd><code>Add macOS constraint and memory dependency</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-1ddcd985760c56dd63f684cae82510232439e902c9fbbb8ecef91f8eb19fd1a1">+3/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>3 files</summary><table> <tr> <td><strong>proposal.md</strong><dd><code>Document sysmon-vm metrics availability fix proposal</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-ad3bf3baec592ea2000de51abcd43628efd97c1524956d0580562e54b56515ad">+15/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>spec.md</strong><dd><code>Add sysmon metrics availability requirements</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-239ceb7ed8d7185674ebc80a40aab82ea6955a159999df3ec7736870a416fac3">+22/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>tasks.md</strong><dd><code>Document investigation and fix tasks</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2043/files#diff-355450ac691dae6d2c3167233558588ea0e0ab969b5266044d4226b00e61ddab">+11/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-12-03 01:31:17 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2043#issuecomment-3604650132
Original created: 2025-12-03T01:31:17Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Excessive logging

Description: Extensive info logs added in sysmon flush path may leak sensitive operational details
(device_id, counts) and could enable log flooding under high load; consider reducing
verbosity or rate limiting.
flush.go [445-503]

Referred Code
for pollerID, sysmonMetrics := range s.sysmonBuffers {
	if len(sysmonMetrics) == 0 {
		continue
	}

	s.logger.Info().
		Str("poller_id", pollerID).
		Int("buffer_count", len(sysmonMetrics)).
		Bool("cnpg_writes_enabled", cnpgWritesEnabled).
		Msg("Flushing sysmon buffers to database")

	for _, metricBuffer := range sysmonMetrics {
		metric := metricBuffer.Metrics
		partition := metricBuffer.Partition
		deviceID := strings.TrimSpace(metricBuffer.DeviceID)

		var ts time.Time

		var agentID, hostID, hostIP string

		// Extract information from the first available metric type


 ... (clipped 38 lines)
Ticket Compliance
🟡
🎫 #2042
🟢 Restore visibility of sysmon-vm metrics that stopped appearing despite collector health.
Ensure device sysmon metrics endpoints return data or safe empty results rather than
errors.
Add memory metrics collection from sysmon-vm so memory panels can display data.
Detect and surface stalls when a connected sysmon-vm stops sending metrics.
Preserve and correctly attribute metrics to the intended device identity (not the
collector host).
Prevent silent CNPG batch insert failures; surface write/read errors for troubleshooting.
Add platform constraints so sysmon-vm builds only on macOS to avoid Linux build failures.
End-to-end validation on an actual macOS/darwin edge device with sysmon-vm to confirm
metrics appear within one polling interval and UI charts render as expected.
Verify CNPG writes and queries in a real environment (schema compatibility, permissions,
performance) beyond unit-level checks.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Audit coverage unclear: New stall-detection and capability updates log events, but it is unclear if these actions
are recorded in a durable audit trail with user/context to reconstruct events.

Referred Code
func (s *Server) noteSysmonSuccess(deviceID string, when time.Time) {
	if s == nil || deviceID == "" {
		return
	}

	s.sysmonStallMu.Lock()
	defer s.sysmonStallMu.Unlock()

	if s.sysmonStall == nil {
		s.sysmonStall = make(map[string]*sysmonStreamState)
	}

	state := s.sysmonStall[deviceID]
	if state == nil {
		state = &sysmonStreamState{}
		s.sysmonStall[deviceID] = state
	}

	state.consecutiveEmpty = 0
	state.lastSuccess = when
}


 ... (clipped 85 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Verbose logging: Newly added info logs include device identifiers and counts which seem acceptable, but
verification is needed to ensure no sensitive user/PII is logged across added log
statements.

Referred Code
func (db *DB) StoreSysmonMetrics(
	ctx context.Context,
	pollerID, agentID, hostID, partition, hostIP, deviceID string,
	sysmon *models.SysmonMetrics,
	timestamp time.Time,
) error {
	if sysmon == nil {
		return nil
	}

	if !db.useCNPGWrites() {
		db.logger.Warn().Str("poller_id", pollerID).Msg("CNPG writes disabled; skipping sysmon metrics")
		return nil
	}

	if deviceID == "" {
		deviceID = fmt.Sprintf("%s:%s", partition, hostIP)
	}

	db.logger.Info().
		Str("poller_id", pollerID).


 ... (clipped 8 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2043#issuecomment-3604650132 Original created: 2025-12-03T01:31:17Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/25183db5ab7438b98afc7169c376d9367780ab57 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=1>⚪</td> <td><details><summary><strong>Excessive logging</strong></summary><br> <b>Description:</b> Extensive info logs added in sysmon flush path may leak sensitive operational details <br>(device_id, counts) and could enable log flooding under high load; consider reducing <br>verbosity or rate limiting.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2043/files#diff-4ad1cf4b0397bd277cd9c0553e4992ad7c9e9484d49d49d48af015958dc4bf4bR445-R503'>flush.go [445-503]</a></strong><br> <details open><summary>Referred Code</summary> ```go for pollerID, sysmonMetrics := range s.sysmonBuffers { if len(sysmonMetrics) == 0 { continue } s.logger.Info(). Str("poller_id", pollerID). Int("buffer_count", len(sysmonMetrics)). Bool("cnpg_writes_enabled", cnpgWritesEnabled). Msg("Flushing sysmon buffers to database") for _, metricBuffer := range sysmonMetrics { metric := metricBuffer.Metrics partition := metricBuffer.Partition deviceID := strings.TrimSpace(metricBuffer.DeviceID) var ts time.Time var agentID, hostID, hostIP string // Extract information from the first available metric type ... (clipped 38 lines) ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>🟡</td> <td> <details> <summary>🎫 <a href=https://github.com/carverauto/serviceradar/issues/2042>#2042</a></summary> <table width='100%'><tbody> <tr><td rowspan=7>🟢</td> <td>Restore visibility of sysmon-vm metrics that stopped appearing despite collector health.</td></tr> <tr><td>Ensure device sysmon metrics endpoints return data or safe empty results rather than <br>errors.</td></tr> <tr><td>Add memory metrics collection from sysmon-vm so memory panels can display data.</td></tr> <tr><td>Detect and surface stalls when a connected sysmon-vm stops sending metrics.</td></tr> <tr><td>Preserve and correctly attribute metrics to the intended device identity (not the <br>collector host).</td></tr> <tr><td>Prevent silent CNPG batch insert failures; surface write/read errors for troubleshooting.</td></tr> <tr><td>Add platform constraints so sysmon-vm builds only on macOS to avoid Linux build failures.</td></tr> <tr><td rowspan=2>⚪</td> <td>End-to-end validation on an actual macOS/darwin edge device with sysmon-vm to confirm <br>metrics appear within one polling interval and UI charts render as expected.</td></tr> <tr><td>Verify CNPG writes and queries in a real environment (schema compatibility, permissions, <br>performance) beyond unit-level checks.</td></tr> </tbody></table> </details> </td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=4>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=2>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2043/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443adR1280-R1385'><strong>Audit coverage unclear</strong></a>: New stall-detection and capability updates log events, but it is unclear if these actions <br>are recorded in a durable audit trail with user/context to reconstruct events.<br> <details open><summary>Referred Code</summary> ```go func (s *Server) noteSysmonSuccess(deviceID string, when time.Time) { if s == nil || deviceID == "" { return } s.sysmonStallMu.Lock() defer s.sysmonStallMu.Unlock() if s.sysmonStall == nil { s.sysmonStall = make(map[string]*sysmonStreamState) } state := s.sysmonStall[deviceID] if state == nil { state = &sysmonStreamState{} s.sysmonStall[deviceID] = state } state.consecutiveEmpty = 0 state.lastSuccess = when } ... (clipped 85 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2043/files#diff-2cf937853ac86eb9a879bd370cbe22b835ee0c48be6771b48b0276d93cc2ae93R198-R226'><strong>Verbose logging</strong></a>: Newly added info logs include device identifiers and counts which seem acceptable, but <br>verification is needed to ensure no sensitive user/PII is logged across added log <br>statements.<br> <details open><summary>Referred Code</summary> ```go func (db *DB) StoreSysmonMetrics( ctx context.Context, pollerID, agentID, hostID, partition, hostIP, deviceID string, sysmon *models.SysmonMetrics, timestamp time.Time, ) error { if sysmon == nil { return nil } if !db.useCNPGWrites() { db.logger.Warn().Str("poller_id", pollerID).Msg("CNPG writes disabled; skipping sysmon metrics") return nil } if deviceID == "" { deviceID = fmt.Sprintf("%s:%s", partition, hostIP) } db.logger.Info(). Str("poller_id", pollerID). ... (clipped 8 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2025-12-03 01:32:27 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2043#issuecomment-3604652437
Original created: 2025-12-03T01:32:27Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix race condition in stall detection

Fix a race condition in noteSysmonStall by ensuring all reads of the shared
state object are performed within the critical section protected by the mutex to
prevent using stale data for stall detection.

pkg/core/metrics.go [1302-1330]

 func (s *Server) noteSysmonStall(
 	ctx context.Context,
 	deviceID, hostIP, hostID, agentID, pollerID, reason string,
 	when time.Time,
 ) {
 	if s == nil || deviceID == "" {
 		return
 	}
 
 	s.sysmonStallMu.Lock()
 	if s.sysmonStall == nil {
 		s.sysmonStall = make(map[string]*sysmonStreamState)
 	}
 
 	state := s.sysmonStall[deviceID]
 	if state == nil {
 		state = &sysmonStreamState{}
 		s.sysmonStall[deviceID] = state
 	}
 
 	state.consecutiveEmpty++
 	consecutive := state.consecutiveEmpty
 	lastSuccess := state.lastSuccess
 	lastAlert := state.lastAlert
-	s.sysmonStallMu.Unlock()
 
 	if consecutive < sysmonStallPollThreshold {
+		s.sysmonStallMu.Unlock()
 		return
 	}
+	s.sysmonStallMu.Unlock()
 ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a race condition where shared state is read under a lock, but the lock is released before the values are used, potentially leading to incorrect stall detection logic. This is a critical concurrency bug.

Medium
General
Simplify redundant payload enrichment logic

Simplify the payload enrichment logic for host_ip and ip. Instead of checking
both statusNode and payload, check only statusNode, add the IP if absent, and
then ensure the top-level payload fields are consistent.

pkg/poller/agent_poller.go [502-522]

 	// Preserve any host_ip/ip already reported by the service; only backfill when absent.
 	hasIP := func(m map[string]any) bool {
 		if m == nil {
 			return false
 		}
 		for _, key := range []string{"host_ip", "ip"} {
 			if val, ok := m[key]; ok {
 				if str, ok := val.(string); ok && strings.TrimSpace(str) != "" {
 					return true
 				}
 			}
 		}
 		return false
 	}
 
-	if ip != "" && !hasIP(statusNode) && !hasIP(payload) {
+	if ip != "" && !hasIP(statusNode) {
 		statusNode["host_ip"] = ip
-		statusNode["ip"] = ip
-		payload["host_ip"] = ip
-		payload["ip"] = ip
 	}
 
+	// For compatibility, ensure top-level ip/host_ip are present if they exist in status.
+	if v, ok := statusNode["host_ip"]; ok {
+		payload["host_ip"] = v
+		payload["ip"] = v
+	}
+
  • Apply / Chat
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly points out redundant logic in the payload enrichment. While the current code is functional, the proposed change simplifies the logic by removing an unnecessary check, improving code clarity and maintainability.

Low
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2043#issuecomment-3604652437 Original created: 2025-12-03T01:32:27Z --- ## PR Code Suggestions ✨ <!-- 25183db --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>Possible issue</td> <td> <details><summary>Fix race condition in stall detection</summary> ___ **Fix a race condition in <code>noteSysmonStall</code> by ensuring all reads of the shared <br><code>state</code> object are performed within the critical section protected by the mutex to <br>prevent using stale data for stall detection.** [pkg/core/metrics.go [1302-1330]](https://github.com/carverauto/serviceradar/pull/2043/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443adR1302-R1330) ```diff func (s *Server) noteSysmonStall( ctx context.Context, deviceID, hostIP, hostID, agentID, pollerID, reason string, when time.Time, ) { if s == nil || deviceID == "" { return } s.sysmonStallMu.Lock() if s.sysmonStall == nil { s.sysmonStall = make(map[string]*sysmonStreamState) } state := s.sysmonStall[deviceID] if state == nil { state = &sysmonStreamState{} s.sysmonStall[deviceID] = state } state.consecutiveEmpty++ consecutive := state.consecutiveEmpty lastSuccess := state.lastSuccess lastAlert := state.lastAlert - s.sysmonStallMu.Unlock() if consecutive < sysmonStallPollThreshold { + s.sysmonStallMu.Unlock() return } + s.sysmonStallMu.Unlock() ... ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies a race condition where shared state is read under a lock, but the lock is released before the values are used, potentially leading to incorrect stall detection logic. This is a critical concurrency bug. </details></details></td><td align=center>Medium </td></tr><tr><td rowspan=1>General</td> <td> <details><summary>Simplify redundant payload enrichment logic</summary> ___ **Simplify the payload enrichment logic for <code>host_ip</code> and <code>ip</code>. Instead of checking <br>both <code>statusNode</code> and <code>payload</code>, check only <code>statusNode</code>, add the IP if absent, and <br>then ensure the top-level payload fields are consistent.** [pkg/poller/agent_poller.go [502-522]](https://github.com/carverauto/serviceradar/pull/2043/files#diff-7cdaae416a1e7b5d39ad75f138c49fa0aca12fc38e8ab0b26bb59c64446da15cR502-R522) ```diff // Preserve any host_ip/ip already reported by the service; only backfill when absent. hasIP := func(m map[string]any) bool { if m == nil { return false } for _, key := range []string{"host_ip", "ip"} { if val, ok := m[key]; ok { if str, ok := val.(string); ok && strings.TrimSpace(str) != "" { return true } } } return false } - if ip != "" && !hasIP(statusNode) && !hasIP(payload) { + if ip != "" && !hasIP(statusNode) { statusNode["host_ip"] = ip - statusNode["ip"] = ip - payload["host_ip"] = ip - payload["ip"] = ip } + // For compatibility, ensure top-level ip/host_ip are present if they exist in status. + if v, ok := statusNode["host_ip"]; ok { + payload["host_ip"] = v + payload["ip"] = v + } + ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=1 --> <details><summary>Suggestion importance[1-10]: 4</summary> __ Why: The suggestion correctly points out redundant logic in the payload enrichment. While the current code is functional, the proposed change simplifies the logic by removing an unnecessary check, improving code clarity and maintainability. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2495
No description provided.