2042 bugsysmon metrics not available from sysmon vm collection #2497
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2497
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2497/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2045
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2045
Original created: 2025-12-03T04:08:32Z
Original updated: 2025-12-03T04:10:26Z
Original head: carverauto/serviceradar:2042-bugsysmon-metrics-not-available-from-sysmon-vm-collection
Original base: main
Original merged: 2025-12-03T04:10:03Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Bug fix, Enhancement
Description
Fix sysmon-vm metrics pipeline to persist CPU/memory data from collectors to CNPG and API endpoints
Add memory metrics collection support and stall detection for sysmon collectors
Explicitly qualify all metrics table names with
publicschema to prevent Apache AGE catalog conflictsImprove error handling in CNPG batch operations and add comprehensive logging for debugging
Preserve service-reported host identity in payload enrichment to avoid overwriting collector metadata
Add null-safety checks in web API routes to return empty arrays instead of 500 errors
Diagram Walkthrough
File Walkthrough
6 files
Add memory metrics collection from gopsutilAdd logging for sysmon buffer flush operationsImplement sysmon stall detection and loggingInitialize sysmon stall tracking state mapDefine sysmonStreamState for stall trackingAdd logging and schema qualification for sysmon storage3 files
Validate memory metrics in response payloadTest sysmon stall event emission after empty payloadsTest host identity preservation during payload enrichment5 files
Qualify metrics tables with public schemaFix batch error handling and add schema qualificationPreserve service-reported host identity in enrichmentAdd null-safety checks and optional chaining operatorsAdd null-safety checks and optional chaining operators2 files
Restrict sysmonvm alias to macOS platformRestrict sysmon-vm binary to macOS platform1 files
Add mem dependency and macOS platform constraint3 files
Document sysmon metrics availability fix proposalDefine sysmon metrics availability requirementsTrack investigation and fix implementation tasksImported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2045#issuecomment-3605008656
Original created: 2025-12-03T04:09:09Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
Information disclosure via response semantics
Description: The endpoint now always returns 200 with metrics (including when empty), potentially
exposing device existence and metric type availability to unauthorized callers if upstream
auth/authorization is lax.
sysmon.go [365-384]
Referred Code
🎫 #2042
handling.
arrive.
identity).
darwin/arm64 deployment.
for all metric types.
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status: Passed
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Limited auditing: New CNPG write-path and sysmon flush instrumentation add logs but do not clearly ensure
auditable records of critical actions (who/what/when/outcome) across all critical
operations.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Error detail echo: API responses include raw error details in JSON which may expose internal information to
end users depending on routing exposure.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Verbose logging: Added info/error logs include device identifiers and potentially large counts; while
useful, review is needed to ensure no sensitive data or excessive verbosity at info level.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2045#issuecomment-3605010947
Original created: 2025-12-03T04:10:26Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Fix race condition in stall detection
Fix a race condition in
noteSysmonStallby holding the lock for the entireduration of the function, ensuring atomic access and modification of the shared
sysmonStreamState.pkg/core/metrics.go [1302-1345]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a race condition in the newly added
noteSysmonStallfunction, which could lead to false positive stall alerts. Fixing this is critical for the correctness of the new feature.Restrict sysmon-vm checker to macOS
The
sysmon-vmchecker has been restricted to build only on macOS. This changeshould be reviewed to confirm it is intentional, as it prevents the checker from
being used on other operating systems like Linux or Windows.
Examples:
alias/BUILD.bazel [28]
cmd/checkers/sysmon-vm/BUILD.bazel [9]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that the
sysmon-vmchecker is now restricted to macOS, which is a significant architectural change not mentioned in the PR description and could be an unintended side effect of fixing a macOS-specific bug.Remove redundant IP fields from payload
Remove redundant
host_ipandipfields from the top-level payload inenrichPayloadWithHost, adding them only to the nestedstatusobject to avoiddata duplication.
pkg/poller/agent_poller.go [517-522]
Suggestion importance[1-10]: 4
__
Why: The suggestion correctly points out data redundancy in the JSON payload, and removing it would be a good practice for maintainability and consistency, but it is a minor improvement.