sysmon/results router work #2670
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2670
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2670/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2299
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2299
Original created: 2026-01-14T17:40:41Z
Original updated: 2026-01-14T21:31:55Z
Original head: carverauto/serviceradar:updates/sysmon-metrics-ingestion
Original base: staging
Original merged: 2026-01-14T21:31:53Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Documentation
Description
Add OpenSpec proposal for sysmon metrics ingestion via gRPC pipeline
Define requirements for persisting CPU, memory, disk, and process metrics to tenant hypertables
Specify device identifier resolution with safe fallbacks for missing linkages
Document payload size handling and implementation tasks for metrics ingestor
Diagram Walkthrough
File Walkthrough
proposal.md
Sysmon metrics ingestion proposal and rationaleopenspec/changes/fix-sysmon-metrics-ingestion/proposal.md
hypertables
spec.md
Edge architecture requirements for sysmon ingestionopenspec/changes/fix-sysmon-metrics-ingestion/specs/edge-architecture/spec.md
tenant-scoped hypertables
resolution
messages
tasks.md
Implementation tasks for sysmon metrics ingestionopenspec/changes/fix-sysmon-metrics-ingestion/tasks.md
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2299#issuecomment-3750797655
Original created: 2026-01-14T17:41:04Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
No security concerns identified
No security vulnerabilities detected by AI analysis. Human verification advised for critical code.🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status: Passed
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Logging unspecified: The proposal/spec describes ingestion that previously "silently drops data" but
does not specify audit/logging requirements for ingestion success/failure outcomes.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Error handling unspecified: The spec requires ingestion to not fail on missing device linkage but does not define
expected handling/reporting for other ingestion failure points (parse errors, DB insert
failures, malformed payloads).
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Validation unspecified: The spec defines acceptance of larger
sysmon-metricspayloads but does not specifyvalidation/sanitization limits (schema validation, size ceilings, rate limiting) for
externally sourced gRPC payload data.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2299#issuecomment-3750804381
Original created: 2026-01-14T17:42:15Z
PR Code Suggestions ✨
Latest suggestions up to
2d46e4cPrevent atom table exhaustion
Replace the unsafe
String.to_atom/1withString.to_existing_atom/1inside atry/rescueblock. This prevents a potential denial-of-service attack caused byatom table exhaustion from untrusted input.
elixir/serviceradar_core/lib/serviceradar/observability/sysmon_metrics_ingestor.ex [0]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 10
__
Why: The suggestion correctly identifies a critical security vulnerability (atom table exhaustion) that could lead to a denial-of-service attack, and proposes a robust fix using
String.to_existing_atom/1.Avoid skipping sweep results
To prevent data loss on transient failures, move the
setSweepResultsSequencecall to after the
StreamStatuscall succeeds. Additionally, returnfalseon aStreamStatuserror to correctly reflect the failed state.pkg/agent/push_loop.go [0]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 9
__
Why: The suggestion fixes a critical bug where transient network errors could cause permanent data loss by updating a sequence number before ensuring the data was successfully sent.
Make chunking failure-tolerant
Instead of erroring on malformed large payloads that cannot be chunked, fall
back to sending the payload as a single chunk. This improves robustness by
ensuring data is still delivered.
pkg/agent/push_loop.go [0]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion improves the system's robustness by preventing data delivery failures for malformed large payloads, ensuring data is sent even if it cannot be chunked.
Previous suggestions
Suggestions up to commit
3a66f3dConsider a dedicated metrics ingestion endpoint
Instead of sending sysmon metrics via the existing gRPC status update channel,
consider creating a dedicated endpoint for them. This would improve scalability
and prevent high-volume metrics data from impacting other status messages.
Examples:
openspec/changes/fix-sysmon-metrics-ingestion/specs/edge-architecture/spec.md [3]
openspec/changes/fix-sysmon-metrics-ingestion/proposal.md [7]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 8
__
Why: The suggestion raises a valid and significant architectural concern about scalability and reliability by questioning the use of a shared status channel for high-volume metrics, which is a core part of the proposed design.
Clarify handling of unlinked device metrics
To prevent data integrity issues, change the specification to drop metrics from
unlinked devices and log a high-severity alert, instead of ingesting them with a
null or fallback identifier.
openspec/changes/fix-sysmon-metrics-ingestion/specs/edge-architecture/spec.md [14-15]
Suggestion importance[1-10]: 7
__
Why: The suggestion raises a valid design concern about data integrity when handling metrics from unlinked devices, proposing a stricter and more explicit failure mode which improves the robustness of the specification.
Specify payload size limit configurability
Clarify the "configured sysmon limit" in the specification by requiring it to be
configurable (e.g., per-tenant) and have a documented default value to improve
security and reliability.
openspec/changes/fix-sysmon-metrics-ingestion/specs/edge-architecture/spec.md [23]
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies ambiguity in the specification regarding the "configured sysmon limit" and proposes adding important details about its scope and default value, which enhances clarity and security.
Imported GitHub PR review comment.
Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2299#discussion_r2691551561
Original created: 2026-01-14T18:21:34Z
Original path: pkg/agent/push_loop.go
Original line: 560
Size computation for allocation may overflow
This operation, which is used in an allocation, involves a potentially large value and might overflow.
Show more details
Imported GitHub PR review comment.
Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2299#discussion_r2691551567
Original created: 2026-01-14T18:21:34Z
Original path: pkg/agent/push_loop.go
Original line: 620
Size computation for allocation may overflow
This operation, which is used in an allocation, involves a potentially large value and might overflow.
Show more details