fixing age graph bugs #2506
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2506
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2506/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2056
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2056
Original created: 2025-12-04T16:03:43Z
Original updated: 2025-12-04T16:21:30Z
Original head: carverauto/serviceradar:bug/age_merge_failed
Original base: main
Original merged: 2025-12-04T16:21:18Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Bug fix, Enhancement
Description
Implement bounded worker queue for AGE graph writes with retry logic for transient errors
Add queue depth/capacity metrics and structured logging for contention diagnostics
Chunk large batches and serialize writes to prevent concurrent MERGE conflicts
Support environment-based configuration for queue size, chunk size, workers, and timeout
Update TimescaleDB to 2.24.0 and CNPG image hash for stability
Document AGE contention troubleshooting in runbook
Diagram Walkthrough
File Walkthrough
2 files
Add queue depth and capacity metricsImplement bounded queue with retry logic2 files
Upgrade TimescaleDB to 2.24.0Add pgconn dependency for error classification5 files
Set TimescaleDB version config to 2.24.0-devAdd imagePullPolicy Always for core containerUpdate app tag and CNPG image hashUpdate CNPG image hash for consistencyUpdate CNPG image hash for consistency4 files
Document queue contention and backpressure checksAdd change proposal for AGE stabilizationDefine AGE contention tolerance requirementsOutline implementation tasks for AGE stabilizationImported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2056#issuecomment-3612976345
Original created: 2025-12-04T16:04:21Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
No security concerns identified
No security vulnerabilities detected by AI analysis. Human verification advised for critical code.🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status: Passed
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Action Logging: New critical actions (queued AGE writes, retries, failures) add structured logs but do not
clearly include a user ID or actor context, so auditability of who initiated changes is
uncertain.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Error Detail Exposure: Logs include database SQLSTATE codes and raw error messages which are useful internally
but may expose internal details if any are surfaced to users; confirm these logs are not
user-facing.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Input Validation: The writer forwards marshaled JSON payloads and classifies errors but does not show
explicit validation/sanitization of external inputs within this diff; ensure upstream
models and executor parameterization prevent injection.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2056#issuecomment-3612981418
Original created: 2025-12-04T16:05:38Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Refactor backfill tool to use queue
The
age-backfilltool should be updated to use the new queuing system for AGEgraph writes. This change is necessary to fulfill the PR's design requirement of
coordinating backfill operations with live data ingestion to prevent database
contention.
Examples:
openspec/changes/stabilize-age-graph-ingestion/tasks.md [11]
pkg/registry/age_graph_writer.go [152-196]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 9
__
Why: This suggestion correctly identifies a critical omission where the
age-backfilltool is not integrated with the new queuing system, directly contradicting a key requirement in the PR's design documents and undermining the primary goal of preventing write contention.Make retry backoff context-aware
Replace
time.Sleep(delay)with a context-aware timer inprocessRequestto allowthe retry backoff to be interrupted by context cancellation.
pkg/registry/age_graph_writer.go [804-867]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that
time.Sleepis not context-aware and proposes a valid improvement to make the retry backoff responsive to context cancellation, improving shutdown and timeout handling.