fixing sweep job / oban scheduler issue #2690
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2690
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2690/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2340
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2340
Original created: 2026-01-18T08:31:02Z
Original updated: 2026-01-18T08:36:44Z
Original head: carverauto/serviceradar:2337-bugweb-ng-unable-to-create-new-sweep-group
Original base: staging
Original merged: 2026-01-18T08:36:43Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Bug fix, Enhancement
Description
Add resilient Oban job scheduling to prevent sweep group creation failures
ObanSupportmodule safely detects Oban availability and handles insert errors:oban_unavailableerror instead of crashing when Oban missingImplement
SweepScheduleReconcilerGenServer to reconcile deferred schedulesUpdate web-ng UI to provide clear feedback when scheduling is deferred
Add comprehensive tests and design documentation for job scheduling resilience
Diagram Walkthrough
File Walkthrough
4 files
Add SweepScheduleReconciler child to supervision treeNew module for safe Oban availability checks and insertsNew GenServer to reconcile deferred sweep schedulesAdd contextual messaging for deferred sweep scheduling3 files
Guard Oban inserts with availability check and error handlingGuard Oban inserts with availability check and error handlingAdd error handling for Oban unavailability conditions1 files
New tests for Oban support and reconciliation behavior5 files
Design proposal for Oban availability resilienceDetailed design decisions and migration planNew job scheduling resilience requirementsUpdated sweep group management requirementsImplementation task checklist and completion statusImported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2340#issuecomment-3765055831
Original created: 2026-01-18T08:31:44Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
No security concerns identified
No security vulnerabilities detected by AI analysis. Human verification advised for critical code.🎫 #2337
current flow raises
RuntimeErrorabout missing Oban instance).LiveView/GenServer terminating due to Oban scheduling.
action succeeds in an environment where Oban is not running/configured, and that the
original LiveView crash no longer occurs.
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status: Passed
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Missing user context: New scheduling audit logs include
sweep_group_idand outcome but do not include anyinitiating user identifier, so it may be insufficient as an audit trail if sweep group
changes are considered critical actions.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Interval not validated: The reconciler reads
:sweep_schedule_reconcile_interval_secondsfrom config withoutvalidating it is a non-negative integer, which could cause crashes or unexpected behavior
when used in
Process.send_after/3.Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Exception message logged: The new warning logs can include
reason: messagederived fromException.message/1, whichmay leak internal scheduler/config details depending on the raised error content.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2340#issuecomment-3765056765
Original created: 2026-01-18T08:32:57Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Consider a more direct reconciliation trigger
Instead of having the
SweepScheduleReconcilerpoll periodically, consider anevent-driven approach. The reconciliation process should be triggered directly
by the component that starts Oban, making job scheduling more immediate.
Examples:
elixir/serviceradar_core/lib/serviceradar/sweep_jobs/sweep_schedule_reconciler.ex [24-43]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 7
__
Why: This is a strong architectural suggestion that correctly identifies the new polling mechanism and proposes a more efficient, event-driven alternative which would reduce scheduling delays and system overhead.
Ensure essential jobs run always
Modify
handle_groups/1to always schedule theSweepMonitorWorkerandSweepDataCleanupWorkerduring reconciliation, regardless of whether any sweepgroups exist.
elixir/serviceradar_core/lib/serviceradar/sweep_jobs/sweep_schedule_reconciler.ex [58-63]
Suggestion importance[1-10]: 7
__
Why: This is a valid logical correction. The monitoring and cleanup workers should run even if no sweep groups are currently enabled, to handle historical data and ensure the system is ready for future groups.