Fix/ash oban tenant scheduler broken #2638
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2638
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2638/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2231
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2231
Original created: 2026-01-09T02:58:11Z
Original updated: 2026-01-09T04:48:57Z
Original head: carverauto/serviceradar:fix/ash_oban_tenant_scheduler_broken
Original base: testing
Original merged: 2026-01-09T04:48:37Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Bug fix
Description
Major architectural refactor from poller to gateway model: Comprehensive rename and restructuring of the entire codebase to replace the poller-based architecture with a new gateway-based push-first communication model
Multi-tenant support: Implemented full multi-tenant architecture with tenant ID and slug propagation throughout the system
Gateway monitoring and status management: New comprehensive gateway health checking, offline/recovery detection, and alert handling with periodic monitoring and streaming status report support
NATS bootstrap and account management: Added new NATS bootstrap functionality with JWT and credentials file management, supporting multi-tenant routing and operator/account generation
Push-based result delivery: Refactored sync service to use gateway client for push-based result delivery instead of pull-based KV storage
Dynamic configuration updates: Implemented gateway enrollment, config polling, and heartbeat loops for dynamic configuration management
Protobuf message updates: Refactored protobuf messages from poller to gateway architecture with new agent-gateway communication types
Removed legacy components: Deleted poller package, poller-related CLI commands, and associated infrastructure code
Updated terminology: Systematic updates across API documentation, SNMP checker, edge onboarding, and all related services
Diagram Walkthrough
File Walkthrough
6 files
monitoring.pb.go
Refactor protobuf messages from poller to gateway architectureproto/monitoring.pb.go
PollerIdfield toGatewayIdacross multiple message types(
StatusRequest,ResultsRequest,StatusResponse,ResultsResponse)PollerStatusRequest,PollerStatusResponse, andServiceStatusmessage typesPollerStatusChunkwith newGatewayStatusRequest,GatewayStatusResponse, andGatewayStatusChunktypesGatewayServiceStatusmessage type with tenant-related fields(
TenantId,TenantSlug)AgentHelloRequest,AgentHelloResponse,AgentConfigRequest,AgentConfigResponse,AgentCheckConfignats_bootstrap.go
Add comprehensive NATS bootstrap and configuration managementpkg/cli/nats_bootstrap.go
nats-bootstrapandadmin natssubcommandsand credentials file management
account listing via Core API
and JSON output formats
service.go
Multi-tenant gateway push architecture with dynamic configpkg/sync/service.go
gateway push-first communication model
for push-based result delivery
per-tenant source configuration management
dynamic configuration updates
deprecated pull-based GetResults API
gateways.go
Complete gateway monitoring and status management systempkg/core/gateways.go
implementation with 1540 lines of core functionality
alert handling with periodic monitoring
reassembly and service message handling
integration with service registry and event publishing
edge_onboarding.go
Rename poller to gateway terminology in edge onboardingpkg/core/edge_onboarding.go
pollerreferences togatewaythroughout theedge onboarding service (200+ occurrences)
reflect gateway terminology
EdgeOnboardingComponentTypeSynccomponent type inpackage creation and validation
config/pollers/toconfig/gateways/andrelated metadata field names
main.go
Update SNMP checker to use gateway terminologycmd/checkers/snmp/main.go
NewSNMPPollerServiceto
NewSNMPGatewayServicesnmp.Pollertosnmp.Gatewayto align withgateway terminology
1 files
main.go
Update API documentation terminologycmd/core/main.go
in Swagger documentation
2 files
prod.exs
Add Elixir production configurationelixir/serviceradar_agent_gateway/config/prod.exs
infofor production environmentdev.exs
Development configuration for Elixir gateway serviceelixir/serviceradar_agent_gateway/config/dev.exs
2 files
nats_account.pb.go
NATS account management protobuf definitionsproto/nats_account.pb.go
message types and 1 enum
operator bootstrap functionality
credential generation, and JWT signing
nats_account_grpc.pb.go
Generated gRPC code for NATS account serviceproto/nats_account_grpc.pb.go
operations
creation, and credential generation
account management
1 files
mock_armis.go
Remove KVWriter from Armis mock interfacespkg/sync/integrations/armis/mock_armis.go
KVWriterinterface frommocked interfaces
101 files
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2231#issuecomment-3726829494
Original created: 2026-01-09T02:59:58Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
No security concerns identified
No security vulnerabilities detected by AI analysis. Human verification advised for critical code.🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status: Passed
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status: Passed
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status: Passed
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Unvalidated External IDs: New externally-supplied identifiers (
gateway_id,tenant_id,tenant_slug) are introduced inrequest messages without any visible validation/sanitization in the diff, requiring
verification that upstream handlers enforce expected formats and authorization boundaries.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2231#issuecomment-3726832504
Original created: 2026-01-09T03:01:38Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Prevent buffer reuse bug in chunking
In
buildResultsChunks, fix a data corruption bug by ensuring a new backing arrayis allocated for each
payloadslice, preventing it from being overwritten bysubsequent writes to the shared buffer.
pkg/sync/service.go [1135-1169]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a critical bug where
buf.Reset()does not clear the underlying array, causing subsequent writes to corrupt previously createdpayloadslices that share the same backing array. This would lead to data corruption in the generated chunks.✅
Use idiomatic error check for streamSuggestion Impact:
Updated the stream receive error handling to use errors.Is(err, io.EOF) instead of comparing err.Error() to "EOF", and added the missing io import.code diff:
Replace the string comparison
err.Error() == "EOF"with the idiomaticerrors.Is(err, io.EOF)to reliably detect the end of a gRPC stream.pkg/core/gateways.go [1001-1007]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 6
__
Why: This suggestion is correct and improves code robustness by replacing a brittle string comparison (
err.Error() == "EOF") with the idiomaticerrors.Is(err, io.EOF)for checking the end of a stream.Avoid potential deadlock in config update
In
UpdateConfig, move the creation of new integrations outside theconfigMulockto reduce lock contention and minimize the critical section.
pkg/sync/service.go [372-404]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 5
__
Why: The suggestion's reasoning about a specific deadlock path is incorrect, as
createIntegrationdoes not callGetEffectiveDiscoveryInterval. However, the proposed change correctly applies a best practice of minimizing lock duration by performing expensive operations (integration creation) outside the critical section, which reduces lock contention and improves maintainability.Use a valid base context
Replace the call to
context.WithoutCancelwithcontext.Background()to create avalid base context for the background goroutine, preventing potential
compilation errors.
pkg/core/gateways.go [617-619]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 5
__
Why: The suggestion correctly identifies that
context.WithoutCancelis a new function and might not be available, proposingcontext.Background()as a fix, which is a valid approach to detach a background task's lifecycle.✅
Bubble enrollment-pending errorSuggestion Impact:
Removed the special-case block that returned nil when ensureGatewayEnrolled() returned errGatewayNotEnrolled, causing the error to be propagated upward (and added the explanatory comment).code diff:
In
bootstrapGatewayConfig, propagate theerrGatewayNotEnrollederror instead ofswallowing it, allowing the calling
Startfunction to correctly handle thepending enrollment state.
pkg/sync/service.go [912-929]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly points out that swallowing
errGatewayNotEnrolledbreaks the intended logic in theStartfunction, which relies on this error to defer configuration bootstrapping. Propagating the error is critical for the correct startup behavior.Remove redundant tenant fields from message
Remove the redundant
TenantIdandTenantSlugfields from theGatewayServiceStatusmessage, as they are already present in parent messages.proto/monitoring.pb.go [1113-1130]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies data redundancy in the Protobuf message design, which impacts payload size and data consistency, and proposes a valid improvement.
Regenerate Protobuf Go file
Ensure the
monitoring.pb.gofile is not manually edited by regenerating it fromthe source
.protofile.proto/monitoring.pb.go [827-840]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion provides a critical best practice for working with generated code, correctly advising to modify the source
.protofile rather than the output, which is what the PR author has already done.Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2231#issuecomment-3727140590
Original created: 2026-01-09T04:25:41Z
CI Feedback 🧐
(Feedback updated until commit
github.com/carverauto/serviceradar@9b433887d6)A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
Action: build
Failed stage: Configure SRQL fixture database for tests [❌]
Failed test name: ""
Failure summary:
The action failed because a required secret for TLS verification was missing:
- The job exited with
code
1after printing:SRQL_TEST_DATABASE_CA_CERT secret must be configured to verify SRQL fixtureTLS.(log line 671).- The environment shows
SRQL_TEST_DATABASE_CA_CERT:is empty, so the workflowcannot verify the SRQL test fixture database TLS connection and aborts.
Relevant error logs: