Fix/discovery sources empty #2650
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2650
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2650/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2249
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2249
Original created: 2026-01-11T18:38:26Z
Original updated: 2026-01-11T19:21:07Z
Original head: carverauto/serviceradar:fix/discovery_sources_empty
Original base: staging
Original merged: 2026-01-11T19:21:05Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Tests
Description
Comprehensive database schema migration with 30+ tables for tenant data including
ocsf_devices,ocsf_agents,service_checks,polling_schedules, andintegration_sourcesNew stateful alert rules engine with bucketed evaluation, state tracking, and fault tolerance for log and event-based alerting
Integration sources management LiveView with CRUD operations for Armis, SNMP, Syslog, Nmap, and Custom sources
Real-time cluster monitoring view with ERTS topology visibility, gateway/agent registration tracking, and Oban job queue monitoring
Infrastructure monitoring view with role-based access control and PubSub-based caching for cluster topology
Gateway process GenServer for agent coordination with health checks, job execution, and agent discovery
Enhanced agent detail view displaying live registry data, system information, and configured service checks
Redesigned job scheduler UI with sortable tables, filtering, pagination, and support for multiple job sources
Edge package creation refactored with Ash forms, automatic certificate generation, and tenant CA provisioning
Tenant registry test suite with comprehensive coverage for registry operations and process lifecycle management
Significant codebase migration from Go to Elixir with removal of legacy poller and discovery components
Diagram Walkthrough
File Walkthrough
1 files
20260107043446_initial_schema.exs
Initial tenant schema migration with comprehensive table definitionselixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs
1416 lines of table definitions
ocsf_devices,ocsf_agents,service_checks,polling_schedules,integration_sources, and relatedentities
across all tables
upanddownmigration functions for schema creationand rollback
8 files
index.ex
Integration sources LiveView with CRUD and modal managementweb-ng/lib/serviceradar_web_ng_web/live/admin/integration_live/index.ex
SNMP, Syslog, Nmap, Custom)
for integration source management
credential handling with JSON parsing
agent/partition assignment and sync statistics display
index.ex
New cluster monitoring LiveView with real-time status trackingweb-ng/lib/serviceradar_web_ng_web/live/settings/cluster_live/index.ex
real-time visibility into ERTS topology, gateways, agents, and Oban
job queues
registrations, and gateway platform updates with automatic refresh
scheduling
gateways and agents across cluster nodes using RPC calls
tenant filtering, and rendering health metrics and event logs
index.ex
Refactor edge package creation with Ash forms and auto-cert generationweb-ng/lib/serviceradar_web_ng_web/live/admin/edge_package_live/index.ex
AshPhoenix.Formfor form handlingwith automatic validation and error management
create_with_tenant_certwith tenant CA auto-provisioning
type, added support for "sync" type
certificate details, and improved UX with loading states
handling with JSON config support
index.ex
New infrastructure monitoring view with role-based access controlweb-ng/lib/serviceradar_web_ng_web/live/infrastructure_live/index.ex
nodes, gateways, and agents with platform admin visibility controls
refresh and staleness detection using wall-clock time
tenant's agents while admins see full cluster topology
timeout handling and fallback mechanisms
show.ex
Agent detail view with live registry and system infoweb-ng/lib/serviceradar_web_ng_web/live/agent_live/show.ex
records with preference for live data
(memory, processes, schedulers, uptime)
service checks card displaying configured checks
and improved capability visualization
index.ex
Job scheduler list with filtering and auto-refreshweb-ng/lib/serviceradar_web_ng_web/live/admin/job_live/index.ex
sortable table with pagination
triggers) with filtering and search capabilities
manual trigger capability for jobs
jobs, and accessing Oban Web
stateful_alert_engine.ex
Stateful alert engine with bucketed evaluationelixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex
and event rules with time-windowed thresholds
fields, severity filtering, and body content matching
and renotification with history tracking
initialization for fault tolerance
gateway_process.ex
Gateway process for agent coordinationelixir/serviceradar_core/lib/serviceradar/edge/gateway_process.ex
managing check execution and agent coordination
checks, and result retrieval with load balancing
selection, registry integration, and metrics tracking
registry cleanup
1 files
tenant_registry_test.exs
Tenant registry module comprehensive test coverageelixir/serviceradar_core/test/serviceradar/cluster/tenant_registry_test.exs
TenantRegistrymodule with 310 linescovering 13 test groups
and process lifecycle management
supervisor child spawning
pollution
1 files
.gitkeep
Docker compose credentials directory placeholderdocker/compose/creds/.gitkeep
version control
1 files
20260110054954_add_stateful_alert_rules.exs
Add stateful alert rules and state tracking tableselixir/serviceradar_core/priv/repo/tenant_migrations/20260110054954_add_stateful_alert_rules.exs
stateful_alert_rulestable with configuration for alertthresholds, time windows, and notification settings
stateful_alert_rule_statestable for tracking rule state pergroup with bucket-based counting and cooldown tracking
rule_id and group_key for data integrity
101 files
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2249#issuecomment-3735379626
Original created: 2026-01-11T18:40:21Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
Authorization bypass
Description:
load_tenant/1callsAsh.get(Tenant, tenant_id, authorize?: false), which can bypassauthorization checks and may allow an authenticated user to load tenant records they are
not permitted to access if
tenant_idcan be influenced (e.g., viacurrent_scope.user.tenant_idbeing stale/forged).index.ex [981-998]
Referred Code
Tenant isolation risk
Description: Tenant scoping depends on
get_tenant/1, but if it returnsnilthe code still callsOnboardingPackages.list(%{limit: 50}, tenant: tenant), which could result in cross-tenantdata exposure if the downstream list/get implementations treat a
niltenant as “no tenantfilter”.
index.ex [20-27]
Referred Code
🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Possible runtime crash: User-controlled
component_typeis converted usingString.to_existing_atom/1which canraise and crash the LiveView instead of handling invalid values gracefully.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Unstructured console logs: Logger console output is configured with a plain-text format rather than structured
logging (e.g., JSON), reducing auditability and automated monitoring effectiveness.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Unsafe atom conversion: Converting external form input to an atom via
String.to_existing_atom/1risks crashes andis unsafe input handling without a strict allowlist conversion.
Referred Code
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Missing audit logging: Critical admin actions (create/revoke/delete edge packages) are performed without any
explicit audit log emission in the new LiveView code, requiring verification that
underlying domain functions log these actions with user/tenant context.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Detailed error surfaced: A user-facing flash message includes interpolated
error_msgwhich may expose internaldetails depending on the underlying error content.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2249#issuecomment-3735386494
Original created: 2026-01-11T18:41:38Z
PR Code Suggestions ✨
Latest suggestions up to
4f60ff3Avoid packaging sensitive Hex config
To avoid packaging sensitive credentials, modify the
tarcommand to archive onlythe
.hex/cacheand.hex/packagessubdirectories instead of the entire~/.hexdirectory.
scripts/update-hex-cache.sh [20-24]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a potential security risk of leaking sensitive credentials from the
~/.hexdirectory and provides a precise fix to only archive the necessary cache directories.✅
Normalize blank discovery sourcesSuggestion Impact:
The commit adds normalization to treat `nil` and empty-string sources as "unknown" and also filters out empty strings when aggregating discovery_sources in the upsert fragment. It does not implement the suggested whitespace trimming/to_string handling, so whitespace-only values may still persist.code diff:
Sanitize the
sourcevalue by trimming whitespace and falling back to "unknown"if it's an empty string to prevent storing blank discovery sources.
elixir/serviceradar_core/lib/serviceradar/inventory/sync_ingestor.ex [264-279]
[Suggestion processed]Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies that an empty or whitespace-only
sourcestring would be persisted, and proposes a robust fix to normalize it to"unknown", improving data integrity.✅
Filter empty array entriesSuggestion Impact:
The commit updated the SQL fragment used for discovery_sources to add `AND src <> ''`, filtering out empty strings during array aggregation as suggested. It also added a related application-level guard to treat empty update.source values as "unknown".code diff:
Update the SQL fragment to filter out empty strings in addition to
NULLvalueswhen merging
discovery_sourcesto ensure data cleanliness.elixir/serviceradar_core/lib/serviceradar/inventory/sync_ingestor.ex [297-301]
[Suggestion processed]Suggestion importance[1-10]: 6
__
Why: This suggestion provides a good defense-in-depth measure at the database level to prevent empty strings in the
discovery_sourcesarray, complementing the application-level logic.Prevent dependency directness flapping
To prevent dependency churn, confirm if
github.com/cenkalti/backoff/v5is useddirectly. If so, keep it as a direct dependency by removing the
// indirectmarker.
go.mod [45]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 5
__
Why: The suggestion correctly identifies a potential dependency management issue where moving
github.com/cenkalti/backoff/v5to an indirect dependency could cause CI churn if it's used directly. This is a valid maintainability concern.Previous suggestions
✅ Suggestions up to commit
9ab7926✅
Prevent DoS from unsafe atom conversionSuggestion Impact:
Updated the component_type conversion guard from a generic non-empty binary check to an explicit allowlist (gateway/agent/checker/sync) before calling String.to_existing_atom/1, mitigating the DoS risk.code diff:
Prevent a potential denial-of-service (DoS) vulnerability by validating the
component_typeagainst an allowlist before converting it to an atom withString.to_existing_atom/1.web-ng/lib/serviceradar_web_ng_web/live/admin/edge_package_live/index.ex [953-961]
Suggestion importance[1-10]: 9
__
Why: This suggestion correctly identifies a denial-of-service (DoS) vulnerability from unsafe atom creation and provides a correct fix by validating against an allowlist, which is a critical security improvement.
✅
Improve JSON parsing and error handlingSuggestion Impact:
The commit implements the "remove redundant parsing for queries_json" part by deleting the parse_json_field("queries_json", "queries") step and documenting that queries_json is not parsed here. It does not implement the suggested form error handling / function signature changes to return and update the form on JSON decode errors.code diff:
Remove redundant parsing for
queries_jsonand add form error handling forinvalid JSON in
credentials_jsonto provide user feedback instead of failingsilently.
web-ng/lib/serviceradar_web_ng_web/live/admin/integration_live/index.ex [1362-1379]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly points out redundant code and, more importantly, fixes a silent failure mode where invalid JSON for credentials is ignored, improving both code quality and user experience.
Use proper array default syntax
For Postgres array columns, use a literal empty-array fragment like
fragment("'{_}'::text[]")for the default value instead ofdefault: [], andenforce
null: false.elixir/serviceradar_core/priv/repo/tenant_migrations/20260107043446_initial_schema.exs [219-259]
Suggestion importance[1-10]: 5
__
Why: The suggestion correctly recommends using a database-specific
fragmentfor array defaults to ensure correctness, which is better practice than relying on Ecto'sdefault: []for migrations.✅
Use form.valid? instead of source.valid?Suggestion Impact:
The validity check in the create_package handler was changed from form.source.valid? to the public API AshPhoenix.Form.valid?(form), matching the recommendation.code diff:
Replace
form.source.valid?with the recommendedAshPhoenix.Form.valid?(form)tocheck for form validity.
web-ng/lib/serviceradar_web_ng_web/live/admin/edge_package_live/index.ex [121-124]
Suggestion importance[1-10]: 5
__
Why: The suggestion correctly recommends using the public API
AshPhoenix.Form.valid?/1instead of accessing the internalform.source.valid?field, which improves code robustness and maintainability.Simplify error handling for GenServer calls
Refactor the
call/2function to remove the redundanttry...catchblock and usepattern matching to handle the
{:error, :noproc}return value fromGenServer.call/3.elixir/serviceradar_core/lib/serviceradar/observability/stateful_alert_engine.ex [96-101]
Suggestion importance[1-10]: 4
__
Why: The suggestion is correct and improves code style by replacing a
try...catchwith idiomatic pattern matching on the return value ofGenServer.call/3, making the code cleaner and more readable.✅
Use safe integer parsing to prevent crashesSuggestion Impact:
The event handlers for remove_query, update_query, and toggle_sweep_mode were updated to parse the id using Integer.parse/1 with a case match, returning the unchanged socket on invalid input. The update_query logic was also refactored to use a helper function, but the core safety change matches the suggestion.code diff:
Replace unsafe
String.to_integer/1withInteger.parse/1in event handlers toprevent crashes from invalid client-side input.
web-ng/lib/serviceradar_web_ng_web/live/admin/integration_live/index.ex [224-250]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that using
String.to_integer/1on client-provided data is unsafe and could crash the LiveView process; switching toInteger.parse/1improves the robustness of the event handlers.✅
Parallelize RPC calls to prevent blockingSuggestion Impact:
The commit refactored fetch_node_info/1 to run the RPC calls concurrently via Task.async and collect results with Task.await_many into a map, then used those results instead of sequential :rpc.call invocations (with minor differences in key naming and timeout).code diff:
Parallelize the sequential RPC calls in
fetch_node_info/1usingTask.asynctoprevent the LiveView process from blocking for an extended period if the remote
node is unresponsive.
web-ng/lib/serviceradar_web_ng_web/live/agent_live/show.ex [164-189]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that sequential RPC calls can block the LiveView process for a long time, and proposes an idiomatic and correct solution using
Task.asyncto improve performance and reliability.✅
Use correct tenant ID for queriesSuggestion Impact:
Added pattern match for %{active_tenant: %{id: tid}} to ensure queries use the active tenant context when present.code diff:
Update
load_checks_for_agent/2to correctly resolve thetenant_idbyprioritizing the active tenant from
current_scope, ensuring data is loaded forthe correct context.
web-ng/lib/serviceradar_web_ng_web/live/agent_live/show.ex [191-204]
Suggestion importance[1-10]: 7
__
Why: This suggestion correctly points out an inconsistency in tenant ID resolution that could lead to loading incorrect data. The proposed change aligns the function's logic with other parts of the PR, fixing a potential bug.