phase 2 work complete #2402
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2402
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2402/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #1925
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1925
Original created: 2025-11-05T06:47:01Z
Original updated: 2025-11-07T18:37:58Z
Original head: carverauto/serviceradar:1924-architecture-refactor-move-device-state-to-registry-treat-proton-as-olap-warehouse
Original base: main
Original merged: 2025-11-07T18:37:19Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Tests
Description
Moved device state management from database to in-memory registry: Implemented
DeviceRecordstruct and in-memory storage with multi-index lookup (by ID, IP, MAC) to enable fast device queries without warehouse accessImplemented trigram-based device search: Added full-text search index with ranking logic for efficient device discovery by ID, IP, and hostname
Added collector capability tracking: Introduced
CollectorCapabilitymodel and in-memory capability index to track which collectors (SNMP, ICMP, Sysmon, agents, pollers, checkers) are available for each deviceRefactored API layer to use registry: Updated device API endpoints to query in-memory registry instead of database, with fallback to warehouse for missing data
Implemented registry hydration: Added batch loading of devices from Proton warehouse into in-memory registry on server startup
Simplified capability response generation: Removed 150+ lines of complex metadata-based capability derivation, replaced with explicit
CollectorCapabilityrecordsAdded comprehensive test coverage: Implemented tests for device storage, transformation, search, capability indexing, and registry hydration
Updated frontend types and components: Extended TypeScript interfaces and refactored device table to use new capabilities API
Optimized build configuration: Simplified Bazel build files using glob patterns for registry and core packages
Diagram Walkthrough
File Walkthrough
21 files
registry.go
In-memory device registry with search and capability indexingpkg/registry/registry.go
mu,devices,devicesByIP,devicesByMACmaps and search/capability indexesNewDeviceRegistryconstructorapplyRegistryStorecall inProcessBatchDeviceUpdatesto persistupdates to in-memory store
SetCollectorCapabilities,GetCollectorCapabilities,HasDeviceCapability,ListDevicesWithCapabilityGetDevice,GetDevicesByIP,ListDevicesto use in-memoryrecords instead of database queries
SearchDevicesmethod with trigram index and scoring logic fordevice search
DeleteLocalmethod to remove devices from registry withoutemitting tombstones
server.go
API server integration with registry search and capabilitiespkg/core/api/server.go
filterDevicessearch logic with newSearchDevicesmethod fromdevice registry
sendDeviceRegistryResponseto acceptcontext.Contextparameterexplicit registry lookup via
GetCollectorCapabilitiesgetDeviceto prioritize registry lookup over databasefallback
getDeviceMetricsto use registry capability hints instead ofinline
collectorCapabilitiesstructdevice_transform.go
Device record transformation layer implementationpkg/registry/device_transform.go
DeviceRecordFromUnifiedconverts Proton unified devices to in-memoryDeviceRecordformatUnifiedDeviceFromRecordandLegacyDeviceFromRecordconvert recordsback to API representations
sorting
device_store.go
In-memory device storage with multi-index lookuppkg/registry/device_store.go
UpsertDeviceRecordmaintains primary device map and IP/MAC lookupindexes
DeleteDeviceRecordremoves devices and cleans up all indexesFindDevicesByIPandFindDevicesByMACquery indexed lookupssnapshotRecordsprovides thread-safe read-only view of all devicesmutations
collectors.go
Simplified collector capability response generationpkg/core/api/collectors.go
lines)
toCollectorCapabilityResponseto work with explicitCollectorCapabilityrecordsCapabilities,AgentID,PollerID,ServiceName,LastSeenfieldsSupportsICMP,SupportsSNMP,SupportsSysmon) derivedfrom capability list
capabilities.go
In-memory collector capability index implementationpkg/registry/capabilities.go
CapabilityIndexfor in-memory collectorcapability tracking
Setstores/updates capability records with deduplication andnormalization
Getreturns defensive copy of capability recordHasCapabilityandListDevicesWithCapabilityquery capability indexescapability→devices
device_store_update.go
Device update application and merging logicpkg/registry/device_store_update.go
applyRegistryStoreprocesses batch updates and tombstonesdeviceRecordFromUpdatemerges updates with existing records,preserving timestamps
and deletion detection
trigram_index.go
Trigram-based device search index implementationpkg/registry/trigram_index.go
Addindexes device text with trigram decompositionSearchreturns ranked matches using trigram scoringRemovecleans up trigram postingsdevice_registry.go
Device deletion integration with registrypkg/core/api/device_registry.go
DeleteDeviceRecordcall indeleteDeviceto remove from in-memoryregistry
capabilities.go
Collector capability management utilitiespkg/core/capabilities.go
normalizeCapabilitiesdeduplicates and sorts capability listsmergeCollectorCapabilityRecordmerges new capabilities with existingrecords
upsertCollectorCapabilitieshelper for registry updatespollers.go
Collector capability recording for service registrationpkg/core/pollers.go
upsertCollectorCapabilitiescalls in poller, agent, and checkerregistration
pollercapabilityagentcapabilitycheckercapabilityhydrate.go
Registry hydration from persistent storagepkg/registry/hydrate.go
HydrateFromStoreloads devices in batches and populates in-memoryindexes
replaceAllatomically replaces registry state with new recordsmetrics.go
Collector capability recording for metricspkg/core/metrics.go
upsertCollectorCapabilitiescalls in SNMP, ICMP, and Sysmonmetric processing
devices.go
Collector capability recording for service devicespkg/core/devices.go
upsertCollectorCapabilitiescall inensureServiceDeviceforchecker registration
interfaces.go
Manager interface capability method additionspkg/registry/interfaces.go
Managerinterface for capability operationsSetCollectorCapabilities,GetCollectorCapabilities,HasDeviceCapability,ListDevicesWithCapabilityserver.go
Initialize device registry hydration from Proton warehousepkg/core/server.go
initialization
fails
device.go
Add DeviceRecord struct for registry state managementpkg/registry/device.go
DeviceRecordstruct for in-memory device stateIDs, and capabilities
types.go
Add GetCollectorCapabilities method to registry interfacepkg/core/api/types.go
GetCollectorCapabilitiesmethod toDeviceRegistryServiceinterface
collector.go
Add CollectorCapability model for device collectorspkg/models/collector.go
CollectorCapabilitystruct for collector metadatainformation
devices.ts
Extend CollectorCapabilities interface with new fieldsweb/src/types/devices.ts
CollectorCapabilitiesinterface with new fieldscapabilitiesarray,agent_id,poller_id,service_name, andlast_seenfieldsDeviceTable.tsx
Refactor device collector info extraction to use capabilities APIweb/src/components/Devices/DeviceTable.tsx
extractCollectorInfoto usecapabilitiesarray from APIinstead of metadata parsing
type registration
capabilities,agentId,pollerId, andlastSeentoCollectorInfotype
metadata hints
10 files
collectors_test.go
Simplified collector capability response testspkg/core/api/collectors_test.go
toCollectorCapabilityResponsewith explicitCollectorCapabilityrecordsmetadata inspection
device_transform_test.go
Device transformation function testspkg/registry/device_transform_test.go
DeviceRecordFromUnifiedwith metadata, hostname, MAC, anddiscovery source handling
UnifiedDeviceFromRecordandLegacyDeviceFromRecordconversionsregistry_test.go
Registry store and search functionality testspkg/registry/registry_test.go
TestProcessBatchDeviceUpdatesUpdatesStoreverifying updatespersist to in-memory store
TestProcessBatchDeviceUpdatesRemovesDeletedRecordstestingdeletion metadata handling
TestSearchDevicestesting search ranking by device ID, IP,hostname, and recency
device_store_test.go
Device storage and indexing testspkg/registry/device_store_test.go
device_store_update_test.go
Device update application testspkg/registry/device_store_update_test.go
handling
timestamps
devices_test.go
Service device registration capability testspkg/core/devices_test.go
TestEnsureServiceDeviceRegistersOnStatusSourceto expectcapability record operations
GetCollectorCapabilitiesandSetCollectorCapabilitiescallshydrate_test.go
Registry hydration from database testspkg/registry/hydrate_test.go
HydrateFromStoreloads devices into in-memory registrymock_registry.go
Mock registry capability method implementationspkg/registry/mock_registry.go
SetCollectorCapabilities,GetCollectorCapabilities,HasDeviceCapability,ListDevicesWithCapabilitycapabilities_test.go
Capability index testspkg/registry/capabilities_test.go
trigram_index_test.go
Trigram index search testspkg/registry/trigram_index_test.go
2 files
BUILD.bazel
Simplify Bazel build configuration with glob patternspkg/registry/BUILD.bazel
srcsfrom explicit file list to glob pattern["*.go"]excluding tests
go_testsrcs to use glob pattern["*_test.go"]BUILD.bazel
Add new capability and test files to build configurationpkg/core/BUILD.bazel
capabilities.goto library sourcesdevices_test.goto test sourcesImported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1925#issuecomment-3489630672
Original created: 2025-11-05T06:48:18Z
PR Compliance Guide 🔍
(Compliance updated until commit
github.com/carverauto/serviceradar@da2ee0f076)Below is a summary of compliance checks for this PR:
Data exposure via search
Description: The device search handler returns raw SRQL rows in response (field
RawResultsand fallbackto
result.Rows), potentially exposing unvetted warehouse data and internal schema fieldsdirectly to clients.
server.go [1847-1906]
Referred Code
Sensitive information exposure
Description: Capability events and metadata include multiple IDs and network details taken from inputs
without normalization (e.g., agent_id, poller_id, target_host), which could be logged or
stored and later exposed; ensure these fields are sanitized and not echoed to clients.
metrics.go [728-776]
Referred Code
Sensitive information exposure
Description: Stats discrepancy diagnostics may log lists of missing/excess device IDs
(
missing_device_ids,excess_device_ids) which could leak inventory identifiers in logsaccessible beyond intended scope.
stats_aggregator.go [732-787]
Referred Code
🎫 #1924
capability from metadata
events
fallback only when necessary
removes SRQL-driven stats from UI
questions
certain paths
longer issue SRQL/stat queries
including linter/middleware enforcement
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Internal error leak: The device search handler returns err.Error() directly to clients on planner failure,
potentially exposing internal system details.
Referred Code
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Missing audit logs: New critical endpoints and capability events (e.g., device search, stats, critical logs,
capability recordings) lack explicit audit logging of actor identity and action outcomes,
making audit trail completeness uncertain.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Input validation gaps: The device search handler minimally validates request fields (e.g., pagination limits,
mode) and passes them through, which may miss edge-case validation and error context for
invalid inputs.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Log metadata risk: New logging and capability events include device IDs, IPs, hostnames, and errors that may
contain sensitive data; ensure policies and redaction guardrails prevent PII/secret
exposure.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Weak pagination checks: The device search API accepts pagination fields (limit/offset/cursor) without explicit
bounds or sanitization in the handler, which could allow excessive resource usage or
injection risk depending on downstream handling.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Previous compliance checks
Compliance check up to commit 057b69f
Unsafe type assertion
Description: Type assertion without proper error handling could cause runtime panic if deviceRegistry
doesn't implement SearchDevices interface, potentially crashing the service.
registry.go [1766-1777]
Referred Code
Unvalidated capability exposure
Description: Capability data retrieved from registry is directly exposed in API response without
validation or sanitization, potentially leaking sensitive collector metadata.
server.go [1820-1826]
Referred Code
🎫 #1924
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status: Passed
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status: Passed
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status:
Generic variable names: Variables named 'src', 'dst', 'out' are too generic and
don't clearly express their purpose without additional context
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Silent nil handling: When search returns nil results, the code silently sets filteredDevices to nil without
logging or providing context about why the search failed
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Missing input validation: The UpsertDeviceRecord function accepts a DeviceRecord pointer without validating critical
fields like IP address format or MAC address format before indexing
Referred Code
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1925#issuecomment-3489633158
Original created: 2025-11-05T06:49:34Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Re-evaluate custom cache and search implementation
Instead of building a custom in-memory cache and trigram search engine, consider
using established libraries like Ristretto for caching and Bleve for search.
This would reduce maintenance and provide more robust features.
Examples:
pkg/registry/registry.go [62-67]
pkg/registry/trigram_index.go [11-26]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies the custom implementation of a cache and search engine, proposing a valid, high-impact architectural alternative using battle-tested libraries that could reduce long-term maintenance and improve features.
✅
Propagate context for proper cancellationSuggestion Impact:
The commit modified processSysmonMetrics to accept a ctx parameter and replaced context.Background() with ctx when calling upsertCollectorCapabilities, implementing proper context propagation.code diff:
In
processSysmonMetrics, pass the incomingctxtoupsertCollectorCapabilitiesinstead of using
context.Background()to ensure proper context propagation.pkg/core/metrics.go [733]
[Suggestion processed]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that
context.Background()is used instead of propagating the function'sctx, which is important for handling cancellations and tracing.✅
Add context to function signatureSuggestion Impact:
The function ensureServiceDevice now takes a context.Context and passes it to upsertCollectorCapabilities; previously it used context.Background(). However, new calls to recordCapabilityEvent still use context.Background().code diff:
Update the
ensureServiceDevicefunction to accept acontext.Contextand pass itto the
upsertCollectorCapabilitiescall instead of usingcontext.Background().pkg/core/devices.go [109-116]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out the use of
context.Background()and proposes adding context propagation toensureServiceDevice, which is a good practice for consistency and robustness.✅
Optimize capability set creation logicSuggestion Impact:
The commit replaced the chained filter/map/set construction with a single for-loop that trims, lowercases, type-checks, and adds capabilities to a Set.code diff:
Refactor the creation of
capabilitySetto use a single loop instead of multiplechained
filterandmapcalls for better efficiency.web/src/components/Devices/DeviceTable.tsx [100-108]
[Suggestion processed]Suggestion importance[1-10]: 4
__
Why: The suggestion offers a valid micro-optimization by reducing chained array methods to a single loop, which can be slightly more performant and readable.