2018 refactor kv store identity map fix #2479
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2479
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2479/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2019
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2019
Original created: 2025-11-26T00:38:49Z
Original updated: 2025-11-26T04:33:40Z
Original head: carverauto/serviceradar:2018-refactor-kv-store-identity-map-fix
Original base: main
Original merged: 2025-11-26T04:33:30Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Tests, Documentation
Description
Introduces
DeviceIdentityResolverfor generating stable ServiceRadar UUIDs using deterministic hashing based on partition and IP, with hierarchical identifier resolution (strong identifiers like MAC, Armis ID, NetBox ID for merging; IP as fallback)Implements
cnpgIdentityResolverto enrich device updates with canonical metadata from CNPG unified_devices table, eliminating write amplification from KV store identity publisherIntegrates new identity resolution system into device processing pipeline with batch deduplication logic and metadata merging
Adds
StaleDeviceReapercomponent to clean up orphaned IP-only devices without strong identifiers on configurable intervalsRemoves KV-based identity publisher and resolver configuration, replacing with direct CNPG queries for identity resolution
Updates sweep result processor and Armis sync to leave
DeviceIDempty, allowing registry to generate stable ServiceRadar UUIDsAdds database methods for querying stale devices and soft-deleting orphaned records
Includes comprehensive test coverage for new identity resolution system, canonicalization scenarios, and device cleanup
Provides detailed documentation of architecture findings, implementation plan, and deployment history
Includes database migration to merge duplicate devices in unified_devices table
Diagram Walkthrough
File Walkthrough
10 files
device_identity.go
New device identity resolver with stable UUID generationpkg/registry/device_identity.go
DeviceIdentityResolverto generate stable ServiceRadarUUIDs for devices using deterministic hashing based on partition and
IP
(MAC, Armis ID, NetBox ID) for device merging, weak identifiers (IP)
as fallback
(5-minute TTL, 100k entries)
ID, and NetBox ID with filtering for legacy IDs
identity_resolver_cnpg.go
CNPG-based identity resolver for canonical metadata hydrationpkg/registry/identity_resolver_cnpg.go
cnpgIdentityResolverfor enriching device updates withcanonical metadata from CNPG unified_devices table
metadata with TTL-based expiry
hydrateCanonicalmethod to apply canonical identifiers andmetadata to device updates
using direct CNPG queries
registry.go
Registry integration of new identity resolution systempkg/registry/registry.go
DeviceIdentityResolverandcnpgIdentityResolverinto deviceprocessing pipeline
IP within a batch
canonicalIDCandidateto skip legacy partition:IP format IDsand prefer ServiceRadar UUIDs
reaper.go
Stale device reaper for cleanup of orphaned devicespkg/core/reaper.go
StaleDeviceReapercomponent to clean up orphaned IP-only deviceswithout strong identifiers
devices older than TTL (default 24 hours)
server.go
Server initialization with new identity resolution systempkg/core/server.go
write amplification)
DeviceIdentityResolverandCNPGIdentityResolveras primaryidentity resolution mechanism
StaleDeviceReaperbackground loop on server startupresolution
result_processor.go
Sweep processor uses empty device IDs for resolverpkg/core/result_processor.go
DeviceIDempty instead ofgenerating legacy partition:IP format
DeviceIdentityResolverto generate stable ServiceRadar UUIDsdevices.go
Armis sync uses empty device IDs for resolverpkg/sync/integrations/armis/devices.go
DeviceIDempty instead ofpartition:IP format
merging
cnpg_unified_devices.go
Database methods for stale device identification and soft deletionpkg/db/cnpg_unified_devices.go
GetStaleIPOnlyDevicesmethod to query devices without strongidentifiers older than TTL
SoftDeleteDevicesmethod to mark devices as deleted by updatingmetadata
interfaces.go
Service interface additions for device cleanuppkg/db/interfaces.go
GetStaleIPOnlyDevicesandSoftDeleteDevicesmethod signatures toService interface
main.go
Faker simulation warmup cycles for IP churncmd/faker/main.go
WarmupCyclesconfiguration parameter to IP shuffle simulationcycles
6 files
registry_test.go
Test updates for ServiceRadar UUID generationpkg/registry/registry_test.go
prefix) instead of legacy partition:IP format
WithDeviceIdentityResolveroption to registry initialization intests
processing
canon_simulation_test.go
Canonicalization simulation tests for real-world scenariospkg/registry/canon_simulation_test.go
followed by Armis sync)
identifiers arrive late
match
solve
device_identity_test.go
Device identity resolver unit testspkg/registry/device_identity_test.go
unknown
strong ID matches exist
identity_resolver_cnpg_test.go
CNPG identity resolver unit testspkg/registry/identity_resolver_cnpg_test.go
reaper_test.go
Stale device reaper unit testspkg/core/reaper_test.go
StaleDeviceReapercovering success cases, emptyresults, and error handling
soft-deleting devices
armis_test.go
Armis integration test updates for empty device IDspkg/sync/integrations/armis/armis_test.go
DeviceIDin Armis eventsinstead of partition:IP format
1 files
mock_db.go
Mock database regeneration with new methodspkg/db/mock_db.go
instead of source file)
MockQueryExecutorinterface mock (no longer needed)GetStaleIPOnlyDevicesandSoftDeleteDevicesmethods2 files
00000000000007_merge_duplicates.up.sql
Database migration to merge duplicate devicespkg/db/cnpg/migrations/00000000000007_merge_duplicates.up.sql
table
on strong identifiers
_merged_intometadata pointing tocanonical device
values.yaml
Helm values update with new service versionshelm/serviceradar/values.yaml
ServiceRadar UUID identity system
empty DeviceID changes
3 files
device_canon_findings.md
Device canonicalization architecture findings documentationdevice_canon_findings.md
inconsistencies
discovered via sweep vs Armis
kv_fix.md
KV Store Identity Map Fix Implementation and Deployment Historykv_fix.md
three completed phases (CNPG resolver, ServiceRadar UUID system,
legacy ID migration)
DeviceIdentityResolverwith in-memory cache,UUID generation, and strong/weak identifier handling
device count growth (50k target vs 93k actual)
investigation and potential fixes
device_canon_plan.md
Device Canonicalization Fix Plan and Root Cause Analysisdevice_canon_plan.md
duplicate devices (61.1k actual vs 50k expected)
format, race conditions between sweep and Armis sync, and cache
invalidation gaps
generation, sweep result processor fixes, and merge-only strong
identifiers
SQL, and timeline for fixes
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2019#issuecomment-3578227891
Original created: 2025-11-26T00:39:48Z
You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
Predictable identifier generation
Description: The deterministic "sr:" device ID is derived solely from partition and IP, allowing an
attacker who can spoof or obtain the same IP in a partition to predict or collide on
device IDs and potentially cause cross-device association or confusion.
device_identity.go [846-882]
Referred Code
SQL injection risk
Description: Direct SQL strings with string pattern matching (LIKE and JSON operators) are executed via
ExecuteQuery; while parameters are bound for arrays, ensure all inputs are parameterized
to avoid SQL injection risks from any untrusted string lists or metadata-driven filters.
device_identity.go [660-741]
Referred Code
🎫 #2018
ServiceRadar UUIDs via deterministic identity resolution prioritizing strong identifiers
(MAC, Armis ID, NetBox ID) and using IP as a weak fallback.
with canonical metadata and reduce write amplification.
including batch deduplication and metadata merging.
assigns stable ServiceRadar UUIDs.
flows.
devices in unified_devices.
configurable interval.
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status: Passed
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Limited Auditing: New identity resolution generates and assigns canonical IDs and performs merges/lookups
without explicit audit logging of these critical actions, making full auditability
uncertain from the diff alone.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Swallowed Errors: Several database lookup errors are downgraded to debug logs and resolution proceeds (e.g.,
generating new IDs) which may hide operational issues without clear propagation or
metrics.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Input Trust: The resolver trusts metadata fields like 'armis_device_id' and
'integration_id' from updates without visible validation or normalization beyond
trimming, which may require upstream validation guarantees.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2019#issuecomment-3578231386
Original created: 2025-11-26T00:41:11Z
You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.
PR Code Suggestions ✨
Explore these optional code suggestions:
UUID generation relies on ephemeral IPs
The deterministic UUID generation relies on the device's IP address, which is
ephemeral in DHCP environments, leading to device duplication when IPs change.
The suggestion is to base the UUID on a stable, strong identifier (like a MAC
address) when available, and only use the IP as a fallback.
Examples:
pkg/registry/device_identity.go [846-882]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a critical design flaw where basing UUIDs on ephemeral IPs can lead to device duplication during DHCP churn, which is a primary problem this PR aims to solve.
Fix race condition in cache
Fix a race condition in
getIPMappingby re-checking the entry's state under awrite lock before deleting it. This ensures the check-and-delete operation for
expired entries is atomic.
pkg/registry/identity_resolver_cnpg.go [70-90]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a classic check-then-act race condition when deleting an expired cache entry. The proposed fix using a double-checked locking pattern correctly resolves this concurrency issue, which is critical for the stability of a concurrent cache.
Prevent loss of resolved device IDs
Improve
deduplicateBatchto prioritize keeping the canonical ServiceRadar UUID(
sr:...) when merging duplicate updates for the same IP. This prevents losing aresolved
DeviceIDif it appears later in the batch.pkg/registry/registry.go [284-312]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out a flaw where the
deduplicateBatchfunction might discard a more accurate, resolvedDeviceIDin favor of an unresolved one. The proposed change to prioritize canonicalsr:UUIDs prevents this potential data loss and is a valuable improvement.Filter out NULL IP addresses
Add a filter to the
duplicatestemporary table creation to exclude records withNULLor emptyipaddresses, preventing them from being incorrectly grouped andmerged.
pkg/db/cnpg/migrations/00000000000007_merge_duplicates.up.sql [6-39]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: This is a valid improvement that makes the migration script more robust by explicitly excluding records with
NULLor emptyipaddresses, preventing them from being incorrectly grouped and merged.Ensure consistent IP filtering
Add a filter to the
canonical_maptemporary table creation to exclude recordswith
NULLor emptyipaddresses, ensuring consistency with theduplicatestablefor correct joins.
pkg/db/cnpg/migrations/00000000000007_merge_duplicates.up.sql [44-70]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: This suggestion correctly points out the need for consistent filtering between the
duplicatesandcanonical_maptables to ensure theJOINoperation works as expected, improving the migration's correctness.Improve cache eviction logic
Improve the
evictOldestfunction to evict the oldest (soonest to expire) entriesinstead of random ones after clearing expired items. This makes the cache
eviction policy more effective and aligns with the function's name.
pkg/registry/device_identity.go [815-838]
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies that the
evictOldestfunction performs random eviction after removing expired items, which is suboptimal and contradicts the function's name. The proposed change to evict the truly oldest items is a significant improvement in the cache eviction logic.