bug(core): identitymap #595

Closed
opened 2026-03-28 04:26:10 +00:00 by mfreeman451 · 3 comments
Owner

Imported from GitHub.

Original GitHub issue: #1846
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1846
Original created: 2025-10-22T06:15:29Z


Describe the bug

Seeing these messages in our OTEL logs again:

Ignoring corrupt canonical identity entry in KV

Seems like a regression of #1842, also possibly related to our restarts and agents or other services trying to register themselves, maybe their pod got a new IP address, etc. We are also noticing that on almost every restart of our k8s deployment in the demo namespace, we accumulate an extra device in the inventory. We should have 50,002 but now after several restarts we're at 50,011~

key: device_canonical_map/device-id/default
error: identitymap: corrupt canonical record: proto: cannot parse invalid wire-format data
span_id: aadaac51bf6c5139
device_id: default:10.42.111.102
trace_id: f4b1440bb138196180d408f8dd9d038a

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Image

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Imported from GitHub. Original GitHub issue: #1846 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1846 Original created: 2025-10-22T06:15:29Z --- **Describe the bug** Seeing these messages in our OTEL logs again: `Ignoring corrupt canonical identity entry in KV` Seems like a regression of #1842, also possibly related to our restarts and agents or other services trying to register themselves, maybe their pod got a new IP address, etc. We are also noticing that on almost every restart of our k8s deployment in the `demo` namespace, we accumulate an extra device in the inventory. We should have 50,002 but now after several restarts we're at 50,011~ ``` key: device_canonical_map/device-id/default error: identitymap: corrupt canonical record: proto: cannot parse invalid wire-format data span_id: aadaac51bf6c5139 device_id: default:10.42.111.102 trace_id: f4b1440bb138196180d408f8dd9d038a ``` **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** <img width="1169" height="601" alt="Image" src="https://github.com/user-attachments/assets/a002c120-11ce-42c8-b560-1e047501b300" /> **Desktop (please complete the following information):** - OS: [e.g. iOS] - Browser [e.g. chrome, safari] - Version [e.g. 22] **Smartphone (please complete the following information):** - Device: [e.g. iPhone6] - OS: [e.g. iOS8.1] - Browser [e.g. stock browser, safari] - Version [e.g. 22] **Additional context** Add any other context about the problem here.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1846#issuecomment-3430820694
Original created: 2025-10-22T07:25:48Z


Quick update: after today’s roll we dug into the canonical identity corruption reports. Core previously logged repeated warnings on , so we tailed that key via the tools pod and also wrote a small Go scanner that walked ~37k canonical entries in . Everything currently in the bucket unmarshals cleanly, and the watch output shows the poller agent rewriting the canonical payload with the expected proto. We haven’t reproduced the bad wire-format again post-restart, so the working theory is we caught a transient malformed publish that later got overwritten. Next step is to instrument/trace the poller → registry pipeline so we can capture the exact writer if the corruption resurfaces.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1846#issuecomment-3430820694 Original created: 2025-10-22T07:25:48Z --- Quick update: after today’s roll we dug into the canonical identity corruption reports. Core previously logged repeated warnings on , so we tailed that key via the tools pod and also wrote a small Go scanner that walked ~37k canonical entries in . Everything currently in the bucket unmarshals cleanly, and the watch output shows the poller agent rewriting the canonical payload with the expected proto. We haven’t reproduced the bad wire-format again post-restart, so the working theory is we caught a transient malformed publish that later got overwritten. Next step is to instrument/trace the poller → registry pipeline so we can capture the exact writer if the corruption resurfaces.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1846#issuecomment-3430822006
Original created: 2025-10-22T07:26:17Z


Follow-up detail: the key we were chasing was device_canonical_map/device-id/default=3A10.42.111.102 and the bucket was serviceradar-kv; both decoded fine during the scan.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1846#issuecomment-3430822006 Original created: 2025-10-22T07:26:17Z --- Follow-up detail: the key we were chasing was `device_canonical_map/device-id/default=3A10.42.111.102` and the bucket was `serviceradar-kv`; both decoded fine during the scan.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1846#issuecomment-3432926966
Original created: 2025-10-22T15:12:04Z


closing, cant repro

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1846#issuecomment-3432926966 Original created: 2025-10-22T15:12:04Z --- closing, cant repro
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#595
No description provided.