bug(core): not handling NATS disconnects gracefully #576

Closed
opened 2026-03-28 04:25:56 +00:00 by mfreeman451 · 0 comments
Owner

Imported from GitHub.

Original GitHub issue: #1790
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1790
Original created: 2025-10-16T15:11:06Z


Describe the bug
Core still not handling NATS disconnects/reconnects correctly after our latest changes:

{"level":"info","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","service_name":"network_sweep","hosts":49910,"updates
":49910,"time":"2025-10-16T15:07:39Z","message":"Sweep processed"}
{"level":"warn","error":"EOF","time":"2025-10-16T15:08:06Z","message":"NATS disconnected"}
2025/10/16 15:08:26 traces export: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2025/10/16 15:08:37 failed to upload metrics: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2025/10/16 15:08:52 traces export: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2025/10/16 15:09:07 failed to upload metrics: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2025/10/16 15:09:22 traces export: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded
{"level":"warn","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","error":"all 3 retries failed: rpc error: code = Inte
rnal desc = failed to get key device_canonical_map/device-id/default=3A10.172.76.199: kv bucket init failed for domain \"\": context deadline exceeded\ncontext can
celed\ncontext canceled\ncontext canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\nall 3 retries failed: rpc er
ror: code = Canceled desc = context canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\ncontext canceled\ncontext
 canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncont
ext canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncon
text canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncont
ext canceled\ncontext canceled\ncontext canceled\ncontext canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\ncon
text canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncont
ext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled","time
":"2025-10-16T15:09:51Z","message":"KV canonical hydration failed"}
2025/10/16 15:09:57 processor export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded
{"level":"info","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","update_count":22869,"total_batches":49,"time":"2025-
10-16T15:10:03Z","message":"Successfully published device updates to device_updates stream"}
{"level":"info","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","incoming_updates":49910,"valid_updates":49910,"publi
shed_updates":22869,"dropped_empty_ip":0,"canonicalized_by_armis_id":2,"canonicalized_by_netbox_id":0,"canonicalized_by_mac":0,"tombstones_emitted":2,"skipped_swee
p_no_identity":27043,"time":"2025-10-16T15:10:03Z","message":"Registry batch processed"}
2025/10/16 15:10:07 failed to upload metrics: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2025/10/16 15:10:09 processor export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Imported from GitHub. Original GitHub issue: #1790 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1790 Original created: 2025-10-16T15:11:06Z --- **Describe the bug** Core still not handling NATS disconnects/reconnects correctly after our latest changes: ``` {"level":"info","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","service_name":"network_sweep","hosts":49910,"updates ":49910,"time":"2025-10-16T15:07:39Z","message":"Sweep processed"} {"level":"warn","error":"EOF","time":"2025-10-16T15:08:06Z","message":"NATS disconnected"} 2025/10/16 15:08:26 traces export: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2025/10/16 15:08:37 failed to upload metrics: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2025/10/16 15:08:52 traces export: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2025/10/16 15:09:07 failed to upload metrics: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2025/10/16 15:09:22 traces export: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded {"level":"warn","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","error":"all 3 retries failed: rpc error: code = Inte rnal desc = failed to get key device_canonical_map/device-id/default=3A10.172.76.199: kv bucket init failed for domain \"\": context deadline exceeded\ncontext can celed\ncontext canceled\ncontext canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\nall 3 retries failed: rpc er ror: code = Canceled desc = context canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\ncontext canceled\ncontext canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncont ext canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncon text canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncont ext canceled\ncontext canceled\ncontext canceled\ncontext canceled\nall 3 retries failed: rpc error: code = Canceled desc = context canceled\ncontext canceled\ncon text canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncont ext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled\ncontext canceled","time ":"2025-10-16T15:09:51Z","message":"KV canonical hydration failed"} 2025/10/16 15:09:57 processor export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded {"level":"info","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","update_count":22869,"total_batches":49,"time":"2025- 10-16T15:10:03Z","message":"Successfully published device updates to device_updates stream"} {"level":"info","trace_id":"7eb6d39b823b06bdde16c82a6b04cea1","span_id":"2cf7f38bb855741c","component":"core","incoming_updates":49910,"valid_updates":49910,"publi shed_updates":22869,"dropped_empty_ip":0,"canonicalized_by_armis_id":2,"canonicalized_by_netbox_id":0,"canonicalized_by_mac":0,"tombstones_emitted":2,"skipped_swee p_no_identity":27043,"time":"2025-10-16T15:10:03Z","message":"Registry batch processed"} 2025/10/16 15:10:07 failed to upload metrics: exporter export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2025/10/16 15:10:09 processor export timeout: rpc error: code = DeadlineExceeded desc = context deadline exceeded ``` **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: [e.g. iOS] - Browser [e.g. chrome, safari] - Version [e.g. 22] **Smartphone (please complete the following information):** - Device: [e.g. iPhone6] - OS: [e.g. iOS8.1] - Browser [e.g. stock browser, safari] - Version [e.g. 22] **Additional context** Add any other context about the problem here.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#576
No description provided.