bug: failed to insert ServiceRadar.Observability.CpuMetric #1049

Closed
opened 2026-03-28 04:31:10 +00:00 by mfreeman451 · 1 comment
Owner

Imported from GitHub.

Original GitHub issue: #2889
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2889
Original created: 2026-02-22T04:07:24Z


Describe the bug

core-elx 04:06:05.499 [info] ResultsRouter received: service_type=sysmon source=sysmon-metrics service=sysmon                                                                                  core-elx 04:06:05.505 [warning] SysmonMetricsIngestor: failed to insert ServiceRadar.Observability.CpuMetric: [%Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk create: ServiceRadarObservability.CpuMetric.create"[],  errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 1779 links to 1780 instead of expected 1 in index \"_hyper_7_87_chunk_cpu_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raised in bulk create: ServiceRdar.Observability.CpuMetric.create"[], vars: [], path: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}]                                                                               core-elx 04:06:05.545 [warning] SysmonMetricsIngestor: failed to insert ServiceRadar.Observability.ProcessMetric: [%Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk create: ServiceRdar.Observability.ProcessMetric.create"[],  errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 9731 links to 9732 instead of expected 1 in index \"_hyper_10_90_chunk_process_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raised in bulk crete: ServiceRadar.Observability.ProcessMetric.create"[], vars: [], path: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}, %Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk ceate: ServiceRadar.Observability.ProcessMetric.create"[],  errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 9731 links to 9732 instead of expected 1 in index \"_hyper_10_90_chunk_process_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raied in bulk create: ServiceRadar.Observability.ProcessMetric.create"[], vars: [], path: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}]                                               core-elx 04:06:05.545 [warning] Results processing failed: [%Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk create: ServiceRadar.Observability.CpuMetric.create"],  errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 1779 links to 1780 instead of expected 1 in index \"_hyper_7
_87_chunk_cpu_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raised in bulk create: ServiceRadar.Observability.CpuMetric.create"], vars: [], pa
th: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}]

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Imported from GitHub. Original GitHub issue: #2889 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2889 Original created: 2026-02-22T04:07:24Z --- **Describe the bug** ``` core-elx 04:06:05.499 [info] ResultsRouter received: service_type=sysmon source=sysmon-metrics service=sysmon core-elx 04:06:05.505 [warning] SysmonMetricsIngestor: failed to insert ServiceRadar.Observability.CpuMetric: [%Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk create: ServiceRadarObservability.CpuMetric.create"[], errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 1779 links to 1780 instead of expected 1 in index \"_hyper_7_87_chunk_cpu_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raised in bulk create: ServiceRdar.Observability.CpuMetric.create"[], vars: [], path: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}] core-elx 04:06:05.545 [warning] SysmonMetricsIngestor: failed to insert ServiceRadar.Observability.ProcessMetric: [%Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk create: ServiceRdar.Observability.ProcessMetric.create"[], errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 9731 links to 9732 instead of expected 1 in index \"_hyper_10_90_chunk_process_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raised in bulk crete: ServiceRadar.Observability.ProcessMetric.create"[], vars: [], path: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}, %Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk ceate: ServiceRadar.Observability.ProcessMetric.create"[], errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 9731 links to 9732 instead of expected 1 in index \"_hyper_10_90_chunk_process_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raied in bulk create: ServiceRadar.Observability.ProcessMetric.create"[], vars: [], path: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}] core-elx 04:06:05.545 [warning] Results processing failed: [%Ash.Error.Unknown{bread_crumbs: ["Exception raised in bulk create: ServiceRadar.Observability.CpuMetric.create"], errors: [%Ash.Error.Unknown.UnknownError{error: "** (Postgrex.Error) ERROR XX002 (index_corrupted) right sibling's left-link doesn't match: block 1779 links to 1780 instead of expected 1 in index \"_hyper_7 _87_chunk_cpu_metrics_timestamp_idx\"", field: nil, value: nil, splode: Ash.Error, bread_crumbs: ["Exception raised in bulk create: ServiceRadar.Observability.CpuMetric.create"], vars: [], pa th: [], stacktrace: #Splode.Stacktrace<>, class: :unknown}]}] ``` **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: [e.g. iOS] - Browser [e.g. chrome, safari] - Version [e.g. 22] **Smartphone (please complete the following information):** - Device: [e.g. iPhone6] - OS: [e.g. iOS8.1] - Browser [e.g. stock browser, safari] - Version [e.g. 22] **Additional context** Add any other context about the problem here.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2889#issuecomment-3940106406
Original created: 2026-02-22T04:14:54Z


Implemented a targeted self-healing fix in elixir/serviceradar_core for sysmon ingest failures caused by Timescale index corruption (ERROR XX002 index_corrupted).

What changed:

  • Updated ServiceRadar.Observability.SysmonMetricsIngestor to:
    • detect the XX002 (index_corrupted) error signature from nested Ash/Postgrex errors,
    • extract the reported index name,
    • run REINDEX INDEX <qualified_index>,
    • retry the failed bulk insert once.
  • Added focused tests for corrupted-index name extraction logic.

Files:

  • elixir/serviceradar_core/lib/serviceradar/observability/sysmon_metrics_ingestor.ex
  • elixir/serviceradar_core/test/serviceradar/observability/sysmon_metrics_ingestor_test.exs

Validation:

  • mix test test/serviceradar/observability/sysmon_metrics_ingestor_test.exs -> 3 tests, 0 failures

Note: REINDEX INDEX is corrective and can briefly lock the affected index while rebuilding.

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2889#issuecomment-3940106406 Original created: 2026-02-22T04:14:54Z --- Implemented a targeted self-healing fix in `elixir/serviceradar_core` for sysmon ingest failures caused by Timescale index corruption (`ERROR XX002 index_corrupted`). What changed: - Updated `ServiceRadar.Observability.SysmonMetricsIngestor` to: - detect the `XX002 (index_corrupted)` error signature from nested Ash/Postgrex errors, - extract the reported index name, - run `REINDEX INDEX <qualified_index>`, - retry the failed bulk insert once. - Added focused tests for corrupted-index name extraction logic. Files: - `elixir/serviceradar_core/lib/serviceradar/observability/sysmon_metrics_ingestor.ex` - `elixir/serviceradar_core/test/serviceradar/observability/sysmon_metrics_ingestor_test.exs` Validation: - `mix test test/serviceradar/observability/sysmon_metrics_ingestor_test.exs` -> `3 tests, 0 failures` Note: `REINDEX INDEX` is corrective and can briefly lock the affected index while rebuilding.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#1049
No description provided.