bug(zen): failing to load rule #816

Closed
opened 2026-03-28 04:28:50 +00:00 by mfreeman451 · 2 comments
Owner

Imported from GitHub.

Original GitHub issue: #2426
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2426
Original created: 2026-01-21T06:21:33Z


Describe the bug

[2026-01-21T06:21:04Z WARN  zen] processing failed: failed to load rule events/logs.syslog/strip_full_message: Loader failed internally on key events/logs.syslog/strip_full_message: missing field `nodes` at line 27 column 1.
[2026-01-21T06:21:04Z WARN  zen] processing failed: failed to load rule events/logs.syslog/strip_full_message: Loader failed internally on key events/logs.syslog/strip_full_message: missing field `nodes` at line 27 column 1

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Imported from GitHub. Original GitHub issue: #2426 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2426 Original created: 2026-01-21T06:21:33Z --- **Describe the bug** ``` [2026-01-21T06:21:04Z WARN zen] processing failed: failed to load rule events/logs.syslog/strip_full_message: Loader failed internally on key events/logs.syslog/strip_full_message: missing field `nodes` at line 27 column 1. [2026-01-21T06:21:04Z WARN zen] processing failed: failed to load rule events/logs.syslog/strip_full_message: Loader failed internally on key events/logs.syslog/strip_full_message: missing field `nodes` at line 27 column 1 ``` **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: [e.g. iOS] - Browser [e.g. chrome, safari] - Version [e.g. 22] **Smartphone (please complete the following information):** - Device: [e.g. iPhone6] - OS: [e.g. iOS8.1] - Browser [e.g. stock browser, safari] - Version [e.g. 22] **Additional context** Add any other context about the problem here.
mfreeman451 added this to the 1.1.0 milestone 2026-03-28 04:28:50 +00:00
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2426#issuecomment-3803199006
Original created: 2026-01-27T05:30:50Z


This is not happening in docker on a fresh docker compose install:

[mfreeman@localhost serviceradar]$ docker logs 7efd37707ba8
First startup detected - installing initial zen rules...
Installing initial zen rules...
Installing strip_full_message rule for logs.syslog...
Inserted rule strip_full_message for subject logs.syslog
✓ strip_full_message rule installed
Installing cef_severity rule for logs.syslog...
Inserted rule cef_severity for subject logs.syslog
✓ cef_severity rule installed
Installing snmp_severity rule for logs.snmp...
Inserted rule snmp_severity for subject logs.snmp
✓ snmp_severity rule installed
Installing passthrough rule for logs.otel...
Inserted rule passthrough for subject logs.otel
✓ passthrough rule installed
✅ Initial zen rules installation completed successfully
✅ Initial rules installation completed
Starting ServiceRadar Zen with config: /etc/serviceradar/zen.json
[2026-01-27T01:37:38Z INFO  async_nats::connector] connected successfully server=4222 max_payload=1048576
[2026-01-27T01:37:38Z INFO  zen::nats] connected to nats at tls://nats:4222
[2026-01-27T01:37:38Z INFO  async_nats] event: connected
[2026-01-27T01:37:38Z INFO  zen] using stream events
[2026-01-27T01:37:38Z INFO  zen] using consumer zen-consumer
[2026-01-27T01:37:38Z INFO  zen::engine] initialized decision engine with bucket serviceradar-datasvc
[2026-01-27T01:37:38Z INFO  zen] waiting for messages on subjects: ["logs.syslog", "logs.snmp", "logs.otel", "logs.internal"]
[2026-01-27T01:37:38Z INFO  zen::rule_watcher] watching rules under agents/docker-zen-consumer/events/
[mfreeman@localhost serviceradar]$
Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2426#issuecomment-3803199006 Original created: 2026-01-27T05:30:50Z --- This is not happening in docker on a fresh docker compose install: ``` [mfreeman@localhost serviceradar]$ docker logs 7efd37707ba8 First startup detected - installing initial zen rules... Installing initial zen rules... Installing strip_full_message rule for logs.syslog... Inserted rule strip_full_message for subject logs.syslog ✓ strip_full_message rule installed Installing cef_severity rule for logs.syslog... Inserted rule cef_severity for subject logs.syslog ✓ cef_severity rule installed Installing snmp_severity rule for logs.snmp... Inserted rule snmp_severity for subject logs.snmp ✓ snmp_severity rule installed Installing passthrough rule for logs.otel... Inserted rule passthrough for subject logs.otel ✓ passthrough rule installed ✅ Initial zen rules installation completed successfully ✅ Initial rules installation completed Starting ServiceRadar Zen with config: /etc/serviceradar/zen.json [2026-01-27T01:37:38Z INFO async_nats::connector] connected successfully server=4222 max_payload=1048576 [2026-01-27T01:37:38Z INFO zen::nats] connected to nats at tls://nats:4222 [2026-01-27T01:37:38Z INFO async_nats] event: connected [2026-01-27T01:37:38Z INFO zen] using stream events [2026-01-27T01:37:38Z INFO zen] using consumer zen-consumer [2026-01-27T01:37:38Z INFO zen::engine] initialized decision engine with bucket serviceradar-datasvc [2026-01-27T01:37:38Z INFO zen] waiting for messages on subjects: ["logs.syslog", "logs.snmp", "logs.otel", "logs.internal"] [2026-01-27T01:37:38Z INFO zen::rule_watcher] watching rules under agents/docker-zen-consumer/events/ [mfreeman@localhost serviceradar]$ ```
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2426#issuecomment-3803245716
Original created: 2026-01-27T05:48:32Z


Root Cause Analysis

The zen service fails in k8s with missing field 'nodes' because:

  1. Docker Compose uses entrypoint-zen.shzen-install-rules.shzen-put-rule to install rules on first startup
  2. Kubernetes/Helm was missing this initialization - rules were mounted as a ConfigMap but never installed to NATS KV
  3. The NATS KV bucket has stale/malformed data that doesn't match the expected DecisionContent structure

Fix

Created a Kubernetes Job (zen-rules-bootstrap-job.yaml) as a Helm hook that:

  • Runs as post-install/post-upgrade hook
  • Uses the zen image which includes zen-put-rule binary
  • Mounts the serviceradar-zen-rules ConfigMap
  • Installs rules using the same zen-put-rule binary as docker compose

Added helm values:

  • zenRulesBootstrap.enabled: true (default)
  • zenRulesBootstrap.forceReinstall: false (set to true to overwrite existing rules)

Files Changed

  • helm/serviceradar/templates/zen-rules-bootstrap-job.yaml (new)
  • helm/serviceradar/values.yaml (added zenRulesBootstrap config)

OpenSpec proposal: fix-zen-k8s-rule-bootstrap

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2426#issuecomment-3803245716 Original created: 2026-01-27T05:48:32Z --- ## Root Cause Analysis The zen service fails in k8s with `missing field 'nodes'` because: 1. **Docker Compose** uses `entrypoint-zen.sh` → `zen-install-rules.sh` → `zen-put-rule` to install rules on first startup 2. **Kubernetes/Helm** was missing this initialization - rules were mounted as a ConfigMap but never installed to NATS KV 3. The NATS KV bucket has stale/malformed data that doesn't match the expected `DecisionContent` structure ## Fix Created a Kubernetes Job (`zen-rules-bootstrap-job.yaml`) as a Helm hook that: - Runs as post-install/post-upgrade hook - Uses the zen image which includes `zen-put-rule` binary - Mounts the `serviceradar-zen-rules` ConfigMap - Installs rules using the same `zen-put-rule` binary as docker compose Added helm values: - `zenRulesBootstrap.enabled: true` (default) - `zenRulesBootstrap.forceReinstall: false` (set to true to overwrite existing rules) ## Files Changed - `helm/serviceradar/templates/zen-rules-bootstrap-job.yaml` (new) - `helm/serviceradar/values.yaml` (added zenRulesBootstrap config) OpenSpec proposal: `fix-zen-k8s-rule-bootstrap`
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#816
No description provided.