PRD: Multi-KV configuration via Admin UI #643

Closed
opened 2026-03-28 04:26:58 +00:00 by mfreeman451 · 1 comment
Owner

Imported from GitHub.

Original GitHub issue: #1938
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1938
Original created: 2025-11-13T03:35:41Z


Background

Today pollers, agents, and checker services implicitly discover their KV endpoint from process environment variables (for example KV_ADDRESS, KV_SEC_MODE, etc.). The Admin UI shows the config JSON stored in KV (e.g., config/pollers/<id>.json), but those blobs rarely contain a kv_address. That works for the current single-KV setup, yet it blocks operators who need to pin specific workloads to different KV backends (hub/leaf JetStream domains, segregated edge KV clusters, staging vs production transitions, etc.).

Problem

Operators cannot use the UI or API to change the KV endpoint per workload. Any change requires SSHing into the host (or editing Compose/k8s manifests) to update environment variables, then restarting the process. This breaks the "watcher-driven reload" story we just completed and prevents multi-KV topologies (hub + one or more leaves) where a poller should follow the nearest KV.

Goals

  1. Allow pollers, agents, checkers, and other services with managed configs to specify a KV endpoint (address + security mode/certs) via the Admin UI / Admin API.
  2. Support multiple KV definitions (e.g., hub and leaf domains) and reference them from individual workloads.
  3. Ensure updates propagate through existing KV watcher infrastructure so services can hot-reload KV credentials/address without restarts when possible.
  4. Preserve backwards compatibility with existing environment-variable driven deployments.

Non-goals

  • Replacing the underlying NATS JetStream federation mechanics.
  • Managing KV cluster lifecycle (provisioning, scaling) from the UI.
  • Changing edge onboarding flows (but they should benefit from the new metadata once available).

User Stories

  1. As an operator running hub/leaf NATS, I can assign docker-poller to the leaf KV while keeping core services on the hub KV, all within the Admin UI, without editing manifests.
  2. As a tester bringing up an isolated lab KV, I can temporarily point only the agent + poller pair to lab-kv:50057, verify changes via watchers, then revert.
  3. As a support engineer, I can audit which services use which KV endpoint from a single UI/API surface, reducing guesswork when troubleshooting.

Requirements

  • Add a first-class "KV Profiles" concept to the Admin UI (name, address, security mode, TLS/SPFFE metadata, optional description).
  • Extend service config schemas (pkg/poller.Config, pkg/agent.ServerConfig, checker configs, etc.) with a kv_profile or similar reference that maps to the profile, while keeping kv_address overrides for backwards compatibility.
  • Update Admin API endpoints (/api/admin/config/...) to accept/return the new fields and validate that referenced profiles exist.
  • Persist profiles either in KV (e.g., config/kv_profiles/<name>.json) or in the existing config store with watcher support, so edits replicate to all services.
  • Update the UI forms for pollers/agents/checkers to display the selected profile, allow switching, and optionally inline-edit the KV settings.
  • Runtime changes: when a service detects its KV profile changed, it should rebuild the KVManager (or reconnect) without requiring a process restart.
  • Logging/telemetry updates so watcher snapshots include the resolved KV profile + address for auditing.

Technical Considerations

  • Need a migration strategy: existing services with no kv_profile continue reading env vars. When a profile is selected, the rendered configuration must include the concrete fields so legacy binaries still work.
  • Decide ownership of sensitive data (cert/key paths, trust bundles). Likely we store references/paths, not raw PEMs, to avoid secrets in KV.
  • Ensure Bazel/Go modules include any new packages (e.g., shared KV profile types) and update tests (pkg/config, cmd/*) accordingly.
  • Evaluate how edge onboarding should write kv_profile metadata into generated configs.

Open Questions

  1. Where do we persist the KV profiles? Single global list in KV vs per-component namespace?
  2. Do we allow per-service overrides (e.g., select profile + custom cert path) or enforce uniform fields per profile?
  3. Should we auto-populate profiles based on the currently exported env vars during migration?
  4. How do we surface validation errors if a profile references a non-existent SPIFFE ID or certificate file?

Milestones

  1. Design sign-off – finalize data model, storage location, API changes.
  2. Backend plumbing – add profile CRUD endpoints, config schema updates, watcher behavior, tests.
  3. UI integration – KV profile management screens + updates to existing forms.
  4. Runtime validation – ensure services reconnect/refresh when profile changes, add e2e tests.
  5. Docs & rollout – update docs/docs/kv-configuration.md, agents.md, release notes.
Imported from GitHub. Original GitHub issue: #1938 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1938 Original created: 2025-11-13T03:35:41Z --- ## Background Today pollers, agents, and checker services implicitly discover their KV endpoint from process environment variables (for example `KV_ADDRESS`, `KV_SEC_MODE`, etc.). The Admin UI shows the config JSON stored in KV (e.g., `config/pollers/<id>.json`), but those blobs rarely contain a `kv_address`. That works for the current single-KV setup, yet it blocks operators who need to pin specific workloads to different KV backends (hub/leaf JetStream domains, segregated edge KV clusters, staging vs production transitions, etc.). ## Problem Operators cannot use the UI or API to change the KV endpoint per workload. Any change requires SSHing into the host (or editing Compose/k8s manifests) to update environment variables, then restarting the process. This breaks the "watcher-driven reload" story we just completed and prevents multi-KV topologies (hub + one or more leaves) where a poller should follow the nearest KV. ## Goals 1. Allow pollers, agents, checkers, and other services with managed configs to specify a KV endpoint (address + security mode/certs) via the Admin UI / Admin API. 2. Support multiple KV definitions (e.g., hub and leaf domains) and reference them from individual workloads. 3. Ensure updates propagate through existing KV watcher infrastructure so services can hot-reload KV credentials/address without restarts when possible. 4. Preserve backwards compatibility with existing environment-variable driven deployments. ## Non-goals - Replacing the underlying NATS JetStream federation mechanics. - Managing KV cluster lifecycle (provisioning, scaling) from the UI. - Changing edge onboarding flows (but they should benefit from the new metadata once available). ## User Stories 1. As an operator running hub/leaf NATS, I can assign `docker-poller` to the leaf KV while keeping core services on the hub KV, all within the Admin UI, without editing manifests. 2. As a tester bringing up an isolated lab KV, I can temporarily point only the agent + poller pair to `lab-kv:50057`, verify changes via watchers, then revert. 3. As a support engineer, I can audit which services use which KV endpoint from a single UI/API surface, reducing guesswork when troubleshooting. ## Requirements - Add a first-class "KV Profiles" concept to the Admin UI (name, address, security mode, TLS/SPFFE metadata, optional description). - Extend service config schemas (`pkg/poller.Config`, `pkg/agent.ServerConfig`, checker configs, etc.) with a `kv_profile` or similar reference that maps to the profile, while keeping `kv_address` overrides for backwards compatibility. - Update Admin API endpoints (`/api/admin/config/...`) to accept/return the new fields and validate that referenced profiles exist. - Persist profiles either in KV (e.g., `config/kv_profiles/<name>.json`) or in the existing config store with watcher support, so edits replicate to all services. - Update the UI forms for pollers/agents/checkers to display the selected profile, allow switching, and optionally inline-edit the KV settings. - Runtime changes: when a service detects its KV profile changed, it should rebuild the `KVManager` (or reconnect) without requiring a process restart. - Logging/telemetry updates so watcher snapshots include the resolved KV profile + address for auditing. ## Technical Considerations - Need a migration strategy: existing services with no `kv_profile` continue reading env vars. When a profile is selected, the rendered configuration must include the concrete fields so legacy binaries still work. - Decide ownership of sensitive data (cert/key paths, trust bundles). Likely we store references/paths, not raw PEMs, to avoid secrets in KV. - Ensure Bazel/Go modules include any new packages (e.g., shared KV profile types) and update tests (`pkg/config`, `cmd/*`) accordingly. - Evaluate how edge onboarding should write `kv_profile` metadata into generated configs. ## Open Questions 1. Where do we persist the KV profiles? Single global list in KV vs per-component namespace? 2. Do we allow per-service overrides (e.g., select profile + custom cert path) or enforce uniform fields per profile? 3. Should we auto-populate profiles based on the currently exported env vars during migration? 4. How do we surface validation errors if a profile references a non-existent SPIFFE ID or certificate file? ## Milestones 1. **Design sign-off** – finalize data model, storage location, API changes. 2. **Backend plumbing** – add profile CRUD endpoints, config schema updates, watcher behavior, tests. 3. **UI integration** – KV profile management screens + updates to existing forms. 4. **Runtime validation** – ensure services reconnect/refresh when profile changes, add e2e tests. 5. **Docs & rollout** – update `docs/docs/kv-configuration.md`, `agents.md`, release notes.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/1938#issuecomment-3713824023
Original created: 2026-01-06T09:11:16Z


closing as will not implement

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/1938#issuecomment-3713824023 Original created: 2026-01-06T09:11:16Z --- closing as will not implement
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#643
No description provided.