adding proposal #2486

Merged
mfreeman451 merged 2 commits from refs/pull/2486/head into main 2025-11-28 16:14:16 +00:00
mfreeman451 commented 2025-11-28 04:49:19 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2029
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2029
Original created: 2025-11-28T04:49:19Z
Original updated: 2025-12-08T06:54:19Z
Original head: carverauto/serviceradar:updates/monitor_bridge
Original base: main
Original merged: 2025-11-28T16:14:16Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Enhancement, Documentation


Description

  • Add Prometheus monitoring bridge for ServiceRadar metrics exposure

  • Enable dual telemetry support for OTEL and Prometheus/remote-write targets

  • Ship Grafana dashboards and kube-prom-stack integration artifacts

  • Define requirements and implementation tasks for monitoring namespace integration


Diagram Walkthrough

flowchart LR
  SR["ServiceRadar<br/>Components"]
  PROM["Prometheus<br/>Endpoints"]
  OTEL["OTEL<br/>Collector"]
  EXT["External<br/>Prometheus"]
  DASH["Grafana<br/>Dashboards"]
  
  SR -- "Expose /metrics" --> PROM
  PROM -- "ServiceMonitor<br/>Discovery" --> OTEL
  OTEL -- "Dual Export" --> EXT
  PROM -- "Query" --> DASH

File Walkthrough

Relevant files
Documentation
proposal.md
Prometheus monitoring bridge proposal document                     

openspec/changes/add-prometheus-monitoring-bridge/proposal.md

  • Introduces proposal for Prometheus monitoring bridge integration
  • Outlines motivation for operators running kube-prometheus-stack
  • Specifies metrics exposure across core, poller, sync, and OTEL
    components
  • Details dual-telemetry support and Grafana dashboard delivery
+14/-0   
spec.md
Observability integration specification requirements         

openspec/changes/add-prometheus-monitoring-bridge/specs/observability-integration/spec.md

  • Defines requirements for Prometheus scrape surfaces across
    ServiceRadar components
  • Specifies dual telemetry destination support without breaking defaults
  • Documents kube-prom-stack integration with ServiceMonitor/PodMonitor
    manifests
  • Includes Grafana dashboard and PrometheusRule alerting requirements
+35/-0   
tasks.md
Implementation tasks for monitoring bridge feature             

openspec/changes/add-prometheus-monitoring-bridge/tasks.md

  • Outlines discovery phase for existing metrics and kube-prom-stack
    expectations
  • Details Prometheus surfacing tasks including endpoint exposure and
    ServiceMonitor setup
  • Specifies dual telemetry configuration extension requirements
  • Includes dashboard, alerting, and validation tasks for implementation
+20/-0   

Imported from GitHub pull request. Original GitHub pull request: #2029 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2029 Original created: 2025-11-28T04:49:19Z Original updated: 2025-12-08T06:54:19Z Original head: carverauto/serviceradar:updates/monitor_bridge Original base: main Original merged: 2025-11-28T16:14:16Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Enhancement, Documentation ___ ### **Description** - Add Prometheus monitoring bridge for ServiceRadar metrics exposure - Enable dual telemetry support for OTEL and Prometheus/remote-write targets - Ship Grafana dashboards and kube-prom-stack integration artifacts - Define requirements and implementation tasks for monitoring namespace integration ___ ### Diagram Walkthrough ```mermaid flowchart LR SR["ServiceRadar<br/>Components"] PROM["Prometheus<br/>Endpoints"] OTEL["OTEL<br/>Collector"] EXT["External<br/>Prometheus"] DASH["Grafana<br/>Dashboards"] SR -- "Expose /metrics" --> PROM PROM -- "ServiceMonitor<br/>Discovery" --> OTEL OTEL -- "Dual Export" --> EXT PROM -- "Query" --> DASH ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Documentation</strong></td><td><table> <tr> <td> <details> <summary><strong>proposal.md</strong><dd><code>Prometheus monitoring bridge proposal document</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> openspec/changes/add-prometheus-monitoring-bridge/proposal.md <ul><li>Introduces proposal for Prometheus monitoring bridge integration<br> <li> Outlines motivation for operators running kube-prometheus-stack<br> <li> Specifies metrics exposure across core, poller, sync, and OTEL <br>components<br> <li> Details dual-telemetry support and Grafana dashboard delivery</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2029/files#diff-81a990985d497493238589e703ea366222940cc604a571ce21bfeca5bfc442e5">+14/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>spec.md</strong><dd><code>Observability integration specification requirements</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> openspec/changes/add-prometheus-monitoring-bridge/specs/observability-integration/spec.md <ul><li>Defines requirements for Prometheus scrape surfaces across <br>ServiceRadar components<br> <li> Specifies dual telemetry destination support without breaking defaults<br> <li> Documents kube-prom-stack integration with ServiceMonitor/PodMonitor <br>manifests<br> <li> Includes Grafana dashboard and PrometheusRule alerting requirements</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2029/files#diff-21ba4baae6c7dbf1e091af2f2cfde05086826dcd839802d6e26eaa407eab584c">+35/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>tasks.md</strong><dd><code>Implementation tasks for monitoring bridge feature</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> openspec/changes/add-prometheus-monitoring-bridge/tasks.md <ul><li>Outlines discovery phase for existing metrics and kube-prom-stack <br>expectations<br> <li> Details Prometheus surfacing tasks including endpoint exposure and <br>ServiceMonitor setup<br> <li> Specifies dual telemetry configuration extension requirements<br> <li> Includes dashboard, alerting, and validation tasks for implementation</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2029/files#diff-c00044e9668389895df2d518bf418b693b3a12ad3488e2926abbdd71b98644ba">+20/-0</a>&nbsp; &nbsp; </td> </tr> </table></td></tr></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-11-28 04:49:42 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2029#issuecomment-3587913416
Original created: 2025-11-28T04:49:42Z

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit scope: The added specs and proposal focus solely on metrics exposure and do not define audit
logging for critical actions, leaving uncertainty whether required audit trails are
implemented elsewhere.

Referred Code
## ADDED Requirements
### Requirement: Prometheus Scrape Surfaces for Core Services
ServiceRadar SHALL expose Prometheus-compatible metrics endpoints for core, registry (identity), poller, sync, and OTEL collector components, addressable from the `monitoring` namespace without bypassing existing auth/TLS controls.

#### Scenario: Monitoring namespace scrapes registry metrics
- **WHEN** kube-prom-stack in namespace `monitoring` discovers ServiceMonitor/PodMonitor objects for `serviceradar-registry`
- **THEN** it can scrape a stable `/metrics` endpoint that includes identity_* metric families without disabling SPIFFE/TLS if enabled

### Requirement: Dual Telemetry Destinations
ServiceRadar telemetry (logs/metrics/traces) SHALL support simultaneous delivery to the built-in OTEL collector and to external Prometheus/remote-write targets via configuration without changing default edge-friendly behavior.

#### Scenario: Enable external Prometheus alongside internal OTEL
- **WHEN** an operator sets configuration to add a Prometheus/remote-write exporter while keeping the default OTLP exporter enabled
- **THEN** metrics continue flowing to the internal OTEL collector AND are exported to the external Prometheus target without duplicate instrumentation or process restarts beyond config reload

### Requirement: kube-prom-stack Integration Artifacts
The repository SHALL ship ServiceMonitor/PodMonitor manifests and Helm values that place monitoring resources in the `monitoring` namespace with label selectors compatible with kube-prom-stack discovery.

#### Scenario: Helm install with monitoring enabled
- **WHEN** Helm values set `monitoring.enabled=true`
- **THEN** the rendered manifests include ServiceMonitor/PodMonitor objects in namespace `monitoring` targeting ServiceRadar components’ metrics ports with appropriate scrape intervals and TLS/auth settings


 ... (clipped 14 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Error handling gaps: The changes are specification and task documents that do not describe runtime error
handling for failures (e.g., scrape/TLS failures, exporter errors), so compliance cannot
be verified from the diff.

Referred Code
### Requirement: Dual Telemetry Destinations
ServiceRadar telemetry (logs/metrics/traces) SHALL support simultaneous delivery to the built-in OTEL collector and to external Prometheus/remote-write targets via configuration without changing default edge-friendly behavior.

#### Scenario: Enable external Prometheus alongside internal OTEL
- **WHEN** an operator sets configuration to add a Prometheus/remote-write exporter while keeping the default OTLP exporter enabled
- **THEN** metrics continue flowing to the internal OTEL collector AND are exported to the external Prometheus target without duplicate instrumentation or process restarts beyond config reload

### Requirement: kube-prom-stack Integration Artifacts
The repository SHALL ship ServiceMonitor/PodMonitor manifests and Helm values that place monitoring resources in the `monitoring` namespace with label selectors compatible with kube-prom-stack discovery.

#### Scenario: Helm install with monitoring enabled
- **WHEN** Helm values set `monitoring.enabled=true`
- **THEN** the rendered manifests include ServiceMonitor/PodMonitor objects in namespace `monitoring` targeting ServiceRadar components’ metrics ports with appropriate scrape intervals and TLS/auth settings

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
No user errors: The proposal introduces monitoring features but does not show user-facing error messages
or logging policies, so we cannot confirm secure error handling from documentation-only
changes.

Referred Code
## What Changes
- Expose ServiceRadar metrics to Prometheus (ServiceMonitors/PodMonitors and scrapeable `/metrics` endpoints) across core, poller, sync, otel-collector, and registry identity metrics.
- Add dual-telemetry support so OTEL exporters can fan out to both our internal collector and external Prometheus/remote-write targets without losing current behavior.
- Ship Grafana dashboards (identity reconciliation, ingestion/poller throughput, OTEL collector health) consumable by kube-prom-stack.
- Provide Helm/kustomize wiring so monitoring artifacts live in the `monitoring` namespace with labels compatible with kube-prom-stack discovery.

## Impact
- Affected specs: `observability-integration`
- Affected code: `pkg/registry/identity_metrics.go`, `pkg/logger/otel.go`, OTEL collector config (`k8s/demo/base/serviceradar-otel.yaml`, Helm charts), Service/ServiceMonitor manifests, Grafana dashboards assets.

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Logging unspecified: The specs cover metrics and telemetry routing but do not specify controls to prevent
sensitive data exposure in logs or metric labels, requiring verification in
implementation.

Referred Code
### Requirement: Grafana Dashboards for ServiceRadar Metrics
Grafana dashboards SHALL be provided that visualize identity reconciliation metrics, poller/sync throughput, OTEL collector health, and key availability indicators, and they SHALL be importable by kube-prom-stack.

#### Scenario: Dashboard import
- **WHEN** an operator imports the shipped JSON dashboards into Grafana
- **THEN** panels render identity_* counters/gauges, scrape success rates, and collector/exporter status using the Prometheus data source without manual query edits

### Requirement: Alerting Rules in Monitoring Namespace
PrometheusRule resources for ServiceRadar SHALL be installable in the `monitoring` namespace, covering identity reconciliation health and scrape/latency regressions, with labels matching kube-prom-stack alertmanager routing.

#### Scenario: Identity reconciliation alert fires
- **WHEN** `identity_promotion_run_age_ms` exceeds the configured threshold or `identity_promotions_blocked_policy_last_batch` stays >0 for the alert window
- **THEN** the PrometheusRule in `monitoring` raises an alert with labels suitable for Alertmanager routing (severity, service)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Validation not shown: While TLS/mTLS alignment is mentioned, the docs do not demonstrate concrete input
validation, authorization checks, or protections against sensitive data exposure in
telemetry, which must be assessed in code/config changes not present here.

Referred Code
- [ ] 2.3 Ensure TLS/mTLS story (SPIFFE or http) and authentication alignment with monitoring stack.

## 3. Dual Telemetry Outputs
- [ ] 3.1 Extend OTEL logger/metric config to support multiple exporters (existing OTLP + Prometheus/remote write) without breaking defaults.
- [ ] 3.2 Provide config samples/values for enabling external Prometheus while keeping edge-friendly OTEL path.

## 4. Dashboards & Alerts
- [ ] 4.1 Add Grafana dashboards for identity reconciliation, ingest/poller health, and OTEL collector status, packaged for kube-prom-stack import.
- [ ] 4.2 Wire alerting rules (e.g., identity rules) into monitoring namespace with labels matching kube-prom-stack.

## 5. Validation
- [ ] 5.1 Validate scrape success in demo cluster (`monitoring` namespace) and verify metrics families (identity_* etc.).

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2029#issuecomment-3587913416 Original created: 2025-11-28T04:49:42Z --- _You are nearing your monthly Qodo Merge usage quota. For more information, please visit [here](https://qodo-merge-docs.qodo.ai/installation/qodo_merge/#cloud-users)._ ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/ec53ceadf1b3e864658995619ad911525faf0cf9 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td>🟢</td><td><details><summary><strong>No security concerns identified</strong></summary> No security vulnerabilities detected by AI analysis. Human verification advised for critical code. </details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=1>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=5>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2029/files#diff-21ba4baae6c7dbf1e091af2f2cfde05086826dcd839802d6e26eaa407eab584cR1-R35'><strong>No audit scope</strong></a>: The added specs and proposal focus solely on metrics exposure and do not define audit <br>logging for critical actions, leaving uncertainty whether required audit trails are <br>implemented elsewhere.<br> <details open><summary>Referred Code</summary> ```markdown ## ADDED Requirements ### Requirement: Prometheus Scrape Surfaces for Core Services ServiceRadar SHALL expose Prometheus-compatible metrics endpoints for core, registry (identity), poller, sync, and OTEL collector components, addressable from the `monitoring` namespace without bypassing existing auth/TLS controls. #### Scenario: Monitoring namespace scrapes registry metrics - **WHEN** kube-prom-stack in namespace `monitoring` discovers ServiceMonitor/PodMonitor objects for `serviceradar-registry` - **THEN** it can scrape a stable `/metrics` endpoint that includes identity_* metric families without disabling SPIFFE/TLS if enabled ### Requirement: Dual Telemetry Destinations ServiceRadar telemetry (logs/metrics/traces) SHALL support simultaneous delivery to the built-in OTEL collector and to external Prometheus/remote-write targets via configuration without changing default edge-friendly behavior. #### Scenario: Enable external Prometheus alongside internal OTEL - **WHEN** an operator sets configuration to add a Prometheus/remote-write exporter while keeping the default OTLP exporter enabled - **THEN** metrics continue flowing to the internal OTEL collector AND are exported to the external Prometheus target without duplicate instrumentation or process restarts beyond config reload ### Requirement: kube-prom-stack Integration Artifacts The repository SHALL ship ServiceMonitor/PodMonitor manifests and Helm values that place monitoring resources in the `monitoring` namespace with label selectors compatible with kube-prom-stack discovery. #### Scenario: Helm install with monitoring enabled - **WHEN** Helm values set `monitoring.enabled=true` - **THEN** the rendered manifests include ServiceMonitor/PodMonitor objects in namespace `monitoring` targeting ServiceRadar components’ metrics ports with appropriate scrape intervals and TLS/auth settings ... (clipped 14 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2029/files#diff-21ba4baae6c7dbf1e091af2f2cfde05086826dcd839802d6e26eaa407eab584cR9-R22'><strong>Error handling gaps</strong></a>: The changes are specification and task documents that do not describe runtime error <br>handling for failures (e.g., scrape/TLS failures, exporter errors), so compliance cannot <br>be verified from the diff.<br> <details open><summary>Referred Code</summary> ```markdown ### Requirement: Dual Telemetry Destinations ServiceRadar telemetry (logs/metrics/traces) SHALL support simultaneous delivery to the built-in OTEL collector and to external Prometheus/remote-write targets via configuration without changing default edge-friendly behavior. #### Scenario: Enable external Prometheus alongside internal OTEL - **WHEN** an operator sets configuration to add a Prometheus/remote-write exporter while keeping the default OTLP exporter enabled - **THEN** metrics continue flowing to the internal OTEL collector AND are exported to the external Prometheus target without duplicate instrumentation or process restarts beyond config reload ### Requirement: kube-prom-stack Integration Artifacts The repository SHALL ship ServiceMonitor/PodMonitor manifests and Helm values that place monitoring resources in the `monitoring` namespace with label selectors compatible with kube-prom-stack discovery. #### Scenario: Helm install with monitoring enabled - **WHEN** Helm values set `monitoring.enabled=true` - **THEN** the rendered manifests include ServiceMonitor/PodMonitor objects in namespace `monitoring` targeting ServiceRadar components’ metrics ports with appropriate scrape intervals and TLS/auth settings ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2029/files#diff-81a990985d497493238589e703ea366222940cc604a571ce21bfeca5bfc442e5R6-R14'><strong>No user errors</strong></a>: The proposal introduces monitoring features but does not show user-facing error messages <br>or logging policies, so we cannot confirm secure error handling from documentation-only <br>changes.<br> <details open><summary>Referred Code</summary> ```markdown ## What Changes - Expose ServiceRadar metrics to Prometheus (ServiceMonitors/PodMonitors and scrapeable `/metrics` endpoints) across core, poller, sync, otel-collector, and registry identity metrics. - Add dual-telemetry support so OTEL exporters can fan out to both our internal collector and external Prometheus/remote-write targets without losing current behavior. - Ship Grafana dashboards (identity reconciliation, ingestion/poller throughput, OTEL collector health) consumable by kube-prom-stack. - Provide Helm/kustomize wiring so monitoring artifacts live in the `monitoring` namespace with labels compatible with kube-prom-stack discovery. ## Impact - Affected specs: `observability-integration` - Affected code: `pkg/registry/identity_metrics.go`, `pkg/logger/otel.go`, OTEL collector config (`k8s/demo/base/serviceradar-otel.yaml`, Helm charts), Service/ServiceMonitor manifests, Grafana dashboards assets. ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2029/files#diff-21ba4baae6c7dbf1e091af2f2cfde05086826dcd839802d6e26eaa407eab584cR23-R35'><strong>Logging unspecified</strong></a>: The specs cover metrics and telemetry routing but do not specify controls to prevent <br>sensitive data exposure in logs or metric labels, requiring verification in <br>implementation.<br> <details open><summary>Referred Code</summary> ```markdown ### Requirement: Grafana Dashboards for ServiceRadar Metrics Grafana dashboards SHALL be provided that visualize identity reconciliation metrics, poller/sync throughput, OTEL collector health, and key availability indicators, and they SHALL be importable by kube-prom-stack. #### Scenario: Dashboard import - **WHEN** an operator imports the shipped JSON dashboards into Grafana - **THEN** panels render identity_* counters/gauges, scrape success rates, and collector/exporter status using the Prometheus data source without manual query edits ### Requirement: Alerting Rules in Monitoring Namespace PrometheusRule resources for ServiceRadar SHALL be installable in the `monitoring` namespace, covering identity reconciliation health and scrape/latency regressions, with labels matching kube-prom-stack alertmanager routing. #### Scenario: Identity reconciliation alert fires - **WHEN** `identity_promotion_run_age_ms` exceeds the configured threshold or `identity_promotions_blocked_policy_last_batch` stays >0 for the alert window - **THEN** the PrometheusRule in `monitoring` raises an alert with labels suitable for Alertmanager routing (severity, service) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2029/files#diff-c00044e9668389895df2d518bf418b693b3a12ad3488e2926abbdd71b98644baR8-R19'><strong>Validation not shown</strong></a>: While TLS/mTLS alignment is mentioned, the docs do not demonstrate concrete input <br>validation, authorization checks, or protections against sensitive data exposure in <br>telemetry, which must be assessed in code/config changes not present here.<br> <details open><summary>Referred Code</summary> ```markdown - [ ] 2.3 Ensure TLS/mTLS story (SPIFFE or http) and authentication alignment with monitoring stack. ## 3. Dual Telemetry Outputs - [ ] 3.1 Extend OTEL logger/metric config to support multiple exporters (existing OTLP + Prometheus/remote write) without breaking defaults. - [ ] 3.2 Provide config samples/values for enabling external Prometheus while keeping edge-friendly OTEL path. ## 4. Dashboards & Alerts - [ ] 4.1 Add Grafana dashboards for identity reconciliation, ingest/poller health, and OTEL collector status, packaged for kube-prom-stack import. - [ ] 4.2 Wire alerting rules (e.g., identity rules) into monitoring namespace with labels matching kube-prom-stack. ## 5. Validation - [ ] 5.1 Validate scrape success in demo cluster (`monitoring` namespace) and verify metrics families (identity_* etc.). ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2025-11-28 04:50:39 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2029#issuecomment-3587914622
Original created: 2025-11-28T04:50:39Z

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Clarify the proposed telemetry architecture

Revise the proposal to resolve the conflict between the diagram and text
regarding the telemetry architecture. The diagram implies an OTEL Collector is
scraped, while the text suggests direct scraping of service /metrics endpoints.

Examples:

openspec/changes/add-prometheus-monitoring-bridge/proposal.md [7-8]
- Expose ServiceRadar metrics to Prometheus (ServiceMonitors/PodMonitors and scrapeable `/metrics` endpoints) across core, poller, sync, otel-collector, and registry identity metrics.
- Add dual-telemetry support so OTEL exporters can fan out to both our internal collector and external Prometheus/remote-write targets without losing current behavior.

Solution Walkthrough:

Before:

# PR Description Diagram
flowchart LR
  SR["ServiceRadar<br/>Components"]
  PROM["Prometheus<br/>Endpoints"]
  OTEL["OTEL<br/>Collector"]
  EXT["External<br/>Prometheus"]
  
  SR -- "Expose /metrics" --> PROM
  PROM -- "ServiceMonitor<br/>Discovery" --> OTEL
  OTEL -- "Dual Export" --> EXT

# proposal.md text
- Expose ServiceRadar metrics to Prometheus (ServiceMonitors/PodMonitors and scrapeable /metrics endpoints) across core, poller, sync...
- Add dual-telemetry support so OTEL exporters can fan out to both our internal collector and external Prometheus/remote-write targets...

After:

# PR Description Diagram (Option 1: Direct Scrape)
flowchart LR
  SR["ServiceRadar<br/>Components"]
  KPS["kube-prom-stack<br/>(Prometheus)"]
  OTEL["Internal OTEL<br/>Collector"]
  
  SR -- "OTLP" --> OTEL
  SR -- "Expose /metrics" --> KPS
  KPS -- "Scrapes via<br/>ServiceMonitor" --> SR

# proposal.md text
- ServiceRadar components will be instrumented to simultaneously send metrics via OTLP to the internal collector AND expose a `/metrics` endpoint.
- `kube-prometheus-stack` will discover and scrape these `/metrics` endpoints directly using `ServiceMonitor` resources.

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical ambiguity in the proposal's architecture, where the diagram and text describe conflicting data flows, which could lead to incorrect implementation.

Medium
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2029#issuecomment-3587914622 Original created: 2025-11-28T04:50:39Z --- _You are nearing your monthly Qodo Merge usage quota. For more information, please visit [here](https://qodo-merge-docs.qodo.ai/installation/qodo_merge/#cloud-users)._ ## PR Code Suggestions ✨ <!-- ec53cea --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>High-level</td> <td> <details><summary>Clarify the proposed telemetry architecture</summary> ___ **Revise the proposal to resolve the conflict between the diagram and text <br>regarding the telemetry architecture. The diagram implies an OTEL Collector is <br>scraped, while the text suggests direct scraping of service <code>/metrics</code> endpoints.** ### Examples: <details> <summary> <a href="https://github.com/carverauto/serviceradar/pull/2029/files#diff-81a990985d497493238589e703ea366222940cc604a571ce21bfeca5bfc442e5R7-R8">openspec/changes/add-prometheus-monitoring-bridge/proposal.md [7-8]</a> </summary> ```markdown - Expose ServiceRadar metrics to Prometheus (ServiceMonitors/PodMonitors and scrapeable `/metrics` endpoints) across core, poller, sync, otel-collector, and registry identity metrics. - Add dual-telemetry support so OTEL exporters can fan out to both our internal collector and external Prometheus/remote-write targets without losing current behavior. ``` </details> ### Solution Walkthrough: #### Before: ```markdown # PR Description Diagram flowchart LR SR["ServiceRadar<br/>Components"] PROM["Prometheus<br/>Endpoints"] OTEL["OTEL<br/>Collector"] EXT["External<br/>Prometheus"] SR -- "Expose /metrics" --> PROM PROM -- "ServiceMonitor<br/>Discovery" --> OTEL OTEL -- "Dual Export" --> EXT # proposal.md text - Expose ServiceRadar metrics to Prometheus (ServiceMonitors/PodMonitors and scrapeable /metrics endpoints) across core, poller, sync... - Add dual-telemetry support so OTEL exporters can fan out to both our internal collector and external Prometheus/remote-write targets... ``` #### After: ```markdown # PR Description Diagram (Option 1: Direct Scrape) flowchart LR SR["ServiceRadar<br/>Components"] KPS["kube-prom-stack<br/>(Prometheus)"] OTEL["Internal OTEL<br/>Collector"] SR -- "OTLP" --> OTEL SR -- "Expose /metrics" --> KPS KPS -- "Scrapes via<br/>ServiceMonitor" --> SR # proposal.md text - ServiceRadar components will be instrumented to simultaneously send metrics via OTLP to the internal collector AND expose a `/metrics` endpoint. - `kube-prometheus-stack` will discover and scrape these `/metrics` endpoints directly using `ServiceMonitor` resources. ``` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies a critical ambiguity in the proposal's architecture, where the diagram and text describe conflicting data flows, which could lead to incorrect implementation. </details></details></td><td align=center>Medium </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2486
No description provided.