doc updates #2487

Merged
mfreeman451 merged 1 commit from refs/pull/2487/head into main 2025-11-28 18:15:04 +00:00
mfreeman451 commented 2025-11-28 18:14:52 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2031
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2031
Original created: 2025-11-28T18:14:52Z
Original updated: 2025-11-28T18:16:23Z
Original head: carverauto/serviceradar:chore/docs_updates_nov28
Original base: main
Original merged: 2025-11-28T18:15:04Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Documentation


Description

  • Redesigned architecture diagram with clearer Kubernetes cluster structure
    • Updated component organization into logical subgraphs (Ingress, API, Monitoring, Data Plane, Collectors, Identity)
    • Added comprehensive traffic flow documentation and cluster requirements
    • Upgraded Docusaurus dependencies from 3.7.0 to 3.9.2
    • Replaced Timeplus references with Timescale/TimescaleDB terminology

Diagram Walkthrough

    flowchart LR
      ArchDoc["Architecture Diagram<br/>Redesign"]
      ArchDoc -->|"Clearer K8s<br/>structure"| NewDiagram["New Mermaid<br/>Flowchart"]
      ArchDoc -->|"Traffic flow<br/>summary"| TrafficDocs["User/Agent/Data<br/>Flow Docs"]
      ArchDoc -->|"Cluster<br/>requirements"| ReqDocs["Ingress/Storage/<br/>CPU/Identity Specs"]
      
      DepUpgrade["Dependency<br/>Upgrades"]
      DepUpgrade -->|"3.7.0 → 3.9.2"| DocusaurusUpgrade["Docusaurus Core<br/>Preset & Mermaid"]
      
      TerminologyFix["Terminology<br/>Updates"]
      TerminologyFix -->|"Timeplus →<br/>Timescale"| DBRefs["Database<br/>References"]

File Walkthrough

Relevant files
Documentation
architecture.md
Redesign architecture diagram and add cluster requirements

docs/docs/architecture.md

  • Completely redesigned Mermaid architecture diagram from graph TD to
    flowchart TB with improved Kubernetes cluster structure
  • Reorganized components into logical subgraphs: External Access,
    Ingress Layer, API Layer, Monitoring Layer, Data Plane, Telemetry
    Collectors, and Identity & Security
  • Added detailed traffic flow summary explaining user requests, Kong
    validation, edge agent connections, NATS messaging, and SPIRE
    certificate distribution
  • Added comprehensive "Cluster requirements" section documenting Ingress
    configuration, persistent storage needs (~150GiB baseline), CPU/memory
    resource requests, and identity plane requirements
  • Replaced all references to "Timeplus" with "Timescale" or
    "TimescaleDB" for consistency
  • Updated Web UI documentation to reference cluster ingress exposure
    instead of Nginx reverse proxy
+80/-83 
Dependencies
package.json
Upgrade Docusaurus dependencies to 3.9.2                                 

docs/package.json

  • Upgraded @docusaurus/core from 3.7.0 to ^3.9.2
  • Upgraded @docusaurus/preset-classic from 3.7.0 to ^3.9.2
  • Upgraded @docusaurus/theme-mermaid from ^3.7.0 to ^3.9.2
  • Upgraded @docusaurus/module-type-aliases from 3.7.0 to ^3.9.2
  • Upgraded @docusaurus/tsconfig from 3.7.0 to ^3.9.2
  • Upgraded @docusaurus/types from 3.7.0 to ^3.9.2
+6/-6     

Imported from GitHub pull request. Original GitHub pull request: #2031 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2031 Original created: 2025-11-28T18:14:52Z Original updated: 2025-11-28T18:16:23Z Original head: carverauto/serviceradar:chore/docs_updates_nov28 Original base: main Original merged: 2025-11-28T18:15:04Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Documentation ___ ### **Description** - Redesigned architecture diagram with clearer Kubernetes cluster structure - Updated component organization into logical subgraphs (Ingress, API, Monitoring, Data Plane, Collectors, Identity) - Added comprehensive traffic flow documentation and cluster requirements - Upgraded Docusaurus dependencies from 3.7.0 to 3.9.2 - Replaced Timeplus references with Timescale/TimescaleDB terminology ___ ### Diagram Walkthrough ```mermaid flowchart LR ArchDoc["Architecture Diagram<br/>Redesign"] ArchDoc -->|"Clearer K8s<br/>structure"| NewDiagram["New Mermaid<br/>Flowchart"] ArchDoc -->|"Traffic flow<br/>summary"| TrafficDocs["User/Agent/Data<br/>Flow Docs"] ArchDoc -->|"Cluster<br/>requirements"| ReqDocs["Ingress/Storage/<br/>CPU/Identity Specs"] DepUpgrade["Dependency<br/>Upgrades"] DepUpgrade -->|"3.7.0 → 3.9.2"| DocusaurusUpgrade["Docusaurus Core<br/>Preset & Mermaid"] TerminologyFix["Terminology<br/>Updates"] TerminologyFix -->|"Timeplus →<br/>Timescale"| DBRefs["Database<br/>References"] ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Documentation</strong></td><td><table> <tr> <td> <details> <summary><strong>architecture.md</strong><dd><code>Redesign architecture diagram and add cluster requirements</code></dd></summary> <hr> docs/docs/architecture.md <ul><li>Completely redesigned Mermaid architecture diagram from <code>graph TD</code> to <br><code>flowchart TB</code> with improved Kubernetes cluster structure<br> <li> Reorganized components into logical subgraphs: External Access, <br>Ingress Layer, API Layer, Monitoring Layer, Data Plane, Telemetry <br>Collectors, and Identity & Security<br> <li> Added detailed traffic flow summary explaining user requests, Kong <br>validation, edge agent connections, NATS messaging, and SPIRE <br>certificate distribution<br> <li> Added comprehensive "Cluster requirements" section documenting Ingress <br>configuration, persistent storage needs (~150GiB baseline), CPU/memory <br>resource requests, and identity plane requirements<br> <li> Replaced all references to "Timeplus" with "Timescale" or <br>"TimescaleDB" for consistency<br> <li> Updated Web UI documentation to reference cluster ingress exposure <br>instead of Nginx reverse proxy</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2031/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37">+80/-83</a>&nbsp; </td> </tr> </table></td></tr><tr><td><strong>Dependencies</strong></td><td><table> <tr> <td> <details> <summary><strong>package.json</strong><dd><code>Upgrade Docusaurus dependencies to 3.9.2</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> docs/package.json <ul><li>Upgraded <code>@docusaurus/core</code> from 3.7.0 to ^3.9.2<br> <li> Upgraded <code>@docusaurus/preset-classic</code> from 3.7.0 to ^3.9.2<br> <li> Upgraded <code>@docusaurus/theme-mermaid</code> from ^3.7.0 to ^3.9.2<br> <li> Upgraded <code>@docusaurus/module-type-aliases</code> from 3.7.0 to ^3.9.2<br> <li> Upgraded <code>@docusaurus/tsconfig</code> from 3.7.0 to ^3.9.2<br> <li> Upgraded <code>@docusaurus/types</code> from 3.7.0 to ^3.9.2</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2031/files#diff-adfa337ce44dc2902621da20152a048dac41878cf3716dfc4cc56d03aa212a56">+6/-6</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></td></tr></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-11-28 18:15:26 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2031#issuecomment-3590079079
Original created: 2025-11-28T18:15:26Z

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No code impact: The PR only updates documentation and package versions without adding or modifying runtime
code paths that would affect audit logging of critical actions.

Referred Code
**Traffic flow summary:**
- **User requests** → Ingress → Web UI (static/SSR) or Kong (API)
- **Kong** validates JWTs and routes to Core (control plane) or SRQL (queries)
- **Edge agents** connect via gRPC mTLS to the Poller
- **NATS JetStream** provides pub/sub messaging and KV storage for all services
- **SPIRE** issues X.509 certificates to all workloads via DaemonSet agents

### Cluster requirements

- **Ingress**: Required for the web UI and API. Default host/class/TLS come from `helm/serviceradar/values.yaml` (`ingress.enabled=true`, `host=demo.serviceradar.cloud`, `className=nginx`, `tls.secretName=serviceradar-prod-tls`, `tls.clusterIssuer=carverauto-issuer`). If you use nginx, mirror the demo annotations (`nginx.ingress.kubernetes.io/proxy-body-size: 100m`, `proxy-buffer-size: 128k`, `proxy-buffers-number: 4`, `proxy-busy-buffers-size: 256k`, `proxy-read-timeout: 86400`, `proxy-send-timeout: 86400`, `proxy-connect-timeout: 60`) to keep SRQL streams and large asset uploads stable (`k8s/demo/prod/ingress.yaml`).

- **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG.

- **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling.

- **Identity plane**: SPIRE server (StatefulSet) and daemonset agents must be running; services expect the workload socket at `/run/spire/sockets/agent.sock` and SPIFFE IDs derived from `spire.trustDomain` in `values.yaml`.

- **TLS artifacts**: Pods mount `serviceradar-cert-data` for inter-service TLS and `cnpg-ca` for database verification; ensure these secrets/PVCs are provisioned before rolling workloads.

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Documentation only: Changes introduce documentation prose and diagram nodes rather than executable code, so
identifier naming in code cannot be evaluated from this diff.

Referred Code
```mermaid
flowchart TB
    subgraph External["External Access"]
        User([User Browser])
        EdgeAgent([Edge Agents])
    end

    subgraph Cluster["Kubernetes Cluster"]
        subgraph Ingress["Edge Layer"]
            ING[Ingress Controller]
            WEB[Web UI<br/>Next.js :3000]
            KONG[Kong Gateway<br/>:8000]
        end

        subgraph API["API Layer"]
            CORE[Core Service<br/>REST :8090 / gRPC :50052]
            SRQL[SRQL Service<br/>:8080]
        end

        subgraph Monitoring["Monitoring Layer"]
            POLLER[Poller<br/>:50053]


 ... (clipped 49 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Dependency upgrade: Only documentation and dependency versions were updated; no new executable code was added
where error handling could be assessed.

Referred Code
"@docusaurus/core": "^3.9.2",
"@docusaurus/preset-classic": "^3.9.2",
"@docusaurus/theme-mermaid": "^3.9.2",
"@mdx-js/react": "^3.0.0",

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
No user errors: The diff does not include user-facing error handling code; only docs were changed, so
exposure of internal details in errors cannot be validated.

Referred Code
**Traffic flow summary:**
- **User requests** → Ingress → Web UI (static/SSR) or Kong (API)
- **Kong** validates JWTs and routes to Core (control plane) or SRQL (queries)
- **Edge agents** connect via gRPC mTLS to the Poller
- **NATS JetStream** provides pub/sub messaging and KV storage for all services
- **SPIRE** issues X.509 certificates to all workloads via DaemonSet agents

### Cluster requirements

- **Ingress**: Required for the web UI and API. Default host/class/TLS come from `helm/serviceradar/values.yaml` (`ingress.enabled=true`, `host=demo.serviceradar.cloud`, `className=nginx`, `tls.secretName=serviceradar-prod-tls`, `tls.clusterIssuer=carverauto-issuer`). If you use nginx, mirror the demo annotations (`nginx.ingress.kubernetes.io/proxy-body-size: 100m`, `proxy-buffer-size: 128k`, `proxy-buffers-number: 4`, `proxy-busy-buffers-size: 256k`, `proxy-read-timeout: 86400`, `proxy-send-timeout: 86400`, `proxy-connect-timeout: 60`) to keep SRQL streams and large asset uploads stable (`k8s/demo/prod/ingress.yaml`).

- **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG.

- **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling.

- **Identity plane**: SPIRE server (StatefulSet) and daemonset agents must be running; services expect the workload socket at `/run/spire/sockets/agent.sock` and SPIFFE IDs derived from `spire.trustDomain` in `values.yaml`.

- **TLS artifacts**: Pods mount `serviceradar-cert-data` for inter-service TLS and `cnpg-ca` for database verification; ensure these secrets/PVCs are provisioned before rolling workloads.

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
No logging code: No application logging changes are present in this documentation-focused PR; secure
logging practices cannot be assessed.

Referred Code
**Traffic flow summary:**
- **User requests** → Ingress → Web UI (static/SSR) or Kong (API)
- **Kong** validates JWTs and routes to Core (control plane) or SRQL (queries)
- **Edge agents** connect via gRPC mTLS to the Poller
- **NATS JetStream** provides pub/sub messaging and KV storage for all services
- **SPIRE** issues X.509 certificates to all workloads via DaemonSet agents

### Cluster requirements

- **Ingress**: Required for the web UI and API. Default host/class/TLS come from `helm/serviceradar/values.yaml` (`ingress.enabled=true`, `host=demo.serviceradar.cloud`, `className=nginx`, `tls.secretName=serviceradar-prod-tls`, `tls.clusterIssuer=carverauto-issuer`). If you use nginx, mirror the demo annotations (`nginx.ingress.kubernetes.io/proxy-body-size: 100m`, `proxy-buffer-size: 128k`, `proxy-buffers-number: 4`, `proxy-busy-buffers-size: 256k`, `proxy-read-timeout: 86400`, `proxy-send-timeout: 86400`, `proxy-connect-timeout: 60`) to keep SRQL streams and large asset uploads stable (`k8s/demo/prod/ingress.yaml`).

- **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG.

- **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling.

- **Identity plane**: SPIRE server (StatefulSet) and daemonset agents must be running; services expect the workload socket at `/run/spire/sockets/agent.sock` and SPIFFE IDs derived from `spire.trustDomain` in `values.yaml`.

- **TLS artifacts**: Pods mount `serviceradar-cert-data` for inter-service TLS and `cnpg-ca` for database verification; ensure these secrets/PVCs are provisioned before rolling workloads.

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Package bumps: Only Docusaurus-related dependency versions and documentation text changed; there are no
new input-handling code paths to validate for security.

Referred Code
"@docusaurus/module-type-aliases": "^3.9.2",
"@docusaurus/tsconfig": "^3.9.2",
"@docusaurus/types": "^3.9.2",

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2031#issuecomment-3590079079 Original created: 2025-11-28T18:15:26Z --- _You are nearing your monthly Qodo Merge usage quota. For more information, please visit [here](https://qodo-merge-docs.qodo.ai/installation/qodo_merge/#cloud-users)._ ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/febbecf313062a35bb61c685fd2346ac19be73c3 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td>🟢</td><td><details><summary><strong>No security concerns identified</strong></summary> No security vulnerabilities detected by AI analysis. Human verification advised for critical code. </details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=6>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2031/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37R83-R101'><strong>No code impact</strong></a>: The PR only updates documentation and package versions without adding or modifying runtime <br>code paths that would affect audit logging of critical actions.<br> <details open><summary>Referred Code</summary> ```markdown **Traffic flow summary:** - **User requests** → Ingress → Web UI (static/SSR) or Kong (API) - **Kong** validates JWTs and routes to Core (control plane) or SRQL (queries) - **Edge agents** connect via gRPC mTLS to the Poller - **NATS JetStream** provides pub/sub messaging and KV storage for all services - **SPIRE** issues X.509 certificates to all workloads via DaemonSet agents ### Cluster requirements - **Ingress**: Required for the web UI and API. Default host/class/TLS come from `helm/serviceradar/values.yaml` (`ingress.enabled=true`, `host=demo.serviceradar.cloud`, `className=nginx`, `tls.secretName=serviceradar-prod-tls`, `tls.clusterIssuer=carverauto-issuer`). If you use nginx, mirror the demo annotations (`nginx.ingress.kubernetes.io/proxy-body-size: 100m`, `proxy-buffer-size: 128k`, `proxy-buffers-number: 4`, `proxy-busy-buffers-size: 256k`, `proxy-read-timeout: 86400`, `proxy-send-timeout: 86400`, `proxy-connect-timeout: 60`) to keep SRQL streams and large asset uploads stable (`k8s/demo/prod/ingress.yaml`). - **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG. - **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling. - **Identity plane**: SPIRE server (StatefulSet) and daemonset agents must be running; services expect the workload socket at `/run/spire/sockets/agent.sock` and SPIFFE IDs derived from `spire.trustDomain` in `values.yaml`. - **TLS artifacts**: Pods mount `serviceradar-cert-data` for inter-service TLS and `cnpg-ca` for database verification; ensure these secrets/PVCs are provisioned before rolling workloads. ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2031/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37R12-R81'><strong>Documentation only</strong></a>: Changes introduce documentation prose and diagram nodes rather than executable code, so <br>identifier naming in code cannot be evaluated from this diff.<br> <details open><summary>Referred Code</summary> ```markdown ```mermaid flowchart TB subgraph External["External Access"] User([User Browser]) EdgeAgent([Edge Agents]) end subgraph Cluster["Kubernetes Cluster"] subgraph Ingress["Edge Layer"] ING[Ingress Controller] WEB[Web UI<br/>Next.js :3000] KONG[Kong Gateway<br/>:8000] end subgraph API["API Layer"] CORE[Core Service<br/>REST :8090 / gRPC :50052] SRQL[SRQL Service<br/>:8080] end subgraph Monitoring["Monitoring Layer"] POLLER[Poller<br/>:50053] ... (clipped 49 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2031/files#diff-adfa337ce44dc2902621da20152a048dac41878cf3716dfc4cc56d03aa212a56R18-R21'><strong>Dependency upgrade</strong></a>: Only documentation and dependency versions were updated; no new executable code was added <br>where error handling could be assessed.<br> <details open><summary>Referred Code</summary> ```json "@docusaurus/core": "^3.9.2", "@docusaurus/preset-classic": "^3.9.2", "@docusaurus/theme-mermaid": "^3.9.2", "@mdx-js/react": "^3.0.0", ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2031/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37R83-R101'><strong>No user errors</strong></a>: The diff does not include user-facing error handling code; only docs were changed, so <br>exposure of internal details in errors cannot be validated.<br> <details open><summary>Referred Code</summary> ```markdown **Traffic flow summary:** - **User requests** → Ingress → Web UI (static/SSR) or Kong (API) - **Kong** validates JWTs and routes to Core (control plane) or SRQL (queries) - **Edge agents** connect via gRPC mTLS to the Poller - **NATS JetStream** provides pub/sub messaging and KV storage for all services - **SPIRE** issues X.509 certificates to all workloads via DaemonSet agents ### Cluster requirements - **Ingress**: Required for the web UI and API. Default host/class/TLS come from `helm/serviceradar/values.yaml` (`ingress.enabled=true`, `host=demo.serviceradar.cloud`, `className=nginx`, `tls.secretName=serviceradar-prod-tls`, `tls.clusterIssuer=carverauto-issuer`). If you use nginx, mirror the demo annotations (`nginx.ingress.kubernetes.io/proxy-body-size: 100m`, `proxy-buffer-size: 128k`, `proxy-buffers-number: 4`, `proxy-busy-buffers-size: 256k`, `proxy-read-timeout: 86400`, `proxy-send-timeout: 86400`, `proxy-connect-timeout: 60`) to keep SRQL streams and large asset uploads stable (`k8s/demo/prod/ingress.yaml`). - **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG. - **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling. - **Identity plane**: SPIRE server (StatefulSet) and daemonset agents must be running; services expect the workload socket at `/run/spire/sockets/agent.sock` and SPIFFE IDs derived from `spire.trustDomain` in `values.yaml`. - **TLS artifacts**: Pods mount `serviceradar-cert-data` for inter-service TLS and `cnpg-ca` for database verification; ensure these secrets/PVCs are provisioned before rolling workloads. ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2031/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37R83-R101'><strong>No logging code</strong></a>: No application logging changes are present in this documentation-focused PR; secure <br>logging practices cannot be assessed.<br> <details open><summary>Referred Code</summary> ```markdown **Traffic flow summary:** - **User requests** → Ingress → Web UI (static/SSR) or Kong (API) - **Kong** validates JWTs and routes to Core (control plane) or SRQL (queries) - **Edge agents** connect via gRPC mTLS to the Poller - **NATS JetStream** provides pub/sub messaging and KV storage for all services - **SPIRE** issues X.509 certificates to all workloads via DaemonSet agents ### Cluster requirements - **Ingress**: Required for the web UI and API. Default host/class/TLS come from `helm/serviceradar/values.yaml` (`ingress.enabled=true`, `host=demo.serviceradar.cloud`, `className=nginx`, `tls.secretName=serviceradar-prod-tls`, `tls.clusterIssuer=carverauto-issuer`). If you use nginx, mirror the demo annotations (`nginx.ingress.kubernetes.io/proxy-body-size: 100m`, `proxy-buffer-size: 128k`, `proxy-buffers-number: 4`, `proxy-busy-buffers-size: 256k`, `proxy-read-timeout: 86400`, `proxy-send-timeout: 86400`, `proxy-connect-timeout: 60`) to keep SRQL streams and large asset uploads stable (`k8s/demo/prod/ingress.yaml`). - **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG. - **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling. - **Identity plane**: SPIRE server (StatefulSet) and daemonset agents must be running; services expect the workload socket at `/run/spire/sockets/agent.sock` and SPIFFE IDs derived from `spire.trustDomain` in `values.yaml`. - **TLS artifacts**: Pods mount `serviceradar-cert-data` for inter-service TLS and `cnpg-ca` for database verification; ensure these secrets/PVCs are provisioned before rolling workloads. ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2031/files#diff-adfa337ce44dc2902621da20152a048dac41878cf3716dfc4cc56d03aa212a56R29-R31'><strong>Package bumps</strong></a>: Only Docusaurus-related dependency versions and documentation text changed; there are no <br>new input-handling code paths to validate for security.<br> <details open><summary>Referred Code</summary> ```json "@docusaurus/module-type-aliases": "^3.9.2", "@docusaurus/tsconfig": "^3.9.2", "@docusaurus/types": "^3.9.2", ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> <!-- placeholder --> <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2025-11-28 18:16:22 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2031#issuecomment-3590080554
Original created: 2025-11-28T18:16:22Z

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Avoid hardcoding configuration-defined storage values

In the documentation, replace hardcoded storage size values with a general
overview and a reference to the configuration files for the exact figures to
improve maintainability.

docs/docs/architecture.md [94]

-- **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG.
+- **Persistent storage (~150GiB/node baseline)**: Several components require persistent storage. CNPG (TimescaleDB) is the largest consumer, followed by NATS JetStream, the OTel Collector, and various services like Core and Datasvc. For specific PVC sizes, refer to the manifests in `k8s/demo/base/`. Spread the CNPG replicas across at least three nodes with SSD-class volumes. The combined storage requirements can reach approximately 150Gi of usable capacity per node when co-scheduled with a CNPG replica.
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: This is a valid suggestion that improves the long-term maintainability of the documentation by avoiding hardcoded values that can become stale.

Low
Avoid hardcoding resource request values

In the documentation, replace hardcoded CPU and memory resource requests with a
high-level summary and a reference to the configuration files for precise
values.

docs/docs/architecture.md [96]

-- **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling.
+- **CPU / memory (requested)**: The core services require significant resources. For example, the Core service and NATS request 1 CPU each, with 4Gi and 8Gi of memory respectively. For a complete and up-to-date list of resource requests for all components, please refer to the relevant deployment manifests (e.g., `k8s/demo/base/serviceradar-core.yaml`). The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling.
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: This is a valid suggestion that improves the long-term maintainability of the documentation by avoiding hardcoded values that can become stale.

Low
  • More
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2031#issuecomment-3590080554 Original created: 2025-11-28T18:16:22Z --- _You are nearing your monthly Qodo Merge usage quota. For more information, please visit [here](https://qodo-merge-docs.qodo.ai/installation/qodo_merge/#cloud-users)._ ## PR Code Suggestions ✨ <!-- febbecf --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=2>General</td> <td> <details><summary>Avoid hardcoding configuration-defined storage values</summary> ___ **In the documentation, replace hardcoded storage size values with a general <br>overview and a reference to the configuration files for the exact figures to <br>improve maintainability.** [docs/docs/architecture.md [94]](https://github.com/carverauto/serviceradar/pull/2031/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37R94-R94) ```diff -- **Persistent storage (~150GiB/node baseline)**: CNPG consumes the majority (3×100Gi PVCs from `k8s/demo/base/spire/cnpg-cluster.yaml`). JetStream adds 30Gi (`k8s/demo/base/serviceradar-nats.yaml`), OTEL 10Gi (`k8s/demo/base/serviceradar-otel.yaml`), and several 5Gi claims for Core, Datasvc, Mapper, Zen, DB event writer, plus 1Gi claims for Faker/Flowgger/Cert jobs. Spread the CNPG replicas across at least three nodes with SSD-class volumes; the extra PVCs lift per-node needs to roughly 150Gi of usable capacity when co-scheduled with CNPG. +- **Persistent storage (~150GiB/node baseline)**: Several components require persistent storage. CNPG (TimescaleDB) is the largest consumer, followed by NATS JetStream, the OTel Collector, and various services like Core and Datasvc. For specific PVC sizes, refer to the manifests in `k8s/demo/base/`. Spread the CNPG replicas across at least three nodes with SSD-class volumes. The combined storage requirements can reach approximately 150Gi of usable capacity per node when co-scheduled with a CNPG replica. ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=0 --> <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: This is a valid suggestion that improves the long-term maintainability of the documentation by avoiding hardcoded values that can become stale. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Avoid hardcoding resource request values</summary> ___ **In the documentation, replace hardcoded CPU and memory resource requests with a <br>high-level summary and a reference to the configuration files for precise <br>values.** [docs/docs/architecture.md [96]](https://github.com/carverauto/serviceradar/pull/2031/files#diff-90abd06467420fd89391fd1a4d75ceb1f6a9381de4d13a95fffe606abff38d37R96-R96) ```diff -- **CPU / memory (requested)**: Core 1 CPU / 4Gi, Poller 0.5 CPU / 2Gi (`k8s/demo/base/serviceradar-core.yaml`, `serviceradar-poller.yaml`); Kong 0.5 CPU / 1Gi; Web 0.2 CPU / 512Mi; Datasvc 0.5 CPU / 128Mi; SRQL 0.1 CPU / 128Mi; NATS 1 CPU / 8Gi; OTEL 0.2 CPU / 256Mi. The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling. +- **CPU / memory (requested)**: The core services require significant resources. For example, the Core service and NATS request 1 CPU each, with 4Gi and 8Gi of memory respectively. For a complete and up-to-date list of resource requests for all components, please refer to the relevant deployment manifests (e.g., `k8s/demo/base/serviceradar-core.yaml`). The steady-state floor is ~4 vCPU and ~16 GiB for the core path, before adding optional sync/checker pods or horizontal scaling. ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=1 --> <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: This is a valid suggestion that improves the long-term maintainability of the documentation by avoiding hardcoded values that can become stale. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] More <!-- /improve --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2487
No description provided.