fixing resource limits #2276

Merged

mfreeman451 merged 2 commits from refs/pull/2276/head into main

2025-10-05 17:34:26 +00:00

mfreeman451 commented

2025-10-05 17:30:18 +00:00

(Migrated from github.com)

Owner

Imported from GitHub pull request.

Original GitHub pull request: #1704
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1704
Original created: 2025-10-05T17:30:18Z
Original updated: 2025-10-05T17:34:35Z
Original head: carverauto/serviceradar:k8s/oom_updates
Original base: main
Original merged: 2025-10-05T17:34:26Z by @mfreeman451

PR Type

Enhancement

Description

Increase CPU and memory limits across all ServiceRadar components
Scale memory from 512Mi-1Gi to 2Gi-4Gi ranges
Boost CPU requests from 100m to 250m consistently
Enhance resource allocation for better performance

Diagram Walkthrough

flowchart LR
  A["Current Resources"] --> B["Increased CPU Limits"]
  A --> C["Increased Memory Limits"]
  B --> D["Better Performance"]
  C --> D

File Walkthrough

Relevant files

Enhancement

serviceradar-agent.yaml `Scale agent resource limits significantly` k8s/demo/base/serviceradar-agent.yaml CPU limits increased from 200m to 500m Memory limits increased from 512Mi to 2Gi CPU requests increased from 100m to 250m Memory requests increased from 128Mi to 1Gi	+4/-4
serviceradar-nats.yaml `Boost NATS server memory allocation` k8s/demo/base/serviceradar-nats.yaml Memory limits increased from 1Gi to 4Gi CPU requests increased from 100m to 250m Memory requests increased from 256Mi to 2Gi	+3/-3
serviceradar-poller.yaml `Double poller resource allocations` k8s/demo/base/serviceradar-poller.yaml CPU limits increased from 500m to 1 core Memory limits increased from 512Mi to 2Gi CPU requests increased from 100m to 250m Memory requests increased from 256Mi to 1Gi	+4/-4
serviceradar-sync.yaml `Maximize sync service resource limits` k8s/demo/base/serviceradar-sync.yaml CPU limits increased from 500m to 1 core Memory limits increased from 512Mi to 4Gi CPU requests increased from 100m to 250m Memory requests increased from 128Mi to 2Gi	+4/-4

Imported from GitHub pull request. Original GitHub pull request: #1704 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/1704 Original created: 2025-10-05T17:30:18Z Original updated: 2025-10-05T17:34:35Z Original head: carverauto/serviceradar:k8s/oom_updates Original base: main Original merged: 2025-10-05T17:34:26Z by @mfreeman451 --- ### **PR Type** Enhancement ___ ### **Description** - Increase CPU and memory limits across all ServiceRadar components - Scale memory from 512Mi-1Gi to 2Gi-4Gi ranges - Boost CPU requests from 100m to 250m consistently - Enhance resource allocation for better performance ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Current Resources"] --> B["Increased CPU Limits"] A --> C["Increased Memory Limits"] B --> D["Better Performance"] C --> D ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><table> <tr> <td> <details> <summary><strong>serviceradar-agent.yaml</strong><dd><code>Scale agent resource limits significantly</code>                                </dd></summary> <hr> k8s/demo/base/serviceradar-agent.yaml <ul><li>CPU limits increased from 200m to 500m<br> <li> Memory limits increased from 512Mi to 2Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 128Mi to 1Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-750aaa803a43f0993450026e4174b8a7d20fe016b9ff726f154a77a4f0fb4e19">+4/-4</a>      </td> </tr> <tr> <td> <details> <summary><strong>serviceradar-nats.yaml</strong><dd><code>Boost NATS server memory allocation</code>                                            </dd></summary> <hr> k8s/demo/base/serviceradar-nats.yaml <ul><li>Memory limits increased from 1Gi to 4Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 256Mi to 2Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-48984f0444e9f5e0d051d71ee217f64c5dfab202889db4564e6c1a7a6a248b05">+3/-3</a>      </td> </tr> <tr> <td> <details> <summary><strong>serviceradar-poller.yaml</strong><dd><code>Double poller resource allocations</code>                                              </dd></summary> <hr> k8s/demo/base/serviceradar-poller.yaml <ul><li>CPU limits increased from 500m to 1 core<br> <li> Memory limits increased from 512Mi to 2Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 256Mi to 1Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-20492d7f5153e92f95cbbf2f62fb75b1a43f530304372a5e7731fdf95b583f3b">+4/-4</a>      </td> </tr> <tr> <td> <details> <summary><strong>serviceradar-sync.yaml</strong><dd><code>Maximize sync service resource limits</code>                                        </dd></summary> <hr> k8s/demo/base/serviceradar-sync.yaml <ul><li>CPU limits increased from 500m to 1 core<br> <li> Memory limits increased from 512Mi to 4Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 128Mi to 2Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-8b12e23e4eec411255c9ae947b353c680f5ff75a4e0891bc14e2db88d9e6b778">+4/-4</a>      </td> </tr> </table></td></tr></tr></tbody></table> </details> ___

qodo-code-review[bot] commented

2025-10-05 17:30:40 +00:00

(Migrated from github.com)

Author

Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369203751
Original created: 2025-10-05T17:30:40Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
⚪	No custom compliance provided Follow the guide to enable custom compliance check.
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369203751 Original created: 2025-10-05T17:30:40Z --- ## PR Compliance Guide 🔍  Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td>🟢</td><td><details><summary><strong>No security concerns identified</strong></summary> No security vulnerabilities detected by AI analysis. Human verification advised for critical code. </details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </summary></strong> - [ ] Create ticket/issue  </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>No custom compliance provided</strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/'>guide</a> to enable custom compliance check. </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update  </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>

qodo-code-review[bot] commented

2025-10-05 17:31:45 +00:00

(Migrated from github.com)

Author

Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369204378
Original created: 2025-10-05T17:31:45Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Justify resource increases with performance data The suggestion states that the substantial CPU and memory resource increases for various services lack supporting performance data. It recommends justifying these changes with metrics to validate their necessity and avoid unnecessary infrastructure costs. Examples: k8s/demo/base/serviceradar-sync.yaml [37-42] `requests: cpu: "250m" memory: "2Gi" limits: cpu: "1" memory: "4Gi"` k8s/demo/base/serviceradar-nats.yaml [38-43] `limits: cpu: "1" memory: "4Gi" requests: cpu: "250m" memory: "2Gi"` Solution Walkthrough: Before: `# k8s/demo/base/serviceradar-sync.yaml ... resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "512Mi" ...` After: `# In PR Description: # Based on monitoring data showing consistent memory usage near 1.8Gi # and frequent CPU throttling, resources are being increased. # See dashboard: [link-to-metrics] # k8s/demo/base/serviceradar-sync.yaml ... resources: requests: cpu: "250m" memory: "2Gi" # Sized based on observed usage + buffer limits: cpu: "1" memory: "4Gi" # Sized to prevent OOMKills during spikes ...` Suggestion importance[1-10]: 9 __ Why: This suggestion addresses a critical operational concern, as blindly increasing resource allocations without performance data can lead to significant, unnecessary infrastructure costs.	High
General	Increase memory request ratio Increase the memory request for `serviceradar-agent` from `1Gi` to `1.5Gi` to better align with the `2Gi` limit, potentially improving pod stability under load. k8s/demo/base/serviceradar-agent.yaml [58-64] `resources: limits: cpu: "500m" memory: "2Gi" requests: cpu: "250m" - memory: "1Gi" + memory: "1.5Gi"` Apply / Chat Suggestion importance[1-10]: 6 __ Why: The suggestion provides a valid Kubernetes best practice to align memory requests closer to limits, which can improve pod stability and its Quality of Service (QoS) class.	Low
General	Reduce excessive memory request Reduce the memory request for `serviceradar-sync` from `2Gi` to `1Gi` as the current 16x increase from `128Mi` seems excessive and may cause resource waste. k8s/demo/base/serviceradar-sync.yaml [36-42] `resources: requests: cpu: "250m" - memory: "2Gi" + memory: "1Gi" limits: cpu: "1" memory: "4Gi"` Apply / Chat Suggestion importance[1-10]: 5 __ Why: The suggestion correctly questions the large increase in memory request, which could lead to resource waste, but it lacks specific data to definitively state the new value is wrong.	Low
Update

Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369204378 Original created: 2025-10-05T17:31:45Z --- ## PR Code Suggestions ✨  Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion                                                                                                                                    </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>High-level</td> <td> <details><summary>Justify resource increases with performance data</summary> ___ **The suggestion states that the substantial CPU and memory resource increases for <br>various services lack supporting performance data. It recommends justifying <br>these changes with metrics to validate their necessity and avoid unnecessary <br>infrastructure costs.** ### Examples: <details> <summary> <a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-8b12e23e4eec411255c9ae947b353c680f5ff75a4e0891bc14e2db88d9e6b778R37-R42">k8s/demo/base/serviceradar-sync.yaml [37-42]</a> </summary> ```yaml requests: cpu: "250m" memory: "2Gi" limits: cpu: "1" memory: "4Gi" ``` </details> <details> <summary> <a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-48984f0444e9f5e0d051d71ee217f64c5dfab202889db4564e6c1a7a6a248b05R38-R43">k8s/demo/base/serviceradar-nats.yaml [38-43]</a> </summary> ```yaml limits: cpu: "1" memory: "4Gi" requests: cpu: "250m" memory: "2Gi" ``` </details> ### Solution Walkthrough: #### Before: ```yaml # k8s/demo/base/serviceradar-sync.yaml ... resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "512Mi" ... ``` #### After: ```yaml # In PR Description: # Based on monitoring data showing consistent memory usage near 1.8Gi # and frequent CPU throttling, resources are being increased. # See dashboard: [link-to-metrics] # k8s/demo/base/serviceradar-sync.yaml ... resources: requests: cpu: "250m" memory: "2Gi" # Sized based on observed usage + buffer limits: cpu: "1" memory: "4Gi" # Sized to prevent OOMKills during spikes ... ``` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: This suggestion addresses a critical operational concern, as blindly increasing resource allocations without performance data can lead to significant, unnecessary infrastructure costs. </details></details></td><td align=center>High </td></tr><tr><td rowspan=2>General</td> <td> <details><summary>Increase memory request ratio</summary> ___ **Increase the memory request for <code>serviceradar-agent</code> from <code>1Gi</code> to <code>1.5Gi</code> to better <br>align with the <code>2Gi</code> limit, potentially improving pod stability under load.** [k8s/demo/base/serviceradar-agent.yaml [58-64]](https://github.com/carverauto/serviceradar/pull/1704/files#diff-750aaa803a43f0993450026e4174b8a7d20fe016b9ff726f154a77a4f0fb4e19R58-R64) ```diff resources: limits: cpu: "500m" memory: "2Gi" requests: cpu: "250m" - memory: "1Gi" + memory: "1.5Gi" ``` - [ ] **Apply / Chat**  <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion provides a valid Kubernetes best practice to align memory requests closer to limits, which can improve pod stability and its Quality of Service (QoS) class. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Reduce excessive memory request</summary> ___ **Reduce the memory request for <code>serviceradar-sync</code> from <code>2Gi</code> to <code>1Gi</code> as the current <br>16x increase from <code>128Mi</code> seems excessive and may cause resource waste.** [k8s/demo/base/serviceradar-sync.yaml [36-42]](https://github.com/carverauto/serviceradar/pull/1704/files#diff-8b12e23e4eec411255c9ae947b353c680f5ff75a4e0891bc14e2db88d9e6b778R36-R42) ```diff resources: requests: cpu: "250m" - memory: "2Gi" + memory: "1Gi" limits: cpu: "1" memory: "4Gi" ``` - [ ] **Apply / Chat**  <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The suggestion correctly questions the large increase in memory request, which could lead to resource waste, but it lacks specific data to definitively state the new value is wrong. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] Update  </td><td></td></tr></tbody></table>