fixing resource limits #2276

Merged
mfreeman451 merged 2 commits from refs/pull/2276/head into main 2025-10-05 17:34:26 +00:00
mfreeman451 commented 2025-10-05 17:30:18 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #1704
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1704
Original created: 2025-10-05T17:30:18Z
Original updated: 2025-10-05T17:34:35Z
Original head: carverauto/serviceradar:k8s/oom_updates
Original base: main
Original merged: 2025-10-05T17:34:26Z by @mfreeman451

PR Type

Enhancement


Description

  • Increase CPU and memory limits across all ServiceRadar components

  • Scale memory from 512Mi-1Gi to 2Gi-4Gi ranges

  • Boost CPU requests from 100m to 250m consistently

  • Enhance resource allocation for better performance


Diagram Walkthrough

flowchart LR
  A["Current Resources"] --> B["Increased CPU Limits"]
  A --> C["Increased Memory Limits"]
  B --> D["Better Performance"]
  C --> D

File Walkthrough

Relevant files
Enhancement
serviceradar-agent.yaml
Scale agent resource limits significantly                               

k8s/demo/base/serviceradar-agent.yaml

  • CPU limits increased from 200m to 500m
  • Memory limits increased from 512Mi to 2Gi
  • CPU requests increased from 100m to 250m
  • Memory requests increased from 128Mi to 1Gi
+4/-4     
serviceradar-nats.yaml
Boost NATS server memory allocation                                           

k8s/demo/base/serviceradar-nats.yaml

  • Memory limits increased from 1Gi to 4Gi
  • CPU requests increased from 100m to 250m
  • Memory requests increased from 256Mi to 2Gi
+3/-3     
serviceradar-poller.yaml
Double poller resource allocations                                             

k8s/demo/base/serviceradar-poller.yaml

  • CPU limits increased from 500m to 1 core
  • Memory limits increased from 512Mi to 2Gi
  • CPU requests increased from 100m to 250m
  • Memory requests increased from 256Mi to 1Gi
+4/-4     
serviceradar-sync.yaml
Maximize sync service resource limits                                       

k8s/demo/base/serviceradar-sync.yaml

  • CPU limits increased from 500m to 1 core
  • Memory limits increased from 512Mi to 4Gi
  • CPU requests increased from 100m to 250m
  • Memory requests increased from 128Mi to 2Gi
+4/-4     

Imported from GitHub pull request. Original GitHub pull request: #1704 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/1704 Original created: 2025-10-05T17:30:18Z Original updated: 2025-10-05T17:34:35Z Original head: carverauto/serviceradar:k8s/oom_updates Original base: main Original merged: 2025-10-05T17:34:26Z by @mfreeman451 --- ### **PR Type** Enhancement ___ ### **Description** - Increase CPU and memory limits across all ServiceRadar components - Scale memory from 512Mi-1Gi to 2Gi-4Gi ranges - Boost CPU requests from 100m to 250m consistently - Enhance resource allocation for better performance ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Current Resources"] --> B["Increased CPU Limits"] A --> C["Increased Memory Limits"] B --> D["Better Performance"] C --> D ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><table> <tr> <td> <details> <summary><strong>serviceradar-agent.yaml</strong><dd><code>Scale agent resource limits significantly</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> k8s/demo/base/serviceradar-agent.yaml <ul><li>CPU limits increased from 200m to 500m<br> <li> Memory limits increased from 512Mi to 2Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 128Mi to 1Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-750aaa803a43f0993450026e4174b8a7d20fe016b9ff726f154a77a4f0fb4e19">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>serviceradar-nats.yaml</strong><dd><code>Boost NATS server memory allocation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> k8s/demo/base/serviceradar-nats.yaml <ul><li>Memory limits increased from 1Gi to 4Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 256Mi to 2Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-48984f0444e9f5e0d051d71ee217f64c5dfab202889db4564e6c1a7a6a248b05">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>serviceradar-poller.yaml</strong><dd><code>Double poller resource allocations</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> k8s/demo/base/serviceradar-poller.yaml <ul><li>CPU limits increased from 500m to 1 core<br> <li> Memory limits increased from 512Mi to 2Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 256Mi to 1Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-20492d7f5153e92f95cbbf2f62fb75b1a43f530304372a5e7731fdf95b583f3b">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td> <details> <summary><strong>serviceradar-sync.yaml</strong><dd><code>Maximize sync service resource limits</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> k8s/demo/base/serviceradar-sync.yaml <ul><li>CPU limits increased from 500m to 1 core<br> <li> Memory limits increased from 512Mi to 4Gi<br> <li> CPU requests increased from 100m to 250m<br> <li> Memory requests increased from 128Mi to 2Gi</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-8b12e23e4eec411255c9ae947b353c680f5ff75a4e0891bc14e2db88d9e6b778">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></td></tr></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-10-05 17:30:40 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369203751
Original created: 2025-10-05T17:30:40Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369203751 Original created: 2025-10-05T17:30:40Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/0cc72e04ceefe1adb68f83eed6f9ab184dc2daa3 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td>🟢</td><td><details><summary><strong>No security concerns identified</strong></summary> No security vulnerabilities detected by AI analysis. Human verification advised for critical code. </details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </summary></strong> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>No custom compliance provided</strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/'>guide</a> to enable custom compliance check. </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2025-10-05 17:31:45 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369204378
Original created: 2025-10-05T17:31:45Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Justify resource increases with performance data

The suggestion states that the substantial CPU and memory resource increases for
various services lack supporting performance data. It recommends justifying
these changes with metrics to validate their necessity and avoid unnecessary
infrastructure costs.

Examples:

k8s/demo/base/serviceradar-sync.yaml [37-42]
          requests:
            cpu: "250m"
            memory: "2Gi"
          limits:
            cpu: "1"
            memory: "4Gi"
k8s/demo/base/serviceradar-nats.yaml [38-43]
          limits:
            cpu: "1"
            memory: "4Gi"
          requests:
            cpu: "250m"
            memory: "2Gi"

Solution Walkthrough:

Before:

# k8s/demo/base/serviceradar-sync.yaml
...
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"
...

After:

# In PR Description:
# Based on monitoring data showing consistent memory usage near 1.8Gi
# and frequent CPU throttling, resources are being increased.
# See dashboard: [link-to-metrics]

# k8s/demo/base/serviceradar-sync.yaml
...
resources:
  requests:
    cpu: "250m"
    memory: "2Gi" # Sized based on observed usage + buffer
  limits:
    cpu: "1"
    memory: "4Gi" # Sized to prevent OOMKills during spikes
...

Suggestion importance[1-10]: 9

__

Why: This suggestion addresses a critical operational concern, as blindly increasing resource allocations without performance data can lead to significant, unnecessary infrastructure costs.

High
General
Increase memory request ratio

Increase the memory request for serviceradar-agent from 1Gi to 1.5Gi to better
align with the 2Gi limit, potentially improving pod stability under load.

k8s/demo/base/serviceradar-agent.yaml [58-64]

 resources:
   limits:
     cpu: "500m"
     memory: "2Gi"
   requests:
     cpu: "250m"
-    memory: "1Gi"
+    memory: "1.5Gi"
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion provides a valid Kubernetes best practice to align memory requests closer to limits, which can improve pod stability and its Quality of Service (QoS) class.

Low
Reduce excessive memory request

Reduce the memory request for serviceradar-sync from 2Gi to 1Gi as the current
16x increase from 128Mi seems excessive and may cause resource waste.

k8s/demo/base/serviceradar-sync.yaml [36-42]

 resources:
   requests:
     cpu: "250m"
-    memory: "2Gi"
+    memory: "1Gi"
   limits:
     cpu: "1"
     memory: "4Gi"
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly questions the large increase in memory request, which could lead to resource waste, but it lacks specific data to definitively state the new value is wrong.

Low
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1704#issuecomment-3369204378 Original created: 2025-10-05T17:31:45Z --- ## PR Code Suggestions ✨ <!-- 0cc72e0 --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>High-level</td> <td> <details><summary>Justify resource increases with performance data</summary> ___ **The suggestion states that the substantial CPU and memory resource increases for <br>various services lack supporting performance data. It recommends justifying <br>these changes with metrics to validate their necessity and avoid unnecessary <br>infrastructure costs.** ### Examples: <details> <summary> <a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-8b12e23e4eec411255c9ae947b353c680f5ff75a4e0891bc14e2db88d9e6b778R37-R42">k8s/demo/base/serviceradar-sync.yaml [37-42]</a> </summary> ```yaml requests: cpu: "250m" memory: "2Gi" limits: cpu: "1" memory: "4Gi" ``` </details> <details> <summary> <a href="https://github.com/carverauto/serviceradar/pull/1704/files#diff-48984f0444e9f5e0d051d71ee217f64c5dfab202889db4564e6c1a7a6a248b05R38-R43">k8s/demo/base/serviceradar-nats.yaml [38-43]</a> </summary> ```yaml limits: cpu: "1" memory: "4Gi" requests: cpu: "250m" memory: "2Gi" ``` </details> ### Solution Walkthrough: #### Before: ```yaml # k8s/demo/base/serviceradar-sync.yaml ... resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "512Mi" ... ``` #### After: ```yaml # In PR Description: # Based on monitoring data showing consistent memory usage near 1.8Gi # and frequent CPU throttling, resources are being increased. # See dashboard: [link-to-metrics] # k8s/demo/base/serviceradar-sync.yaml ... resources: requests: cpu: "250m" memory: "2Gi" # Sized based on observed usage + buffer limits: cpu: "1" memory: "4Gi" # Sized to prevent OOMKills during spikes ... ``` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: This suggestion addresses a critical operational concern, as blindly increasing resource allocations without performance data can lead to significant, unnecessary infrastructure costs. </details></details></td><td align=center>High </td></tr><tr><td rowspan=2>General</td> <td> <details><summary>Increase memory request ratio</summary> ___ **Increase the memory request for <code>serviceradar-agent</code> from <code>1Gi</code> to <code>1.5Gi</code> to better <br>align with the <code>2Gi</code> limit, potentially improving pod stability under load.** [k8s/demo/base/serviceradar-agent.yaml [58-64]](https://github.com/carverauto/serviceradar/pull/1704/files#diff-750aaa803a43f0993450026e4174b8a7d20fe016b9ff726f154a77a4f0fb4e19R58-R64) ```diff resources: limits: cpu: "500m" memory: "2Gi" requests: cpu: "250m" - memory: "1Gi" + memory: "1.5Gi" ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=1 --> <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion provides a valid Kubernetes best practice to align memory requests closer to limits, which can improve pod stability and its Quality of Service (QoS) class. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Reduce excessive memory request<!-- not_implemented --></summary> ___ **Reduce the memory request for <code>serviceradar-sync</code> from <code>2Gi</code> to <code>1Gi</code> as the current <br>16x increase from <code>128Mi</code> seems excessive and may cause resource waste.** [k8s/demo/base/serviceradar-sync.yaml [36-42]](https://github.com/carverauto/serviceradar/pull/1704/files#diff-8b12e23e4eec411255c9ae947b353c680f5ff75a4e0891bc14e2db88d9e6b778R36-R42) ```diff resources: requests: cpu: "250m" - memory: "2Gi" + memory: "1Gi" limits: cpu: "1" memory: "4Gi" ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=2 --> <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The suggestion correctly questions the large increase in memory request, which could lead to resource waste, but it lacks specific data to definitively state the new value is wrong. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2276
No description provided.