fixing flakey test #2550

Merged
mfreeman451 merged 1 commit from refs/pull/2550/head into main 2025-12-12 04:35:28 +00:00
mfreeman451 commented 2025-12-12 04:34:57 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2111
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2111
Original created: 2025-12-12T04:34:57Z
Original updated: 2025-12-12T04:36:12Z
Original head: carverauto/serviceradar:update/fix_flakey_kv_tests
Original base: main
Original merged: 2025-12-12T04:35:28Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Tests


Description

  • Increased timeout values in flaky test cases

  • Expanded watch channel buffer from 1 to 8 to prevent blocking

  • Adjusted timing thresholds to improve test reliability


Diagram Walkthrough

flowchart LR
  A["Test Timeouts"] -->|increased| B["50ms → 500ms<br/>25ms → 100ms<br/>30ms → 100ms"]
  C["Watch Channel"] -->|buffer expanded| D["1 → 8 capacity"]
  B --> E["Improved Test Stability"]
  D --> E

File Walkthrough

Relevant files
Tests
kv_client_test.go
Increase test timeouts and watch channel buffer                   

pkg/config/kv_client_test.go

  • Increased timeout in TestKVManagerStartWatchReloadsOnAnyChange from
    50ms to 500ms for reload detection
  • Increased timeout in same test from 25ms to 100ms for no-reload
    scenario
  • Increased timeout in TestKVManagerStartWatchReappliesPinnedOverlay
    from 30ms to 100ms
  • Expanded watchKVStore.Watch channel buffer from 1 to 8 to prevent emit
    operations from blocking
+4/-4     

Imported from GitHub pull request. Original GitHub pull request: #2111 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2111 Original created: 2025-12-12T04:34:57Z Original updated: 2025-12-12T04:36:12Z Original head: carverauto/serviceradar:update/fix_flakey_kv_tests Original base: main Original merged: 2025-12-12T04:35:28Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Tests ___ ### **Description** - Increased timeout values in flaky test cases - Expanded watch channel buffer from 1 to 8 to prevent blocking - Adjusted timing thresholds to improve test reliability ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Test Timeouts"] -->|increased| B["50ms → 500ms<br/>25ms → 100ms<br/>30ms → 100ms"] C["Watch Channel"] -->|buffer expanded| D["1 → 8 capacity"] B --> E["Improved Test Stability"] D --> E ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Tests</strong></td><td><table> <tr> <td> <details> <summary><strong>kv_client_test.go</strong><dd><code>Increase test timeouts and watch channel buffer</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary> <hr> pkg/config/kv_client_test.go <ul><li>Increased timeout in <code>TestKVManagerStartWatchReloadsOnAnyChange</code> from <br>50ms to 500ms for reload detection<br> <li> Increased timeout in same test from 25ms to 100ms for no-reload <br>scenario<br> <li> Increased timeout in <code>TestKVManagerStartWatchReappliesPinnedOverlay</code> <br>from 30ms to 100ms<br> <li> Expanded <code>watchKVStore.Watch</code> channel buffer from 1 to 8 to prevent emit <br>operations from blocking</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/2111/files#diff-09814dd9ab05f9d40f40974f297806d71fe2eca5f8120492c9f79311153d3aef">+4/-4</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></td></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-12-12 04:35:16 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2111#issuecomment-3644869811
Original created: 2025-12-12T04:35:16Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit scope: The changes only adjust test timeouts and a buffered channel size without introducing or
affecting audit logging of critical actions, so compliance cannot be determined from this
diff.

Referred Code
	store.emit([]byte(`{"listen_addr":":9090"}`))

	select {
	case <-reloads:
	case <-time.After(500 * time.Millisecond):
		t.Fatalf("timed out waiting for reload triggered by listen_addr change")
	}

	if snap := cfg.snapshot(); snap.ListenAddr != ":9090" {
		t.Fatalf("expected config to apply listen_addr change, got %q", snap.ListenAddr)
	}

	store.emit([]byte(`{"listen_addr":":9090"}`))

	select {
	case <-reloads:
		t.Fatalf("unexpected reload when config payload did not change")
	case <-time.After(100 * time.Millisecond):
	}
}



 ... (clipped 141 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Timeout handling: Tests use fixed timeouts and rely on timing-based selects without explicit handling for
slow environments, which may still be flaky though the change increases thresholds.

Referred Code
	select {
	case <-reloads:
	case <-time.After(500 * time.Millisecond):
		t.Fatalf("timed out waiting for reload triggered by listen_addr change")
	}

	if snap := cfg.snapshot(); snap.ListenAddr != ":9090" {
		t.Fatalf("expected config to apply listen_addr change, got %q", snap.ListenAddr)
	}

	store.emit([]byte(`{"listen_addr":":9090"}`))

	select {
	case <-reloads:
		t.Fatalf("unexpected reload when config payload did not change")
	case <-time.After(100 * time.Millisecond):
	}
}

func TestKVManagerStartWatchReappliesPinnedOverlay(t *testing.T) {
	initialKV := []byte(`{"listen_addr":":8080","logging":{"level":"info"}}`)


 ... (clipped 39 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Test-only changes: The diff only modifies test timeouts and a channel buffer and does not touch input
validation or data handling paths, so security validation compliance cannot be assessed
from this change.

Referred Code
func (s *watchKVStore) Watch(ctx context.Context, _ string) (<-chan []byte, error) {
	s.mu.Lock()
	if s.watchCh == nil {
		s.watchCh = make(chan []byte, 8) // Larger buffer to prevent emit from blocking
		s.readyOnce.Do(func() {

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2111#issuecomment-3644869811 Original created: 2025-12-12T04:35:16Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/2f3279d888412ad91636519aca73d92fda8c6b6d --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td>🟢</td><td><details><summary><strong>No security concerns identified</strong></summary> No security vulnerabilities detected by AI analysis. Human verification advised for critical code. </details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=3>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=3>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2111/files#diff-09814dd9ab05f9d40f40974f297806d71fe2eca5f8120492c9f79311153d3aefR330-R491'><strong>No audit scope</strong></a>: The changes only adjust test timeouts and a buffered channel size without introducing or <br>affecting audit logging of critical actions, so compliance cannot be determined from this <br>diff.<br> <details open><summary>Referred Code</summary> ```go store.emit([]byte(`{"listen_addr":":9090"}`)) select { case <-reloads: case <-time.After(500 * time.Millisecond): t.Fatalf("timed out waiting for reload triggered by listen_addr change") } if snap := cfg.snapshot(); snap.ListenAddr != ":9090" { t.Fatalf("expected config to apply listen_addr change, got %q", snap.ListenAddr) } store.emit([]byte(`{"listen_addr":":9090"}`)) select { case <-reloads: t.Fatalf("unexpected reload when config payload did not change") case <-time.After(100 * time.Millisecond): } } ... (clipped 141 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2111/files#diff-09814dd9ab05f9d40f40974f297806d71fe2eca5f8120492c9f79311153d3aefR332-R391'><strong>Timeout handling</strong></a>: Tests use fixed timeouts and rely on timing-based selects without explicit handling for <br>slow environments, which may still be flaky though the change increases thresholds.<br> <details open><summary>Referred Code</summary> ```go select { case <-reloads: case <-time.After(500 * time.Millisecond): t.Fatalf("timed out waiting for reload triggered by listen_addr change") } if snap := cfg.snapshot(); snap.ListenAddr != ":9090" { t.Fatalf("expected config to apply listen_addr change, got %q", snap.ListenAddr) } store.emit([]byte(`{"listen_addr":":9090"}`)) select { case <-reloads: t.Fatalf("unexpected reload when config payload did not change") case <-time.After(100 * time.Millisecond): } } func TestKVManagerStartWatchReappliesPinnedOverlay(t *testing.T) { initialKV := []byte(`{"listen_addr":":8080","logging":{"level":"info"}}`) ... (clipped 39 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2111/files#diff-09814dd9ab05f9d40f40974f297806d71fe2eca5f8120492c9f79311153d3aefR483-R487'><strong>Test-only changes</strong></a>: The diff only modifies test timeouts and a channel buffer and does not touch input <br>validation or data handling paths, so security validation compliance cannot be assessed <br>from this change.<br> <details open><summary>Referred Code</summary> ```go func (s *watchKVStore) Watch(ctx context.Context, _ string) (<-chan []byte, error) { s.mu.Lock() if s.watchCh == nil { s.watchCh = make(chan []byte, 8) // Larger buffer to prevent emit from blocking s.readyOnce.Do(func() { ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> <!-- placeholder --> <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2025-12-12 04:36:12 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2111#issuecomment-3644871358
Original created: 2025-12-12T04:36:12Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use a non-blocking channel read

Replace the timed wait (time.After) with a non-blocking channel read using a
select with a default case to deterministically check for an unexpected reload.

pkg/config/kv_client_test.go [342-348]

 	store.emit([]byte(`{"listen_addr":":9090"}`))
 
 	select {
 	case <-reloads:
 		t.Fatalf("unexpected reload when config payload did not change")
-	case <-time.After(100 * time.Millisecond):
+	default:
+		// No reload, which is the expected behavior.
 	}

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that using time.After to check for the absence of a signal is a source of test flakiness and proposes a more robust, non-blocking select with a default case.

Medium
  • More
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2111#issuecomment-3644871358 Original created: 2025-12-12T04:36:12Z --- ## PR Code Suggestions ✨ <!-- 2f3279d --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>General</td> <td> <details><summary>Use a non-blocking channel read</summary> ___ **Replace the timed wait (<code>time.After</code>) with a non-blocking channel read using a <br><code>select</code> with a <code>default</code> case to deterministically check for an unexpected reload.** [pkg/config/kv_client_test.go [342-348]](https://github.com/carverauto/serviceradar/pull/2111/files#diff-09814dd9ab05f9d40f40974f297806d71fe2eca5f8120492c9f79311153d3aefR342-R348) ```diff store.emit([]byte(`{"listen_addr":":9090"}`)) select { case <-reloads: t.Fatalf("unexpected reload when config payload did not change") - case <-time.After(100 * time.Millisecond): + default: + // No reload, which is the expected behavior. } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: The suggestion correctly identifies that using `time.After` to check for the absence of a signal is a source of test flakiness and proposes a more robust, non-blocking `select` with a `default` case. </details></details></td><td align=center>Medium </td></tr> <tr><td align="center" colspan="2"> - [ ] More <!-- /improve --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2550
No description provided.