poller update missed in last check in #2317

Merged
mfreeman451 merged 2 commits from refs/pull/2317/head into main 2025-10-14 03:36:45 +00:00
mfreeman451 commented 2025-10-14 03:15:15 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #1757
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1757
Original created: 2025-10-14T03:15:15Z
Original updated: 2025-10-14T03:36:48Z
Original head: carverauto/serviceradar:sysmonvm/integration_tests_2
Original base: main
Original merged: 2025-10-14T03:36:45Z by @mfreeman451

PR Type

Bug fix, Enhancement


Description

  • Fixed missing client assignment after connectToCore call in initialization

  • Refactored connectToCore to return clients instead of setting them directly

  • Improved thread safety in reconnection logic with proper mutex handling

  • Enhanced error handling during client closure operations


Diagram Walkthrough

flowchart LR
  A["connectToCore()"] -- "returns clients" --> B["grpcClient, coreClient"]
  B -- "assigned with mutex" --> C["Poller struct"]
  D["reconnectCore()"] -- "closes old clients" --> E["mutex unlock"]
  E -- "creates new clients" --> B
  F["UpdateConfig()"] -- "reconnects if needed" --> D

File Walkthrough

Relevant files
Bug fix
poller.go
Fix client assignment and improve connection thread safety

pkg/poller/poller.go

  • Refactored connectToCore to return client instances instead of
    directly assigning to struct fields
  • Fixed missing client assignment in New function after connectToCore
    call
  • Improved mutex handling in reconnectCore to prevent deadlocks and race
    conditions
  • Enhanced UpdateConfig to properly manage client lifecycle during
    reconnection
+35/-12 

Imported from GitHub pull request. Original GitHub pull request: #1757 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/1757 Original created: 2025-10-14T03:15:15Z Original updated: 2025-10-14T03:36:48Z Original head: carverauto/serviceradar:sysmonvm/integration_tests_2 Original base: main Original merged: 2025-10-14T03:36:45Z by @mfreeman451 --- ### **PR Type** Bug fix, Enhancement ___ ### **Description** - Fixed missing client assignment after `connectToCore` call in initialization - Refactored `connectToCore` to return clients instead of setting them directly - Improved thread safety in reconnection logic with proper mutex handling - Enhanced error handling during client closure operations ___ ### Diagram Walkthrough ```mermaid flowchart LR A["connectToCore()"] -- "returns clients" --> B["grpcClient, coreClient"] B -- "assigned with mutex" --> C["Poller struct"] D["reconnectCore()"] -- "closes old clients" --> E["mutex unlock"] E -- "creates new clients" --> B F["UpdateConfig()"] -- "reconnects if needed" --> D ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><table> <tr> <td> <details> <summary><strong>poller.go</strong><dd><code>Fix client assignment and improve connection thread safety</code></dd></summary> <hr> pkg/poller/poller.go <ul><li>Refactored <code>connectToCore</code> to return client instances instead of <br>directly assigning to struct fields<br> <li> Fixed missing client assignment in <code>New</code> function after <code>connectToCore</code> <br>call<br> <li> Improved mutex handling in <code>reconnectCore</code> to prevent deadlocks and race <br>conditions<br> <li> Enhanced <code>UpdateConfig</code> to properly manage client lifecycle during <br>reconnection</ul> </details> </td> <td><a href="https://github.com/carverauto/serviceradar/pull/1757/files#diff-28a10dea1596540e55ce9a8b68bd1af3d96bd4634f6def668643892cef25a086">+35/-12</a>&nbsp; </td> </tr> </table></td></tr></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-10-14 03:15:46 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1757#issuecomment-3399954606
Original created: 2025-10-14T03:15:46Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Potential deadlock

Description: Closing the existing gRPC client while holding the mutex may block other goroutines that
need the lock if Close() internally performs blocking operations or callbacks, potentially
leading to a lock convoy or deadlock under high load.
poller.go [487-494]

Referred Code
p.mu.Lock()
if p.grpcClient != nil {
	// Close the existing client while holding the lock to prevent its use.
	if err := p.grpcClient.Close(); err != nil {
		p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect")
	}
	p.grpcClient = nil
	p.coreClient = nil
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1757#issuecomment-3399954606 Original created: 2025-10-14T03:15:46Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/abd8268ef14ea63f04dda1f3af8f35fd31bfdfc5 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=1>⚪</td> <td><details><summary><strong>Potential deadlock </strong></summary><br> <b>Description:</b> Closing the existing gRPC client while holding the mutex may block other goroutines that <br>need the lock if Close() internally performs blocking operations or callbacks, potentially <br>leading to a lock convoy or deadlock under high load.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/1757/files#diff-28a10dea1596540e55ce9a8b68bd1af3d96bd4634f6def668643892cef25a086R487-R494'>poller.go [487-494]</a></strong><br> <details open><summary>Referred Code</summary> ```go p.mu.Lock() if p.grpcClient != nil { // Close the existing client while holding the lock to prevent its use. if err := p.grpcClient.Close(); err != nil { p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect") } p.grpcClient = nil p.coreClient = nil ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </summary></strong> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>No custom compliance provided</strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/'>guide</a> to enable custom compliance check. </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2025-10-14 03:16:53 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1757#issuecomment-3399956226
Original created: 2025-10-14T03:16:53Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent race condition during reconnection
Suggestion Impact:The commit refactored reconnect and shutdown paths to avoid closing the existing client while holding the lock. It now captures the old client under lock, releases the lock, closes it, and only then proceeds, reducing race conditions. Although the commit still connects after releasing the lock (not before locking as suggested), it implements the core idea of not closing under lock and atomically swapping clients.

code diff:

@@ -484,17 +524,20 @@
 	reconnectCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
 	defer cancel()
 
+	var clientToClose *grpc.Client
 	p.mu.Lock()
 	if p.grpcClient != nil {
-		// Close the existing client while holding the lock to prevent its use.
-		if err := p.grpcClient.Close(); err != nil {
-			p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect")
-		}
+		clientToClose = p.grpcClient
 		p.grpcClient = nil
 		p.coreClient = nil
 	}
-
 	p.mu.Unlock()
+
+	if clientToClose != nil {
+		if err := clientToClose.Close(); err != nil {
+			p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect")
+		}
+	}
 
 	grpcClient, coreClient, err := p.connectToCore(reconnectCtx)
 	if err != nil {
@@ -627,14 +670,19 @@

To prevent a race condition in reconnectCore, first establish the new gRPC
connection, then acquire the lock to atomically swap the old client with the new
one.

pkg/poller/poller.go [483-510]

 func (p *Poller) reconnectCore(ctx context.Context) error {
 	reconnectCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
 	defer cancel()
 
-	p.mu.Lock()
-	if p.grpcClient != nil {
-		// Close the existing client while holding the lock to prevent its use.
-		if err := p.grpcClient.Close(); err != nil {
-			p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect")
-		}
-		p.grpcClient = nil
-		p.coreClient = nil
-	}
-
-	p.mu.Unlock()
-
+	// Create new clients before acquiring the lock.
 	grpcClient, coreClient, err := p.connectToCore(reconnectCtx)
 	if err != nil {
 		return err
 	}
 
 	p.mu.Lock()
+	defer p.mu.Unlock()
+
+	// Close the old client after a new one is ready.
+	if p.grpcClient != nil {
+		if err := p.grpcClient.Close(); err != nil {
+			p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect")
+		}
+	}
+
+	// Atomically swap to the new clients.
 	p.grpcClient = grpcClient
 	p.coreClient = coreClient
-	p.mu.Unlock()
 
 	return nil
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical race condition introduced in the PR that could lead to a nil pointer dereference and a panic, and it proposes a robust fix.

High
Fix race condition on config update
Suggestion Impact:The commit changes the reconnect logic to avoid closing the existing gRPC client while holding the lock; it now grabs the client under lock, releases the lock, closes it outside the lock, and then proceeds to connect and later set the new client. This addresses the race condition the suggestion targeted, though it does not connect before locking as suggested.

code diff:

@@ -627,14 +670,19 @@
 	}
 	if reconnectCore {
 		p.mu.Lock()
+		var clientToClose *grpc.Client
 		if p.grpcClient != nil {
-			if err := p.grpcClient.Close(); err != nil {
-				p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect")
-			}
+			clientToClose = p.grpcClient
 			p.grpcClient = nil
 			p.coreClient = nil
 		}
 		p.mu.Unlock()
+
+		if clientToClose != nil {
+			if err := clientToClose.Close(); err != nil {
+				p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect")
+			}
+		}

To fix a race condition in UpdateConfig, first establish the new gRPC
connection, then acquire the lock to atomically replace the old client with the
new one.

pkg/poller/poller.go [628-649]

 	if reconnectCore {
-		p.mu.Lock()
-		if p.grpcClient != nil {
-			if err := p.grpcClient.Close(); err != nil {
-				p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect")
-			}
-			p.grpcClient = nil
-			p.coreClient = nil
-		}
-		p.mu.Unlock()
-
 		grpcClient, coreClient, err := p.connectToCore(ctx)
 		if err != nil {
 			p.logger.Error().Err(err).Msg("Failed to reconnect to core")
 		} else {
 			p.mu.Lock()
+			if p.grpcClient != nil {
+				if err := p.grpcClient.Close(); err != nil {
+					p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect")
+				}
+			}
 			p.grpcClient = grpcClient
 			p.coreClient = coreClient
 			p.mu.Unlock()
 			p.logger.Info().Msg("Reconnected to core after config change")
 		}
 	}

[Suggestion processed]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical race condition in UpdateConfig that was introduced in the PR, which could cause a panic, and provides the correct pattern to fix it.

High
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1757#issuecomment-3399956226 Original created: 2025-10-14T03:16:53Z --- ## PR Code Suggestions ✨ <!-- abd8268 --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=2>Possible issue</td> <td> <details><summary>✅ <s>Prevent race condition during reconnection</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit refactored reconnect and shutdown paths to avoid closing the existing client while holding the lock. It now captures the old client under lock, releases the lock, closes it, and only then proceeds, reducing race conditions. Although the commit still connects after releasing the lock (not before locking as suggested), it implements the core idea of not closing under lock and atomically swapping clients. code diff: ```diff @@ -484,17 +524,20 @@ reconnectCtx, cancel := context.WithTimeout(ctx, 15*time.Second) defer cancel() + var clientToClose *grpc.Client p.mu.Lock() if p.grpcClient != nil { - // Close the existing client while holding the lock to prevent its use. - if err := p.grpcClient.Close(); err != nil { - p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect") - } + clientToClose = p.grpcClient p.grpcClient = nil p.coreClient = nil } - p.mu.Unlock() + + if clientToClose != nil { + if err := clientToClose.Close(); err != nil { + p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect") + } + } grpcClient, coreClient, err := p.connectToCore(reconnectCtx) if err != nil { @@ -627,14 +670,19 @@ ``` </details> ___ **To prevent a race condition in <code>reconnectCore</code>, first establish the new gRPC <br>connection, then acquire the lock to atomically swap the old client with the new <br>one.** [pkg/poller/poller.go [483-510]](https://github.com/carverauto/serviceradar/pull/1757/files#diff-28a10dea1596540e55ce9a8b68bd1af3d96bd4634f6def668643892cef25a086R483-R510) ```diff func (p *Poller) reconnectCore(ctx context.Context) error { reconnectCtx, cancel := context.WithTimeout(ctx, 15*time.Second) defer cancel() - p.mu.Lock() - if p.grpcClient != nil { - // Close the existing client while holding the lock to prevent its use. - if err := p.grpcClient.Close(); err != nil { - p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect") - } - p.grpcClient = nil - p.coreClient = nil - } - - p.mu.Unlock() - + // Create new clients before acquiring the lock. grpcClient, coreClient, err := p.connectToCore(reconnectCtx) if err != nil { return err } p.mu.Lock() + defer p.mu.Unlock() + + // Close the old client after a new one is ready. + if p.grpcClient != nil { + if err := p.grpcClient.Close(); err != nil { + p.logger.Warn().Err(err).Msg("Failed to close existing core client during reconnect") + } + } + + // Atomically swap to the new clients. p.grpcClient = grpcClient p.coreClient = coreClient - p.mu.Unlock() return nil } ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a critical race condition introduced in the PR that could lead to a nil pointer dereference and a panic, and it proposes a robust fix. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>✅ <s>Fix race condition on config update</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit changes the reconnect logic to avoid closing the existing gRPC client while holding the lock; it now grabs the client under lock, releases the lock, closes it outside the lock, and then proceeds to connect and later set the new client. This addresses the race condition the suggestion targeted, though it does not connect before locking as suggested. code diff: ```diff @@ -627,14 +670,19 @@ } if reconnectCore { p.mu.Lock() + var clientToClose *grpc.Client if p.grpcClient != nil { - if err := p.grpcClient.Close(); err != nil { - p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect") - } + clientToClose = p.grpcClient p.grpcClient = nil p.coreClient = nil } p.mu.Unlock() + + if clientToClose != nil { + if err := clientToClose.Close(); err != nil { + p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect") + } + } ``` </details> ___ **To fix a race condition in <code>UpdateConfig</code>, first establish the new gRPC <br>connection, then acquire the lock to atomically replace the old client with the <br>new one.** [pkg/poller/poller.go [628-649]](https://github.com/carverauto/serviceradar/pull/1757/files#diff-28a10dea1596540e55ce9a8b68bd1af3d96bd4634f6def668643892cef25a086R628-R649) ```diff if reconnectCore { - p.mu.Lock() - if p.grpcClient != nil { - if err := p.grpcClient.Close(); err != nil { - p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect") - } - p.grpcClient = nil - p.coreClient = nil - } - p.mu.Unlock() - grpcClient, coreClient, err := p.connectToCore(ctx) if err != nil { p.logger.Error().Err(err).Msg("Failed to reconnect to core") } else { p.mu.Lock() + if p.grpcClient != nil { + if err := p.grpcClient.Close(); err != nil { + p.logger.Warn().Err(err).Msg("Failed to close core client before config reconnect") + } + } p.grpcClient = grpcClient p.coreClient = coreClient p.mu.Unlock() p.logger.Info().Msg("Reconnected to core after config change") } } ``` `[Suggestion processed]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a critical race condition in `UpdateConfig` that was introduced in the PR, which could cause a panic, and provides the correct pattern to fix it. </details></details></td><td align=center>High </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2317
No description provided.