1921 bugcore missing agent with ping metrics #2401

Merged
mfreeman451 merged 14 commits from refs/pull/2401/head into main 2025-11-05 03:30:28 +00:00
mfreeman451 commented 2025-11-03 15:33:24 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #1923
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1923
Original created: 2025-11-03T15:33:24Z
Original updated: 2025-11-05T03:31:01Z
Original head: carverauto/serviceradar:1921-bugcore-missing-agent-with-ping-metrics
Original base: main
Original merged: 2025-11-05T03:30:28Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Bug fix, Enhancement


Description

  • Fix missing ICMP metrics for agents by implementing device canonicalization and fallback database queries

  • Add service device update buffering and batch processing to improve device registry synchronization

  • Enhance ICMP metric processing with canonical device resolution and metadata enrichment

  • Improve database connection pooling and add IP field to canonical cache snapshots


Diagram Walkthrough

flowchart LR
  A["ICMP Metrics Processing"] -->|resolve canonical device| B["Device Canonicalization"]
  B -->|lookup or fetch| C["Canonical Cache"]
  A -->|enqueue update| D["Service Device Buffer"]
  D -->|batch flush| E["Device Registry"]
  A -->|fallback query| F["Database ICMP Metrics"]
  F -->|return metrics| G["API Response"]

File Walkthrough

Relevant files
Bug fix
2 files
server.go
Add ICMP metrics database fallback lookup                               
+19/-0   
metrics.go
Enhance ICMP metrics with canonical device resolution       
+165/-10
Enhancement
9 files
canonical_cache.go
Store IP address in canonical cache snapshots                       
+3/-0     
devices.go
Implement service device update buffering and batch processing
+52/-10 
flush.go
Add service device updates flush and fix formatting           
+99/-58 
result_processor.go
Store IP in canonical snapshot for sweep identities           
+1/-0     
server.go
Initialize service device buffer in server setup                 
+1/-0     
types.go
Add service device buffer and mutex to server struct         
+2/-0     
db.go
Increase database connection pool and timeout settings     
+3/-3     
interfaces.go
Add GetICMPMetricsForDevice interface method                         
+1/-0     
metrics.go
Implement ICMP metrics query with device and collector IP matching
+53/-0   
Tests
3 files
devices_test.go
Add tests for service device registration logic                   
+106/-0 
server_test.go
Mock unified devices query in test setup                                 
+1/-0     
mock_db.go
Add mock for GetICMPMetricsForDevice method                           
+15/-0   

Imported from GitHub pull request. Original GitHub pull request: #1923 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/1923 Original created: 2025-11-03T15:33:24Z Original updated: 2025-11-05T03:31:01Z Original head: carverauto/serviceradar:1921-bugcore-missing-agent-with-ping-metrics Original base: main Original merged: 2025-11-05T03:30:28Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Bug fix, Enhancement ___ ### **Description** - Fix missing ICMP metrics for agents by implementing device canonicalization and fallback database queries - Add service device update buffering and batch processing to improve device registry synchronization - Enhance ICMP metric processing with canonical device resolution and metadata enrichment - Improve database connection pooling and add IP field to canonical cache snapshots ___ ### Diagram Walkthrough ```mermaid flowchart LR A["ICMP Metrics Processing"] -->|resolve canonical device| B["Device Canonicalization"] B -->|lookup or fetch| C["Canonical Cache"] A -->|enqueue update| D["Service Device Buffer"] D -->|batch flush| E["Device Registry"] A -->|fallback query| F["Database ICMP Metrics"] F -->|return metrics| G["API Response"] ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Bug fix</strong></td><td><details><summary>2 files</summary><table> <tr> <td><strong>server.go</strong><dd><code>Add ICMP metrics database fallback lookup</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-1bb99367fdd853c728b7cfbf5893a293f6d217144dfb5282cb8dd32e5261021e">+19/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>metrics.go</strong><dd><code>Enhance ICMP metrics with canonical device resolution</code>&nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443ad">+165/-10</a></td> </tr> </table></details></td></tr><tr><td><strong>Enhancement</strong></td><td><details><summary>9 files</summary><table> <tr> <td><strong>canonical_cache.go</strong><dd><code>Store IP address in canonical cache snapshots</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-f15007387966f0f7a8fc3fe7dd99304bb6426f02a5b4cc76370999ee4459b2b6">+3/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>devices.go</strong><dd><code>Implement service device update buffering and batch processing</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-fb7f5159558f22afdc809b9cbf5a9c0f71560be2f7389c944da8f1604998ea48">+52/-10</a>&nbsp; </td> </tr> <tr> <td><strong>flush.go</strong><dd><code>Add service device updates flush and fix formatting</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-4ad1cf4b0397bd277cd9c0553e4992ad7c9e9484d49d49d48af015958dc4bf4b">+99/-58</a>&nbsp; </td> </tr> <tr> <td><strong>result_processor.go</strong><dd><code>Store IP in canonical snapshot for sweep identities</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-828f49c36998b59e7eea7ceca2e1982ce06366bca483c9a52439a360ce2a0a50">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>server.go</strong><dd><code>Initialize service device buffer in server setup</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-a12d34beb8640efbbedad28f8610fd58afeea457572d82e4fd295ab2a6bb8ee6">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>types.go</strong><dd><code>Add service device buffer and mutex to server struct</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-717f128517472d3dc4091e67bfc0f4fe4c36e32096c5ef87d78f34cbc64d2399">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>db.go</strong><dd><code>Increase database connection pool and timeout settings</code>&nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-5da3684806835246d262230050593f460b12b6c0e3966df174e6061be0e9e575">+3/-3</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>interfaces.go</strong><dd><code>Add GetICMPMetricsForDevice interface method</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-c230fe0c315251837357bfde4ae7f7b34080398d8e48af6bf78badb2124271f3">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>metrics.go</strong><dd><code>Implement ICMP metrics query with device and collector IP matching</code></dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-2cf937853ac86eb9a879bd370cbe22b835ee0c48be6771b48b0276d93cc2ae93">+53/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>3 files</summary><table> <tr> <td><strong>devices_test.go</strong><dd><code>Add tests for service device registration logic</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-133a0e1348e2d2dcfc9645f0409219463138e17380efd4fbc213be14c104043b">+106/-0</a>&nbsp; </td> </tr> <tr> <td><strong>server_test.go</strong><dd><code>Mock unified devices query in test setup</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-a406eabc08b2c0cc98ca61d603c8dceb8e4592c989be63da2735d705d20db86a">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>mock_db.go</strong><dd><code>Add mock for GetICMPMetricsForDevice method</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/1923/files#diff-30e38f888d4849fc40d7ebb1559c2a84c43aa8cd13b3b89fd7ec6cf873b243c7">+15/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2025-11-03 15:34:06 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1923#issuecomment-3481150508
Original created: 2025-11-03T15:34:06Z

PR Compliance Guide 🔍

(Compliance updated until commit github.com/carverauto/serviceradar@a2c9cda150)

Below is a summary of compliance checks for this PR:

Security Compliance
Device ID injection risk

Description: The deviceID is constructed from untrusted input (pingResult.DeviceID or generated from
sourceIP) without validation, potentially allowing device ID spoofing or injection attacks
if the input contains malicious characters.
metrics.go [546-565]

Referred Code
targetHost := strings.TrimSpace(pingResult.Host)
deviceID := strings.TrimSpace(pingResult.DeviceID)

if deviceID == "" {
	deviceID = models.GenerateNetworkDeviceID(partition, sourceIP)
}

resolution := s.resolveCanonicalDevice(ctx, sourceIP, deviceID)
if strings.TrimSpace(resolution.DeviceID) != "" {
	deviceID = strings.TrimSpace(resolution.DeviceID)
}

updateIP := strings.TrimSpace(resolution.IP)
if updateIP == "" {
	updateIP = ipFromDeviceID(deviceID)
}
if updateIP == "" {
	updateIP = sourceIP
}

SQL injection via escapeLiteral

Description: The escapeLiteral function is called to sanitize user input for SQL query construction,
but the function implementation is not visible in the diff, creating potential SQL
injection risk if the escaping is insufficient.
unified_devices.go [285-289]

Referred Code
buildQuery := func(column string, values []string) string {
	placeholders := make([]string, len(values))
	for i, value := range values {
		placeholders[i] = fmt.Sprintf("'%s'", escapeLiteral(value))
	}
Boolean parsing ambiguity

Description: The parseBoolQuery function accepts various string representations of boolean values but
returns false, true for unrecognized values, which could lead to unexpected behavior when
invalid input is treated as false rather than being rejected.
server.go [2255-2268]

Referred Code
func parseBoolQuery(values url.Values, key string) (bool, bool) {
	raw := strings.TrimSpace(values.Get(key))
	if raw == "" {
		return false, false
	}

	switch strings.ToLower(raw) {
	case "1", "true", "t", "yes", "y":
		return true, true
	case "0", "false", "f", "no", "n":
		return false, true
	default:
		return false, true
	}
Ticket Compliance
🟡
🎫 #1921
🟢 Ensure local k8s-agent reporting ICMP metrics appears in inventory
Fix missing agent with ping metrics issue
Re-add deleted devices that are still sending data (healthchecks, metrics) with valid
SPIFFE credentials
Remove tombstone metadata for devices that resume sending data
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Generic variable names: Variables like 'result', 'hits', 'misses',
'snapshot', 'record' use generic names that don't clearly express
their specific purpose in the context of device canonicalization and identity resolution.

Referred Code



 ... (clipped 61 lines)
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing error context: Error handling in resolveCanonicalDevice and fetchCanonicalSnapshot lacks context about
which device or IP failed, making debugging difficult in production.

Referred Code
func (s *Server) resolveCanonicalDevice(ctx context.Context, ip, fallback string) canonicalSnapshot {
	ip = strings.TrimSpace(ip)
	fallback = strings.TrimSpace(fallback)

	result := canonicalSnapshot{
		DeviceID: fallback,
	}

	if ip == "" {
		return result
	}

	if s.canonicalCache != nil {
		if hits, misses := s.canonicalCache.getBatch([]string{ip}); len(misses) == 0 {
			if snap, ok := hits[ip]; ok {
				return ensureSnapshotDeviceID(snap, fallback)
			}
		} else {
			if snap, ok := s.fetchCanonicalSnapshot(ctx, ip); ok {
				return ensureSnapshotDeviceID(snap, fallback)
			}


 ... (clipped 61 lines)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing IP validation: The deviceIP parameter extracted from query string is not validated before being used in
database queries, potentially allowing injection of malicious input.

Referred Code
var (
	deviceIP           string
	collectorCaps      collectorCapabilities
	collectorCapsKnown bool
)

queryValues := r.URL.Query()
if ipParam := strings.TrimSpace(queryValues.Get("device_ip")); ipParam != "" {
	deviceIP = ipParam
} else {
	if _, ip := splitDeviceID(deviceID); ip != "" {
		deviceIP = ip
	}
}

if hasCollector, ok := parseBoolQuery(queryValues, "has_collector"); ok {
	collectorCaps.hasCollector = hasCollector
	collectorCapsKnown = true
}
if supportsICMP, ok := parseBoolQuery(queryValues, "supports_icmp"); ok {
	collectorCaps.supportsICMP = supportsICMP


 ... (clipped 6 lines)
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label

Previous compliance checks

Compliance check up to commit 69686f1
Security Compliance
Unbounded input size

Description: Device metadata from ICMP processing is directly included in JSON without strict
validation or size limits, potentially allowing large or unexpected strings (e.g., service
names) to inflate storage or logs and cause resource exhaustion.
metrics.go [741-756]

Referred Code
if err != nil {
	s.logger.Warn().Err(err).Str("ip", ip).Msg("Failed to fetch canonical device by IP")
	return canonicalSnapshot{}, false
}

for _, device := range devices {
	if device == nil {
		continue
	}
	snapshot := canonicalSnapshot{
		DeviceID: strings.TrimSpace(device.DeviceID),
		IP:       strings.TrimSpace(device.IP),
	}
	if device.MAC != nil {
		snapshot.MAC = strings.TrimSpace(device.MAC.Value)
	}
Ticket Compliance
🟡
🎫 #1921
🟢 ICMP metrics from agents should be retrievable even when in-memory ring buffer has no
data.
Canonical device resolution should map agent IP/device to unified device identity for
consistent metrics and metadata.
System should update device registry with ICMP availability and related metadata when ICMP
results are processed.
Improve robustness/performance around device caching and DB access to help recover missing
agents.
Agents reporting ICMP ping metrics should appear as devices even if previously deleted or
tombstoned.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Fallback Query Logging: The new database fallback for ICMP metrics logs only count/device ID and warns on error,
but it is unclear if sufficient audit details (user/request context) are captured for this
critical action.

Referred Code
if len(timeseriesMetrics) == 0 {
	dbMetrics, dbErr := s.dbService.GetICMPMetricsForDevice(ctx, deviceID, deviceIP, startTime, endTime)
	if dbErr != nil {
		s.logger.Warn().
			Err(dbErr).
			Str("device_id", deviceID).
			Msg("Failed to fetch ICMP metrics from database fallback")
	} else {
		timeseriesMetrics = append(timeseriesMetrics, dbMetrics...)
	}
}
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
DB Error Context: New ICMP metrics query adds helpful context but may miss validation of inputs (e.g., empty
deviceID/IP) and detailed logging of filter values on failure paths.

Referred Code
ctx context.Context, deviceID, deviceIP string, start, end time.Time) ([]models.TimeseriesMetric, error) {
query := `
	SELECT metric_name, metric_type, value, metadata, timestamp, target_device_ip,
	       ifIndex, device_id, partition, poller_id
	FROM table(timeseries_metrics)
	WHERE metric_type = 'icmp'
	  AND timestamp BETWEEN $3 AND $4
	  AND (
	        device_id = $1
	     OR json_extract_string(metadata, 'device_id') = $1
	     OR (target_device_ip = $2 AND $2 != '')
	     OR (json_extract_string(metadata, 'collector_ip') = $2 AND $2 != '')
	      )
	ORDER BY timestamp DESC`

rows, err := db.Conn.Query(ctx, query, deviceID, deviceIP, start, end)
if err != nil {
	return nil, fmt.Errorf("failed to query ICMP metrics for device %s: %w", deviceID, err)
}
defer func() { _ = rows.Close() }()



 ... (clipped 29 lines)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Input Validation: The new GetICMPMetricsForDevice uses parameterized queries but does not validate or
normalize deviceID/deviceIP inputs before query execution, which may lead to unexpected
query breadth.

Referred Code
ctx context.Context, deviceID, deviceIP string, start, end time.Time) ([]models.TimeseriesMetric, error) {
query := `
	SELECT metric_name, metric_type, value, metadata, timestamp, target_device_ip,
	       ifIndex, device_id, partition, poller_id
	FROM table(timeseries_metrics)
	WHERE metric_type = 'icmp'
	  AND timestamp BETWEEN $3 AND $4
	  AND (
	        device_id = $1
	     OR json_extract_string(metadata, 'device_id') = $1
	     OR (target_device_ip = $2 AND $2 != '')
	     OR (json_extract_string(metadata, 'collector_ip') = $2 AND $2 != '')
	      )
	ORDER BY timestamp DESC`

rows, err := db.Conn.Query(ctx, query, deviceID, deviceIP, start, end)
if err != nil {
	return nil, fmt.Errorf("failed to query ICMP metrics for device %s: %w", deviceID, err)
}
defer func() { _ = rows.Close() }()



 ... (clipped 29 lines)
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1923#issuecomment-3481150508 Original created: 2025-11-03T15:34:06Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/a2c9cda15075d08726d73b421a5f2f7cfa9462f3 --> #### (Compliance updated until commit https://github.com/carverauto/serviceradar/commit/a2c9cda15075d08726d73b421a5f2f7cfa9462f3) Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=3>⚪</td> <td><details><summary><strong>Device ID injection risk </strong></summary><br> <b>Description:</b> The <code>deviceID</code> is constructed from untrusted input (<code>pingResult.DeviceID</code> or generated from <br><code>sourceIP</code>) without validation, potentially allowing device ID spoofing or injection attacks <br>if the input contains malicious characters.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443adR546-R565'>metrics.go [546-565]</a></strong><br> <details open><summary>Referred Code</summary> ```go targetHost := strings.TrimSpace(pingResult.Host) deviceID := strings.TrimSpace(pingResult.DeviceID) if deviceID == "" { deviceID = models.GenerateNetworkDeviceID(partition, sourceIP) } resolution := s.resolveCanonicalDevice(ctx, sourceIP, deviceID) if strings.TrimSpace(resolution.DeviceID) != "" { deviceID = strings.TrimSpace(resolution.DeviceID) } updateIP := strings.TrimSpace(resolution.IP) if updateIP == "" { updateIP = ipFromDeviceID(deviceID) } if updateIP == "" { updateIP = sourceIP } ``` </details></details></td></tr> <tr><td><details><summary><strong>SQL injection via escapeLiteral </strong></summary><br> <b>Description:</b> The <code>escapeLiteral</code> function is called to sanitize user input for SQL query construction, <br>but the function implementation is not visible in the diff, creating potential SQL <br>injection risk if the escaping is insufficient.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-165bffede411b9a031b4d85af0ae459a1913960593d59bfc19554cb838e39679R285-R289'>unified_devices.go [285-289]</a></strong><br> <details open><summary>Referred Code</summary> ```go buildQuery := func(column string, values []string) string { placeholders := make([]string, len(values)) for i, value := range values { placeholders[i] = fmt.Sprintf("'%s'", escapeLiteral(value)) } ``` </details></details></td></tr> <tr><td><details><summary><strong>Boolean parsing ambiguity </strong></summary><br> <b>Description:</b> The <code>parseBoolQuery</code> function accepts various string representations of boolean values but <br>returns <code>false, true</code> for unrecognized values, which could lead to unexpected behavior when <br>invalid input is treated as <code>false</code> rather than being rejected.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-1bb99367fdd853c728b7cfbf5893a293f6d217144dfb5282cb8dd32e5261021eR2255-R2268'>server.go [2255-2268]</a></strong><br> <details open><summary>Referred Code</summary> ```go func parseBoolQuery(values url.Values, key string) (bool, bool) { raw := strings.TrimSpace(values.Get(key)) if raw == "" { return false, false } switch strings.ToLower(raw) { case "1", "true", "t", "yes", "y": return true, true case "0", "false", "f", "no", "n": return false, true default: return false, true } ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>🟡</td> <td> <details> <summary>🎫 <a href=https://github.com/carverauto/serviceradar/issues/1921>#1921</a></summary> <table width='100%'><tbody> <tr><td rowspan=2>🟢</td> <td>Ensure local k8s-agent reporting ICMP metrics appears in inventory</td></tr> <tr><td>Fix missing agent with ping metrics issue</td></tr> <tr><td rowspan=2>⚪</td> <td>Re-add deleted devices that are still sending data (healthchecks, metrics) with valid <br>SPIFFE credentials</td></tr> <tr><td>Remove tombstone metadata for devices that resume sending data</td></tr> </tbody></table> </details> </td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=3>🟢</td><td> <details><summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** Passed<br> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** Passed<br> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** Passed<br> </details></td></tr> <tr><td rowspan=3>🔴</td> <td><details> <summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-4ad1cf4b0397bd277cd9c0553e4992ad7c9e9484d49d49d48af015958dc4bf4bR727-R808'><strong>Generic variable names</strong></a>: Variables like &#x27;result&#x27;, &#x27;hits&#x27;, &#x27;misses&#x27;, <br>&#x27;snapshot&#x27;, &#x27;record&#x27; use generic names that don&#x27;t clearly express <br>their specific purpose in the context of device canonicalization and identity resolution.<br> <details open><summary>Referred Code</summary> ```go ... (clipped 61 lines) ``` </details></details></td></tr> <tr><td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443adR727-R808'><strong>Missing error context</strong></a>: Error handling in <code>resolveCanonicalDevice</code> and <code>fetchCanonicalSnapshot</code> lacks context about <br>which device or IP failed, making debugging difficult in production.<br> <details open><summary>Referred Code</summary> ```go func (s *Server) resolveCanonicalDevice(ctx context.Context, ip, fallback string) canonicalSnapshot { ip = strings.TrimSpace(ip) fallback = strings.TrimSpace(fallback) result := canonicalSnapshot{ DeviceID: fallback, } if ip == "" { return result } if s.canonicalCache != nil { if hits, misses := s.canonicalCache.getBatch([]string{ip}); len(misses) == 0 { if snap, ok := hits[ip]; ok { return ensureSnapshotDeviceID(snap, fallback) } } else { if snap, ok := s.fetchCanonicalSnapshot(ctx, ip); ok { return ensureSnapshotDeviceID(snap, fallback) } ... (clipped 61 lines) ``` </details></details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-1bb99367fdd853c728b7cfbf5893a293f6d217144dfb5282cb8dd32e5261021eR2043-R2069'><strong>Missing IP validation</strong></a>: The <code>deviceIP</code> parameter extracted from query string is not validated before being used in <br>database queries, potentially allowing injection of malicious input.<br> <details open><summary>Referred Code</summary> ```go var ( deviceIP string collectorCaps collectorCapabilities collectorCapsKnown bool ) queryValues := r.URL.Query() if ipParam := strings.TrimSpace(queryValues.Get("device_ip")); ipParam != "" { deviceIP = ipParam } else { if _, ip := splitDeviceID(deviceID); ip != "" { deviceIP = ip } } if hasCollector, ok := parseBoolQuery(queryValues, "has_collector"); ok { collectorCaps.hasCollector = hasCollector collectorCapsKnown = true } if supportsICMP, ok := parseBoolQuery(queryValues, "supports_icmp"); ok { collectorCaps.supportsICMP = supportsICMP ... (clipped 6 lines) ``` </details></details></td></tr> <tr><td align="center" colspan="2"> <!-- placeholder --> <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details> ___ #### Previous compliance checks <details> <summary>Compliance check up to commit <a href='https://github.com/carverauto/serviceradar/commit/69686f1c9ec5bfa1b85a62e1668fb032700521d2'>69686f1</a></summary><br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=1>⚪</td> <td><details><summary><strong>Unbounded input size </strong></summary><br> <b>Description:</b> Device metadata from ICMP processing is directly included in JSON without strict <br>validation or size limits, potentially allowing large or unexpected strings (e.g., service <br>names) to inflate storage or logs and cause resource exhaustion.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443adR741-R756'>metrics.go [741-756]</a></strong><br> <details open><summary>Referred Code</summary> ```go if err != nil { s.logger.Warn().Err(err).Str("ip", ip).Msg("Failed to fetch canonical device by IP") return canonicalSnapshot{}, false } for _, device := range devices { if device == nil { continue } snapshot := canonicalSnapshot{ DeviceID: strings.TrimSpace(device.DeviceID), IP: strings.TrimSpace(device.IP), } if device.MAC != nil { snapshot.MAC = strings.TrimSpace(device.MAC.Value) } ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>🟡</td> <td> <details> <summary>🎫 <a href=https://github.com/carverauto/serviceradar/issues/1921>#1921</a></summary> <table width='100%'><tbody> <tr><td rowspan=4>🟢</td> <td>ICMP metrics from agents should be retrievable even when in-memory ring buffer has no <br>data.</td></tr> <tr><td>Canonical device resolution should map agent IP/device to unified device identity for <br>consistent metrics and metadata.</td></tr> <tr><td>System should update device registry with ICMP availability and related metadata when ICMP <br>results are processed.</td></tr> <tr><td>Improve robustness/performance around device caching and DB access to help recover missing <br>agents.</td></tr> <tr><td rowspan=1>⚪</td> <td>Agents reporting ICMP ping metrics should appear as devices even if previously deleted or <br>tombstoned.</td></tr> </tbody></table> </details> </td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=3>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** Passed<br> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** Passed<br> </details></td></tr> <tr><td rowspan=3>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-1bb99367fdd853c728b7cfbf5893a293f6d217144dfb5282cb8dd32e5261021eR1964-R1974'><strong>Fallback Query Logging</strong></a>: The new database fallback for ICMP metrics logs only count/device ID and warns on error, <br>but it is unclear if sufficient audit details (user/request context) are captured for this <br>critical action.<br> <details open><summary>Referred Code</summary> ```go if len(timeseriesMetrics) == 0 { dbMetrics, dbErr := s.dbService.GetICMPMetricsForDevice(ctx, deviceID, deviceIP, startTime, endTime) if dbErr != nil { s.logger.Warn(). Err(dbErr). Str("device_id", deviceID). Msg("Failed to fetch ICMP metrics from database fallback") } else { timeseriesMetrics = append(timeseriesMetrics, dbMetrics...) } } ``` </details></details></td></tr> <tr><td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-2cf937853ac86eb9a879bd370cbe22b835ee0c48be6771b48b0276d93cc2ae93R1140-R1189'><strong>DB Error Context</strong></a>: New ICMP metrics query adds helpful context but may miss validation of inputs (e.g., empty <br>deviceID/IP) and detailed logging of filter values on failure paths.<br> <details open><summary>Referred Code</summary> ```go ctx context.Context, deviceID, deviceIP string, start, end time.Time) ([]models.TimeseriesMetric, error) { query := ` SELECT metric_name, metric_type, value, metadata, timestamp, target_device_ip, ifIndex, device_id, partition, poller_id FROM table(timeseries_metrics) WHERE metric_type = 'icmp' AND timestamp BETWEEN $3 AND $4 AND ( device_id = $1 OR json_extract_string(metadata, 'device_id') = $1 OR (target_device_ip = $2 AND $2 != '') OR (json_extract_string(metadata, 'collector_ip') = $2 AND $2 != '') ) ORDER BY timestamp DESC` rows, err := db.Conn.Query(ctx, query, deviceID, deviceIP, start, end) if err != nil { return nil, fmt.Errorf("failed to query ICMP metrics for device %s: %w", deviceID, err) } defer func() { _ = rows.Close() }() ... (clipped 29 lines) ``` </details></details></td></tr> <tr><td><details> <summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/1923/files#diff-2cf937853ac86eb9a879bd370cbe22b835ee0c48be6771b48b0276d93cc2ae93R1140-R1189'><strong>Input Validation</strong></a>: The new GetICMPMetricsForDevice uses parameterized queries but does not validate or <br>normalize deviceID/deviceIP inputs before query execution, which may lead to unexpected <br>query breadth.<br> <details open><summary>Referred Code</summary> ```go ctx context.Context, deviceID, deviceIP string, start, end time.Time) ([]models.TimeseriesMetric, error) { query := ` SELECT metric_name, metric_type, value, metadata, timestamp, target_device_ip, ifIndex, device_id, partition, poller_id FROM table(timeseries_metrics) WHERE metric_type = 'icmp' AND timestamp BETWEEN $3 AND $4 AND ( device_id = $1 OR json_extract_string(metadata, 'device_id') = $1 OR (target_device_ip = $2 AND $2 != '') OR (json_extract_string(metadata, 'collector_ip') = $2 AND $2 != '') ) ORDER BY timestamp DESC` rows, err := db.Conn.Query(ctx, query, deviceID, deviceIP, start, end) if err != nil { return nil, fmt.Errorf("failed to query ICMP metrics for device %s: %w", deviceID, err) } defer func() { _ = rows.Close() }() ... (clipped 29 lines) ``` </details></details></td></tr> <tr><td align="center" colspan="2"> <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> </details>
qodo-code-review[bot] commented 2025-11-03 15:35:28 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1923#issuecomment-3481156264
Original created: 2025-11-03T15:35:28Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Correct device ID resolution logic

Correct the device ID resolution logic for ICMP metrics to use the targetHost IP
instead of the sourceIP.

pkg/core/metrics.go [546-556]

 	targetHost := strings.TrimSpace(pingResult.Host)
 	deviceID := strings.TrimSpace(pingResult.DeviceID)
 
 	if deviceID == "" {
-		deviceID = models.GenerateNetworkDeviceID(partition, sourceIP)
+		deviceID = models.GenerateNetworkDeviceID(partition, targetHost)
 	}
 
-	resolution := s.resolveCanonicalDevice(ctx, sourceIP, deviceID)
+	resolution := s.resolveCanonicalDevice(ctx, targetHost, deviceID)
 	if strings.TrimSpace(resolution.DeviceID) != "" {
 		deviceID = strings.TrimSpace(resolution.DeviceID)
 	}
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion identifies a critical bug where ICMP metrics could be associated with the monitoring agent's IP (sourceIP) instead of the target device's IP (targetHost), leading to incorrect data attribution.

High
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/1923#issuecomment-3481156264 Original created: 2025-11-03T15:35:28Z --- ## PR Code Suggestions ✨ <!-- 69686f1 --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>Possible issue</td> <td> <details><summary>Correct device ID resolution logic</summary> ___ **Correct the device ID resolution logic for ICMP metrics to use the <code>targetHost</code> IP <br>instead of the <code>sourceIP</code>.** [pkg/core/metrics.go [546-556]](https://github.com/carverauto/serviceradar/pull/1923/files#diff-6d98e853ce17576c088e77956ae4ecfa8078019e0bff107a79d8d1d6ed2443adR546-R556) ```diff targetHost := strings.TrimSpace(pingResult.Host) deviceID := strings.TrimSpace(pingResult.DeviceID) if deviceID == "" { - deviceID = models.GenerateNetworkDeviceID(partition, sourceIP) + deviceID = models.GenerateNetworkDeviceID(partition, targetHost) } - resolution := s.resolveCanonicalDevice(ctx, sourceIP, deviceID) + resolution := s.resolveCanonicalDevice(ctx, targetHost, deviceID) if strings.TrimSpace(resolution.DeviceID) != "" { deviceID = strings.TrimSpace(resolution.DeviceID) } ``` - [ ] **Apply / Chat** <!-- /improve --apply_suggestion=0 --> <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: This suggestion identifies a critical bug where ICMP metrics could be associated with the monitoring agent's IP (`sourceIP`) instead of the target device's IP (`targetHost`), leading to incorrect data attribution. </details></details></td><td align=center>High </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2401
No description provided.