NetBox integration misses paginated devices (only first page fetched) #687

Closed
opened 2026-03-28 04:27:26 +00:00 by mfreeman451 · 0 comments
Owner

Imported from GitHub.

Original GitHub issue: #2149
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2149
Original created: 2025-12-16T05:17:54Z


Summary

  • Context: The NetBox integration fetches device data from NetBox API's /api/dcim/devices/ endpoint to discover devices and perform reconciliation.
  • Bug: The integration does not implement pagination, causing it to only fetch the first page of devices (default: 50 devices).
  • Actual vs. expected: When NetBox has more than 50 devices, only the first 50 are discovered. Expected behavior is to fetch all devices across all pages.
  • Impact: Devices beyond the first page are never discovered, and if they were previously discovered (e.g., when total device count was < 50), they will be incorrectly marked as deleted during reconciliation, causing them to disappear from the inventory.

Code with bug

In pkg/sync/integrations/netbox/netbox.go, the Fetch method:

func (n *NetboxIntegration) Fetch(ctx context.Context) (map[string][]byte, []*models.DeviceUpdate, error) {
	resp, err := n.fetchDevices(ctx)  // <-- BUG 🔴 Only fetches first page
	if err != nil {
		return nil, nil, err
	}
	defer n.closeResponse(resp)

	deviceResp, err := n.decodeResponse(resp)
	if err != nil {
		return nil, nil, err
	}

	// Process current devices from Netbox API
	data, ips, currentEvents := n.processDevices(ctx, deviceResp)  // <-- BUG 🔴 Only processes first page

	n.Logger.Info().
		Int("devices_discovered", len(deviceResp.Results)).
		Int("sweep_results_generated", len(currentEvents)).
		Msg("Completed NetBox discovery operation")

	n.writeSweepConfig(ctx, ips)

	// Return the data for both KV store and sweep agents
	return data, currentEvents, nil
}

The Reconcile method has the same bug:

func (n *NetboxIntegration) Reconcile(ctx context.Context) error {
	// ... querier setup ...

	// Fetch current devices from NetBox to identify retractions
	resp, err := n.fetchDevices(ctx)  // <-- BUG 🔴 Only fetches first page
	if err != nil {
		n.Logger.Error().
			Err(err).
			Msg("Failed to fetch current devices from NetBox during reconciliation")

		return err
	}
	defer n.closeResponse(resp)

	deviceResp, err := n.decodeResponse(resp)
	if err != nil {
		n.Logger.Error().
			Err(err).
			Msg("Failed to decode NetBox response during reconciliation")

		return err
	}

	// Process current devices to get current events
	_, _, currentEvents := n.processDevices(ctx, deviceResp)  // <-- BUG 🔴 Only processes first page

	// Generate retraction events
	retractionEvents := n.generateRetractionEvents(currentEvents, existingRadarDevices)
	// <-- BUG 🔴 Devices on page 2+ will be incorrectly marked as deleted

	// ... rest of reconciliation ...
}

In pkg/sync/integrations/netbox/types.go, the DeviceResponse struct includes pagination fields that are never used:

type DeviceResponse struct {
	Results  []Device `json:"results"`
	Count    int      `json:"count"`
	Next     string   `json:"next"`     // <-- BUG 🔴 Never checked or followed
	Previous string   `json:"previous"` // Pagination URL
}

Evidence

Failing test

Test script

package netbox

import (
	"context"
	"encoding/json"
	"net/http"
	"net/http/httptest"
	"testing"

	"github.com/stretchr/testify/require"

	"github.com/carverauto/serviceradar/pkg/logger"
	"github.com/carverauto/serviceradar/pkg/models"
)

// TestNetboxIntegration_Fetch_MissesPaginatedDevices demonstrates that the
// NetBox integration fails to fetch devices beyond the first page when the
// API response includes pagination.
func TestNetboxIntegration_Fetch_MissesPaginatedDevices(t *testing.T) {
	// Track which pages were requested
	requestedPages := make(map[string]bool)

	// Create a test server that simulates NetBox with paginated responses
	var server *httptest.Server
	server = httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		if r.URL.Path != "/api/dcim/devices/" {
			http.NotFound(w, r)
			return
		}

		// Track that this page was requested
		pageParam := r.URL.Query().Get("offset")
		if pageParam == "" {
			pageParam = "0" // First page
		}
		requestedPages[pageParam] = true

		// Return different responses based on the page
		var response DeviceResponse

		switch pageParam {
		case "0", "": // First page
			response = DeviceResponse{
				Count: 75, // Total devices across all pages
				Next:  server.URL + "/api/dcim/devices/?offset=50",
				Results: []Device{
					{
						ID:   1,
						Name: "device-page1-1",
						Role: struct {
							ID   int    `json:"id"`
							Name string `json:"name"`
						}{ID: 1, Name: "router"},
						Site: struct {
							ID   int    `json:"id"`
							Name string `json:"name"`
						}{ID: 1, Name: "site1"},
						PrimaryIP4: struct {
							ID      int    `json:"id"`
							Address string `json:"address"`
						}{ID: 1, Address: "10.0.1.1/24"},
					},
					{
						ID:   2,
						Name: "device-page1-2",
						Role: struct {
							ID   int    `json:"id"`
							Name string `json:"name"`
						}{ID: 1, Name: "router"},
						Site: struct {
							ID   int    `json:"id"`
							Name string `json:"name"`
						}{ID: 1, Name: "site1"},
						PrimaryIP4: struct {
							ID      int    `json:"id"`
							Address string `json:"address"`
						}{ID: 2, Address: "10.0.1.2/24"},
					},
				},
			}

		case "50": // Second page
			response = DeviceResponse{
				Count:    75,
				Previous: server.URL + "/api/dcim/devices/",
				Next:     "", // No more pages
				Results: []Device{
					{
						ID:   51,
						Name: "device-page2-1",
						Role: struct {
							ID   int    `json:"id"`
							Name string `json:"name"`
						}{ID: 1, Name: "router"},
						Site: struct {
							ID   int    `json:"id"`
							Name string `json:"name"`
						}{ID: 1, Name: "site1"},
						PrimaryIP4: struct {
							ID      int    `json:"id"`
							Address string `json:"address"`
						}{ID: 51, Address: "10.0.2.1/24"},
					},
				},
			}
		}

		w.Header().Set("Content-Type", "application/json")
		json.NewEncoder(w).Encode(response)
	}))
	defer server.Close()

	// Create the NetBox integration
	integration := &NetboxIntegration{
		Config: &models.SourceConfig{
			AgentID:            "test-agent",
			PollerID:           "test-poller",
			Partition:          "test-partition",
			Endpoint:           server.URL,
			Credentials:        map[string]string{"api_token": "test-token"},
			InsecureSkipVerify: true, // Required for test server
		},
		Logger: logger.NewTestLogger(),
	}

	// Fetch devices
	_, events, err := integration.Fetch(context.Background())
	require.NoError(t, err)

	// BUG: The integration only fetches the first page
	// We expect 3 devices (2 from page 1, 1 from page 2)
	t.Logf("Number of events fetched: %d", len(events))
	t.Logf("Pages requested: %v", requestedPages)

	// The devices from page 2 are missing
	deviceIPs := make(map[string]bool)
	for _, event := range events {
		deviceIPs[event.IP] = true
		t.Logf("Fetched device IP: %s", event.IP)
	}

	// Verify that only the first page was requested
	require.True(t, requestedPages["0"], "First page should be requested")

	// BUG: If pagination is not implemented, only page 1 devices are fetched
	if len(events) == 2 && !deviceIPs["10.0.2.1"] {
		t.Logf("BUG CONFIRMED: Only %d devices fetched from page 1. Device from page 2 (10.0.2.1) is missing.", len(events))
		t.Logf("BUG CONFIRMED: Second page was not requested: requestedPages['50'] = %v", requestedPages["50"])

		// This demonstrates the bug - the integration should fetch all 3 devices but only fetches 2
		require.Fail(t, "NetBox integration does not implement pagination. Only first page of devices is fetched.")
	}

	// If we get here, pagination is implemented correctly
	require.True(t, deviceIPs["10.0.1.1"], "Device from page 1 should be present")
	require.True(t, deviceIPs["10.0.1.2"], "Device from page 1 should be present")
	require.True(t, deviceIPs["10.0.2.1"], "Device from page 2 should be present")
	require.True(t, requestedPages["50"], "Second page should be requested")
	require.Len(t, events, 3, "Should fetch all 3 devices across both pages")
}

Test output

=== RUN   TestNetboxIntegration_Fetch_MissesPaginatedDevices
    pagination_bug_test.go:132: Number of events fetched: 2
    pagination_bug_test.go:133: Pages requested: map[0:true]
    pagination_bug_test.go:139: Fetched device IP: 10.0.1.1
    pagination_bug_test.go:139: Fetched device IP: 10.0.1.2
    pagination_bug_test.go:147: BUG CONFIRMED: Only 2 devices fetched from page 1. Device from page 2 (10.0.2.1) is missing.
    pagination_bug_test.go:148: BUG CONFIRMED: Second page was not requested: requestedPages['50'] = false
    pagination_bug_test.go:151:
        	Error Trace:	/home/user/serviceradar/pkg/sync/integrations/netbox/pagination_bug_test.go:151
        	Error:      	NetBox integration does not implement pagination. Only first page of devices is fetched.
        	Test:       	TestNetboxIntegration_Fetch_MissesPaginatedDevices
--- FAIL: TestNetboxIntegration_Fetch_MissesPaginatedDevices (0.00s)
FAIL

Example

Consider a NetBox instance with 75 devices:

  • Devices 1-50 are returned on the first page with "next": "https://netbox.example.com/api/dcim/devices/?offset=50"
  • Devices 51-75 are on the second page
  • The integration calls fetchDevices() once, receives the first page response
  • It processes devices 1-50 and returns
  • It never checks the Next field or fetches the second page
  • Devices 51-75 are never discovered

Worse scenario - reconciliation with previously discovered devices:

  1. Initially, NetBox has 40 devices, all discovered successfully
  2. Over time, 35 more devices are added, bringing total to 75
  3. During the next sync, only devices 1-50 are fetched
  4. Reconciliation compares the fetched 50 devices against the database which has all 75
  5. Devices 51-75 appear to be "missing" from NetBox
  6. Retraction events are generated for devices 51-75 with "_deleted": "true"
  7. These 25 devices disappear from the inventory even though they still exist in NetBox!

Full context

The NetBox integration is part of ServiceRadar's device discovery system. It pulls device information from NetBox (an IP address management and data center infrastructure management tool) to populate ServiceRadar's device inventory.

The Fetch method is called periodically by the sync service to discover new devices from NetBox. It:

  1. Calls the NetBox API /api/dcim/devices/ endpoint
  2. Processes the response to create device discovery events
  3. Writes sweep configuration for discovered IPs
  4. Returns the events to be processed by ServiceRadar's registry

The Reconcile method is called after sweeps complete to identify devices that no longer exist in NetBox:

  1. Queries ServiceRadar for existing devices discovered from NetBox
  2. Fetches current devices from NetBox
  3. Compares the two lists
  4. Generates retraction events (marked with "_deleted": "true") for devices in ServiceRadar but not in the current NetBox fetch

The bug affects both discovery and reconciliation:

  • Discovery: Only the first 50 devices are discovered, limiting the system's view of the network
  • Reconciliation: Devices on page 2+ are incorrectly identified as deleted and removed from inventory

The integration is used by the sync service (pkg/sync/sync.go) which orchestrates multiple integration sources. NetBox is a common source of truth for network infrastructure, so this bug affects any organization using ServiceRadar with a NetBox instance containing more than 50 devices.

External documentation

Pagination

API responses containing multiple objects will be paginated. The default is 50 objects per page,
set by the PAGINATE_COUNT configuration parameter.

The response will include a count of the total number of objects matched by the query. It will
also include links to the previous and next pages (if any).

{
    "count": 2861,
    "next": "http://netbox/api/dcim/devices/?limit=50&offset=50",
    "previous": null,
    "results": [...]
}

You can specify the number of objects per page using the limit parameter (up to MAX_PAGE_SIZE,
which is 1000 by default). You can also specify a starting point using the offset parameter.

Why has this bug gone undetected?

This bug has likely gone undetected for several reasons:

  1. Small test deployments: Development and test environments likely have fewer than 50 devices in NetBox, so pagination is never triggered during testing.

  2. No validation of total count: The code logs "devices_discovered" but never compares it against the Count field from the API response, which would reveal that only a subset was fetched.

  3. Gradual device growth: In production environments, NetBox instances may start with < 50 devices and grow over time. The first 50 devices work correctly, masking the problem until the threshold is crossed.

  4. Reconciliation timing: If reconciliation hasn't run since crossing the 50-device threshold, the incorrect retractions haven't occurred yet. Once they do, they might be attributed to other causes like network issues or configuration changes.

  5. Silent failure mode: The integration doesn't fail or error when pagination is needed - it simply processes fewer devices than expected, which might be mistaken for legitimate filtering or missing data in NetBox.

Recommended fix

Implement pagination by checking the Next field in the DeviceResponse and fetching subsequent pages until Next is empty:

func (n *NetboxIntegration) fetchAllDevices(ctx context.Context) ([]Device, error) {
	var allDevices []Device
	url := n.Config.Endpoint + "/api/dcim/devices/"

	for url != "" {  // <-- FIX 🟢 Loop until no more pages
		resp, err := n.fetchDevicesFromURL(ctx, url)
		if err != nil {
			return nil, err
		}

		deviceResp, err := n.decodeResponse(resp)
		n.closeResponse(resp)
		if err != nil {
			return nil, err
		}

		allDevices = append(allDevices, deviceResp.Results...)

		url = deviceResp.Next  // <-- FIX 🟢 Follow Next URL for pagination
	}

	return allDevices, nil
}

// Helper method to fetch from a specific URL
func (n *NetboxIntegration) fetchDevicesFromURL(ctx context.Context, url string) (*http.Response, error) {
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
	if err != nil {
		return nil, err
	}

	req.Header.Set("Authorization", "Token "+n.Config.Credentials["api_token"])
	req.Header.Set("Accept", "application/json")

	client := &http.Client{
		Transport: &http.Transport{
			TLSClientConfig: &tls.Config{
				InsecureSkipVerify: n.Config.InsecureSkipVerify,
			},
		},
	}

	resp, err := client.Do(req)
	if err != nil {
		return nil, err
	}

	if resp.StatusCode != http.StatusOK {
		resp.Body.Close()
		return nil, fmt.Errorf("%w: %d", errUnexpectedStatusCode, resp.StatusCode)
	}

	return resp, nil
}

Then update Fetch and Reconcile to use fetchAllDevices() instead of fetchDevices().

Imported from GitHub. Original GitHub issue: #2149 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2149 Original created: 2025-12-16T05:17:54Z --- # Summary - **Context**: The NetBox integration fetches device data from NetBox API's `/api/dcim/devices/` endpoint to discover devices and perform reconciliation. - **Bug**: The integration does not implement pagination, causing it to only fetch the first page of devices (default: 50 devices). - **Actual vs. expected**: When NetBox has more than 50 devices, only the first 50 are discovered. Expected behavior is to fetch all devices across all pages. - **Impact**: Devices beyond the first page are never discovered, and if they were previously discovered (e.g., when total device count was < 50), they will be incorrectly marked as deleted during reconciliation, causing them to disappear from the inventory. # Code with bug In `pkg/sync/integrations/netbox/netbox.go`, the `Fetch` method: ```go func (n *NetboxIntegration) Fetch(ctx context.Context) (map[string][]byte, []*models.DeviceUpdate, error) { resp, err := n.fetchDevices(ctx) // <-- BUG 🔴 Only fetches first page if err != nil { return nil, nil, err } defer n.closeResponse(resp) deviceResp, err := n.decodeResponse(resp) if err != nil { return nil, nil, err } // Process current devices from Netbox API data, ips, currentEvents := n.processDevices(ctx, deviceResp) // <-- BUG 🔴 Only processes first page n.Logger.Info(). Int("devices_discovered", len(deviceResp.Results)). Int("sweep_results_generated", len(currentEvents)). Msg("Completed NetBox discovery operation") n.writeSweepConfig(ctx, ips) // Return the data for both KV store and sweep agents return data, currentEvents, nil } ``` The `Reconcile` method has the same bug: ```go func (n *NetboxIntegration) Reconcile(ctx context.Context) error { // ... querier setup ... // Fetch current devices from NetBox to identify retractions resp, err := n.fetchDevices(ctx) // <-- BUG 🔴 Only fetches first page if err != nil { n.Logger.Error(). Err(err). Msg("Failed to fetch current devices from NetBox during reconciliation") return err } defer n.closeResponse(resp) deviceResp, err := n.decodeResponse(resp) if err != nil { n.Logger.Error(). Err(err). Msg("Failed to decode NetBox response during reconciliation") return err } // Process current devices to get current events _, _, currentEvents := n.processDevices(ctx, deviceResp) // <-- BUG 🔴 Only processes first page // Generate retraction events retractionEvents := n.generateRetractionEvents(currentEvents, existingRadarDevices) // <-- BUG 🔴 Devices on page 2+ will be incorrectly marked as deleted // ... rest of reconciliation ... } ``` In `pkg/sync/integrations/netbox/types.go`, the `DeviceResponse` struct includes pagination fields that are never used: ```go type DeviceResponse struct { Results []Device `json:"results"` Count int `json:"count"` Next string `json:"next"` // <-- BUG 🔴 Never checked or followed Previous string `json:"previous"` // Pagination URL } ``` # Evidence ## Failing test ### Test script ```go package netbox import ( "context" "encoding/json" "net/http" "net/http/httptest" "testing" "github.com/stretchr/testify/require" "github.com/carverauto/serviceradar/pkg/logger" "github.com/carverauto/serviceradar/pkg/models" ) // TestNetboxIntegration_Fetch_MissesPaginatedDevices demonstrates that the // NetBox integration fails to fetch devices beyond the first page when the // API response includes pagination. func TestNetboxIntegration_Fetch_MissesPaginatedDevices(t *testing.T) { // Track which pages were requested requestedPages := make(map[string]bool) // Create a test server that simulates NetBox with paginated responses var server *httptest.Server server = httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { if r.URL.Path != "/api/dcim/devices/" { http.NotFound(w, r) return } // Track that this page was requested pageParam := r.URL.Query().Get("offset") if pageParam == "" { pageParam = "0" // First page } requestedPages[pageParam] = true // Return different responses based on the page var response DeviceResponse switch pageParam { case "0", "": // First page response = DeviceResponse{ Count: 75, // Total devices across all pages Next: server.URL + "/api/dcim/devices/?offset=50", Results: []Device{ { ID: 1, Name: "device-page1-1", Role: struct { ID int `json:"id"` Name string `json:"name"` }{ID: 1, Name: "router"}, Site: struct { ID int `json:"id"` Name string `json:"name"` }{ID: 1, Name: "site1"}, PrimaryIP4: struct { ID int `json:"id"` Address string `json:"address"` }{ID: 1, Address: "10.0.1.1/24"}, }, { ID: 2, Name: "device-page1-2", Role: struct { ID int `json:"id"` Name string `json:"name"` }{ID: 1, Name: "router"}, Site: struct { ID int `json:"id"` Name string `json:"name"` }{ID: 1, Name: "site1"}, PrimaryIP4: struct { ID int `json:"id"` Address string `json:"address"` }{ID: 2, Address: "10.0.1.2/24"}, }, }, } case "50": // Second page response = DeviceResponse{ Count: 75, Previous: server.URL + "/api/dcim/devices/", Next: "", // No more pages Results: []Device{ { ID: 51, Name: "device-page2-1", Role: struct { ID int `json:"id"` Name string `json:"name"` }{ID: 1, Name: "router"}, Site: struct { ID int `json:"id"` Name string `json:"name"` }{ID: 1, Name: "site1"}, PrimaryIP4: struct { ID int `json:"id"` Address string `json:"address"` }{ID: 51, Address: "10.0.2.1/24"}, }, }, } } w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(response) })) defer server.Close() // Create the NetBox integration integration := &NetboxIntegration{ Config: &models.SourceConfig{ AgentID: "test-agent", PollerID: "test-poller", Partition: "test-partition", Endpoint: server.URL, Credentials: map[string]string{"api_token": "test-token"}, InsecureSkipVerify: true, // Required for test server }, Logger: logger.NewTestLogger(), } // Fetch devices _, events, err := integration.Fetch(context.Background()) require.NoError(t, err) // BUG: The integration only fetches the first page // We expect 3 devices (2 from page 1, 1 from page 2) t.Logf("Number of events fetched: %d", len(events)) t.Logf("Pages requested: %v", requestedPages) // The devices from page 2 are missing deviceIPs := make(map[string]bool) for _, event := range events { deviceIPs[event.IP] = true t.Logf("Fetched device IP: %s", event.IP) } // Verify that only the first page was requested require.True(t, requestedPages["0"], "First page should be requested") // BUG: If pagination is not implemented, only page 1 devices are fetched if len(events) == 2 && !deviceIPs["10.0.2.1"] { t.Logf("BUG CONFIRMED: Only %d devices fetched from page 1. Device from page 2 (10.0.2.1) is missing.", len(events)) t.Logf("BUG CONFIRMED: Second page was not requested: requestedPages['50'] = %v", requestedPages["50"]) // This demonstrates the bug - the integration should fetch all 3 devices but only fetches 2 require.Fail(t, "NetBox integration does not implement pagination. Only first page of devices is fetched.") } // If we get here, pagination is implemented correctly require.True(t, deviceIPs["10.0.1.1"], "Device from page 1 should be present") require.True(t, deviceIPs["10.0.1.2"], "Device from page 1 should be present") require.True(t, deviceIPs["10.0.2.1"], "Device from page 2 should be present") require.True(t, requestedPages["50"], "Second page should be requested") require.Len(t, events, 3, "Should fetch all 3 devices across both pages") } ``` ### Test output ``` === RUN TestNetboxIntegration_Fetch_MissesPaginatedDevices pagination_bug_test.go:132: Number of events fetched: 2 pagination_bug_test.go:133: Pages requested: map[0:true] pagination_bug_test.go:139: Fetched device IP: 10.0.1.1 pagination_bug_test.go:139: Fetched device IP: 10.0.1.2 pagination_bug_test.go:147: BUG CONFIRMED: Only 2 devices fetched from page 1. Device from page 2 (10.0.2.1) is missing. pagination_bug_test.go:148: BUG CONFIRMED: Second page was not requested: requestedPages['50'] = false pagination_bug_test.go:151: Error Trace: /home/user/serviceradar/pkg/sync/integrations/netbox/pagination_bug_test.go:151 Error: NetBox integration does not implement pagination. Only first page of devices is fetched. Test: TestNetboxIntegration_Fetch_MissesPaginatedDevices --- FAIL: TestNetboxIntegration_Fetch_MissesPaginatedDevices (0.00s) FAIL ``` ## Example Consider a NetBox instance with 75 devices: - Devices 1-50 are returned on the first page with `"next": "https://netbox.example.com/api/dcim/devices/?offset=50"` - Devices 51-75 are on the second page - The integration calls `fetchDevices()` once, receives the first page response - It processes devices 1-50 and returns - It never checks the `Next` field or fetches the second page - Devices 51-75 are never discovered Worse scenario - reconciliation with previously discovered devices: 1. Initially, NetBox has 40 devices, all discovered successfully 2. Over time, 35 more devices are added, bringing total to 75 3. During the next sync, only devices 1-50 are fetched 4. Reconciliation compares the fetched 50 devices against the database which has all 75 5. Devices 51-75 appear to be "missing" from NetBox 6. Retraction events are generated for devices 51-75 with `"_deleted": "true"` 7. These 25 devices disappear from the inventory even though they still exist in NetBox! # Full context The NetBox integration is part of ServiceRadar's device discovery system. It pulls device information from NetBox (an IP address management and data center infrastructure management tool) to populate ServiceRadar's device inventory. The `Fetch` method is called periodically by the sync service to discover new devices from NetBox. It: 1. Calls the NetBox API `/api/dcim/devices/` endpoint 2. Processes the response to create device discovery events 3. Writes sweep configuration for discovered IPs 4. Returns the events to be processed by ServiceRadar's registry The `Reconcile` method is called after sweeps complete to identify devices that no longer exist in NetBox: 1. Queries ServiceRadar for existing devices discovered from NetBox 2. Fetches current devices from NetBox 3. Compares the two lists 4. Generates retraction events (marked with `"_deleted": "true"`) for devices in ServiceRadar but not in the current NetBox fetch The bug affects both discovery and reconciliation: - **Discovery**: Only the first 50 devices are discovered, limiting the system's view of the network - **Reconciliation**: Devices on page 2+ are incorrectly identified as deleted and removed from inventory The integration is used by the sync service (`pkg/sync/sync.go`) which orchestrates multiple integration sources. NetBox is a common source of truth for network infrastructure, so this bug affects any organization using ServiceRadar with a NetBox instance containing more than 50 devices. ## External documentation - [NetBox REST API documentation](https://netboxlabs.com/docs/netbox/en/stable/integrations/rest-api/) ``` Pagination API responses containing multiple objects will be paginated. The default is 50 objects per page, set by the PAGINATE_COUNT configuration parameter. The response will include a count of the total number of objects matched by the query. It will also include links to the previous and next pages (if any). { "count": 2861, "next": "http://netbox/api/dcim/devices/?limit=50&offset=50", "previous": null, "results": [...] } You can specify the number of objects per page using the limit parameter (up to MAX_PAGE_SIZE, which is 1000 by default). You can also specify a starting point using the offset parameter. ``` # Why has this bug gone undetected? This bug has likely gone undetected for several reasons: 1. **Small test deployments**: Development and test environments likely have fewer than 50 devices in NetBox, so pagination is never triggered during testing. 2. **No validation of total count**: The code logs "devices_discovered" but never compares it against the `Count` field from the API response, which would reveal that only a subset was fetched. 3. **Gradual device growth**: In production environments, NetBox instances may start with < 50 devices and grow over time. The first 50 devices work correctly, masking the problem until the threshold is crossed. 4. **Reconciliation timing**: If reconciliation hasn't run since crossing the 50-device threshold, the incorrect retractions haven't occurred yet. Once they do, they might be attributed to other causes like network issues or configuration changes. 5. **Silent failure mode**: The integration doesn't fail or error when pagination is needed - it simply processes fewer devices than expected, which might be mistaken for legitimate filtering or missing data in NetBox. # Recommended fix Implement pagination by checking the `Next` field in the `DeviceResponse` and fetching subsequent pages until `Next` is empty: ```go func (n *NetboxIntegration) fetchAllDevices(ctx context.Context) ([]Device, error) { var allDevices []Device url := n.Config.Endpoint + "/api/dcim/devices/" for url != "" { // <-- FIX 🟢 Loop until no more pages resp, err := n.fetchDevicesFromURL(ctx, url) if err != nil { return nil, err } deviceResp, err := n.decodeResponse(resp) n.closeResponse(resp) if err != nil { return nil, err } allDevices = append(allDevices, deviceResp.Results...) url = deviceResp.Next // <-- FIX 🟢 Follow Next URL for pagination } return allDevices, nil } // Helper method to fetch from a specific URL func (n *NetboxIntegration) fetchDevicesFromURL(ctx context.Context, url string) (*http.Response, error) { req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody) if err != nil { return nil, err } req.Header.Set("Authorization", "Token "+n.Config.Credentials["api_token"]) req.Header.Set("Accept", "application/json") client := &http.Client{ Transport: &http.Transport{ TLSClientConfig: &tls.Config{ InsecureSkipVerify: n.Config.InsecureSkipVerify, }, }, } resp, err := client.Do(req) if err != nil { return nil, err } if resp.StatusCode != http.StatusOK { resp.Body.Close() return nil, fmt.Errorf("%w: %d", errUnexpectedStatusCode, resp.StatusCode) } return resp, nil } ``` Then update `Fetch` and `Reconcile` to use `fetchAllDevices()` instead of `fetchDevices()`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#687
No description provided.