InsertEvents closes pgx BatchResults without reading results, discarding insert errors #691
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar#691
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub.
Original GitHub issue: #2153
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2153
Original created: 2025-12-16T05:19:02Z
Summary
InsertEventsfunction inpkg/db/events.gopersists CloudEvent rows to the PostgreSQL events table using pgx batch operations.br.Close()onBatchResultswithout first reading individual batch item results viabr.Exec(), which can silently discard errors from INSERT operations.Code with bug
Evidence
Example
Consider a batch with 3 event insertions where the second one has a constraint violation:
INSERT INTO events (...) VALUES (...)- validINSERT INTO events (...) VALUES (...)- violates constraint (e.g., invalid data type, foreign key violation not caught by ON CONFLICT)INSERT INTO events (...) VALUES (...)- validWith the current implementation calling only
br.Close():br.Close()may or may not surface this error reliablyWith the fixed implementation that calls
br.Exec()for each item:Inconsistency within the codebase
Reference code
pkg/db/cnpg_metrics.go:320-336pkg/db/cnpg_device_updates_retry.go:168-183Current code
pkg/db/events.go:83-88Contradiction
The codebase has an established pattern for handling batch operations: explicitly calling
br.Exec()for each batch item beforebr.Close()to properly detect errors. This pattern was introduced specifically to fix the error detection issue (see git commit05968ef8). TheInsertEventsfunction uses the older, buggy pattern that was replaced elsewhere in the codebase.Additionally,
pkg/db/auth.goalso suffers from the same bug (lines 160-163), indicating this is a systemic issue that was partially fixed but not comprehensively addressed across all batch operations.Full context
The
InsertEventsfunction is called by the db-event-writer consumer (pkg/consumers/db-event-writer/processor.go:639) to persist CloudEvents from various sources into the PostgreSQL events table. CloudEvents are a standardized format for event data, and this function is part of the observability pipeline that stores events for later querying and analysis.The events table uses a compound primary key
(event_timestamp, id)and includes anON CONFLICT ... DO UPDATEclause to handle duplicate events. While the ON CONFLICT handling prevents many error conditions, other errors can still occur:05968ef8)When such errors occur during a batch insert but are not properly detected, the db-event-writer believes the events were successfully persisted and continues processing. This leads to gaps in the event log that are difficult to diagnose because there's no error logged or reported.
The batch operation is used for performance: multiple event rows are sent to PostgreSQL in a single network round trip. The issue is not with using batches, but with the error detection mechanism after the batch is sent.
External documentation
From the openspec proposal that fixed this issue in
cnpg_metrics.go:From the git commit message (
05968ef8):Why has this bug gone undetected?
This bug has gone undetected for several reasons:
ON CONFLICT handling: The INSERT statement includes
ON CONFLICT (id, event_timestamp) DO UPDATE, which handles duplicate key violations gracefully. This prevents the most common type of INSERT error (duplicates) from occurring, masking the error detection problem.Permissive schema: The events table schema is quite permissive - most columns are TEXT without NOT NULL constraints, and the Level column is an INTEGER that accepts any int32 value. This makes it rare for data type violations or constraint errors to occur.
Successful common path: In normal operation, events are well-formed and inserts succeed. The bug only manifests when there's an actual error during insertion, which is uncommon.
Silent failure: When the bug does occur, it fails silently - no error is logged, no alert is raised, and the application continues normally. Only careful inspection of missing data would reveal the problem.
Recent discovery: The development team only recently discovered this pattern was problematic (December 2025, commit
05968ef8) when debugging why sysmon metrics weren't appearing. They fixed it incnpg_metrics.goandcnpg_device_updates_retry.gobut did not audit other batch operations in the codebase for the same issue.No comprehensive test coverage: There are no tests that verify error handling for batch operations with constraint violations or other error conditions. The test suite focuses on the happy path where all inserts succeed.
Recommended fix
Apply the same fix that was used in commit
05968ef8forcnpg_metrics.go:Related bugs
pkg/db/auth.golines 160-163 has the same bug in theStoreUsersfunction:This should also be fixed using the same pattern.
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2153#issuecomment-3662853121
Original created: 2025-12-16T23:20:39Z
closing as completed