Chore/fixing helm #2443
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2443
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2443/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #1975
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/1975
Original created: 2025-11-22T07:32:11Z
Original updated: 2025-11-22T22:51:57Z
Original head: carverauto/serviceradar:chore/fixing_helm
Original base: main
Original merged: 2025-11-22T22:49:56Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Bug fix, Documentation
Description
Major migration from Proton to CNPG/Timescale database: Removed all Proton-related services, configurations, and references throughout the codebase, replacing them with CNPG/Timescale terminology and configurations
SPIFFE/SPIRE security enhancements: Added SPIFFE support to KV client TLS configuration with retry logic, improved SPIRE agent and server configurations with projected service account tokens (PSAT), and enhanced credential loading resilience
Helm chart improvements: Refactored helpers for SPIFFE and RBAC, added secret auto-generation job, added CNPG application database bootstrap job, updated core and db-event-writer deployments with proper credential management
Database schema updates: Refactored device metrics continuous aggregates (CAGGs) for Timescale 2.24 compatibility, fixed event upsert conflict resolution with composite key
Configuration standardization: Updated default database name from
telemetrytoserviceradar, added environment variable support for CNPG password injection, updated Docker Compose and Docker imagesDocumentation updates: Completed CNCF General Technical Review documentation, updated architecture decision records (ADRs) and product requirement documents (PRDs) to reference CNPG, removed Proton-specific TLS and setup documentation
Code quality improvements: Enhanced error handling in zen engine with
matches!macro, improved build script error formatting, added packaging rules validation testDiagram Walkthrough
File Walkthrough
5 files
publisher_test.go
Rename ProtonPublisher to RegistryPublisher in testspkg/mapper/publisher_test.go
TestNewProtonPublishertoTestNewRegistryPublisherNewProtonPublishertoNewRegistryPublisher*ProtonPublisherto*RegistryPublisherprotonPublishertoregistryPublisherthroughout test cases
publisher.go
Rename ProtonPublisher to RegistryPublisher structpkg/mapper/publisher.go
ProtonPublishertoRegistryPublisherNewProtonPublishertoNewRegistryPublisher*ProtonPublisherto*RegistryPublisherhydrate.go
Replace Proton references with database terminologypkg/registry/hydrate.go
functions
protonCounttodbCountserver.go
Update API server comments to use legacy terminologypkg/core/api/server.go
WithDeviceRegistryEnforcementfunctiondevice_registry.go
Update device registry log messagespkg/core/api/device_registry.go
"Proton"
14 files
main.go
Add environment variable support for CNPG passwordcmd/consumers/db-event-writer/main.go
ospackage import for environment variable accessCNPG_PASSWORDfrom environment if password isempty in config
lib.rs
Add SPIFFE support to KV client TLS configurationrust/kvutil/src/lib.rs
load_spiffe_tlsfunction with retry logic and errorhandling
SpiffeSourceGuardstruct for managing SPIFFE credentialsspiffe.rs
Improve SPIFFE credential loading resiliencecmd/consumers/zen/src/spiffe.rs
backoff
message_processor.rs
Improve zen engine error handlingcmd/consumers/zen/src/message_processor.rs
matches!macro for cleaner patternmatching
_helpers.tpl
Refactor Helm helpers for SPIFFE and RBAChelm/serviceradar/templates/_helpers.tpl
serviceradar.kvServerSPIFFEIDhelper templateserviceradar.coreServerSPIFFEIDhelper templateserviceradar.kvEnvtemplate and refactored to use new SPIFFE IDhelper
secret-generator-job.yaml
Add Helm secret auto-generation job for deploymenthelm/serviceradar/templates/secret-generator-job.yaml
jwt-secret,api-key,admin-password,admin-bcrypt-hash,edge-onboarding-key)Kubernetes API calls
Role, RoleBinding)
variable
cnpg-app-bootstrap-job.yaml
Add CNPG application database bootstrap jobhelm/serviceradar/templates/cnpg-app-bootstrap-job.yaml
application database and user
serviceradardatabase and application role with properprivileges
serviceradar-db-credentialssecretcore.yaml
Update core deployment for CNPG and credential managementhelm/serviceradar/templates/core.yaml
serviceradar-coreServiceAccount creationdefaults
serviceradar-db-credentialssecretinstead of
cnpg-superuserEDGE_ONBOARDING_ENCRYPTION_KEYfrom secrets00000000000003_device_metrics_summary_cagg.up.sql
Refactor device metrics CAGGs for Timescale compatibilitypkg/db/cnpg/migrations/00000000000003_device_metrics_summary_cagg.up.sql
single-hypertable CAGGs (
device_metrics_summary_cpu,device_metrics_summary_disk,device_metrics_summary_memory)backward compatibility
metric type
db-event-writer.yaml
Update db-event-writer for CNPG and SPIFFE configurationhelm/serviceradar/templates/db-event-writer.yaml
values
serviceradar-db-credentialssecretvariables
SPIFFE_ENDPOINT_SOCKETfor proper SPIFFE workload APIinitialization
ENABLE_DB_MIGRATIONSflagspire-server.yaml
Update SPIRE server for PSAT and multi-namespace supporthelm/serviceradar/templates/spire-server.yaml
tokenAudienceconfiguration for SPIRE token validationk8s_sattok8s_psat(projected serviceaccount tokens)
additionalAgentNamespaces00000000000005_device_metrics_summary_cagg_fix.up.sql
Add CAGG rebuild migration for Timescale compatibilitypkg/db/cnpg/migrations/00000000000005_device_metrics_summary_cagg_fix.up.sql
single-hypertable design
constraints
compatibility
spire-agent.yaml
Update SPIRE agent for PSAT and token projectionhelm/serviceradar/templates/spire-agent.yaml
k8s_sattok8s_psat(projected serviceaccount tokens)
tokenAudienceconfiguration for token validationstatus.podIPtospec.nodeNameautomountServiceAccountToken: trueanddnsPolicy:ClusterFirstWithHostNet00000000000005_device_metrics_summary_cagg_fix.up.sql
Add device metrics CAGG rebuild migrationpkg/db/cnpg/migrations/00000000000005_device_metrics_summary_cagg_fix.up.sql
compatibility
constraints
13 files
main.go
Update default CNPG database namecmd/tools/cnpg-migrate/main.go
tools-profile.sh
Remove Proton references from Docker Composedocker/compose/tools-profile.sh
generate-certs.sh
Simplify certificate generation for non-Proton setupdocker/compose/generate-certs.sh
logic
entrypoint-certs.sh
Remove Proton certificate generationdocker/compose/entrypoint-certs.sh
entrypoint-srql.sh
Update SRQL entrypoint database namedocker/compose/entrypoint-srql.sh
BUILD.bazel
Update flowgger base image to Debiandocker/images/BUILD.bazel
ubuntu_noble_linux_amd64todebian_testing_slim_linux_amd64db-event-writer.docker.json
Update db-event-writer database configurationdocker/compose/db-event-writer.docker.json
serviceradar-config.yaml
Helm configuration templating and OTEL service integrationhelm/serviceradar/files/serviceradar-config.yaml
$otelSAtemplate variable for OTEL service account configurationtrustDomain,spireNamespace, and CNPG configurationtelemetrytoserviceradarandadded CA file path templating
edge_onboarding.encryption_keytemplating and updated securityconfigurations to use template variables
spacing
otel.toml,trapd.json,sync.json,faker.json,rperf.json, and updateddatasvc.jsonwith NATS securityand OTEL service account RBAC
core-k8s-init.shto support edge onboarding encryption key andCNPG CA file configuration
docker-compose.yml
Remove Proton database service from Docker Composedocker-compose.yml
containers (
proton,credentials-permissions-fixer)core service
telemetrytoserviceradarand credential permissions fixer
docker-compose.dev.yml
Remove Proton service from Docker Composedocker-compose.dev.yml
environment variables, and health checks
values.yaml
Update Helm values for CNPG and SPIRE configurationhelm/serviceradar/values.yaml
protonimage tag from configurationflowggerimage tag to1.0.56(fixed OpenSSL version)cnpgconfiguration section with host, port,database, credentials, and TLS settings
agentresource limits and requests configurationdbEventWriterconfiguration with config source and KV bootstrapoptions
tokenAudienceandadditionalAgentNamespacesto SPIREconfiguration
otelServiceAccountto SPIRE service accountsprotonresource and storage configuration sectionsecrets.autoGenerateandsecrets.edgeOnboardingKeyconfigurationMakefile.docker
Remove Proton targets from Docker MakefileMakefile.docker
logs-protonanddb-shellmake targetsuptarget to start only core service instead of core + protondb-queryanddocker-build-protontargetsdb-event-writer.json
Update db-event-writer config for serviceradar databasepackaging/event-writer/config/db-event-writer.json
telemetrytoserviceradarpostgrestoserviceradar31 files
diagnostics.go
Update diagnostics documentation terminologypkg/registry/diagnostics.go
SampleMissingDeviceIDsdocumentationdiscovery.go
Update discovery model comment terminologypkg/models/discovery.go
LocalIfIndexfield from "Proton driver" to"Postgres driver"
time_utils.go
Update timestamp validation commentpkg/core/time_utils.go
"Proton"
sanitize.go
Update metadata size limit commentpkg/deviceupdate/sanitize.go
limits"
stats.go
Update stats model documentationpkg/models/stats.go
device stats computation
device_transform.go
Update device transform documentationpkg/registry/device_transform.go
documentation
device.go
Update device registry commentpkg/registry/device.go
device hydration
traceTimestamp.test.ts
Update web test descriptions terminologyweb/src/utils/traceTimestamp.test.ts
DateTime64"
formatted"
streaming-client.ts
Update streaming client log messageweb/src/lib/streaming-client.ts
batch completion"
traceTimestamp.ts
Update timestamp normalization documentationweb/src/utils/traceTimestamp.ts
instead of "Proton"
fix-cert-permissions.sh
Update certificate permissions commentdocker/compose/fix-cert-permissions.sh
CNCF_GTR.md
Complete CNCF General Technical Review documentationdocs/LF/CNCF_GTR.md
information
compliance details
ADR-02.md
Update ADR-02 to reference CNPG instead of Protonsr-architecture-and-design/adr/ADR-02.md
engine
06-snmp-discovery.md
Update SNMP discovery PRD to reference CNPGsr-architecture-and-design/prd/06-snmp-discovery.md
tls-security.md
Remove Proton TLS configuration documentationdocs/docs/tls-security.md
components
CHANGELOG
Add ServiceRadar v1.0.56 release notesCHANGELOG
flowgger OpenSSL fix and agent/poller stability improvements
creation
DOCKER_QUICKSTART.md
Update Docker quickstart documentation for Proton removalDOCKER_QUICKSTART.md
list
NATS, web, API)
CNCF_DAY0.md
Replace Proton database references with CNPG/Timescaledocs/CNCF/CNCF_DAY0.md
Timeplus ProtonwithCNPG/Timescalethroughout the documentation
the new database technology
proton.resources.memoryHelm parameter)terminology
docker-setup.md
Update Docker setup documentation for CNPG migrationdocs/docs/docker-setup.md
tooling
psqlinstead of Protonclient
references
project.md
Update project context documentation for CNPGopenspec/project.md
Timeplus Protonreferences withCNPG/Timescalethroughoutproject context
of Proton streams
docker-setup.md
Remove Proton references from Docker setupdocs/docs/docker-setup.md
04-device-mgmt.md
Update device management PRD for CNPG databasesr-architecture-and-design/prd/04-device-mgmt.md
Protondatabase references withCNPGof Proton streams
use CNPG terminology
agents.md
Update agents documentation for CNPG database namingdocs/docs/agents.md
serviceradardatabase instead of
telemetryserviceradardatabase
CNCF_security_self_assessment.md
Update security assessment for CNPG migrationdocs/CNCF/CNCF_security_self_assessment.md
serviceradar-protonfrom actors listdb-event-writerdescription to reference CNPG database insteadof Proton
serviceradar-rperf-checkerto reference CNPG databaseserviceradar-toolsdescription to reference CNPG instead ofProton
07-cdp-discovery.md
Update CDP discovery PRD for CNPG databasesr-architecture-and-design/prd/07-cdp-discovery.md
Protonstream references withCNPGstream referencesTimeplus Proton
09-bgp-discovery.md
Update BGP discovery PRD for CNPG databasesr-architecture-and-design/prd/09-bgp-discovery.md
Protonstream references withCNPGstream referencesTimeplus Proton
08-lldp-discovery.md
Update LLDP discovery PRD for CNPG databasesr-architecture-and-design/prd/08-lldp-discovery.md
Protonstream references withCNPGstream referencesTimeplus Proton
README.md
Update Docker README for CNPG migrationdocker/README.md
tasks.md
Add Helm demo chart update tasks documentationopenspec/changes/update-helm-demo-chart/tasks.md
generation
SPIFFE
bootstrap
agents.md
Update agents documentation for CNPG databasedocs/docs/agents.md
telemetrytoserviceradarAGENTS.md
Update AGENTS.md for CNPG migrationAGENTS.md
Proton/TimepluswithCNPG/Timescalein project overviewProton
1 files
events.go
Fix event upsert conflict resolutionpkg/db/events.go
ON CONFLICTclause to include bothidandevent_timestampcolumns
1 files
tests.rs
Add packaging rules validation testcmd/consumers/zen/src/tests.rs
packaging_rules_parseto validate all packaging rulesJSON files
1 files
build.rs
Improve build script error handlingcmd/checkers/rperf-client/build.rs
RUNFILES_DIRenvironmentvariable check
1 files
MODULE.bazel
Update Bazel module dependencies for CNPG migrationMODULE.bazel
debian_testing_slimOCI image pull for Debian testing basetimeplus_protonOCI image pull and related configurationhttp_archiveattributes forcmake_linux_amd64_prebuiltforconsistency
101 files
Imported GitHub PR comment.
Original author: @gitguardian[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1975#issuecomment-3565938987
Original created: 2025-11-22T07:32:17Z
️✅ There are no secrets present in this pull request anymore.
If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1975#issuecomment-3565943498
Original created: 2025-11-22T07:34:41Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
Private key overexposure
Description: Script sets 644 permissions on all certificate and private key files (including
"*-key.pem"), making private keys world-readable within the container filesystem, which
can expose sensitive keys to non-privileged processes/users.
fix-cert-permissions.sh [7-16]
Referred Code
Unbounded retry loop
Description: SPIFFE TLS retry/backoff loop in
load_spiffe_tlsretries indefinitely ontransport/availability errors without a max retry count or overall timeout, which could
cause unbounded hangs or resource retention when the Workload API is permanently
unavailable.
lib.rs [246-265]
Referred Code
Unbounded retry loop
Description: The SPIFFE credential loading loop retries forever with fixed 2s delay and no global
timeout or cap, risking indefinite startup hang and denial of service if the Workload API
never becomes available.
spiffe.rs [26-88]
Referred Code
Env secret handling
Description: CNPG password is sourced from the environment and stored back into the in-memory config
without masking; while common, this increases risk of accidental exposure via logs or
memory dumps—ensure no logging of full config occurs and prefer file/secret mounts.
main.go [67-73]
Referred Code
🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status: Passed
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Audit context: Error paths and rule-skip decisions are logged minimally without user/device identifiers
or request correlation, which may limit auditability of critical rule evaluation outcomes.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Error detail: Returned errors wrap underlying evaluation errors which may include internal rule loader
details; confirm these are not surfaced to end users or insecure logs.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Console logging: Added console logs on WebSocket close events may leak operational context in browser
consoles; verify no sensitive data is included and that logging level is appropriate for
production.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Secret handling: The process reads CNPG password from environment and injects into config; confirm it is
not logged or exposed elsewhere and that downstream components avoid printing it.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/1975#issuecomment-3565946382
Original created: 2025-11-22T07:36:18Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Fix potential duplicate data insertion
Revert the
ON CONFLICTclause to use only(id)as the conflict target to preventpotential data duplication for events.
pkg/db/events.go [52-56]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies that changing the
ON CONFLICTkey from(id)to(id, event_timestamp)can lead to unintended data duplication, which is a significant data integrity and correctness issue.✅
Prevent indefinite retry loop on failureSuggestion Impact:
The commit introduced a maximum retry count (configurable via env var) and stopped retrying after exceeding it, returning detailed errors instead of looping indefinitely. Although it did not implement exponential backoff, it addressed the core issue of an infinite retry loop by adding a retry cap.code diff:
Modify the retry logic in
load_spiffe_tlsto use a retry limit and exponentialbackoff instead of an indefinite loop with a fixed delay to prevent permanent
busy-loops on non-transient errors.
rust/kvutil/src/lib.rs [142-198]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly points out that the indefinite retry loop can hide permanent failures and consume resources, proposing a more robust retry mechanism with a limit and backoff, which is a good practice for resilience.