Chore/fix edge onboarding #2489
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2489
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2489/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2033
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2033
Original created: 2025-11-29T03:33:20Z
Original updated: 2025-11-30T18:54:10Z
Original head: carverauto/serviceradar:chore/fix_edge_onboarding
Original base: main
Original merged: 2025-11-30T18:53:54Z by @mfreeman451
User description
IMPORTANT: Please sign the Developer Certificate of Origin
Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:
Describe your changes
Issue ticket number and link
Code checklist before requesting a review
PR Type
Enhancement, Bug fix
Description
Embed SPIRE workload agent in sysmon-vm for standalone laptop deployments without shared sockets
Implement checker onboarding with join token/bundle delivery and SPIRE address resolution
Add service registry method to lookup agent poller IDs for edge onboarding fallback
Extend Helm SPIRE server trust to include Core and Datasvc service accounts
Enhance edge package UI with checker kind, config JSON, and parent agent poller auto-derivation
Diagram Walkthrough
File Walkthrough
28 files
Add GetAgentPollerID method to ServiceManager interfaceImplement GetAgentPollerID adapter for agent poller lookupAdd embeddedAgent field to Bootstrapper structGenerate sysmon-vm specific checker config with SPIRE settingsImplement SPIRE address resolution and socket wait utilitiesConfigure checker SPIRE with embedded agent startup logicNew file implementing embedded SPIRE agent process managementNew entrypoint script for sysmon-vm Docker containerInstall embedded SPIRE agent binary for macOS sysmon-vmDownload and install SPIRE agent for embedded onboarding flowAdd SPIRE socket host path helper templateAdd sysmon-vm Docker image definitions for AMD64 and ARM64Use configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathUse configurable SPIRE socket host path instead of hardcoded pathAdd checker kind and config JSON fields to edge package UI2 files
Add SPIRE Linux ARM64 binary dependency for multi-arch buildsAdd SPIRE agent binary filegroups for AMD64 and ARM647 files
Document embedded SPIRE agent installation for macOSNew design document for embedded SPIRE agent architectureNew proposal for sysmon-vm embedded SPIRE agent featureNew specification for sysmon-vm SPIFFE identity requirementsNew task list for embedded SPIRE agent implementationNew proposal for SPIRE admin trust stabilizationNew task list for SPIRE admin trust fixes3 files
Add checker metadata with SPIRE upstream address configurationUpdate app tag and add socketHostPath configuration optionAdd checker metadata with SPIRE upstream address configuration1 files
Add Core and Datasvc service accounts to SPIRE trust allow-listImported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#issuecomment-3590952456
Original created: 2025-11-29T03:34:10Z
You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
Unsigned binary download
Description: The installer downloads and extracts a
spire-agentbinary over HTTPS without signature orchecksum verification, risking supply-chain compromise if the download is tampered or a
MITM occurs.
host-setup.sh [191-231]
Referred Code
Sensitive log exposure
Description: The embedded SPIRE agent process writes logs to a world-readable file (
0644) and redirectsboth stdout/stderr to it, potentially exposing sensitive join/attestation details or
tokens to other local users on multi-tenant hosts.
spire_agent.go [70-88]
Referred Code
Insecure file probing
Description: The
waitForSocketfunction polls for the presence of arbitrary socket paths; when combinedwith configurable paths from metadata/env in other flows, this can be abused to probe
filesystem state (information disclosure) or race against creation of sockets outside
intended directories.
deployment.go [167-183]
Referred Code
Trust configuration spoofing
Description: Checker configuration generation accepts
server_spiffe_idand workload socket from packagemetadata without strict validation, enabling a malicious or tampered package to redirect
trust to an attacker-controlled SPIFFE ID or socket path (trust confusion / SSRF-like
local endpoint abuse).
config.go [281-324]
Referred Code
Unverified socket trust
Description: In Docker deployments, the code auto-accepts any detected workload socket from fixed paths
without verifying ownership/permissions, which could allow a compromised container/host
path to supply a malicious socket endpoint.
spire.go [149-183]
Referred Code
Unverified binary install
Description: The script installs a
spire-agentbinary if present without checksum/signatureverification or provenance checks, allowing a malicious local binary to be installed with
execute permissions.
host-install-macos.sh [43-48]
Referred Code
Config injection risk
Description: The entrypoint executes the checker with a config path that may point to a writable path
in the container (
/var/lib/serviceradar/config/checker.json), enabling config injection ifthat path is writable by untrusted users or mounted from an untrusted source.
entrypoint-sysmon-vm.sh [1-11]
Referred Code
Insufficient server-side validation
Description: The client-side validator accepts arbitrary
metadata_jsonandchecker_config_jsonforpackage creation, but if the backend does not robustly validate/limit these fields, this
increases risk of injecting untrusted configuration leading to trust manipulation or path
injection; ensure server-side validation and strict schemas.
page.tsx [640-736]
Referred Code
🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status: Passed
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Secrets in logs: The embedded agent process redirects stdout/stderr of spire-agent to a log file without
safeguards, risking exposure of sensitive join tokens or credentials in logs.
Referred Code
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
Missing audit logs: New critical actions like starting an embedded SPIRE agent and generating configs are
logged at info/debug but lack explicit, structured audit logging with actor context and
outcomes.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
Edge cases partial: While many errors are wrapped with context, functions like waitForSocket use fixed
attempts without configurable backoff and some paths rely on environment/metadata without
validating value formats (e.g., ports), which may require additional handling.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
Raw error surfacing: The UI parses and displays backend error messages (parseErrorMessage) which may expose
internal details to end users if the API returns sensitive content.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
Address validation: SPIRE upstream address and port are taken from env/metadata without validating format or
range, which could permit invalid or unsafe inputs.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#issuecomment-3590953110
Original created: 2025-11-29T03:35:30Z
PR Code Suggestions ✨
Latest suggestions up to
850d878✅
Sanitize CA file pathsSuggestion Impact:
The commit added validation to resolve absolute paths and ensure ca_cert_path and ca_key_path reside within certDir, and then used the sanitized absolute paths for file reads, mitigating path traversal risks.code diff:
Sanitize the
ca_cert_path,ca_key_path, andcert_dirmetadata values to preventpath traversal attacks. Ensure all file access is constrained to a trusted base
directory.
pkg/core/edge_onboarding.go [1949-1965]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 9
__
Why: This suggestion addresses a critical path traversal vulnerability, where a malicious user could provide crafted metadata to read arbitrary files from the server's filesystem.
Enforce HTTPS and timeouts
Enforce HTTPS for Core API connections and set a default timeout on the HTTP
client. This prevents insecure communication and potential hangs during API
requests.
pkg/edgeonboarding/mtls/bootstrap.go [145-159]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: This suggestion improves security by enforcing HTTPS for API communication and adds robustness by setting a default HTTP client timeout, preventing the application from hanging.
Avoid forcing TLS on non-TLS modes
Conditionally apply the
.cnpg.tlsconfiguration injqonly whenCNPG_SSL_MODEisnot
disable, and addserver_nameto the TLS settings for proper hostnameverification.
docker/compose/update-config.sh [139-161]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies that the script unconditionally sets TLS configuration, which would cause connection failures if
CNPG_SSL_MODEisdisable, and also addsserver_namewhich is crucial forverify-fullmode.Regenerate when key is missing
In
generate-certs.sh, check for the existence of both the certificate and itskey file. If the key is missing, remove the existing certificate and regenerate
the pair to prevent startup failures.
docker/compose/generate-certs.sh [67-74]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a state where a certificate exists but its key is missing, which would prevent services from starting; the proposed fix to regenerate the pair is a robust solution.
Fix NATS mTLS verification
In
db-event-writer.mtls.json, update thenats_securityconfiguration to alignthe
server_namewith other services and add the missingclient_ca_fileto ensuresuccessful mTLS connections.
docker/compose/db-event-writer.mtls.json [1-19]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies inconsistent mTLS configuration that would likely cause connection failures, making it a critical fix for the service's functionality.
Suppress join token for mTLS
Explicitly clear the
JoinTokenandBundlePEMfrom the API response when anMTLSBundleis present. This ensures that SPIRE-related artifacts are not sentfor mTLS packages.
pkg/core/api/edge_onboarding.go [543-546]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: This is a good defense-in-depth measure to ensure that a
JoinTokenis never sent for mTLS packages, even if populated by mistake, thus preventing potential confusion or misuse.Enforce required poller for children
In the
handleCreatefunction, add a validation check to ensure apoller_idispresent before submitting the form for 'agent' or 'checker' component types to
prevent backend errors.
web/src/app/admin/edge-packages/page.tsx [713-831]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a missing validation for
poller_idwhich could lead to backend errors, improving the form's robustness by adding a client-side guard.Avoid duplicate column migration
Remove the redundant
security_modecolumn definition from either the initialschema or the subsequent migration file to prevent potential migration failures
and ensure schema consistency.
pkg/db/cnpg/migrations/00000000000001_timescale_schema.up.sql [373-381]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a redundant database migration, which is a significant maintainability and correctness issue that could lead to future migration failures.
Validate CA certificate capabilities
In
parseCACertificate, validate that the parsed certificate has theIsCAflagset and includes the
KeyUsageCertSignkey usage before returning it.pkg/core/edge_onboarding.go [2085-2097]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: This is a valid security enhancement that ensures only proper CA certificates can be used for signing, preventing misconfiguration and potential security weaknesses.
Reject encrypted CA keys explicitly
In
parsePrivateKey, add a check usingx509.IsEncryptedPEMBlockto explicitlyreject encrypted private keys, as they are not supported.
pkg/core/edge_onboarding.go [2099-2124]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies that encrypted private keys are not supported and adds an explicit check, which improves error handling and provides clearer feedback to the user.
Fix permissions for existing certs
Add a
chmod 644command for the existing certificate file(
$CERT_DIR/$component.pem) to ensure its permissions are correctly set, matchingthe behavior for newly generated certificates.
docker/compose/generate-certs.sh [67-74]
Suggestion importance[1-10]: 5
__
Why: The suggestion correctly points out an inconsistency in file permissions and improves security by ensuring existing certificate files have the same strict permissions as newly generated ones.
Cap client TTL to CA expiry
Before calling
mintClientCertificate, ensure the client certificate's TTL doesnot extend beyond the CA certificate's expiration date by capping it if
necessary.
pkg/core/edge_onboarding.go [2012-2015]
[To ensure code accuracy, apply this suggestion manually]Suggestion importance[1-10]: 8
__
Why: This is a valid correctness fix that prevents issuing client certificates that outlive their signing CA, which would lead to validation failures.
Fix mTLS server name verification
Update the
server_namevalues forkv_securityandsecurityto match the DockerCompose service names (
datasvcandagent) to prevent TLS hostname verificationfailures.
docker/compose/agent.mtls.json [1-53]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies a likely mTLS handshake failure due to a
server_namemismatch in the Docker Compose environment, which is a critical bug for the new feature's functionality.Add NATS client TLS parameters
Add a
nats_tlsconfiguration block with CA, client certificate, and key paths tothe
netflow-consumer.mtls.jsonfile to enable a successful mTLS connection tothe NATS server.
docker/compose/netflow-consumer.mtls.json [1-17]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies that the NATS connection will fail because the
nats_urlspecifies TLS but no TLS configuration is provided, which is a critical bug in the new configuration.Conditionally apply CNPG TLS config
Modify the
jqcommand to conditionally create the.cnpg.tlsobject only when thessl_modeis notdisable.docker/compose/entrypoint-db-event-writer.sh [139-160]
Suggestion importance[1-10]: 7
__
Why: The suggestion fixes a bug where the script generates an incorrect configuration by always including a
tlsblock, even whenssl_modeisdisable.Validate metadata JSON before jq
Add a check to validate the
$EDGE_DEFAULT_METAvariable as JSON before using itin the
jqcommand, falling back to an empty object if it's invalid to preventscript failure.
docker/compose/update-config.sh [275-287]
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly identifies that invalid JSON in the
$EDGE_DEFAULT_METAvariable would crash the script and proposes a robust validation check to prevent this failure.Previous suggestions
✅ Suggestions up to commit
3e659bdConsider a more robust process supervision model
The embedded SPIRE agent is launched without supervision. Implement a strategy
where the main application exits if the agent process dies, enabling external
supervisors to manage restarts correctly.
Examples:
pkg/edgeonboarding/spire_agent.go [80-86]
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 9
__
Why: This suggestion correctly identifies a critical reliability flaw in the new embedded agent feature, where the parent process does not handle the unexpected termination of the child
spire-agent, leading to silent failures.Handle embedded agent process exit
Handle the exit status of the embedded SPIRE agent process to avoid silent
failures. Use channels and a
selectstatement to concurrently wait for theagent's socket to become available or for the process to exit, reporting any
premature exit as an error.
pkg/edgeonboarding/spire_agent.go [80-90]
Suggestion importance[1-10]: 8
__
Why: The suggestion correctly identifies that the agent process could exit prematurely with an error that would be silently ignored, leading to a confusing timeout. The proposed fix using channels is a robust way to handle this concurrency and improve error reporting.
✅
Prevent file descriptor leakSuggestion Impact:
A goroutine was added to wait for the agent process to exit, but it only calls cmd.Wait() without closing the log file. While not a full implementation, it reflects a change likely prompted by the suggestion area; however, the actual file descriptor leak fix (logFile.Close()) was not applied.code diff:
Close the log file for the embedded SPIRE agent to prevent a file descriptor
leak. The file should be closed after the agent process, which writes to it, has
finished executing.
pkg/edgeonboarding/spire_agent.go [70-78]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies a file descriptor leak by not closing the opened log file. The proposed fix is appropriate, ensuring the resource is released after the agent process terminates.
Improve parsing of nested errors
Improve the
parseErrorMessagefunction to handle nested JSON error responses.Implement a recursive search to find
messageorerrorkeys within the parsedobject, ensuring that detailed error messages are extracted correctly.
web/src/app/admin/edge-packages/page.tsx [273-295]
Suggestion importance[1-10]: 6
__
Why: The suggestion correctly points out that the current error parsing logic is shallow and will fail to extract messages from nested JSON objects, which is a common pattern in API error responses. The proposed recursive approach makes the error handling more robust.
Imported GitHub PR review comment.
Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573290620
Original created: 2025-11-30T01:42:01Z
Original path: pkg/core/edge_onboarding.go
Original line: 1932
Uncontrolled data used in path expression
This path depends on a user-provided value.
Show more details
Imported GitHub PR review comment.
Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573299154
Original created: 2025-11-30T02:11:57Z
Original path: pkg/core/edge_onboarding.go
Original line: 1930
Uncontrolled data used in path expression
This path depends on a user-provided value.
Show more details
Imported GitHub PR review comment.
Original author: @github-advanced-security[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2033#discussion_r2573299157
Original created: 2025-11-30T02:11:57Z
Original path: pkg/core/edge_onboarding.go
Original line: 1939
Uncontrolled data used in path expression
This path depends on a user-provided value.
Show more details