Release staging pipeline (OpenSpec): manual promotion + pinned demo #2551
No reviewers
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar!2551
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "refs/pull/2551/head"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub pull request.
Original GitHub pull request: #2112
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2112
Original created: 2025-12-12T05:00:27Z
Original updated: 2025-12-12T21:40:45Z
Original head: carverauto/serviceradar:chore/fix_releases_oci_image_versioning
Original base: main
Original merged: 2025-12-12T21:40:41Z by @mfreeman451
What
latestimages; demo pins chart+image versions for stability.valueFiles(less inline YAML).Key files
k8s/argocd/applications/demo-staging.yamlk8s/argocd/applications/demo-prod.yaml.github/workflows/e2e-tests.ymlhelm/serviceradar/values-demo.yamlhelm/serviceradar/values-demo-staging.yamlopenspec/changes/add-release-staging-pipeline/*Next steps (pipeline test)
After merge, run a pre-release tag (example):
scripts/cut-release.sh --version 1.0.76-pre1 --pushThis should publish images + chart, then trigger
E2E Testsagainstdemo-staging.Promotion to demo is a follow-up PR that bumps
targetRevision+global.imageTagink8s/argocd/applications/demo-prod.yaml.Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2112#issuecomment-3644915500
Original created: 2025-12-12T05:00:53Z
PR Compliance Guide 🔍
Below is a summary of compliance checks for this PR:
No security concerns identified
No security vulnerabilities detected by AI analysis. Human verification advised for critical code.🎫 No ticket provided
Codebase context is not defined
Follow the guide to enable codebase context checks.
Generic: Comprehensive Audit Trails
Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.
Status:
No runtime logs: The PR adds documentation/specs only and introduces no executable code that could
implement audit logging of critical actions, so compliance cannot be verified from this
diff.
Referred Code
Generic: Meaningful Naming and Self-Documenting Code
Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting
Status:
No code added: Only markdown documentation was added and no identifiers or implementations are present to
assess naming conventions.
Referred Code
Generic: Robust Error Handling and Edge Case Management
Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation
Status:
No error logic: The changes are specifications in markdown with no executable error handling paths to
validate robustness or edge case management.
Referred Code
Generic: Secure Error Handling
Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.
Status:
No user errors: No user-facing error messages or handling were introduced in this PR diff, so security of
error outputs cannot be evaluated.
Referred Code
Generic: Secure Logging Practices
Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.
Status:
Logging not present: The PR adds planning docs and tasks only; no logging statements or configurations are
introduced to assess secure logging practices.
Referred Code
Generic: Security-First Input Validation and Data Handling
Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities
Status:
No input handling: This PR contains only markdown specifications and does not add executable input handling
to assess validation, sanitization, or secret management beyond high-level guidance.
Referred Code
Compliance status legend
🟢 - Fully Compliant🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2112#issuecomment-3644917740
Original created: 2025-12-12T05:01:53Z
PR Code Suggestions ✨
Explore these optional code suggestions:
Test hotfixes in staging before release
Modify the hotfix bypass scenario to include an expedited, automated test run in
the staging environment before release, instead of skipping testing entirely.
openspec/changes/add-release-staging-pipeline/specs/release-automation/spec.md [94-97]
Suggestion importance[1-10]: 9
__
Why: The suggestion correctly identifies a significant risk in the proposed hotfix process, which completely bypasses staging tests, and proposes a safer, more robust alternative that still allows for an expedited release.
✅
Consider a simpler Git-based promotion flowSuggestion Impact:
The design was changed to "Manual PR-Based Promotion (Simplified)" and "Environment Promotion (SIMPLIFIED)," explicitly dropping the GitOps Promoter/Source Hydrator and adopting promotion via manual PRs (with possible future GitHub Action automation). This aligns with the suggested simpler Git-based promotion approach.code diff:
+Note: ArgoCD requires OCI URLs without the
oci://prefix for Helm sources.Decision 4: Version Tag Strategy
Rationale: Use
v<VERSION>tag (e.g.,v1.0.70) as the primary release tag, withlatestfor dev convenience andsha-<commit>for immutability.@@ -115,6 +107,20 @@
v1.0.70- Primary release tagsha-<commit>- Immutable referencelatest- Development convenience (staging only)+Implementation:
+- Modified
docker/images/container_tags.bzlto addversion_tagsattribute toimmutable_push_tagsrule+- Modified
docker/images/push_targets.bzlto generate version tag viaexpand_templateusing{{STABLE_VERSION}}+- Version is read from
VERSIONfile viascripts/workspace_status.shwhen--stampis used+- The "vdev" tag is excluded when building without proper workspace status (local dev builds)
+
+Generated Tags (example):
+
+sha-486cbfcbc1027b4255e8287df1c7ced48402b1c4 # commit SHA +v1.0.70 # semantic version +latest # static tag +sha-c30cd42eb275 # short digest +Decision 5: E2E Test Credentials via GitHub Environments
Rationale: Cannot expose kubectl/kubeadm API credentials to GitHub Actions. Instead, store application-level credentials in GitHub Secrets using deployment environments for isolation.
@@ -168,40 +174,133 @@
Risk: Staging environment divergence
Mitigation: Use same Helm chart with minimal value overrides. Document differences clearly.
-### Trade-off: GitHub Pages vs dedicated chart registry
-Accepted: GitHub Pages has rate limits but sufficient for internal use. Can migrate later if needed.
+### Trade-off: OCI registry visibility
+Accepted: OCI charts in ghcr.io are less discoverable than GitHub Pages but provide better integration with existing GHCR authentication and image workflows.
+
+### Decision 6: Helm Chart and Image Versioning Strategy
+Rationale: Keep chart version in sync with app version (standard practice for charts in the same repo as the app). Image tags default to
latestin values.yaml; deployments override viaglobal.imageTag.+
+Release updates required:
+1.
VERSIONfile - app version+2.
helm/serviceradar/Chart.yaml- chart version + appVersion (viacut-release.sh)+
+Not updated on release:
+-
values.yaml- keepsappTag: "latest"as default+- ArgoCD Applications override with
global.imageTag: "v1.0.71"for specific versions+
+Benefits:
+- Minimal files to update during release
+- Flexible: local dev uses
latest, deployments pin to specific versions+- Standard Helm versioning practice
Migration Plan
-### Phase 1: Helm Chart Repository (Week 1)
-1. Create
gh-pagesbranch with index.yaml-2. Add helm-release workflow
-3. Publish initial chart version
-### Phase 2: Image Version Tagging (Week 1)
-1. Update push_targets.bzl for version tags
-2. Modify release.yml to pass version
-3. Verify with dry-run release
-### Phase 3: Demo-Staging Setup (Week 2)
-1. Create demo-staging ArgoCD Application
-2. Deploy via Helm chart
-3. Validate staging deployment works
-### Phase 4: GitOps Promoter (Week 2-3)
-1. Install promoter CRDs
-2. Configure staging->demo promotion
-3. Integrate e2e test gate
-### Phase 5: Full Pipeline (Week 3)
-1. Update release workflow for staged deployment
-2. Test complete flow with pre-release
-3. Document and train team
+### Phase 1: Helm Chart OCI Registry (DONE)
+1.
Create gh-pages branchUsing OCI registry instead+2. Added helm package/push step to release.yml
+3. Published chart:
oci://ghcr.io/carverauto/charts/serviceradar:1.0.75+4. Updated
cut-release.shto bump Chart.yaml version automatically+5. Created ArgoCD repo credentials template (not needed - chart made public)
+
+### Phase 2: Helm Values Modernization (DONE)
+1. Added
global.imageTagandglobal.imagePullPolicyto values.yaml+2. Set default
image.tags.appTagtolatest+3. Added helper templates for image tag/policy resolution
+4. Updated key templates (core, web, datasvc, agent, poller, srql)
+5. Fixed db-event-writer-config.yaml template whitespace issue (malformed apiVersion)
+6. Fixed db-event-writer.yaml duplicate volume/volumeMount definitions
+
+### Phase 3: Demo-Staging Setup (DONE)
+1. Created demo-staging ArgoCD Application
+2. Configured to use OCI Helm chart with inline values
+3. Made Helm chart public in GHCR (no credentials needed)
+4. Copied ghcr-io-cred image pull secret to demo-staging namespace
+5. Fixed CNPG secret name to use dynamic cluster name (
$cnpgClusterName-ca) in templates+6. Fixed CNPG host to use dynamic cluster name (
$cnpgClusterName-rw) in templates+7. Published chart v1.0.75 with all template fixes
+8. Successfully deployed demo-staging: Sync: Synced, Health: Healthy (all 19 deployments running)
+
+### Phase 4: Environment Promotion (SIMPLIFIED)
+Approach Changed: After evaluating GitOps Promoter with Source Hydrator, we found the complexity didn't match our needs. The Source Hydrator requires specific credential configurations for write operations that proved difficult to set up with GitHub Apps.
+
+Current Simple Approach:
+1. Demo-staging deploys automatically when chart is updated
+2. Demo-staging validation happens (manual or via e2e tests)
+3. Production promotion is via manual PR to update demo-prod version
+4. ArgoCD syncs demo-prod when PR is merged
+
+What was tried and removed:
+- GitOps Promoter v0.18.3 CRDs
+- ArgoCD Source Hydrator (argocd-commit-server)
+- Hydrated branch model (environments/demo-staging, environments/demo)
+- GitHub App for SCM access (sr-argocd-promoter)
+
+Future consideration: Could add GitHub Action to automate PR creation when staging e2e tests pass.
Solution Walkthrough:
Before:
After:
Suggestion importance[1-10]: 8
__
Why: The suggestion proposes a valid and simpler architectural alternative to a core component of the design, the
ArgoCD GitOps Promoter, which significantly impacts the implementation's complexity and dependencies.Imported GitHub PR comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2112#issuecomment-3647439411
Original created: 2025-12-12T17:17:39Z
Validated locally:
openspec validate add-release-staging-pipeline --stricthelm lint helm/serviceradar+helm templatewith demo + staging valuesscripts/cut-release.sh --version 1.0.76-pre1 --dry-run(clean tree)After merge, to exercise the full pipeline:
scripts/cut-release.sh --version 1.0.76-pre1 --pushThen confirm
Publish Release Artifactsand follow-onE2E Tests(staging) succeed, and promote by bumpingtargetRevision+global.imageTagink8s/argocd/applications/demo-prod.yaml.Imported GitHub PR comment.
Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2112#issuecomment-3647445223
Original created: 2025-12-12T17:19:30Z
CI Feedback 🧐
A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
Action: test-go
Failed stage: Run Go Tests [❌]
Failed test name: TestSanitizeTOML
Failure summary:
The action failed because the Go test
github.com/carverauto/serviceradar/pkg/configtimed out andpanicked after 3s. The failing test is
TestSanitizeTOML, which repeatedly attempted to read aconfiguration file at
/etc/serviceradar/core.jsonand logged errors:-
open/etc/serviceradar/core.json: no such file or directoryThe repeated read attempts caused the test to
hang until the timeout, leading to:
- panic: test timed out after 3s
Relevant frames:
-
pkg/config/toml_mask.go:57-
pkg/config/toml_mask_test.go:28Relevant error logs: