Refactor/zen tenant isolation #2658

Merged
mfreeman451 merged 7 commits from refs/pull/2658/head into staging 2026-01-13 06:12:32 +00:00
mfreeman451 commented 2026-01-13 05:57:41 +00:00 (Migrated from github.com)
Owner

Imported from GitHub pull request.

Original GitHub pull request: #2275
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/pull/2275
Original created: 2026-01-13T05:57:41Z
Original updated: 2026-01-13T06:12:34Z
Original head: carverauto/serviceradar:refactor/zen-tenant-isolation
Original base: staging
Original merged: 2026-01-13T06:12:32Z by @mfreeman451

User description

IMPORTANT: Please sign the Developer Certificate of Origin

Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include
a DCO sign-off statement indicating the DCO acceptance in one commit message. Here
is an example DCO Signed-off-by line in a commit message:

Signed-off-by: J. Doe <j.doe@domain.com>

Describe your changes

Code checklist before requesting a review

  • I have signed the DCO?
  • The build completes without errors?
  • All tests are passing when running make test?

PR Type

Enhancement, Bug fix


Description

  • Migrate core service to Elixir implementation with clustering support

  • Fix JetStream subject configuration by removing invalid leading wildcards

  • Correct tenant migration execution order for zen_rule_templates index updates

  • Add platform tenant lifecycle event publishing with retry mechanism

  • Implement CNPG certificate generation and client certificate authentication


Diagram Walkthrough

flowchart LR
  A["Core Service"] -->|"Migrate to"| B["Core-ELX Elixir"]
  B -->|"Cluster via"| C["Kubernetes Discovery"]
  D["JetStream Config"] -->|"Fix subjects"| E["Remove wildcards"]
  E -->|"Match patterns"| F["logs.syslog/snmp/otel"]
  G["Migration Order"] -->|"Correct timing"| H["zen_rule_templates first"]
  H -->|"Then update"| I["Index creation"]
  J["Tenant Bootstrap"] -->|"Publish events"| K["TenantLifecyclePublisher"]
  K -->|"With retry"| L["Exponential backoff"]
  M["CNPG Certs"] -->|"Generate"| N["CA and Client Certs"]
  N -->|"Configure"| O["Database connections"]

File Walkthrough

Relevant files
Enhancement
7 files
platform_tenant_bootstrap.ex
Add tenant lifecycle event publishing with retries             
+61/-0   
generate-certs.sh
Add CNPG CA and client certificate generation                       
+48/-1   
cnpg-client-ca-job.yaml
Add Helm hooks and simplify secret reconciliation logic   
+24/-39 
core.yaml
Replace core with core-elx Elixir implementation                 
+127/-135
db-event-writer-config.yaml
Use configurable CNPG client certificate paths                     
+2/-2     
spire-server.yaml
Add CNPG client certificate configuration variables           
+6/-1     
web.yaml
Use configurable CNPG client certificate paths                     
+2/-2     
Bug fix
2 files
20260111082000_update-zen-rule-identities.exs
Make index creation idempotent with IF NOT EXISTS               
+11/-6   
serviceradar-config.yaml
Fix zen JetStream subject configuration patterns                 
+4/-12   
Configuration changes
2 files
values-demo-staging.yaml
Update image tag to latest commit                                               
+1/-1     
values.yaml
Add CNPG client certificate name configuration options     
+2/-0     

Imported from GitHub pull request. Original GitHub pull request: #2275 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/pull/2275 Original created: 2026-01-13T05:57:41Z Original updated: 2026-01-13T06:12:34Z Original head: carverauto/serviceradar:refactor/zen-tenant-isolation Original base: staging Original merged: 2026-01-13T06:12:32Z by @mfreeman451 --- ### **User description** ## IMPORTANT: Please sign the Developer Certificate of Origin Thank you for your contribution to ServiceRadar. Please note, when contributing, the developer must include a [DCO sign-off statement]( https://developercertificate.org/) indicating the DCO acceptance in one commit message. Here is an example DCO Signed-off-by line in a commit message: ``` Signed-off-by: J. Doe <j.doe@domain.com> ``` ## Describe your changes ## Issue ticket number and link ## Code checklist before requesting a review - [ ] I have signed the DCO? - [ ] The build completes without errors? - [ ] All tests are passing when running make test? ___ ### **PR Type** Enhancement, Bug fix ___ ### **Description** - Migrate core service to Elixir implementation with clustering support - Fix JetStream subject configuration by removing invalid leading wildcards - Correct tenant migration execution order for zen_rule_templates index updates - Add platform tenant lifecycle event publishing with retry mechanism - Implement CNPG certificate generation and client certificate authentication ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Core Service"] -->|"Migrate to"| B["Core-ELX Elixir"] B -->|"Cluster via"| C["Kubernetes Discovery"] D["JetStream Config"] -->|"Fix subjects"| E["Remove wildcards"] E -->|"Match patterns"| F["logs.syslog/snmp/otel"] G["Migration Order"] -->|"Correct timing"| H["zen_rule_templates first"] H -->|"Then update"| I["Index creation"] J["Tenant Bootstrap"] -->|"Publish events"| K["TenantLifecyclePublisher"] K -->|"With retry"| L["Exponential backoff"] M["CNPG Certs"] -->|"Generate"| N["CA and Client Certs"] N -->|"Configure"| O["Database connections"] ``` <details><summary><h3>File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>7 files</summary><table> <tr> <td><strong>platform_tenant_bootstrap.ex</strong><dd><code>Add tenant lifecycle event publishing with retries</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-10699868a7d10a4386ad293ed86e1e60a9e3c172bc2ea3d22f98c0efae666182">+61/-0</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>generate-certs.sh</strong><dd><code>Add CNPG CA and client certificate generation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-224edb1896351749211a2a609692daa31f4f45cf175658124613b9dc08496d96">+48/-1</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>cnpg-client-ca-job.yaml</strong><dd><code>Add Helm hooks and simplify secret reconciliation logic</code>&nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-c5ed02183d64ba0434152f6d1b56e659fde6be1bbc4f9f7a07a29652028f2214">+24/-39</a>&nbsp; </td> </tr> <tr> <td><strong>core.yaml</strong><dd><code>Replace core with core-elx Elixir implementation</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-06ab387d2c169d82a1de28b5e66c86f0417bd81b82a96246d0a2da8bfaa8d224">+127/-135</a></td> </tr> <tr> <td><strong>db-event-writer-config.yaml</strong><dd><code>Use configurable CNPG client certificate paths</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-4cce0ab31bec3428ffae6701d20ca14b0f27a1e8a810ba1c7388e5c7860c3254">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>spire-server.yaml</strong><dd><code>Add CNPG client certificate configuration variables</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-7959f7b987adcd56306fe5ddf31fed367dd9bb995f9b82a8fa53553dac8e7077">+6/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>web.yaml</strong><dd><code>Use configurable CNPG client certificate paths</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-6e0340bf29070d4b88c305a585cf278672914f74f95dbcfde215da3fcbd0f562">+2/-2</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Bug fix</strong></td><td><details><summary>2 files</summary><table> <tr> <td><strong>20260111082000_update-zen-rule-identities.exs</strong><dd><code>Make index creation idempotent with IF NOT EXISTS</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-12dc8193dab16db4228f0fbc403a8bd851b7265aef51a4da4f5673b0d4f38cd8">+11/-6</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>serviceradar-config.yaml</strong><dd><code>Fix zen JetStream subject configuration patterns</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-b8c8d2484103b11c396bc60d290c81df63c30a0f81103eceb5852a17e1d2b5e3">+4/-12</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>2 files</summary><table> <tr> <td><strong>values-demo-staging.yaml</strong><dd><code>Update image tag to latest commit</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-923e7e4134431f4ec89c77f227e8fc9546bcfdefc836c26da86f30e7847f0d3c">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>values.yaml</strong><dd><code>Add CNPG client certificate name configuration options</code>&nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-d4449c7cb70362554b274f81eae5a4b81a8e81df494282e383d1b7ea3871c452">+2/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr></tbody></table> </details> ___
qodo-code-review[bot] commented 2026-01-13 05:58:25 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2275#issuecomment-3742112403
Original created: 2026-01-13T05:58:25Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Private key exposure

Description: The script generates and stores long-lived private CA and client keys (e.g.,
cnpg-ca-key.pem, cnpg-client-key.pem) under CERT_DIR (typically a shared PVC), and other
workloads mount this same cert directory, so a compromise of any pod with access to that
volume could exfiltrate the CA private key and mint trusted CNPG client certificates to
access the database.
generate-certs.sh [22-107]

Referred Code
if [ ! -f "$CERT_DIR/cnpg-ca.pem" ] || [ "$FORCE_REGENERATE" = "true" ]; then
  openssl ecparam -name prime256v1 -genkey -noout -out "$CERT_DIR/cnpg-ca-key.pem"
  cat > "$CERT_DIR/cnpg-ca.conf" <<EOF
[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_ca
prompt = no
[req_distinguished_name]
C = $COUNTRY
ST = $STATE
L = $LOCALITY
O = $ORGANIZATION
OU = $ORG_UNIT
CN = ServiceRadar CNPG CA
[v3_ca]
basicConstraints = critical, CA:TRUE, pathlen:0
keyUsage = critical, keyCertSign, cRLSign
subjectKeyIdentifier = hash
EOF
  openssl req -new -x509 -sha256 -key "$CERT_DIR/cnpg-ca-key.pem" -out "$CERT_DIR/cnpg-ca.pem" -days $DAYS_VALID -config "$CERT_DIR/cnpg-ca.conf"
  rm "$CERT_DIR/cnpg-ca.conf"


 ... (clipped 65 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Unbounded retry loop: The new tenant lifecycle publish retry mechanism re-schedules retries indefinitely with a
fixed delay and no max attempts/backoff, which can cause endless retries and log noise
during persistent failure.

Referred Code
@impl true
def handle_info({:publish_platform_event, tenant_id, event}, state) do
  case fetch_platform_tenant(tenant_id) do
    {:ok, tenant} ->
      if publish_event(event, tenant) != :ok do
        schedule_publish_retry(tenant_id, event)
      end

    {:error, reason} ->
      Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant",
        tenant_id: tenant_id,
        reason: inspect(reason)
      )

      schedule_publish_retry(tenant_id, event)
  end

  {:noreply, state}
end

defp publish_platform_lifecycle_event(event, tenant) do


 ... (clipped 18 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing success audit: Platform tenant create/update triggers lifecycle events but the new code only logs
failures (not successful outcomes with actor/user context), so it is unclear whether audit
trail requirements are met.

Referred Code
  |> Ash.Query.for_read(:read)
  |> Ash.Query.filter(is_platform_tenant == true)
  |> Ash.Query.select([:id, :slug, :is_platform_tenant])

case Ash.read(query, actor: actor) do
  {:ok, []} ->
    tenant = create_platform_tenant!(platform_slug)
    set_platform_tenant_id!(tenant.id, platform_slug)
    publish_platform_lifecycle_event(:created, tenant)
    :ok

  {:ok, [tenant]} ->
    validate_platform_tenant!(tenant, platform_slug)
    ensure_platform_tenant_infrastructure!(tenant)
    set_platform_tenant_id!(tenant.id, platform_slug)
    publish_platform_lifecycle_event(:updated, tenant)
    :ok

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Possible sensitive reason: Logging reason: inspect(reason) from Ash.read/2 failures may include internal details from
underlying adapters, so it is unclear from the diff alone whether sensitive information
could be emitted to logs.

Referred Code
  {:error, reason} ->
    Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant",
      tenant_id: tenant_id,
      reason: inspect(reason)
    )

    schedule_publish_retry(tenant_id, event)
end

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
- Requires Further Human Verification
🏷️ - Compliance label
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2275#issuecomment-3742112403 Original created: 2026-01-13T05:58:25Z --- ## PR Compliance Guide 🔍 <!-- https://github.com/carverauto/serviceradar/commit/482d0b6fc98f28f256ab30ece1dbc34e9b7e72d4 --> Below is a summary of compliance checks for this PR:<br> <table><tbody><tr><td colspan='2'><strong>Security Compliance</strong></td></tr> <tr><td rowspan=1>⚪</td> <td><details><summary><strong>Private key exposure </strong></summary><br> <b>Description:</b> The script generates and stores long-lived private CA and client keys (e.g., <br><code>cnpg-ca-key.pem</code>, <code>cnpg-client-key.pem</code>) under <code>CERT_DIR</code> (typically a shared PVC), and other <br>workloads mount this same cert directory, so a compromise of any pod with access to that <br>volume could exfiltrate the CA private key and mint trusted CNPG client certificates to <br>access the database.<br> <strong><a href='https://github.com/carverauto/serviceradar/pull/2275/files#diff-224edb1896351749211a2a609692daa31f4f45cf175658124613b9dc08496d96R22-R107'>generate-certs.sh [22-107]</a></strong><br> <details open><summary>Referred Code</summary> ```shell if [ ! -f "$CERT_DIR/cnpg-ca.pem" ] || [ "$FORCE_REGENERATE" = "true" ]; then openssl ecparam -name prime256v1 -genkey -noout -out "$CERT_DIR/cnpg-ca-key.pem" cat > "$CERT_DIR/cnpg-ca.conf" <<EOF [req] distinguished_name = req_distinguished_name x509_extensions = v3_ca prompt = no [req_distinguished_name] C = $COUNTRY ST = $STATE L = $LOCALITY O = $ORGANIZATION OU = $ORG_UNIT CN = ServiceRadar CNPG CA [v3_ca] basicConstraints = critical, CA:TRUE, pathlen:0 keyUsage = critical, keyCertSign, cRLSign subjectKeyIdentifier = hash EOF openssl req -new -x509 -sha256 -key "$CERT_DIR/cnpg-ca-key.pem" -out "$CERT_DIR/cnpg-ca.pem" -days $DAYS_VALID -config "$CERT_DIR/cnpg-ca.conf" rm "$CERT_DIR/cnpg-ca.conf" ... (clipped 65 lines) ``` </details></details></td></tr> <tr><td colspan='2'><strong>Ticket Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary>🎫 <strong>No ticket provided </strong></summary> - [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr> <tr><td colspan='2'><strong>Codebase Duplication Compliance</strong></td></tr> <tr><td>⚪</td><td><details><summary><strong>Codebase context is not defined </strong></summary> Follow the <a href='https://qodo-merge-docs.qodo.ai/core-abilities/rag_context_enrichment/'>guide</a> to enable codebase context checks. </details></td></tr> <tr><td colspan='2'><strong>Custom Compliance</strong></td></tr> <tr><td rowspan=3>🟢</td><td> <details><summary><strong>Generic: Meaningful Naming and Self-Documenting Code</strong></summary><br> **Objective:** Ensure all identifiers clearly express their purpose and intent, making code <br>self-documenting<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Secure Error Handling</strong></summary><br> **Objective:** To prevent the leakage of sensitive system information through error messages while <br>providing sufficient detail for internal debugging.<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td> <details><summary><strong>Generic: Security-First Input Validation and Data Handling</strong></summary><br> **Objective:** Ensure all data inputs are validated, sanitized, and handled securely to prevent <br>vulnerabilities<br> **Status:** Passed<br> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=1>🔴</td> <td><details> <summary><strong>Generic: Robust Error Handling and Edge Case Management</strong></summary><br> **Objective:** Ensure comprehensive error handling that provides meaningful context and graceful <br>degradation<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2275/files#diff-10699868a7d10a4386ad293ed86e1e60a9e3c172bc2ea3d22f98c0efae666182R198-R236'><strong>Unbounded retry loop</strong></a>: The new tenant lifecycle publish retry mechanism re-schedules retries indefinitely with a <br>fixed delay and no max attempts/backoff, which can cause endless retries and log noise <br>during persistent failure.<br> <details open><summary>Referred Code</summary> ```elixir @impl true def handle_info({:publish_platform_event, tenant_id, event}, state) do case fetch_platform_tenant(tenant_id) do {:ok, tenant} -> if publish_event(event, tenant) != :ok do schedule_publish_retry(tenant_id, event) end {:error, reason} -> Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant", tenant_id: tenant_id, reason: inspect(reason) ) schedule_publish_retry(tenant_id, event) end {:noreply, state} end defp publish_platform_lifecycle_event(event, tenant) do ... (clipped 18 lines) ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td rowspan=2>⚪</td> <td><details> <summary><strong>Generic: Comprehensive Audit Trails</strong></summary><br> **Objective:** To create a detailed and reliable record of critical system actions for security analysis <br>and compliance.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2275/files#diff-10699868a7d10a4386ad293ed86e1e60a9e3c172bc2ea3d22f98c0efae666182R45-R61'><strong>Missing success audit</strong></a>: Platform tenant create/update triggers lifecycle events but the new code only logs <br>failures (not successful outcomes with actor/user context), so it is unclear whether audit <br>trail requirements are met.<br> <details open><summary>Referred Code</summary> ```elixir |> Ash.Query.for_read(:read) |> Ash.Query.filter(is_platform_tenant == true) |> Ash.Query.select([:id, :slug, :is_platform_tenant]) case Ash.read(query, actor: actor) do {:ok, []} -> tenant = create_platform_tenant!(platform_slug) set_platform_tenant_id!(tenant.id, platform_slug) publish_platform_lifecycle_event(:created, tenant) :ok {:ok, [tenant]} -> validate_platform_tenant!(tenant, platform_slug) ensure_platform_tenant_infrastructure!(tenant) set_platform_tenant_id!(tenant.id, platform_slug) publish_platform_lifecycle_event(:updated, tenant) :ok ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td><details> <summary><strong>Generic: Secure Logging Practices</strong></summary><br> **Objective:** To ensure logs are useful for debugging and auditing without exposing sensitive <br>information like PII, PHI, or cardholder data.<br> **Status:** <br><a href='https://github.com/carverauto/serviceradar/pull/2275/files#diff-10699868a7d10a4386ad293ed86e1e60a9e3c172bc2ea3d22f98c0efae666182R206-R213'><strong>Possible sensitive reason</strong></a>: Logging <code>reason: inspect(reason)</code> from <code>Ash.read/2</code> failures may include internal details from <br>underlying adapters, so it is unclear from the diff alone whether sensitive information <br>could be emitted to logs.<br> <details open><summary>Referred Code</summary> ```elixir {:error, reason} -> Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant", tenant_id: tenant_id, reason: inspect(reason) ) schedule_publish_retry(tenant_id, event) end ``` </details> > Learn more about managing compliance <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#configuration-options'>generic rules</a> or creating your own <a href='https://qodo-merge-docs.qodo.ai/tools/compliance/#custom-compliance'>custom rules</a> </details></td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /compliance --update_compliance=true --> </td></tr></tbody></table> <details><summary>Compliance status legend</summary> 🟢 - Fully Compliant<br> 🟡 - Partial Compliant<br> 🔴 - Not Compliant<br> ⚪ - Requires Further Human Verification<br> 🏷️ - Compliance label<br> </details>
qodo-code-review[bot] commented 2026-01-13 05:59:52 +00:00 (Migrated from github.com)
Author
Owner

Imported GitHub PR comment.

Original author: @qodo-code-review[bot]
Original URL: https://github.com/carverauto/serviceradar/pull/2275#issuecomment-3742116331
Original created: 2026-01-13T05:59:52Z

PR Code Suggestions

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent an infinite retry loop
Suggestion Impact:The commit adds explicit handle_info/2 clauses for {:error, :not_found} (and also {:error, {:unexpected_count, count}}) that log and stop scheduling further retries, addressing the infinite retry concern for unrecoverable tenant-fetch errors. It also introduces attempt tracking and exponential backoff with jitter for retriable errors, though it still retries on generic {:error, reason}.

code diff:

   def handle_info({:publish_platform_event, tenant_id, event}, state) do
+    handle_info({:publish_platform_event, tenant_id, event, 1}, state)
+  end
+
+  def handle_info({:publish_platform_event, tenant_id, event, attempt}, state) do
     case fetch_platform_tenant(tenant_id) do
       {:ok, tenant} ->
         if publish_event(event, tenant) != :ok do
-          schedule_publish_retry(tenant_id, event)
+          schedule_publish_retry(tenant_id, event, attempt + 1)
         end
+
+      {:error, :not_found} ->
+        Logger.warning(
+          "[PlatformTenantBootstrap] Tenant not found during publish retry; stopping retries.",
+          tenant_id: tenant_id,
+          event: event
+        )
+
+      {:error, {:unexpected_count, count}} ->
+        Logger.error(
+          "[PlatformTenantBootstrap] Multiple tenants found during publish retry; stopping retries.",
+          tenant_id: tenant_id,
+          count: count,
+          event: event
+        )
 
       {:error, reason} ->
         Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant",
           tenant_id: tenant_id,
-          reason: inspect(reason)
-        )
-
-        schedule_publish_retry(tenant_id, event)
+          reason: inspect(reason),
+          event: event
+        )
+
+        schedule_publish_retry(tenant_id, event, attempt + 1)
     end
 
     {:noreply, state}
@@ -222,7 +243,7 @@
         event: event
       )
 
-      schedule_publish_retry(tenant.id, event)
+      schedule_publish_retry(tenant.id, event, 1)
     end
   end
 
@@ -231,8 +252,20 @@
   defp publish_event(:deleted, tenant), do: TenantLifecyclePublisher.publish_deleted(tenant)
   defp publish_event(_event, tenant), do: TenantLifecyclePublisher.publish_updated(tenant)
 
-  defp schedule_publish_retry(tenant_id, event) do
-    Process.send_after(self(), {:publish_platform_event, tenant_id, event}, @publish_retry_delay)
+  defp schedule_publish_retry(tenant_id, event, attempt) do
+    delay = calculate_publish_backoff(attempt)
+    Process.send_after(self(), {:publish_platform_event, tenant_id, event, attempt}, delay)
+  end
+
+  defp calculate_publish_backoff(attempt) do
+    exponent = min(max(attempt - 1, 0), 16)
+    base_delay =
+      @initial_publish_retry_delay
+      |> Kernel.*(Integer.pow(2, exponent))
+      |> min(@max_publish_retry_delay)
+
+    jitter = :rand.uniform(max(div(base_delay, 2), 1))
+    min(base_delay + jitter, @max_publish_retry_delay)
   end

In handle_info/2, add specific case clauses for {:error, :not_found} and other
unrecoverable errors to prevent an infinite retry loop when a tenant cannot be
fetched.

elixir/serviceradar_core/lib/serviceradar/identity/platform_tenant_bootstrap.ex [199-216]

 def handle_info({:publish_platform_event, tenant_id, event}, state) do
   case fetch_platform_tenant(tenant_id) do
     {:ok, tenant} ->
       if publish_event(event, tenant) != :ok do
         schedule_publish_retry(tenant_id, event)
       end
 
-    {:error, reason} ->
-      Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant",
+    {:error, :not_found} ->
+      Logger.warning(
+        "[PlatformTenantBootstrap] Tenant not found during publish retry; stopping retries.",
         tenant_id: tenant_id,
-        reason: inspect(reason)
+        event: event
       )
 
-      schedule_publish_retry(tenant_id, event)
+    {:error, reason} ->
+      Logger.error(
+        "[PlatformTenantBootstrap] Unrecoverable error fetching tenant during publish retry; stopping retries.",
+        tenant_id: tenant_id,
+        reason: inspect(reason),
+        event: event
+      )
   end
 
   {:noreply, state}
 end

[Suggestion processed]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a potential infinite retry loop if a tenant is not found, which is a significant bug that could lead to resource exhaustion and log spam.

High
High-level
Implement exponential backoff for retries
Suggestion Impact:The fixed @publish_retry_delay retry was replaced with an exponential backoff retry mechanism: retry messages now include an attempt count, delays are calculated using exponential growth capped by a max delay and include jitter, and retry scheduling uses the computed delay. The handler was updated to support the new message shape and initial attempts.

code diff:

-  @publish_retry_delay 10_000
+  @initial_publish_retry_delay 1_000
+  @max_publish_retry_delay 60_000
 
   def start_link(opts \\ []) do
     GenServer.start_link(__MODULE__, opts, name: __MODULE__)
@@ -197,19 +198,39 @@
 
   @impl true
   def handle_info({:publish_platform_event, tenant_id, event}, state) do
+    handle_info({:publish_platform_event, tenant_id, event, 1}, state)
+  end
+
+  def handle_info({:publish_platform_event, tenant_id, event, attempt}, state) do
     case fetch_platform_tenant(tenant_id) do
       {:ok, tenant} ->
         if publish_event(event, tenant) != :ok do
-          schedule_publish_retry(tenant_id, event)
+          schedule_publish_retry(tenant_id, event, attempt + 1)
         end
+
+      {:error, :not_found} ->
+        Logger.warning(
+          "[PlatformTenantBootstrap] Tenant not found during publish retry; stopping retries.",
+          tenant_id: tenant_id,
+          event: event
+        )
+
+      {:error, {:unexpected_count, count}} ->
+        Logger.error(
+          "[PlatformTenantBootstrap] Multiple tenants found during publish retry; stopping retries.",
+          tenant_id: tenant_id,
+          count: count,
+          event: event
+        )
 
       {:error, reason} ->
         Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant",
           tenant_id: tenant_id,
-          reason: inspect(reason)
-        )
-
-        schedule_publish_retry(tenant_id, event)
+          reason: inspect(reason),
+          event: event
+        )
+
+        schedule_publish_retry(tenant_id, event, attempt + 1)
     end
 
     {:noreply, state}
@@ -222,7 +243,7 @@
         event: event
       )
 
-      schedule_publish_retry(tenant.id, event)
+      schedule_publish_retry(tenant.id, event, 1)
     end
   end
 
@@ -231,8 +252,20 @@
   defp publish_event(:deleted, tenant), do: TenantLifecyclePublisher.publish_deleted(tenant)
   defp publish_event(_event, tenant), do: TenantLifecyclePublisher.publish_updated(tenant)
 
-  defp schedule_publish_retry(tenant_id, event) do
-    Process.send_after(self(), {:publish_platform_event, tenant_id, event}, @publish_retry_delay)
+  defp schedule_publish_retry(tenant_id, event, attempt) do
+    delay = calculate_publish_backoff(attempt)
+    Process.send_after(self(), {:publish_platform_event, tenant_id, event, attempt}, delay)
+  end
+
+  defp calculate_publish_backoff(attempt) do
+    exponent = min(max(attempt - 1, 0), 16)
+    base_delay =
+      @initial_publish_retry_delay
+      |> Kernel.*(Integer.pow(2, exponent))
+      |> min(@max_publish_retry_delay)
+
+    jitter = :rand.uniform(max(div(base_delay, 2), 1))
+    min(base_delay + jitter, @max_publish_retry_delay)
   end

Replace the fixed-delay retry mechanism for platform tenant event publishing
with an exponential backoff strategy. This will improve system resilience during
outages.

Examples:

elixir/serviceradar_core/lib/serviceradar/identity/platform_tenant_bootstrap.ex [198-236]
  @impl true
  def handle_info({:publish_platform_event, tenant_id, event}, state) do
    case fetch_platform_tenant(tenant_id) do
      {:ok, tenant} ->
        if publish_event(event, tenant) != :ok do
          schedule_publish_retry(tenant_id, event)
        end

      {:error, reason} ->
        Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant",

 ... (clipped 29 lines)

Solution Walkthrough:

Before:

defmodule ServiceRadar.Identity.PlatformTenantBootstrap do
  @publish_retry_delay 10_000 # Fixed 10-second delay

  defp publish_platform_lifecycle_event(event, tenant) do
    if publish_event(event, tenant) != :ok do
      schedule_publish_retry(tenant.id, event)
    end
  end

  def handle_info({:publish_platform_event, tenant_id, event}, state) do
    # ... attempt to publish ...
    if publish_failed do
      # Always retries with the same fixed delay
      schedule_publish_retry(tenant_id, event)
    end
    ...
  end

  defp schedule_publish_retry(tenant_id, event) do
    Process.send_after(self(), {:publish_platform_event, tenant_id, event}, @publish_retry_delay)
  end
end

After:

defmodule ServiceRadar.Identity.PlatformTenantBootstrap do
  @initial_delay 1_000
  @max_delay 60_000

  defp publish_platform_lifecycle_event(event, tenant) do
    if publish_event(event, tenant) != :ok do
      schedule_publish_retry(tenant.id, event, 1) # Start with attempt 1
    end
  end

  def handle_info({:publish_platform_event, tenant_id, event, attempt}, state) do
    # ... attempt to publish ...
    if publish_failed do
      schedule_publish_retry(tenant_id, event, attempt + 1)
    end
    ...
  end

  defp schedule_publish_retry(tenant_id, event, attempt) do
    delay = calculate_backoff(attempt)
    Process.send_after(self(), {:publish_platform_event, tenant_id, event, attempt}, delay)
  end

  defp calculate_backoff(attempt) do
    # e.g., min(@max_delay, @initial_delay * 2 ^ (attempt - 1) + jitter)
    ...
  end
end

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the implemented retry mechanism uses a fixed delay, contrary to the PR's stated goal of exponential backoff, and proposes a more resilient best-practice strategy.

Medium
Security
Improve security of root key

Change the permissions of root-key.pem from 640 to 600 to restrict access to the
owner only, enhancing security and aligning with the principle of least
privilege.

helm/serviceradar/files/generate-certs.sh [17-21]

 if [ ! -f "$CERT_DIR/root.pem" ]; then
   openssl genrsa -out "$CERT_DIR/root-key.pem" 4096
   openssl req -new -x509 -sha256 -key "$CERT_DIR/root-key.pem" -out "$CERT_DIR/root.pem" -days $DAYS_VALID -subj "/C=$COUNTRY/ST=$STATE/L=$LOCALITY/O=$ORGANIZATION/OU=$ORG_UNIT/CN=ServiceRadar Root CA"
-  chmod 644 "$CERT_DIR/root.pem"; chmod 640 "$CERT_DIR/root-key.pem"
+  chmod 644 "$CERT_DIR/root.pem"; chmod 600 "$CERT_DIR/root-key.pem"
 fi

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out an opportunity to improve security by restricting permissions on the root-key.pem file, aligning it with best practices and other keys in the script.

Low
  • Update
Imported GitHub PR comment. Original author: @qodo-code-review[bot] Original URL: https://github.com/carverauto/serviceradar/pull/2275#issuecomment-3742116331 Original created: 2026-01-13T05:59:52Z --- ## PR Code Suggestions ✨ <!-- 482d0b6 --> Explore these optional code suggestions: <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>Possible issue</td> <td> <details><summary>✅ <s>Prevent an infinite retry loop</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The commit adds explicit handle_info/2 clauses for {:error, :not_found} (and also {:error, {:unexpected_count, count}}) that log and stop scheduling further retries, addressing the infinite retry concern for unrecoverable tenant-fetch errors. It also introduces attempt tracking and exponential backoff with jitter for retriable errors, though it still retries on generic {:error, reason}. code diff: ```diff def handle_info({:publish_platform_event, tenant_id, event}, state) do + handle_info({:publish_platform_event, tenant_id, event, 1}, state) + end + + def handle_info({:publish_platform_event, tenant_id, event, attempt}, state) do case fetch_platform_tenant(tenant_id) do {:ok, tenant} -> if publish_event(event, tenant) != :ok do - schedule_publish_retry(tenant_id, event) + schedule_publish_retry(tenant_id, event, attempt + 1) end + + {:error, :not_found} -> + Logger.warning( + "[PlatformTenantBootstrap] Tenant not found during publish retry; stopping retries.", + tenant_id: tenant_id, + event: event + ) + + {:error, {:unexpected_count, count}} -> + Logger.error( + "[PlatformTenantBootstrap] Multiple tenants found during publish retry; stopping retries.", + tenant_id: tenant_id, + count: count, + event: event + ) {:error, reason} -> Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant", tenant_id: tenant_id, - reason: inspect(reason) - ) - - schedule_publish_retry(tenant_id, event) + reason: inspect(reason), + event: event + ) + + schedule_publish_retry(tenant_id, event, attempt + 1) end {:noreply, state} @@ -222,7 +243,7 @@ event: event ) - schedule_publish_retry(tenant.id, event) + schedule_publish_retry(tenant.id, event, 1) end end @@ -231,8 +252,20 @@ defp publish_event(:deleted, tenant), do: TenantLifecyclePublisher.publish_deleted(tenant) defp publish_event(_event, tenant), do: TenantLifecyclePublisher.publish_updated(tenant) - defp schedule_publish_retry(tenant_id, event) do - Process.send_after(self(), {:publish_platform_event, tenant_id, event}, @publish_retry_delay) + defp schedule_publish_retry(tenant_id, event, attempt) do + delay = calculate_publish_backoff(attempt) + Process.send_after(self(), {:publish_platform_event, tenant_id, event, attempt}, delay) + end + + defp calculate_publish_backoff(attempt) do + exponent = min(max(attempt - 1, 0), 16) + base_delay = + @initial_publish_retry_delay + |> Kernel.*(Integer.pow(2, exponent)) + |> min(@max_publish_retry_delay) + + jitter = :rand.uniform(max(div(base_delay, 2), 1)) + min(base_delay + jitter, @max_publish_retry_delay) end ``` </details> ___ **In <code>handle_info/2</code>, add specific case clauses for <code>{:error, :not_found}</code> and other <br>unrecoverable errors to prevent an infinite retry loop when a tenant cannot be <br>fetched.** [elixir/serviceradar_core/lib/serviceradar/identity/platform_tenant_bootstrap.ex [199-216]](https://github.com/carverauto/serviceradar/pull/2275/files#diff-10699868a7d10a4386ad293ed86e1e60a9e3c172bc2ea3d22f98c0efae666182R199-R216) ```diff def handle_info({:publish_platform_event, tenant_id, event}, state) do case fetch_platform_tenant(tenant_id) do {:ok, tenant} -> if publish_event(event, tenant) != :ok do schedule_publish_retry(tenant_id, event) end - {:error, reason} -> - Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant", + {:error, :not_found} -> + Logger.warning( + "[PlatformTenantBootstrap] Tenant not found during publish retry; stopping retries.", tenant_id: tenant_id, - reason: inspect(reason) + event: event ) - schedule_publish_retry(tenant_id, event) + {:error, reason} -> + Logger.error( + "[PlatformTenantBootstrap] Unrecoverable error fetching tenant during publish retry; stopping retries.", + tenant_id: tenant_id, + reason: inspect(reason), + event: event + ) end {:noreply, state} end ``` `[Suggestion processed]` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The suggestion correctly identifies a potential infinite retry loop if a tenant is not found, which is a significant bug that could lead to resource exhaustion and log spam. </details></details></td><td align=center>High </td></tr><tr><td rowspan=1>High-level</td> <td> <details><summary>✅ <s>Implement exponential backoff for retries</s></summary> ___ <details><summary><b>Suggestion Impact:</b></summary>The fixed @publish_retry_delay retry was replaced with an exponential backoff retry mechanism: retry messages now include an attempt count, delays are calculated using exponential growth capped by a max delay and include jitter, and retry scheduling uses the computed delay. The handler was updated to support the new message shape and initial attempts. code diff: ```diff - @publish_retry_delay 10_000 + @initial_publish_retry_delay 1_000 + @max_publish_retry_delay 60_000 def start_link(opts \\ []) do GenServer.start_link(__MODULE__, opts, name: __MODULE__) @@ -197,19 +198,39 @@ @impl true def handle_info({:publish_platform_event, tenant_id, event}, state) do + handle_info({:publish_platform_event, tenant_id, event, 1}, state) + end + + def handle_info({:publish_platform_event, tenant_id, event, attempt}, state) do case fetch_platform_tenant(tenant_id) do {:ok, tenant} -> if publish_event(event, tenant) != :ok do - schedule_publish_retry(tenant_id, event) + schedule_publish_retry(tenant_id, event, attempt + 1) end + + {:error, :not_found} -> + Logger.warning( + "[PlatformTenantBootstrap] Tenant not found during publish retry; stopping retries.", + tenant_id: tenant_id, + event: event + ) + + {:error, {:unexpected_count, count}} -> + Logger.error( + "[PlatformTenantBootstrap] Multiple tenants found during publish retry; stopping retries.", + tenant_id: tenant_id, + count: count, + event: event + ) {:error, reason} -> Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant", tenant_id: tenant_id, - reason: inspect(reason) - ) - - schedule_publish_retry(tenant_id, event) + reason: inspect(reason), + event: event + ) + + schedule_publish_retry(tenant_id, event, attempt + 1) end {:noreply, state} @@ -222,7 +243,7 @@ event: event ) - schedule_publish_retry(tenant.id, event) + schedule_publish_retry(tenant.id, event, 1) end end @@ -231,8 +252,20 @@ defp publish_event(:deleted, tenant), do: TenantLifecyclePublisher.publish_deleted(tenant) defp publish_event(_event, tenant), do: TenantLifecyclePublisher.publish_updated(tenant) - defp schedule_publish_retry(tenant_id, event) do - Process.send_after(self(), {:publish_platform_event, tenant_id, event}, @publish_retry_delay) + defp schedule_publish_retry(tenant_id, event, attempt) do + delay = calculate_publish_backoff(attempt) + Process.send_after(self(), {:publish_platform_event, tenant_id, event, attempt}, delay) + end + + defp calculate_publish_backoff(attempt) do + exponent = min(max(attempt - 1, 0), 16) + base_delay = + @initial_publish_retry_delay + |> Kernel.*(Integer.pow(2, exponent)) + |> min(@max_publish_retry_delay) + + jitter = :rand.uniform(max(div(base_delay, 2), 1)) + min(base_delay + jitter, @max_publish_retry_delay) end ``` </details> ___ **Replace the fixed-delay retry mechanism for platform tenant event publishing <br>with an exponential backoff strategy. This will improve system resilience during <br>outages.** ### Examples: <details> <summary> <a href="https://github.com/carverauto/serviceradar/pull/2275/files#diff-10699868a7d10a4386ad293ed86e1e60a9e3c172bc2ea3d22f98c0efae666182R198-R236">elixir/serviceradar_core/lib/serviceradar/identity/platform_tenant_bootstrap.ex [198-236]</a> </summary> ```elixir @impl true def handle_info({:publish_platform_event, tenant_id, event}, state) do case fetch_platform_tenant(tenant_id) do {:ok, tenant} -> if publish_event(event, tenant) != :ok do schedule_publish_retry(tenant_id, event) end {:error, reason} -> Logger.warning("[PlatformTenantBootstrap] Retry publish failed to load tenant", ... (clipped 29 lines) ``` </details> ### Solution Walkthrough: #### Before: ```elixir defmodule ServiceRadar.Identity.PlatformTenantBootstrap do @publish_retry_delay 10_000 # Fixed 10-second delay defp publish_platform_lifecycle_event(event, tenant) do if publish_event(event, tenant) != :ok do schedule_publish_retry(tenant.id, event) end end def handle_info({:publish_platform_event, tenant_id, event}, state) do # ... attempt to publish ... if publish_failed do # Always retries with the same fixed delay schedule_publish_retry(tenant_id, event) end ... end defp schedule_publish_retry(tenant_id, event) do Process.send_after(self(), {:publish_platform_event, tenant_id, event}, @publish_retry_delay) end end ``` #### After: ```elixir defmodule ServiceRadar.Identity.PlatformTenantBootstrap do @initial_delay 1_000 @max_delay 60_000 defp publish_platform_lifecycle_event(event, tenant) do if publish_event(event, tenant) != :ok do schedule_publish_retry(tenant.id, event, 1) # Start with attempt 1 end end def handle_info({:publish_platform_event, tenant_id, event, attempt}, state) do # ... attempt to publish ... if publish_failed do schedule_publish_retry(tenant_id, event, attempt + 1) end ... end defp schedule_publish_retry(tenant_id, event, attempt) do delay = calculate_backoff(attempt) Process.send_after(self(), {:publish_platform_event, tenant_id, event, attempt}, delay) end defp calculate_backoff(attempt) do # e.g., min(@max_delay, @initial_delay * 2 ^ (attempt - 1) + jitter) ... end end ``` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The suggestion correctly identifies that the implemented retry mechanism uses a fixed delay, contrary to the PR's stated goal of exponential backoff, and proposes a more resilient best-practice strategy. </details></details></td><td align=center>Medium </td></tr><tr><td rowspan=1>Security</td> <td> <details><summary>Improve security of root key</summary> ___ **Change the permissions of <code>root-key.pem</code> from <code>640</code> to <code>600</code> to restrict access to the <br>owner only, enhancing security and aligning with the principle of least <br>privilege.** [helm/serviceradar/files/generate-certs.sh [17-21]](https://github.com/carverauto/serviceradar/pull/2275/files#diff-224edb1896351749211a2a609692daa31f4f45cf175658124613b9dc08496d96R17-R21) ```diff if [ ! -f "$CERT_DIR/root.pem" ]; then openssl genrsa -out "$CERT_DIR/root-key.pem" 4096 openssl req -new -x509 -sha256 -key "$CERT_DIR/root-key.pem" -out "$CERT_DIR/root.pem" -days $DAYS_VALID -subj "/C=$COUNTRY/ST=$STATE/L=$LOCALITY/O=$ORGANIZATION/OU=$ORG_UNIT/CN=ServiceRadar Root CA" - chmod 644 "$CERT_DIR/root.pem"; chmod 640 "$CERT_DIR/root-key.pem" + chmod 644 "$CERT_DIR/root.pem"; chmod 600 "$CERT_DIR/root-key.pem" fi ``` `[To ensure code accuracy, apply this suggestion manually]` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The suggestion correctly points out an opportunity to improve security by restricting permissions on the `root-key.pem` file, aligning it with best practices and other keys in the script. </details></details></td><td align=center>Low </td></tr> <tr><td align="center" colspan="2"> - [ ] Update <!-- /improve_multi --more_suggestions=true --> </td><td></td></tr></tbody></table>
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!2658
No description provided.