Feat: SNMP MIB Enrichment #1103

Open
opened 2026-03-28 04:31:42 +00:00 by mfreeman451 · 0 comments
Owner

Imported from GitHub.

Original GitHub issue: #3018
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/3018
Original created: 2026-03-09T03:29:00Z


Product Requirements Document (PRD): SNMP MIB Enrichment Pipeline

1. Overview and Objective

Project: OpenSource Network Management Software
Component: MIB Enrichment Service
Tech Stack: Elixir, Broadway, NATS JetStream, CNPG (PostgreSQL)

Objective:
Build a high-throughput, low-latency stream processing service that sits between the SNMP Collector and the Log Normalization Engine (Zen-Engine). This service will translate raw, numeric SNMP OIDs and varbinds into human-readable, context-rich JSON payloads using Ahead-of-Time (AOT) compiled MIB dictionaries.

2. Problem Statement

Currently, SNMP traps flow into the system as raw numeric OIDs (e.g., .1.3.6.1.2.1.2.2.1.8.4). Because there are no MIBs loaded, downstream components (Zen-Engine) and end-users must deal with raw OIDs, which makes writing normalization rules and searching data practically impossible.

Dynamically parsing raw ASN.1 MIB files from disk per-trap at ingestion is an anti-pattern that will cause massive CPU bottlenecks, memory spikes, and pipeline backpressure during a trap storm. We need a scalable way to load hundreds of thousands of OID mappings into memory and enrich traps at line-rate.

3. Proposed Solution

Introduce a dedicated Elixir/Broadway Consumer to handle OID enrichment.

  1. Ahead-of-Time (AOT) Compilation: MIBs are pre-compiled offline into flattened JSON and stored in the CNPG database (leveraging ParadeDB for UI searchability).
  2. Memory Efficiency: The Elixir service loads this JSON on startup directly into :persistent_term, allowing thousands of concurrent Broadway workers to access the MIB dictionary via direct memory pointers with O(1) lookup speeds, zero memory duplication, and zero Garbage Collection (GC) overhead.
  3. Longest Prefix Match (LPM): The service resolves SNMP table instances dynamically by recursively truncating the OID list until a base MIB match is found.

4. System Architecture & Data Flow

flowchart LR
    A[SNMP Collector] -->|Raw OID Trap| B[(NATS JetStream\nRAW_TRAPS)]
    B --> C{Broadway Enricher\nElixir / OTP}
    
    subgraph Elixir Node
    C -->|Reads OIDs| D[persistent_term\nGlobal Shared Memory]
    D -.->|Loads on Boot / Reload| E[(CNPG / ParadeDB\nCompiled MIBs)]
    end
    
    C -->|Enriched JSON| F[(NATS JetStream\nENRICHED_TRAPS)]
    F --> G[Zen-Engine Normalizer\nRust]
    G --> H[(NATS JetStream)]
    H --> I[DB Event Writer\nGolang]
    I --> J[(CNPG Database)]

5. Functional Requirements

5.1 System Initialization (Boot)

  • FR-1: On startup, the service must query the CNPG database to retrieve the fully compiled, flattened MIB dictionary.
  • FR-2: The service must format the OIDs as lists of integers and store the mappings in Erlang's :persistent_term.

5.2 Trap Processing (Broadway Pipeline)

  • FR-3: The service must consume messages from the RAW_TRAPS NATS JetStream topic using Broadway backpressure mechanisms.
  • FR-4: For every snmpTrapOID and varbind OID in the payload, the service must perform a Longest Prefix Match (LPM) lookup against :persistent_term.
  • FR-5: If an OID resolves to a known MIB, the service must extract the base symbolic name (e.g., ifOperStatus) and the instance suffix (e.g., [4]).
  • FR-6: If a varbind contains an integer value and the mapped MIB defines ENUMs (e.g., 1: up, 2: down), the service must translate the integer to the human-readable string.
  • FR-7: The service must output a newly structured JSON payload containing both the raw OIDs/values and the enriched symbolic names/string values.
  • FR-8: Enriched traps must be batched and published to the ENRICHED_TRAPS NATS JetStream topic.

5.3 Hot Reloading

  • FR-9: The service must subscribe to a NATS Control Topic (e.g., SYS.MIBS.UPDATED).
  • FR-10: When a message is received on this topic (triggered by a user uploading a new MIB to the UI), the service must re-query CNPG and update :persistent_term without dropping the NATS JetStream connection or requiring a container restart.

6. Non-Functional Requirements (NFRs)

  • Performance: Must be capable of processing >10,000 traps per second per container.
  • Memory Footprint: The application footprint should remain relatively flat (estimated < 500MB), regardless of worker concurrency, due to :persistent_term shared references.
  • Resilience: Malformed SNMP traps must not crash the broader pipeline. Broadway workers encountering bad payloads must log the error, dead-letter the message, and continue processing the stream (standard OTP supervision).
  • Backpressure: If Zen-Engine slows down, NATS will fill, and Broadway must naturally slow its ingestion rate from the collector to prevent Out-Of-Memory (OOM) failures.

7. Implementation Specifications

7.1 Persistent Term Data Structure

To ensure O(1) lookups, the :persistent_term keys should be structured as Elixir tuples:

# Key: {:oid, [integer()]} 
# Value: %{name: String.t(), syntax: String.t(), enums: map()}

:persistent_term.put(
  {:oid,[1, 3, 6, 1, 2, 1, 2, 2, 1, 8]}, 
  %{name: "ifOperStatus", syntax: "INTEGER", enums: %{1 => "up", 2 => "down", 3 => "testing"}}
)

7.2 Longest Prefix Match Algorithm (LPM)

Because SNMP appends instance data to OIDs (e.g., .1.3.6.1.2.1.2.2.1.8.4), an exact match will fail. The system must implement a recursive truncation algorithm:

  1. Lookup full list [1, 3, 6, 1, 2, 1, 2, 2, 1, 8, 4] -> MISS.
  2. Pop last integer 4, prepend to instance accumulator, lookup [1, 3, 6, 1, 2, 1, 2, 2, 1, 8] -> HIT.
  3. Return Base MIB Info + Instance [4].

7.3 Schema of Enriched Payload

Example Output Payload:

{
  "timestamp": "2026-03-08T10:00:00Z",
  "source_ip": "10.0.0.5",
  "trap_oid": ".1.3.6.1.6.3.1.1.5.3",
  "trap_name": "linkDown",
  "varbinds":[
    {
      "raw_oid": ".1.3.6.1.2.1.2.2.1.8.4",
      "raw_value": 2,
      "mib_name": "ifOperStatus",
      "instance": "4",
      "enriched_value": "down"
    }
  ]
}

8. Out of Scope

  • MIB Compilation Process: The actual offline parsing of .txt / .my ASN.1 files into flattened JSON. This PRD assumes the JSON already exists in CNPG.
  • Zen-Engine Rules: The creation of business logic and normalization rules inside the Rust engine.
  • Database Writes: Writing the final events to TimescaleDB/ParadeDB (handled by the Golang DB-writer).

9. Acceptance Criteria

  • Service connects to NATS and successfully reads from RAW_TRAPS.
  • Service successfully pulls MIB dictionaries from CNPG on startup and populates :persistent_term.
  • Traps containing known OIDs with instance suffixes are successfully resolved to their base names using the LPM algorithm.
  • Integer values mapped to MIB ENUMs are successfully translated to strings.
  • Output JSON matches the defined schema and is successfully published to ENRICHED_TRAPS.
  • Emitting a NATS control message successfully triggers a background refresh of the MIB memory without halting trap processing.
  • Load testing confirms stable memory usage (<500MB) when processing 10k traps/sec.
Imported from GitHub. Original GitHub issue: #3018 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/3018 Original created: 2026-03-09T03:29:00Z --- # Product Requirements Document (PRD): SNMP MIB Enrichment Pipeline ## 1. Overview and Objective **Project:** OpenSource Network Management Software **Component:** MIB Enrichment Service **Tech Stack:** Elixir, Broadway, NATS JetStream, CNPG (PostgreSQL) **Objective:** Build a high-throughput, low-latency stream processing service that sits between the SNMP Collector and the Log Normalization Engine (Zen-Engine). This service will translate raw, numeric SNMP OIDs and varbinds into human-readable, context-rich JSON payloads using Ahead-of-Time (AOT) compiled MIB dictionaries. ## 2. Problem Statement Currently, SNMP traps flow into the system as raw numeric OIDs (e.g., `.1.3.6.1.2.1.2.2.1.8.4`). Because there are no MIBs loaded, downstream components (Zen-Engine) and end-users must deal with raw OIDs, which makes writing normalization rules and searching data practically impossible. Dynamically parsing raw ASN.1 MIB files from disk per-trap at ingestion is an anti-pattern that will cause massive CPU bottlenecks, memory spikes, and pipeline backpressure during a trap storm. We need a scalable way to load hundreds of thousands of OID mappings into memory and enrich traps at line-rate. ## 3. Proposed Solution Introduce a dedicated **Elixir/Broadway Consumer** to handle OID enrichment. 1. **Ahead-of-Time (AOT) Compilation:** MIBs are pre-compiled offline into flattened JSON and stored in the CNPG database (leveraging ParadeDB for UI searchability). 2. **Memory Efficiency:** The Elixir service loads this JSON on startup directly into `:persistent_term`, allowing thousands of concurrent Broadway workers to access the MIB dictionary via direct memory pointers with O(1) lookup speeds, zero memory duplication, and zero Garbage Collection (GC) overhead. 3. **Longest Prefix Match (LPM):** The service resolves SNMP table instances dynamically by recursively truncating the OID list until a base MIB match is found. ## 4. System Architecture & Data Flow ```mermaid flowchart LR A[SNMP Collector] -->|Raw OID Trap| B[(NATS JetStream\nRAW_TRAPS)] B --> C{Broadway Enricher\nElixir / OTP} subgraph Elixir Node C -->|Reads OIDs| D[persistent_term\nGlobal Shared Memory] D -.->|Loads on Boot / Reload| E[(CNPG / ParadeDB\nCompiled MIBs)] end C -->|Enriched JSON| F[(NATS JetStream\nENRICHED_TRAPS)] F --> G[Zen-Engine Normalizer\nRust] G --> H[(NATS JetStream)] H --> I[DB Event Writer\nGolang] I --> J[(CNPG Database)] ``` ## 5. Functional Requirements ### 5.1 System Initialization (Boot) * **FR-1:** On startup, the service must query the CNPG database to retrieve the fully compiled, flattened MIB dictionary. * **FR-2:** The service must format the OIDs as lists of integers and store the mappings in Erlang's `:persistent_term`. ### 5.2 Trap Processing (Broadway Pipeline) * **FR-3:** The service must consume messages from the `RAW_TRAPS` NATS JetStream topic using Broadway backpressure mechanisms. * **FR-4:** For every `snmpTrapOID` and `varbind` OID in the payload, the service must perform a Longest Prefix Match (LPM) lookup against `:persistent_term`. * **FR-5:** If an OID resolves to a known MIB, the service must extract the base symbolic name (e.g., `ifOperStatus`) and the instance suffix (e.g., `[4]`). * **FR-6:** If a varbind contains an integer value and the mapped MIB defines ENUMs (e.g., `1: up, 2: down`), the service must translate the integer to the human-readable string. * **FR-7:** The service must output a newly structured JSON payload containing both the raw OIDs/values and the enriched symbolic names/string values. * **FR-8:** Enriched traps must be batched and published to the `ENRICHED_TRAPS` NATS JetStream topic. ### 5.3 Hot Reloading * **FR-9:** The service must subscribe to a NATS Control Topic (e.g., `SYS.MIBS.UPDATED`). * **FR-10:** When a message is received on this topic (triggered by a user uploading a new MIB to the UI), the service must re-query CNPG and update `:persistent_term` without dropping the NATS JetStream connection or requiring a container restart. ## 6. Non-Functional Requirements (NFRs) * **Performance:** Must be capable of processing >10,000 traps per second per container. * **Memory Footprint:** The application footprint should remain relatively flat (estimated < 500MB), regardless of worker concurrency, due to `:persistent_term` shared references. * **Resilience:** Malformed SNMP traps must not crash the broader pipeline. Broadway workers encountering bad payloads must log the error, dead-letter the message, and continue processing the stream (standard OTP supervision). * **Backpressure:** If Zen-Engine slows down, NATS will fill, and Broadway must naturally slow its ingestion rate from the collector to prevent Out-Of-Memory (OOM) failures. ## 7. Implementation Specifications ### 7.1 Persistent Term Data Structure To ensure O(1) lookups, the `:persistent_term` keys should be structured as Elixir tuples: ```elixir # Key: {:oid, [integer()]} # Value: %{name: String.t(), syntax: String.t(), enums: map()} :persistent_term.put( {:oid,[1, 3, 6, 1, 2, 1, 2, 2, 1, 8]}, %{name: "ifOperStatus", syntax: "INTEGER", enums: %{1 => "up", 2 => "down", 3 => "testing"}} ) ``` ### 7.2 Longest Prefix Match Algorithm (LPM) Because SNMP appends instance data to OIDs (e.g., `.1.3.6.1.2.1.2.2.1.8.4`), an exact match will fail. The system must implement a recursive truncation algorithm: 1. Lookup full list `[1, 3, 6, 1, 2, 1, 2, 2, 1, 8, 4]` -> MISS. 2. Pop last integer `4`, prepend to instance accumulator, lookup `[1, 3, 6, 1, 2, 1, 2, 2, 1, 8]` -> HIT. 3. Return Base MIB Info + Instance `[4]`. ### 7.3 Schema of Enriched Payload *Example Output Payload:* ```json { "timestamp": "2026-03-08T10:00:00Z", "source_ip": "10.0.0.5", "trap_oid": ".1.3.6.1.6.3.1.1.5.3", "trap_name": "linkDown", "varbinds":[ { "raw_oid": ".1.3.6.1.2.1.2.2.1.8.4", "raw_value": 2, "mib_name": "ifOperStatus", "instance": "4", "enriched_value": "down" } ] } ``` ## 8. Out of Scope * **MIB Compilation Process:** The actual offline parsing of `.txt` / `.my` ASN.1 files into flattened JSON. This PRD assumes the JSON already exists in CNPG. * **Zen-Engine Rules:** The creation of business logic and normalization rules inside the Rust engine. * **Database Writes:** Writing the final events to TimescaleDB/ParadeDB (handled by the Golang DB-writer). ## 9. Acceptance Criteria - [ ] Service connects to NATS and successfully reads from `RAW_TRAPS`. - [ ] Service successfully pulls MIB dictionaries from CNPG on startup and populates `:persistent_term`. - [ ] Traps containing known OIDs with instance suffixes are successfully resolved to their base names using the LPM algorithm. - [ ] Integer values mapped to MIB ENUMs are successfully translated to strings. - [ ] Output JSON matches the defined schema and is successfully published to `ENRICHED_TRAPS`. - [ ] Emitting a NATS control message successfully triggers a background refresh of the MIB memory without halting trap processing. - [ ] Load testing confirms stable memory usage (<500MB) when processing 10k traps/sec.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#1103
No description provided.