feat: add BGP/BMP collector #700

Closed
opened 2026-03-28 04:27:35 +00:00 by mfreeman451 · 1 comment
Owner

Imported from GitHub.

Original GitHub issue: #2183
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2183
Original created: 2025-12-18T06:35:21Z


Is your feature request related to a problem?

We need to integrate a BMP (BGP) collector into ServiceRadar that can sit on the edge or in cluster,
receive BMP data, and write it to NATS JetStream. The interface to the message broker should be abstracted so we could support additional/ different message brokers in the future, right now we are currently targeting NATS JetStream but will be considering something like igy.rs or some kind of hybrid architecture in the future.

Once data is written to the message broker, we have several options of what to do next. Using the stateless rule-based zen-engine (serviceradar-zen), we can do really fast ETL to get it into the correct shape for an OCSF-based schema, write it to a different message subject, and then the db-event-writer consumer would process it off the queue and write it to the DB.

Describe the solution you'd like

We have identified https://github.com/nxthdr/risotto as being a prime candidate and is currently MIT licensed, upstream maintainers have been contacted in the past and are open to PR. We would likely need to create a new trait for the messaging interface, similiar to https://github.com/carverauto/serviceradar/blob/staging/sr-architecture-and-design/prd/14-netflow-collector-rust.md

I think it makes more sense to try and do the ETL in our pipeline and not really in the rust BMP collector itself so that it stays generic or in whatever existing format it's already in and we can keep using the upstream version.

https://schema.ocsf.io/1.7.0/classes/network_activity

  • #2184
  • rust-based BMP collector/daemon that writes to a message broker
  • zen rule for processing into OCSF format if possible/necessary
  • db-event-writer updated
  • UI dashboards

PRD:
https://github.com/carverauto/serviceradar/blob/staging/sr-architecture-and-design/prd/15-bmp-bgp-collector-rust.md

Future work would be around analysis/data processing and could involve @marvin-hansen and his causal computation library (https://github.com/deepcausality-rs)

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Related to https://github.com/carverauto/serviceradar/issues/859
Tangentally related to https://github.com/carverauto/serviceradar/issues/2181

Imported from GitHub. Original GitHub issue: #2183 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2183 Original created: 2025-12-18T06:35:21Z --- **Is your feature request related to a problem?** We need to integrate a BMP (BGP) collector into ServiceRadar that can sit on the edge or in cluster, receive BMP data, and write it to NATS JetStream. The interface to the message broker should be abstracted so we could support additional/ different message brokers in the future, right now we are currently targeting NATS JetStream but will be considering something like igy.rs or some kind of hybrid architecture in the future. Once data is written to the message broker, we have several options of what to do next. Using the stateless rule-based zen-engine (serviceradar-zen), we can do really fast ETL to get it into the correct shape for an OCSF-based schema, write it to a different message subject, and then the `db-event-writer` consumer would process it off the queue and write it to the DB. **Describe the solution you'd like** We have identified https://github.com/nxthdr/risotto as being a prime candidate and is currently MIT licensed, upstream maintainers have been contacted in the past and are open to PR. We would likely need to create a new trait for the messaging interface, similiar to https://github.com/carverauto/serviceradar/blob/staging/sr-architecture-and-design/prd/14-netflow-collector-rust.md I think it makes more sense to try and do the ETL in our pipeline and not really in the rust BMP collector itself so that it stays generic or in whatever existing format it's already in and we can keep using the upstream version. https://schema.ocsf.io/1.7.0/classes/network_activity - [x] #2184 - [ ] rust-based BMP collector/daemon that writes to a message broker - [ ] zen rule for processing into OCSF format if possible/necessary - [ ] db-event-writer updated - [ ] UI dashboards **PRD:** https://github.com/carverauto/serviceradar/blob/staging/sr-architecture-and-design/prd/15-bmp-bgp-collector-rust.md Future work would be around analysis/data processing and could involve @marvin-hansen and his causal computation library (https://github.com/deepcausality-rs) **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Related to https://github.com/carverauto/serviceradar/issues/859 Tangentally related to https://github.com/carverauto/serviceradar/issues/2181
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2183#issuecomment-3906229640
Original created: 2026-02-16T03:20:14Z


This PRD outlines the integration of BGP/BMP (BGP Monitoring Protocol) into the ServiceRadar ecosystem. By leveraging the Rust risotto crate for ingestion and an Elixir Broadway consumer for the data pipeline, we bypass the complexity of Zen rules while gaining deep visibility into internal routing decisions (Calico, OSPF) and external ISP health.


PRD: ServiceRadar BGP/BMP "Routing Intelligence" Engine

1. Vision & Purpose

To transform ServiceRadar from a status-monitor into a Decision-Monitor. By capturing the real-time routing conversations between the UDM Pro Max (FRR), K8s Clusters (Calico), and internal OSPF routers (farm01/tonka01), we provide an absolute visual "Source of Truth" for how data travels through the infrastructure.


2. Technical Architecture: The "Decision Pipeline"

2.1 The Ingestor (Rust + Risotto)

  • Role: A standalone Rust binary using the risotto crate.
  • Mechanism: Listens on port 11019 for BMP streams from the UDM Pro Max.
  • Action: Decodes raw BGP messages (Route Monitoring, Peer Up/Down, Stats) into structured JSON/Protobuf and publishes to NATS JetStream on the events.bgp.raw subject.
  • Performance: Zero-copy decoding in Rust ensures the UDM’s minimal BGP table or a full global table can be handled with the same sub-millisecond latency.

2.2 The Pipeline (Elixir + Broadway)

  • Role: A high-throughput Broadway consumer in the ServiceRadar Core.
  • Mechanism: Consumes from NATS events.bgp.raw.
  • Logic:
    • State Tracking: Maintains a real-time "Routing Table" state in CNPG/TimescaleDB.
    • Normalization: Maps BGP attributes (AS_PATH, Next Hop, Local Pref) into OCSF Network Activity classes.
    • OSPF Mapping: Specifically parses redistributed OSPF routes (via the UDM) to identify internal physical router-to-router links.
  • Benefit: Provides standard Elixir error handling and scalability without the DSL overhead of Zen.

3. Cyber-Physical Routing Topology

3.1 The "K8s-to-Core" Visibility (Calico)

  • The Problem: Currently, pod networking is a "black box" to the router.
  • The BMP Solution: Calico announces pod CIDRs to the UDM via BGP. BMP streams these announcements to ServiceRadar.
  • God-View Visual: Renders Dynamic Arcs between specific K8s Nodes and the UDM. If a pod route shifts to a different node, the Arc physically moves in 3D space.

3.2 OSPF Redistribution Visibility

  • Mechanism: Configure FRR on the UDM to redistribute ospf into the BGP process monitored by BMP.
  • Result: ServiceRadar gains visibility into the OSPF adjacencies between farm01 and tonka01.
  • Causal Link: If the OSPF link between farm01 and tonka01 fails, the BGP BMP stream will report the route withdrawal.

4. Integration with the "Four Pillars"

Pillar 1: Apache Arrow

  • BGP updates (Announcements/Withdrawals) are pushed into the global Arrow buffer.
  • Data Shape: [Source_Node_ID, Destination_Node_ID, Prefix, Metric, Status].

Pillar 2: Wasm-Arrow Bridge

  • Wasm engine in the browser monitors the BGP Arrow column.
  • Instant Interaction: If an admin clicks farm01, Wasm calculates all active BGP/OSPF routes originating from that router in <1ms.

Pillar 3: Deep Causality (The Brain)

  • Links BMP PeerDown events with SNMP Interface events.
  • Example: If risotto reports an ISP PeerDown and SNMP reports eth0 is UP, the engine flags a Provider Routing Failure (Logic) rather than a Cable Failure (Physical).

Pillar 4: Animated Particle Shaders

  • Atmosphere Layer: 3D arcs represent the BGP/OSPF paths.
  • Visual Logic:
    • Standard: Cyan/Blue particles flowing along arcs.
    • Route Withdrawal: The arc turns Red and Dissipates (ghosting).
    • Flapping Route: The arc Vibrates and turns Amber, signaling instability to the "Bored Admin."

5. Critical Use Cases

5.1 Calico Pod Egress Verification

  • Signal: risotto sees a Calico route withdrawal for Pod CIDR 10.42.50.0/24.
  • Action: ServiceRadar "Ghosts" the K8s cluster node in the God-View and triggers an alert: "Route to Pod Subnet Lost - Check Calico Node BGP Peer."

5.2 Internal Router Path Shift (Farm -> Tonka)

  • Signal: BMP reports a Next-Hop change for internal traffic from farm01 to tonka01.
  • Visual: The God-View reroutes the Animated Particle Stream from the primary link to the backup link in real-time.

6. Success Metrics

  1. Ingestion Latency: BGP route changes on the UDM must appear in the God-View UI in < 500ms.
  2. State Accuracy: The ServiceRadar routing table must match the FRR show ip bgp table with 100% parity.
  3. Visual Scale: Handle 1,000 internal routes (Calico) and 5,000 redistributed OSPF/External routes without dropping frames on the 3D ArcLayer.

7. Implementation Steps for Today

  1. FRR Config: Enable BMP targets on the UDM Pro Max.
  2. Redistribution: Set router ospf to redistribute into router bgp.
  3. Rust Build: Deploy risotto collector as a sidecar/container.
  4. Broadway Build: Create the Broadway NATS consumer to update the bgp_routing_state hypertable in CNPG.
  5. Shaders: Update deck.gl ArcLayer to subscribe to the BGP Arrow buffer.
Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2183#issuecomment-3906229640 Original created: 2026-02-16T03:20:14Z --- This PRD outlines the integration of **BGP/BMP (BGP Monitoring Protocol)** into the ServiceRadar ecosystem. By leveraging the Rust `risotto` crate for ingestion and an Elixir **Broadway** consumer for the data pipeline, we bypass the complexity of Zen rules while gaining deep visibility into internal routing decisions (Calico, OSPF) and external ISP health. --- # PRD: ServiceRadar BGP/BMP "Routing Intelligence" Engine ## 1. Vision & Purpose To transform ServiceRadar from a status-monitor into a **Decision-Monitor**. By capturing the real-time routing conversations between the UDM Pro Max (FRR), K8s Clusters (Calico), and internal OSPF routers (`farm01`/`tonka01`), we provide an absolute visual "Source of Truth" for how data travels through the infrastructure. --- ## 2. Technical Architecture: The "Decision Pipeline" ### 2.1 The Ingestor (Rust + Risotto) * **Role:** A standalone Rust binary using the `risotto` crate. * **Mechanism:** Listens on port `11019` for BMP streams from the UDM Pro Max. * **Action:** Decodes raw BGP messages (Route Monitoring, Peer Up/Down, Stats) into structured JSON/Protobuf and publishes to NATS JetStream on the `events.bgp.raw` subject. * **Performance:** Zero-copy decoding in Rust ensures the UDM’s minimal BGP table or a full global table can be handled with the same sub-millisecond latency. ### 2.2 The Pipeline (Elixir + Broadway) * **Role:** A high-throughput Broadway consumer in the ServiceRadar Core. * **Mechanism:** Consumes from NATS `events.bgp.raw`. * **Logic:** * **State Tracking:** Maintains a real-time "Routing Table" state in CNPG/TimescaleDB. * **Normalization:** Maps BGP attributes (AS_PATH, Next Hop, Local Pref) into OCSF Network Activity classes. * **OSPF Mapping:** Specifically parses redistributed OSPF routes (via the UDM) to identify internal physical router-to-router links. * **Benefit:** Provides standard Elixir error handling and scalability without the DSL overhead of Zen. --- ## 3. Cyber-Physical Routing Topology ### 3.1 The "K8s-to-Core" Visibility (Calico) * **The Problem:** Currently, pod networking is a "black box" to the router. * **The BMP Solution:** Calico announces pod CIDRs to the UDM via BGP. BMP streams these announcements to ServiceRadar. * **God-View Visual:** Renders **Dynamic Arcs** between specific K8s Nodes and the UDM. If a pod route shifts to a different node, the Arc physically moves in 3D space. ### 3.2 OSPF Redistribution Visibility * **Mechanism:** Configure FRR on the UDM to `redistribute ospf` into the BGP process monitored by BMP. * **Result:** ServiceRadar gains visibility into the OSPF adjacencies between `farm01` and `tonka01`. * **Causal Link:** If the OSPF link between `farm01` and `tonka01` fails, the BGP BMP stream will report the route withdrawal. --- ## 4. Integration with the "Four Pillars" ### Pillar 1: Apache Arrow * BGP updates (Announcements/Withdrawals) are pushed into the global Arrow buffer. * **Data Shape:** `[Source_Node_ID, Destination_Node_ID, Prefix, Metric, Status]`. ### Pillar 2: Wasm-Arrow Bridge * Wasm engine in the browser monitors the BGP Arrow column. * **Instant Interaction:** If an admin clicks `farm01`, Wasm calculates all active BGP/OSPF routes originating from that router in **<1ms**. ### Pillar 3: Deep Causality (The Brain) * Links **BMP PeerDown** events with **SNMP Interface** events. * **Example:** If `risotto` reports an ISP PeerDown and SNMP reports `eth0` is UP, the engine flags a **Provider Routing Failure** (Logic) rather than a **Cable Failure** (Physical). ### Pillar 4: Animated Particle Shaders * **Atmosphere Layer:** 3D arcs represent the BGP/OSPF paths. * **Visual Logic:** * **Standard:** Cyan/Blue particles flowing along arcs. * **Route Withdrawal:** The arc turns **Red and Dissipates** (ghosting). * **Flapping Route:** The arc **Vibrates** and turns Amber, signaling instability to the "Bored Admin." --- ## 5. Critical Use Cases ### 5.1 Calico Pod Egress Verification * **Signal:** `risotto` sees a Calico route withdrawal for Pod CIDR `10.42.50.0/24`. * **Action:** ServiceRadar "Ghosts" the K8s cluster node in the God-View and triggers an alert: *"Route to Pod Subnet Lost - Check Calico Node BGP Peer."* ### 5.2 Internal Router Path Shift (Farm -> Tonka) * **Signal:** BMP reports a Next-Hop change for internal traffic from `farm01` to `tonka01`. * **Visual:** The God-View reroutes the **Animated Particle Stream** from the primary link to the backup link in real-time. --- ## 6. Success Metrics 1. **Ingestion Latency:** BGP route changes on the UDM must appear in the God-View UI in **< 500ms**. 2. **State Accuracy:** The ServiceRadar routing table must match the FRR `show ip bgp` table with 100% parity. 3. **Visual Scale:** Handle 1,000 internal routes (Calico) and 5,000 redistributed OSPF/External routes without dropping frames on the 3D ArcLayer. --- ## 7. Implementation Steps for Today 1. **FRR Config:** Enable BMP targets on the UDM Pro Max. 2. **Redistribution:** Set `router ospf` to redistribute into `router bgp`. 3. **Rust Build:** Deploy `risotto` collector as a sidecar/container. 4. **Broadway Build:** Create the Broadway NATS consumer to update the `bgp_routing_state` hypertable in CNPG. 5. **Shaders:** Update `deck.gl` ArcLayer to subscribe to the BGP Arrow buffer.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#700
No description provided.