feat: NG visualizations / deepcausality integration #1028

Closed
opened 2026-03-28 04:30:57 +00:00 by mfreeman451 · 2 comments
Owner

Imported from GitHub.

Original GitHub issue: #2834
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2834
Original created: 2026-02-14T04:30:53Z


This is the Definitive Product Requirements Document (PRD) for the ServiceRadar "God-View" Topology Platform. This edition integrates the Hybrid Filter Strategy, ensuring a clean decoupling between the high-performance backend and the GPU-accelerated frontend.


PRD: ServiceRadar "God-View" Visualization Engine (Integrated Edition)

1. Vision & Executive Summary

To transform "Network Monitoring" into a Cyber-Physical Radar experience. ServiceRadar visualizes massive-scale global infrastructure (100k+ nodes) as a living, breathing organism. By combining Zero-Copy Data Streaming, GPU-Native Rendering, and Deep Causal Inference, we eliminate "Alert Fatigue" and provide an instant, visual "Blast Radius" for every incident.


2. The High-Performance Technical Stack (The "Three Pillars")

To achieve 60fps performance and sub-second data updates at a scale of 100k nodes/250k edges, we bypass the "JSON/REST Bottleneck" entirely.

Pillar 1: The Vehicle (Apache Arrow IPC)

  • Implementation: Elixir fetches data from PostgreSQL/AGE and passes pointers to a Rustler NIF (Rust). Rust layouts the graph data (coordinates, colors, status) into memory-mapped binary buffers. These buffers are streamed via Phoenix Channels and mapped directly into GPU memory using deck.gl.
  • Result: 90% reduction in payload size; 0ms CPU parsing time in the browser.

Pillar 2: The Filter (Hybrid Roaring Bitmaps)

  • Implementation: The backend maintains compressed bitsets for every attribute (e.g., is_cisco, is_critical).
  • Strategy: The server performs the logic (the "What") and sends a tiny Bitmask to the client. The GPU performs the rendering (the "How") by using the mask to toggle visibility or styles.

Pillar 3: The Brain (Deep Causality & Rustler)

  • Implementation: A multi-stage causal reasoning engine (via the deep_causality Rust crate) evaluates telemetry (SNMP, Flow, BGP, Security) to distinguish between a Root Cause and an Inferred Symptom.

3. The Data Pipeline: "Telemetry to Vision"

  1. Ingestion: Elixir (Phoenix) ingests high-velocity telemetry (Availability, Traffic, Security, Routing).
  2. Causal Fusion: The Rust NIF processes these signals. If a link is congested and Netflow shows a "Top Talker" with a "Malicious IP," it is flagged as Security-Induced Congestion.
  3. Encoding: Rust generates the Arrow buffer (spatial data) and the Selection Bitmaps (state metadata).
  4. Streaming: The binary blob is pushed to the frontend.
  5. Direct-Map: deck.gl (WebGL2/WebGPU) receives the buffer and updates the 100k nodes in a single draw call.

4. The Hybrid Filtering & Ghosting Engine (Architectural Core)

To maintain backend/frontend decoupling while ensuring 60fps performance, we utilize a Hybrid Filter Strategy. This ensures the backend remains GPU-agnostic while the frontend remains logic-light.

4.1 Separation of Concerns

  • Server-Side Logic (The Source of Truth):
    • The Deep Causality engine identifies the "Blast Radius."
    • It generates a Roaring Bitmap of Node IDs categorizing them as "Root Cause," "Affected," or "Healthy."
    • This bitmap is sent as a small binary metadata attachment in the Arrow stream.
    • Backend Benefit: Stays focused on graph math and causality without needing to know WebGL specifics.
  • Client-Side Rendering (The GPU Presentation):
    • deck.gl receives the bitmap and passes it to the GPU as a Vertex Attribute via the DataFilterExtension.
    • The GPU Shader performs the visual transformation:
      • Root Cause: Bright Red + Pulse Animation.
      • Affected: 50% Opacity + Hollow Icon.
      • Healthy/Unrelated: 10% Opacity ("Ghosted").
    • Frontend Benefit: Filtering happens in real-time (0ms) on the GPU. Users can toggle complex filters without the CPU "looping" through 100k elements.

4.2 The "Reshape" vs. "Visual" Logic

  • Visual-Only (GPU): Actions like "Hide Cisco devices" or "Highlight the Blast Radius" stay on the client. We only update the bitmask.
  • Structural Reshape (Backend): Actions that require a layout change (e.g., "Collapse all healthy sites into a single meta-node") require the Elixir/Rust backend to recalculate coordinates and push a new Arrow buffer.

5. Multi-Layer Visualization Architecture

We use a "Layered Projection" model to maintain clarity across physical and logical planes.

Layer 1: The Mantle (Physical Infrastructure)

  • Content: Cables, physical switches, routers, chassis.
  • Visuals: Low-saturation, thin charcoal lines (#2A2A2A). Fixed coordinates via background Force-Directed Layout.

Layer 2: The Crust (Logical Topology)

  • Content: BGP Peerings, OSPF Areas, SD-WAN Tunnels.
  • Visuals: ArcLayers (3D curved lines) hovering above the physical base.

Layer 3: The Atmosphere (Telemetry Flow)

  • Content: Real-time Netflow and rperf throughput.
  • Visuals: Animated Particle Shaders. Speed is inversely proportional to Latency; density is proportional to Throughput.

Layer 4: The Security & Causal Perimeter

  • Content: Active threats and Root Cause analysis.
  • Visuals: The "Radar Ripple" (Pulse) and the "Blast Radius" (Ghosting) driven by the Hybrid Filter bitmasks.

6. Advanced UI Features

6.1 Semantic Zoom & Fractal Navigation

  • Global View: Nodes cluster into "Site Orbs."
  • The "Explosion": Smooth fractal transition as users zoom from Site -> Rack -> Chassis -> Port.
  • Teleportation: Spatial search index allows "Flying" the camera to an IP's exact X/Y coordinate.

6.2 Radial Subnet Layouts

To prevent "Sprawl," Leaf/Access nodes are arranged in Compact Radial Clusters around Distribution switches, reducing the visual footprint of large subnets.


7. Aesthetic Specification ("Cyber-Punk Nocturne")

  • Background: Deep Charcoal / Obsidian (#0A0A0A).
  • Healthy State: Neon Cyan / Electric Blue.
  • Congestion: Amber / Electric Orange.
  • BGP/Logical Layer: Vibrant Purple / Magenta.
  • Critical / Root Cause: High-Intensity Pulsing Red.
  • Icons: SDF (Signed Distance Field) icons for crisp scaling at any zoom level.

8. Success Metrics

  1. Scale: Support 100,000 nodes and 250,000 edges without browser lag.
  2. Visual Transition: Switching between "Global View" and "Causal Blast Radius View" (Ghosting) must occur in < 16ms (instantaneous).
  3. Load Time: < 3 seconds from authentication to "God-View."
  4. Framerate: Steady 60fps during pan/zoom on standard GPU hardware.
  5. Network Efficiency: 90% reduction in data overhead compared to JSON-based competitors.

9. Critical Use Case: The "Security-Exfiltration" Incident

  • Signal: A 100Gbps Backbone Link shows 98% utilization.
  • Engine: deep_causality identifies a Server IP as the source, notes an unauthorized Falco process, and matches the destination to a known C2 server.
  • Visual Outcome:
    1. The Backbone Link turns Pulsing Amber.
    2. The Backend pushes a Causal Bitmask to the client.
    3. The GPU instantly Ghosts the rest of the network, draws a Magenta Causal Line to the Server, and initiates a Red Radar Pulse on the source node.
    4. The Operator clicks the line to see the specific PID responsible.
Imported from GitHub. Original GitHub issue: #2834 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2834 Original created: 2026-02-14T04:30:53Z --- This is the **Definitive Product Requirements Document (PRD)** for the **ServiceRadar "God-View" Topology Platform**. This edition integrates the **Hybrid Filter Strategy**, ensuring a clean decoupling between the high-performance backend and the GPU-accelerated frontend. --- # PRD: ServiceRadar "God-View" Visualization Engine (Integrated Edition) ## 1. Vision & Executive Summary To transform "Network Monitoring" into a **Cyber-Physical Radar** experience. ServiceRadar visualizes massive-scale global infrastructure (100k+ nodes) as a living, breathing organism. By combining **Zero-Copy Data Streaming**, **GPU-Native Rendering**, and **Deep Causal Inference**, we eliminate "Alert Fatigue" and provide an instant, visual "Blast Radius" for every incident. --- ## 2. The High-Performance Technical Stack (The "Three Pillars") To achieve 60fps performance and sub-second data updates at a scale of 100k nodes/250k edges, we bypass the "JSON/REST Bottleneck" entirely. ### Pillar 1: The Vehicle (Apache Arrow IPC) * **Implementation:** Elixir fetches data from PostgreSQL/AGE and passes pointers to a **Rustler NIF (Rust)**. Rust layouts the graph data (coordinates, colors, status) into memory-mapped binary buffers. These buffers are streamed via Phoenix Channels and mapped **directly into GPU memory** using `deck.gl`. * **Result:** 90% reduction in payload size; 0ms CPU parsing time in the browser. ### Pillar 2: The Filter (Hybrid Roaring Bitmaps) * **Implementation:** The backend maintains compressed bitsets for every attribute (e.g., `is_cisco`, `is_critical`). * **Strategy:** The server performs the logic (the "What") and sends a tiny **Bitmask** to the client. The GPU performs the rendering (the "How") by using the mask to toggle visibility or styles. ### Pillar 3: The Brain (Deep Causality & Rustler) * **Implementation:** A multi-stage causal reasoning engine (via the `deep_causality` Rust crate) evaluates telemetry (SNMP, Flow, BGP, Security) to distinguish between a **Root Cause** and an **Inferred Symptom**. --- ## 3. The Data Pipeline: "Telemetry to Vision" 1. **Ingestion:** Elixir (Phoenix) ingests high-velocity telemetry (Availability, Traffic, Security, Routing). 2. **Causal Fusion:** The Rust NIF processes these signals. If a link is congested *and* Netflow shows a "Top Talker" with a "Malicious IP," it is flagged as **Security-Induced Congestion**. 3. **Encoding:** Rust generates the **Arrow buffer** (spatial data) and the **Selection Bitmaps** (state metadata). 4. **Streaming:** The binary blob is pushed to the frontend. 5. **Direct-Map:** `deck.gl` (WebGL2/WebGPU) receives the buffer and updates the 100k nodes in a single draw call. --- ## 4. The Hybrid Filtering & Ghosting Engine (Architectural Core) To maintain backend/frontend decoupling while ensuring 60fps performance, we utilize a **Hybrid Filter Strategy**. This ensures the backend remains GPU-agnostic while the frontend remains logic-light. ### 4.1 Separation of Concerns * **Server-Side Logic (The Source of Truth):** * The **Deep Causality** engine identifies the "Blast Radius." * It generates a **Roaring Bitmap** of Node IDs categorizing them as "Root Cause," "Affected," or "Healthy." * This bitmap is sent as a small binary metadata attachment in the Arrow stream. * *Backend Benefit:* Stays focused on graph math and causality without needing to know WebGL specifics. * **Client-Side Rendering (The GPU Presentation):** * `deck.gl` receives the bitmap and passes it to the GPU as a **Vertex Attribute** via the `DataFilterExtension`. * The **GPU Shader** performs the visual transformation: * **Root Cause:** Bright Red + Pulse Animation. * **Affected:** 50% Opacity + Hollow Icon. * **Healthy/Unrelated:** 10% Opacity ("Ghosted"). * *Frontend Benefit:* Filtering happens in **real-time (0ms)** on the GPU. Users can toggle complex filters without the CPU "looping" through 100k elements. ### 4.2 The "Reshape" vs. "Visual" Logic * **Visual-Only (GPU):** Actions like "Hide Cisco devices" or "Highlight the Blast Radius" stay on the client. We only update the bitmask. * **Structural Reshape (Backend):** Actions that require a layout change (e.g., "Collapse all healthy sites into a single meta-node") require the Elixir/Rust backend to recalculate coordinates and push a new Arrow buffer. --- ## 5. Multi-Layer Visualization Architecture We use a "Layered Projection" model to maintain clarity across physical and logical planes. ### Layer 1: The Mantle (Physical Infrastructure) * **Content:** Cables, physical switches, routers, chassis. * **Visuals:** Low-saturation, thin charcoal lines (#2A2A2A). Fixed coordinates via background **Force-Directed Layout**. ### Layer 2: The Crust (Logical Topology) * **Content:** BGP Peerings, OSPF Areas, SD-WAN Tunnels. * **Visuals:** **ArcLayers** (3D curved lines) hovering above the physical base. ### Layer 3: The Atmosphere (Telemetry Flow) * **Content:** Real-time Netflow and `rperf` throughput. * **Visuals:** **Animated Particle Shaders**. Speed is inversely proportional to Latency; density is proportional to Throughput. ### Layer 4: The Security & Causal Perimeter * **Content:** Active threats and Root Cause analysis. * **Visuals:** The "Radar Ripple" (Pulse) and the "Blast Radius" (Ghosting) driven by the Hybrid Filter bitmasks. --- ## 6. Advanced UI Features ### 6.1 Semantic Zoom & Fractal Navigation * **Global View:** Nodes cluster into **"Site Orbs."** * **The "Explosion":** Smooth fractal transition as users zoom from Site -> Rack -> Chassis -> Port. * **Teleportation:** Spatial search index allows "Flying" the camera to an IP's exact X/Y coordinate. ### 6.2 Radial Subnet Layouts To prevent "Sprawl," Leaf/Access nodes are arranged in **Compact Radial Clusters** around Distribution switches, reducing the visual footprint of large subnets. --- ## 7. Aesthetic Specification ("Cyber-Punk Nocturne") * **Background:** Deep Charcoal / Obsidian (#0A0A0A). * **Healthy State:** Neon Cyan / Electric Blue. * **Congestion:** Amber / Electric Orange. * **BGP/Logical Layer:** Vibrant Purple / Magenta. * **Critical / Root Cause:** High-Intensity Pulsing Red. * **Icons:** SDF (Signed Distance Field) icons for crisp scaling at any zoom level. --- ## 8. Success Metrics 1. **Scale:** Support 100,000 nodes and 250,000 edges without browser lag. 2. **Visual Transition:** Switching between "Global View" and "Causal Blast Radius View" (Ghosting) must occur in **< 16ms** (instantaneous). 3. **Load Time:** < 3 seconds from authentication to "God-View." 4. **Framerate:** Steady 60fps during pan/zoom on standard GPU hardware. 5. **Network Efficiency:** 90% reduction in data overhead compared to JSON-based competitors. --- ## 9. Critical Use Case: The "Security-Exfiltration" Incident * **Signal:** A 100Gbps Backbone Link shows 98% utilization. * **Engine:** `deep_causality` identifies a Server IP as the source, notes an unauthorized Falco process, and matches the destination to a known C2 server. * **Visual Outcome:** 1. The Backbone Link turns **Pulsing Amber**. 2. The Backend pushes a **Causal Bitmask** to the client. 3. The GPU instantly **Ghosts** the rest of the network, draws a **Magenta Causal Line** to the Server, and initiates a **Red Radar Pulse** on the source node. 4. The Operator clicks the line to see the specific PID responsible.
Author
Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2834#issuecomment-3901207836
Original created: 2026-02-14T06:10:42Z


v2:

This updated Definitive Product Requirements Document (PRD) integrates the Wasm-Arrow Bridge into the ServiceRadar "God-View" architecture. This addition elevates the platform from a high-performance web app to a "computationally elite" visualization engine, eliminating the "JavaScript Tax" to ensure a locked 60fps at 100k+ nodes.


PRD: ServiceRadar "God-View" Visualization Engine (Wasm-Arrow Edition)

1. Vision & Executive Summary

To transform "Network Monitoring" into a Cyber-Physical Radar experience. ServiceRadar visualizes massive-scale global infrastructure (100k+ nodes) as a living, breathing organism. By combining Zero-Copy Data Streaming, Wasm-Native Logic, and GPU-Accelerated Rendering, we eliminate "Alert Fatigue" and provide an instant, visual "Blast Radius" for every incident.


2. The High-Performance Technical Stack (The "Four Pillars")

To achieve 60fps performance and sub-second data updates at a scale of 100k nodes/250k edges, we bypass the "JSON/REST Bottleneck" and JavaScript Garbage Collection stutters entirely.

Pillar 1: The Vehicle (Apache Arrow IPC)

  • Implementation: Elixir fetches data from PostgreSQL/AGE and passes pointers to a Rustler NIF (Rust). Rust layouts the graph data (coordinates, colors, status) into memory-mapped binary buffers.
  • Result: 90% reduction in payload size; zero-copy serialization.

Pillar 2: The Engine (Wasm-Arrow Bridge)

  • Implementation: Arrow buffers are streamed via Phoenix Channels directly into WebAssembly (Wasm) Linear Memory.
  • Function: All client-side "logic" (filtering, 3-hop neighbor traversal, coordinate interpolation) happens in Wasm. JavaScript never "touches" individual nodes, preventing Garbage Collection (GC) stutters.
  • Result: A "Game Engine" architecture where the UI remains fluid even during heavy data ingestion.

Pillar 3: The Filter (Hybrid Roaring Bitmaps)

  • Implementation: The backend and Wasm engine maintain compressed bitsets for every attribute.
  • Strategy: The server sends tiny Bitmasks. The GPU performs the rendering (the "How") by using these masks as vertex attributes to toggle visibility or styles in 0ms.

Pillar 4: The Brain (Deep Causality & Rustler)

  • Implementation: A multi-stage causal reasoning engine (via the deep_causality Rust crate) evaluates telemetry (SNMP, Flow, BGP, Security) to distinguish between a Root Cause and an Inferred Symptom.

3. The Data Pipeline: "Telemetry to Vision"

  1. Ingestion: Elixir (Phoenix) ingests high-velocity telemetry signals.
  2. Causal Fusion: The Rust NIF processes signals to identify the "Blast Radius."
  3. Encoding: Rust generates the Arrow buffer (spatial data) and Selection Bitmaps.
  4. Streaming: The binary blob is pushed to the frontend.
  5. Direct-Map: The browser moves the buffer into Wasm Memory. deck.gl (via GeoArrow patterns) reads coordinates directly from the Wasm heap to update 100k nodes in a single draw call.

4. The Hybrid Filtering & Ghosting Engine

4.1 Separation of Concerns

  • Server-Side Logic: Identifies the "Blast Radius" and generates the source-of-truth Roaring Bitmaps.
  • Wasm Logic (Local Interaction): Handles instant user queries (e.g., "Show me all Cisco devices with >50ms latency"). Wasm performs a Columnar Scan on the local Arrow buffer to update the visibility mask without hitting the backend.
  • GPU Rendering: Receives masks from Wasm and applies visual shaders:
    • Root Cause: Bright Red + Pulse Animation.
    • Affected: 50% Opacity + Hollow Icon.
    • Healthy/Unrelated: 10% Opacity ("Ghosted").

4.2 The "3-Hop" Rule (Local Traversal)

When a user clicks a node, the Wasm Engine traverses the graph adjacency list in memory. It identifies neighbors within N hops and updates the "Ghosting Mask" in < 1ms, providing instantaneous visual isolation.


5. Multi-Layer Visualization Architecture

Layer 1: The Mantle (Physical Infrastructure)

  • Content: Cables, physical switches, routers. Charcoal lines (#2A2A2A).

Layer 2: The Crust (Logical Topology)

  • Content: BGP Peerings, OSPF Areas, SD-WAN Tunnels. ArcLayers (3D curved lines).

Layer 3: The Atmosphere (Telemetry Flow)

  • Content: Netflow/rperf throughput. Animated Particle Shaders calculated in Wasm for 100k+ particles.

Layer 4: The Security & Causal Perimeter

  • Content: Active threats. Driven by the "Radar Ripple" pulse.

6. Advanced UI Features

6.1 Semantic Zoom & Wasm Interpolation

  • The "Explosion": As users zoom from Site -> Rack, Wasm calculates the intermediate X/Y coordinates for 100k nodes in real-time. This ensures "Liquid Motion" transitions that are impossible in standard JavaScript.

6.2 Radial Subnet Layouts

To prevent "Sprawl," Leaf/Access nodes are arranged in Compact Radial Clusters. Wasm handles the radial coordinate math locally to keep the layout snappy.


7. Aesthetic Specification ("Cyber-Punk Nocturne")

  • Background: Deep Charcoal / Obsidian (#0A0A0A).
  • Healthy State: Neon Cyan / Electric Blue.
  • Congestion: Amber / Electric Orange.
  • Critical / Root Cause: High-Intensity Pulsing Red.
  • Icons: SDF (Signed Distance Field) icons for crisp scaling.

8. Success Metrics

  1. Scale: Support 100,000 nodes and 250,000 edges at 60fps.
  2. Zero-Stutter: 0 instances of JavaScript Garbage Collection pauses > 5ms during interaction.
  3. Local Query Latency: "3-Hop" traversal and attribute filtering (Columnar Scan) must complete in < 2ms.
  4. Visual Transition: Switching between "Global" and "Causal" views must occur in < 16ms.
  5. Network Efficiency: 90% reduction in data overhead vs JSON.

9. Critical Use Case: The "Security-Exfiltration" Incident

  • Signal: A 100Gbps Backbone Link shows 98% utilization.
  • Engine: deep_causality identifies a Server IP as the source and a malicious destination.
  • Visual Outcome:
    1. The Backbone Link turns Pulsing Amber.
    2. The Backend pushes a Causal Bitmask to the client.
    3. Wasm receives the bitmask and updates the GPU vertex attributes.
    4. The GPU instantly Ghosts the rest of the network, draws a Magenta Causal Line, and initiates a Red Radar Pulse.
    5. The Operator clicks the node; Wasm instantly highlights all 1st-degree neighbors to show the server's immediate blast radius.
Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2834#issuecomment-3901207836 Original created: 2026-02-14T06:10:42Z --- v2: This updated **Definitive Product Requirements Document (PRD)** integrates the **Wasm-Arrow Bridge** into the **ServiceRadar "God-View"** architecture. This addition elevates the platform from a high-performance web app to a "computationally elite" visualization engine, eliminating the "JavaScript Tax" to ensure a locked 60fps at 100k+ nodes. --- # PRD: ServiceRadar "God-View" Visualization Engine (Wasm-Arrow Edition) ## 1. Vision & Executive Summary To transform "Network Monitoring" into a **Cyber-Physical Radar** experience. ServiceRadar visualizes massive-scale global infrastructure (100k+ nodes) as a living, breathing organism. By combining **Zero-Copy Data Streaming**, **Wasm-Native Logic**, and **GPU-Accelerated Rendering**, we eliminate "Alert Fatigue" and provide an instant, visual "Blast Radius" for every incident. --- ## 2. The High-Performance Technical Stack (The "Four Pillars") To achieve 60fps performance and sub-second data updates at a scale of 100k nodes/250k edges, we bypass the "JSON/REST Bottleneck" and JavaScript Garbage Collection stutters entirely. ### Pillar 1: The Vehicle (Apache Arrow IPC) * **Implementation:** Elixir fetches data from PostgreSQL/AGE and passes pointers to a **Rustler NIF (Rust)**. Rust layouts the graph data (coordinates, colors, status) into memory-mapped binary buffers. * **Result:** 90% reduction in payload size; zero-copy serialization. ### Pillar 2: The Engine (Wasm-Arrow Bridge) * **Implementation:** Arrow buffers are streamed via Phoenix Channels directly into **WebAssembly (Wasm) Linear Memory**. * **Function:** All client-side "logic" (filtering, 3-hop neighbor traversal, coordinate interpolation) happens in Wasm. JavaScript never "touches" individual nodes, preventing Garbage Collection (GC) stutters. * **Result:** A "Game Engine" architecture where the UI remains fluid even during heavy data ingestion. ### Pillar 3: The Filter (Hybrid Roaring Bitmaps) * **Implementation:** The backend and Wasm engine maintain compressed bitsets for every attribute. * **Strategy:** The server sends tiny **Bitmasks**. The GPU performs the rendering (the "How") by using these masks as vertex attributes to toggle visibility or styles in 0ms. ### Pillar 4: The Brain (Deep Causality & Rustler) * **Implementation:** A multi-stage causal reasoning engine (via the `deep_causality` Rust crate) evaluates telemetry (SNMP, Flow, BGP, Security) to distinguish between a **Root Cause** and an **Inferred Symptom**. --- ## 3. The Data Pipeline: "Telemetry to Vision" 1. **Ingestion:** Elixir (Phoenix) ingests high-velocity telemetry signals. 2. **Causal Fusion:** The Rust NIF processes signals to identify the "Blast Radius." 3. **Encoding:** Rust generates the **Arrow buffer** (spatial data) and **Selection Bitmaps**. 4. **Streaming:** The binary blob is pushed to the frontend. 5. **Direct-Map:** The browser moves the buffer into **Wasm Memory**. `deck.gl` (via GeoArrow patterns) reads coordinates directly from the Wasm heap to update 100k nodes in a single draw call. --- ## 4. The Hybrid Filtering & Ghosting Engine ### 4.1 Separation of Concerns * **Server-Side Logic:** Identifies the "Blast Radius" and generates the source-of-truth Roaring Bitmaps. * **Wasm Logic (Local Interaction):** Handles instant user queries (e.g., "Show me all Cisco devices with >50ms latency"). Wasm performs a **Columnar Scan** on the local Arrow buffer to update the visibility mask without hitting the backend. * **GPU Rendering:** Receives masks from Wasm and applies visual shaders: * **Root Cause:** Bright Red + Pulse Animation. * **Affected:** 50% Opacity + Hollow Icon. * **Healthy/Unrelated:** 10% Opacity ("Ghosted"). ### 4.2 The "3-Hop" Rule (Local Traversal) When a user clicks a node, the **Wasm Engine** traverses the graph adjacency list in memory. It identifies neighbors within $N$ hops and updates the "Ghosting Mask" in **< 1ms**, providing instantaneous visual isolation. --- ## 5. Multi-Layer Visualization Architecture ### Layer 1: The Mantle (Physical Infrastructure) * **Content:** Cables, physical switches, routers. Charcoal lines (#2A2A2A). ### Layer 2: The Crust (Logical Topology) * **Content:** BGP Peerings, OSPF Areas, SD-WAN Tunnels. **ArcLayers** (3D curved lines). ### Layer 3: The Atmosphere (Telemetry Flow) * **Content:** Netflow/`rperf` throughput. **Animated Particle Shaders** calculated in Wasm for 100k+ particles. ### Layer 4: The Security & Causal Perimeter * **Content:** Active threats. Driven by the "Radar Ripple" pulse. --- ## 6. Advanced UI Features ### 6.1 Semantic Zoom & Wasm Interpolation * **The "Explosion":** As users zoom from Site -> Rack, Wasm calculates the intermediate X/Y coordinates for 100k nodes in real-time. This ensures "Liquid Motion" transitions that are impossible in standard JavaScript. ### 6.2 Radial Subnet Layouts To prevent "Sprawl," Leaf/Access nodes are arranged in **Compact Radial Clusters**. Wasm handles the radial coordinate math locally to keep the layout snappy. --- ## 7. Aesthetic Specification ("Cyber-Punk Nocturne") * **Background:** Deep Charcoal / Obsidian (#0A0A0A). * **Healthy State:** Neon Cyan / Electric Blue. * **Congestion:** Amber / Electric Orange. * **Critical / Root Cause:** High-Intensity Pulsing Red. * **Icons:** SDF (Signed Distance Field) icons for crisp scaling. --- ## 8. Success Metrics 1. **Scale:** Support 100,000 nodes and 250,000 edges at 60fps. 2. **Zero-Stutter:** 0 instances of JavaScript Garbage Collection pauses > 5ms during interaction. 3. **Local Query Latency:** "3-Hop" traversal and attribute filtering (Columnar Scan) must complete in **< 2ms**. 4. **Visual Transition:** Switching between "Global" and "Causal" views must occur in **< 16ms**. 5. **Network Efficiency:** 90% reduction in data overhead vs JSON. --- ## 9. Critical Use Case: The "Security-Exfiltration" Incident * **Signal:** A 100Gbps Backbone Link shows 98% utilization. * **Engine:** `deep_causality` identifies a Server IP as the source and a malicious destination. * **Visual Outcome:** 1. The Backbone Link turns **Pulsing Amber**. 2. The Backend pushes a **Causal Bitmask** to the client. 3. **Wasm** receives the bitmask and updates the GPU vertex attributes. 4. The GPU instantly **Ghosts** the rest of the network, draws a **Magenta Causal Line**, and initiates a **Red Radar Pulse**. 5. The Operator clicks the node; **Wasm** instantly highlights all 1st-degree neighbors to show the server's immediate blast radius.
Author
Owner

Imported GitHub comment.

Original author: @marvin-hansen
Original URL: https://github.com/carverauto/serviceradar/issues/2834#issuecomment-3904074197
Original created: 2026-02-15T09:57:17Z


Okay, I took some time to write up my thoughts on the DC integration. No AI, just my humble brain dump;

Service Radar DeepCausality integration

Big idea:

Constructing and updating a context hyper graph in real-time as the various devices in the network are discovered.

DeepCausality enables multi contextual reasoning across arbitrarily complex hypergraphs. However, because the model abstraction that wraps a causal model and its context defines context as a Arc<RwLock>>, it’s also possible to share a global context across different models because Arc is clone and the RwLock (fine grained Mutex) ensures read / write protection. Thus one can experiment with various causal models reasoning over the same shared global network context graph.

In practice, it is advisable to build and update the graph in tandem with the persistence e.g. the database upset operation to ensure data synchronization.

Why?

A handful of use cases become trivial to solve with the context graph:

Network diagnostic and reliability detection

A) Detecting mission critical choke points.

Problem:
In large networks, it’s rarely fully known where all the oblivious bottlenecks are buried. However, if just one of those highly centralized routers or gateways were to fail, the bulk of the network would instantly be disconnected.

Solution: This is actually trivial because one only need to create a deep copy (clone) of the current network context graph, freeze it, and run the betweenness_centrality() algo, sort the results by concentration score, and highlight the top N nodes. The betweenness_centrality measures the relative path through connectivity meaning a high score implies the highest number of network paths go through this node and therefore its implicitly mission critical. In the UI it’s advisable to set N to a sensible default e.g. N=5 to identify the top 5 mission critical network nodes. However, the user should also be able to set N to a custom value.

Value:
If just one single unmitigated choke point was upgraded to HA failover, a complete network takedown could have been prevented.
The most important incident is always the one that never occurred because of effective mitigation.

B) Identifying over-centralized nodes

Problem:
As networks grow large, its possible that certain services become over-centralized and because of it a structural risk:

  • Enabling trivial Denial of Service by taking out one central service e.g. DNS / DHCP
  • Enabling central malware distribution if not adequately secured

Solution:
Trivial, just clone the graph, freeze and run

strongly_connected_components()

Which returns all nodes where each set represents the strongly connected components. This would be central routers, DNS servers and core network services. Once these are identified, a network security audit can being with mitigation.

Value:
If just one of those core services is made redundant through proper HA failover, another potential network takedown as been mitigated before it could happened.

C) Testing network pathways

Problem:
As network grow large debugging connectivity issues becomes increasingly complex. Also, for security reasons some network nodes should not be reachable from some network segments.

Solution:
Trivial, just clone the graph, freeze and run

is_reachable(start_index, stop_index)

This shows instantly if a node is reachable from the target node. This verifies that security policies have been enforced correctly or, equally valid, answers the question why a certain service is not reachable.

Value:

  • In case of connectivity issues, this gives an instant answer if there even is a route from the source to the target.
  • In validates whether network security policies have been correctly implemented and thus support (external) security audits.

Network security and intrusion detection

Problem:

Advanced Persistent Threats (APT) pose a significant challenge because adversaries spread out network infiltration over time and camouflage their activities as regular traffic that would normally remain undetected.

Here, larger network size becomes a major complication because it’s impractical to deploy an in-depth IDS on each single device mainly because heterogeneous platforms and systems of all connected devices.

Note: The WiFi scan and monitor capabilities would massively help here to capture the network 360 degrees by keeping an eye on all wired and wireless devices. That way, one can block wireless devices the moment they try to do anything stupid long before anything else breaks deep down in the network.

Approach:

Because the network graph represents all discovered network devices and captures traffic between devices, one can deploy multiple causal models for watching out for multiple anomalies in the network hyper graph.

For once, one can deploy certain rules e.g. workstations of network segment X are only allowed to connect to printers and SMB shares of the same segment, but not to a certain number of other core services. If that rule were to be violated, an alert and/or silent mitigation could be triggered.

Then one can instantly detect and capture the blast radius of a compromised machine granted one has detected an anomaly by simply querying for all edges of that node. How “dangerous” a compromised node is can be determined by testing if there is any pathway from the compromised node to a number of mission critical nodes. The more pathways exists, the higher the danger and the swifter the counter measure.

A central challenge is anomaly detection itself because, as stated before, APT tends to camouflage as regular network traffic. Meaning, a compromised SMB server would try to send out some kind of SMB traffic to adjacent SMB servers to obtain access to other file servers. One key distinction between “norma” and “anomaly” network is in the details of the handshake, or network header. For example, there were historic CVE’s where SMB was compromised by a buffer overflow caused by a an oversized network header. Likewise, a classic Denial of Service usually aborts TCP handshakes in an attempt to exhaust the hosts open connection limit.

Therefore, the causal rules can only be effective when combined with deep packet inspection that scans each protocol for standard conformance. This one is easier to implement because it’s relatively hard to trigger a buffer overflow on the receiving host when network packages with non-standard headers simply never arrive.

Disallowing certain host to connect to certain network nodes should, in theory, be handled on an internal router, but in practice its useful to have the functionality in place for those networks that don’t secure internal routes.

Preliminary solution:

Priority 1:

  • Implement wifi network and wifi device discovery to enable 360 visibly across all network types.

  • Implement real-time network graph for both, wired and wireless networking

Priority 2:

  • Implement the trivial use cases for network diagnostic and reliability detection. Even though these are fairly basic, the value is obvious enough to justify the context graph.

Priority 3:

Once 360 wired and wireless visibly is in place the and the network hypergraph runs stable in the background, its time to design an end-to-end advanced APT detection system.

Imported GitHub comment. Original author: @marvin-hansen Original URL: https://github.com/carverauto/serviceradar/issues/2834#issuecomment-3904074197 Original created: 2026-02-15T09:57:17Z --- Okay, I took some time to write up my thoughts on the DC integration. No AI, just my humble brain dump; ## Service Radar DeepCausality integration ## Big idea: Constructing and updating a context hyper graph in real-time as the various devices in the network are discovered. DeepCausality enables multi contextual reasoning across arbitrarily complex hypergraphs. However, because the model abstraction that wraps a causal model and its context defines context as a Arc<RwLock<CTX>>>, it’s also possible to share a global context across different models because Arc is clone and the RwLock (fine grained Mutex) ensures read / write protection. Thus one can experiment with various causal models reasoning over the same shared global network context graph. In practice, it is advisable to build and update the graph in tandem with the persistence e.g. the database upset operation to ensure data synchronization. Why? A handful of use cases become trivial to solve with the context graph: ## Network diagnostic and reliability detection ### A) Detecting mission critical choke points. Problem: In large networks, it’s rarely fully known where all the oblivious bottlenecks are buried. However, if just one of those highly centralized routers or gateways were to fail, the bulk of the network would instantly be disconnected. Solution: This is actually trivial because one only need to create a deep copy (clone) of the current network context graph, freeze it, and run the betweenness_centrality() algo, sort the results by concentration score, and highlight the top N nodes. The betweenness_centrality measures the relative path through connectivity meaning a high score implies the highest number of network paths go through this node and therefore its implicitly mission critical. In the UI it’s advisable to set N to a sensible default e.g. N=5 to identify the top 5 mission critical network nodes. However, the user should also be able to set N to a custom value. Value: If just one single unmitigated choke point was upgraded to HA failover, a complete network takedown could have been prevented. The most important incident is always the one that never occurred because of effective mitigation. ### B) Identifying over-centralized nodes Problem: As networks grow large, its possible that certain services become over-centralized and because of it a structural risk: * Enabling trivial Denial of Service by taking out one central service e.g. DNS / DHCP * Enabling central malware distribution if not adequately secured Solution: Trivial, just clone the graph, freeze and run strongly_connected_components() Which returns all nodes where each set represents the strongly connected components. This would be central routers, DNS servers and core network services. Once these are identified, a network security audit can being with mitigation. Value: If just one of those core services is made redundant through proper HA failover, another potential network takedown as been mitigated before it could happened. ### C) Testing network pathways Problem: As network grow large debugging connectivity issues becomes increasingly complex. Also, for security reasons some network nodes should not be reachable from some network segments. Solution: Trivial, just clone the graph, freeze and run is_reachable(start_index, stop_index) This shows instantly if a node is reachable from the target node. This verifies that security policies have been enforced correctly or, equally valid, answers the question why a certain service is not reachable. Value: * In case of connectivity issues, this gives an instant answer if there even is a route from the source to the target. * In validates whether network security policies have been correctly implemented and thus support (external) security audits. ## Network security and intrusion detection Problem: Advanced Persistent Threats (APT) pose a significant challenge because adversaries spread out network infiltration over time and camouflage their activities as regular traffic that would normally remain undetected. Here, larger network size becomes a major complication because it’s impractical to deploy an in-depth IDS on each single device mainly because heterogeneous platforms and systems of all connected devices. Note: The WiFi scan and monitor capabilities would massively help here to capture the network 360 degrees by keeping an eye on all wired and wireless devices. That way, one can block wireless devices the moment they try to do anything stupid long before anything else breaks deep down in the network. Approach: Because the network graph represents all discovered network devices and captures traffic between devices, one can deploy multiple causal models for watching out for multiple anomalies in the network hyper graph. For once, one can deploy certain rules e.g. workstations of network segment X are only allowed to connect to printers and SMB shares of the same segment, but not to a certain number of other core services. If that rule were to be violated, an alert and/or silent mitigation could be triggered. Then one can instantly detect and capture the blast radius of a compromised machine granted one has detected an anomaly by simply querying for all edges of that node. How “dangerous” a compromised node is can be determined by testing if there is any pathway from the compromised node to a number of mission critical nodes. The more pathways exists, the higher the danger and the swifter the counter measure. A central challenge is anomaly detection itself because, as stated before, APT tends to camouflage as regular network traffic. Meaning, a compromised SMB server would try to send out some kind of SMB traffic to adjacent SMB servers to obtain access to other file servers. One key distinction between “norma” and “anomaly” network is in the details of the handshake, or network header. For example, there were historic CVE’s where SMB was compromised by a buffer overflow caused by a an oversized network header. Likewise, a classic Denial of Service usually aborts TCP handshakes in an attempt to exhaust the hosts open connection limit. Therefore, the causal rules can only be effective when combined with deep packet inspection that scans each protocol for standard conformance. This one is easier to implement because it’s relatively hard to trigger a buffer overflow on the receiving host when network packages with non-standard headers simply never arrive. Disallowing certain host to connect to certain network nodes should, in theory, be handled on an internal router, but in practice its useful to have the functionality in place for those networks that don’t secure internal routes. Preliminary solution: Priority 1: * Implement wifi network and wifi device discovery to enable 360 visibly across all network types. * Implement real-time network graph for both, wired and wireless networking Priority 2: * Implement the trivial use cases for network diagnostic and reliability detection. Even though these are fairly basic, the value is obvious enough to justify the context graph. Priority 3: Once 360 wired and wireless visibly is in place the and the network hypergraph runs stable in the background, its time to design an end-to-end advanced APT detection system.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar#1028
No description provided.