feat: NG visualizations / deepcausality integration #1028
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar#1028
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub.
Original GitHub issue: #2834
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2834
Original created: 2026-02-14T04:30:53Z
This is the Definitive Product Requirements Document (PRD) for the ServiceRadar "God-View" Topology Platform. This edition integrates the Hybrid Filter Strategy, ensuring a clean decoupling between the high-performance backend and the GPU-accelerated frontend.
PRD: ServiceRadar "God-View" Visualization Engine (Integrated Edition)
1. Vision & Executive Summary
To transform "Network Monitoring" into a Cyber-Physical Radar experience. ServiceRadar visualizes massive-scale global infrastructure (100k+ nodes) as a living, breathing organism. By combining Zero-Copy Data Streaming, GPU-Native Rendering, and Deep Causal Inference, we eliminate "Alert Fatigue" and provide an instant, visual "Blast Radius" for every incident.
2. The High-Performance Technical Stack (The "Three Pillars")
To achieve 60fps performance and sub-second data updates at a scale of 100k nodes/250k edges, we bypass the "JSON/REST Bottleneck" entirely.
Pillar 1: The Vehicle (Apache Arrow IPC)
deck.gl.Pillar 2: The Filter (Hybrid Roaring Bitmaps)
is_cisco,is_critical).Pillar 3: The Brain (Deep Causality & Rustler)
deep_causalityRust crate) evaluates telemetry (SNMP, Flow, BGP, Security) to distinguish between a Root Cause and an Inferred Symptom.3. The Data Pipeline: "Telemetry to Vision"
deck.gl(WebGL2/WebGPU) receives the buffer and updates the 100k nodes in a single draw call.4. The Hybrid Filtering & Ghosting Engine (Architectural Core)
To maintain backend/frontend decoupling while ensuring 60fps performance, we utilize a Hybrid Filter Strategy. This ensures the backend remains GPU-agnostic while the frontend remains logic-light.
4.1 Separation of Concerns
deck.glreceives the bitmap and passes it to the GPU as a Vertex Attribute via theDataFilterExtension.4.2 The "Reshape" vs. "Visual" Logic
5. Multi-Layer Visualization Architecture
We use a "Layered Projection" model to maintain clarity across physical and logical planes.
Layer 1: The Mantle (Physical Infrastructure)
Layer 2: The Crust (Logical Topology)
Layer 3: The Atmosphere (Telemetry Flow)
rperfthroughput.Layer 4: The Security & Causal Perimeter
6. Advanced UI Features
6.1 Semantic Zoom & Fractal Navigation
6.2 Radial Subnet Layouts
To prevent "Sprawl," Leaf/Access nodes are arranged in Compact Radial Clusters around Distribution switches, reducing the visual footprint of large subnets.
7. Aesthetic Specification ("Cyber-Punk Nocturne")
8. Success Metrics
9. Critical Use Case: The "Security-Exfiltration" Incident
deep_causalityidentifies a Server IP as the source, notes an unauthorized Falco process, and matches the destination to a known C2 server.Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2834#issuecomment-3901207836
Original created: 2026-02-14T06:10:42Z
v2:
This updated Definitive Product Requirements Document (PRD) integrates the Wasm-Arrow Bridge into the ServiceRadar "God-View" architecture. This addition elevates the platform from a high-performance web app to a "computationally elite" visualization engine, eliminating the "JavaScript Tax" to ensure a locked 60fps at 100k+ nodes.
PRD: ServiceRadar "God-View" Visualization Engine (Wasm-Arrow Edition)
1. Vision & Executive Summary
To transform "Network Monitoring" into a Cyber-Physical Radar experience. ServiceRadar visualizes massive-scale global infrastructure (100k+ nodes) as a living, breathing organism. By combining Zero-Copy Data Streaming, Wasm-Native Logic, and GPU-Accelerated Rendering, we eliminate "Alert Fatigue" and provide an instant, visual "Blast Radius" for every incident.
2. The High-Performance Technical Stack (The "Four Pillars")
To achieve 60fps performance and sub-second data updates at a scale of 100k nodes/250k edges, we bypass the "JSON/REST Bottleneck" and JavaScript Garbage Collection stutters entirely.
Pillar 1: The Vehicle (Apache Arrow IPC)
Pillar 2: The Engine (Wasm-Arrow Bridge)
Pillar 3: The Filter (Hybrid Roaring Bitmaps)
Pillar 4: The Brain (Deep Causality & Rustler)
deep_causalityRust crate) evaluates telemetry (SNMP, Flow, BGP, Security) to distinguish between a Root Cause and an Inferred Symptom.3. The Data Pipeline: "Telemetry to Vision"
deck.gl(via GeoArrow patterns) reads coordinates directly from the Wasm heap to update 100k nodes in a single draw call.4. The Hybrid Filtering & Ghosting Engine
4.1 Separation of Concerns
4.2 The "3-Hop" Rule (Local Traversal)
When a user clicks a node, the Wasm Engine traverses the graph adjacency list in memory. It identifies neighbors within
Nhops and updates the "Ghosting Mask" in < 1ms, providing instantaneous visual isolation.5. Multi-Layer Visualization Architecture
Layer 1: The Mantle (Physical Infrastructure)
Layer 2: The Crust (Logical Topology)
Layer 3: The Atmosphere (Telemetry Flow)
rperfthroughput. Animated Particle Shaders calculated in Wasm for 100k+ particles.Layer 4: The Security & Causal Perimeter
6. Advanced UI Features
6.1 Semantic Zoom & Wasm Interpolation
6.2 Radial Subnet Layouts
To prevent "Sprawl," Leaf/Access nodes are arranged in Compact Radial Clusters. Wasm handles the radial coordinate math locally to keep the layout snappy.
7. Aesthetic Specification ("Cyber-Punk Nocturne")
8. Success Metrics
9. Critical Use Case: The "Security-Exfiltration" Incident
deep_causalityidentifies a Server IP as the source and a malicious destination.Imported GitHub comment.
Original author: @marvin-hansen
Original URL: https://github.com/carverauto/serviceradar/issues/2834#issuecomment-3904074197
Original created: 2026-02-15T09:57:17Z
Okay, I took some time to write up my thoughts on the DC integration. No AI, just my humble brain dump;
Service Radar DeepCausality integration
Big idea:
Constructing and updating a context hyper graph in real-time as the various devices in the network are discovered.
DeepCausality enables multi contextual reasoning across arbitrarily complex hypergraphs. However, because the model abstraction that wraps a causal model and its context defines context as a Arc<RwLock>>, it’s also possible to share a global context across different models because Arc is clone and the RwLock (fine grained Mutex) ensures read / write protection. Thus one can experiment with various causal models reasoning over the same shared global network context graph.
In practice, it is advisable to build and update the graph in tandem with the persistence e.g. the database upset operation to ensure data synchronization.
Why?
A handful of use cases become trivial to solve with the context graph:
Network diagnostic and reliability detection
A) Detecting mission critical choke points.
Problem:
In large networks, it’s rarely fully known where all the oblivious bottlenecks are buried. However, if just one of those highly centralized routers or gateways were to fail, the bulk of the network would instantly be disconnected.
Solution: This is actually trivial because one only need to create a deep copy (clone) of the current network context graph, freeze it, and run the betweenness_centrality() algo, sort the results by concentration score, and highlight the top N nodes. The betweenness_centrality measures the relative path through connectivity meaning a high score implies the highest number of network paths go through this node and therefore its implicitly mission critical. In the UI it’s advisable to set N to a sensible default e.g. N=5 to identify the top 5 mission critical network nodes. However, the user should also be able to set N to a custom value.
Value:
If just one single unmitigated choke point was upgraded to HA failover, a complete network takedown could have been prevented.
The most important incident is always the one that never occurred because of effective mitigation.
B) Identifying over-centralized nodes
Problem:
As networks grow large, its possible that certain services become over-centralized and because of it a structural risk:
Solution:
Trivial, just clone the graph, freeze and run
strongly_connected_components()
Which returns all nodes where each set represents the strongly connected components. This would be central routers, DNS servers and core network services. Once these are identified, a network security audit can being with mitigation.
Value:
If just one of those core services is made redundant through proper HA failover, another potential network takedown as been mitigated before it could happened.
C) Testing network pathways
Problem:
As network grow large debugging connectivity issues becomes increasingly complex. Also, for security reasons some network nodes should not be reachable from some network segments.
Solution:
Trivial, just clone the graph, freeze and run
is_reachable(start_index, stop_index)
This shows instantly if a node is reachable from the target node. This verifies that security policies have been enforced correctly or, equally valid, answers the question why a certain service is not reachable.
Value:
Network security and intrusion detection
Problem:
Advanced Persistent Threats (APT) pose a significant challenge because adversaries spread out network infiltration over time and camouflage their activities as regular traffic that would normally remain undetected.
Here, larger network size becomes a major complication because it’s impractical to deploy an in-depth IDS on each single device mainly because heterogeneous platforms and systems of all connected devices.
Note: The WiFi scan and monitor capabilities would massively help here to capture the network 360 degrees by keeping an eye on all wired and wireless devices. That way, one can block wireless devices the moment they try to do anything stupid long before anything else breaks deep down in the network.
Approach:
Because the network graph represents all discovered network devices and captures traffic between devices, one can deploy multiple causal models for watching out for multiple anomalies in the network hyper graph.
For once, one can deploy certain rules e.g. workstations of network segment X are only allowed to connect to printers and SMB shares of the same segment, but not to a certain number of other core services. If that rule were to be violated, an alert and/or silent mitigation could be triggered.
Then one can instantly detect and capture the blast radius of a compromised machine granted one has detected an anomaly by simply querying for all edges of that node. How “dangerous” a compromised node is can be determined by testing if there is any pathway from the compromised node to a number of mission critical nodes. The more pathways exists, the higher the danger and the swifter the counter measure.
A central challenge is anomaly detection itself because, as stated before, APT tends to camouflage as regular network traffic. Meaning, a compromised SMB server would try to send out some kind of SMB traffic to adjacent SMB servers to obtain access to other file servers. One key distinction between “norma” and “anomaly” network is in the details of the handshake, or network header. For example, there were historic CVE’s where SMB was compromised by a buffer overflow caused by a an oversized network header. Likewise, a classic Denial of Service usually aborts TCP handshakes in an attempt to exhaust the hosts open connection limit.
Therefore, the causal rules can only be effective when combined with deep packet inspection that scans each protocol for standard conformance. This one is easier to implement because it’s relatively hard to trigger a buffer overflow on the receiving host when network packages with non-standard headers simply never arrive.
Disallowing certain host to connect to certain network nodes should, in theory, be handled on an internal router, but in practice its useful to have the functionality in place for those networks that don’t secure internal routes.
Preliminary solution:
Priority 1:
Implement wifi network and wifi device discovery to enable 360 visibly across all network types.
Implement real-time network graph for both, wired and wireless networking
Priority 2:
Priority 3:
Once 360 wired and wireless visibly is in place the and the network hypergraph runs stable in the background, its time to design an end-to-end advanced APT detection system.