feat(agent): Agent updates/SIEM #1071
Labels
No labels
1week
2weeks
Failed compliance check
IP cameras
NATS
Possible security concern
Review effort 1/5
Review effort 2/5
Review effort 3/5
Review effort 4/5
Review effort 5/5
UI
aardvark
accessibility
amd64
api
arm64
auth
back-end
bgp
blog
bug
build
checkers
ci-cd
cleanup
cnpg
codex
core
dependencies
device-management
documentation
duplicate
dusk
ebpf
enhancement
eta 1d
eta 1hr
eta 3d
eta 3hr
feature
fieldsurvey
github_actions
go
good first issue
help wanted
invalid
javascript
k8s
log-collector
mapper
mtr
needs-triage
netflow
network-sweep
observability
oracle
otel
plug-in
proton
python
question
reddit
redhat
research
rperf
rperf-checker
rust
sdk
security
serviceradar-agent
serviceradar-agent-gateway
serviceradar-web
serviceradar-web-ng
siem
snmp
sysmon
topology
ubiquiti
wasm
wontfix
zen-engine
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
carverauto/serviceradar#1071
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imported from GitHub.
Original GitHub issue: #2936
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2936
Original created: 2026-02-27T21:15:40Z
Product Requirements Document: ServiceRadar Next-Gen SIEM & Observability Platform
Version: 2.0
Author: Carver Automation Corporation
Date: February 2026
Status: Draft
1. Executive Summary
ServiceRadar is evolving from a network visibility tool into a true Single Pane of Glass (SPOG) for automated datacenter management, observability, and next-generation SIEM. Target environments include bare metal servers, Proxmox (VMs/LXCs), and Kubernetes clusters.
By utilizing a pure-Golang single-binary agent, a high-speed gRPC/NATS data plane, an Elixir/BEAM backend, and a heavily extended CloudNativePG (CNPG) database, ServiceRadar will provide Datadog/CrowdStrike-tier capabilities—entirely open-source and natively within our own infrastructure.
1.1 Design Principles
serviceradar-agent). No Wazuh, no Fleet/osquery, no legacy C/C++ agents on endpoints.2. Architectural Overview
The platform is divided into four strictly controlled layers, with one specific exception for Kubernetes-native security telemetry.
2.1 Layer Diagram
2.2 Data Flow Summary
gopsutil)telemetry.metrics.edgesecurity.falco.alertsscan.trivy.resultssecurity.xdp.dropstopology.connectionscommands.agent.{id}3. Core Epics & Feature Requirements
Epic 1: The Edge Go Agent (
serviceradar-agent)Goal: Eliminate agent fatigue at the edge with one highly optimized Go binary. No Wazuh, no Fleet/osquery—deploy one binary that does everything.
1.1 Native Telemetry (
gopsutil+ Container SDKs)shirou/gopsutil.github.com/docker/docker/client) and LXC Go SDK (github.com/lxc/go-lxc) to map raw Linux PIDs to specific container names, image tags, and Proxmox LXC IDs.gopsutil/net.Connections()to map active network connections for topology discovery.1.2 Embedded Vulnerability Scanner (Trivy)
dpkg/rpmparsing—itsrootfsscan mode outputs structured JSON of all installed OS packages, Python packages, Node modules, etc.1.3 eBPF / XDP Network Security (Edge Firewalling)
cilium/ebpfto drop malicious packets at wire-speed on bare metal and Proxmox hosts.1.4 Kubernetes Runtime Security (The Falco Exception)
security.falco.alertsJetStream subject.agent-gatewaywould be an unnecessary bottleneck.Epic 2: The Data Plane (gRPC + NATS JetStream)
Goal: Create a massively scalable, back-pressure-resistant pipeline. Edge agents are strictly proxied through the gateway, while trusted K8s workloads get direct NATS access.
2.1 Command & Control (Bi-Directional gRPC Streaming)
agent-gateway.commands.agent.{agent_id}.2.2 Edge Telemetry Firehose (Client-Side gRPC Streaming)
gopsutilmetrics, Trivy results, and XDP drop events, streaming them to the gateway via client-side gRPC streaming.telemetry.metrics.edge,scan.trivy.results,security.xdp.drops).2.3 The Kubernetes Exception (Direct NATS Ingestion)
agent-gatewayentirely.security.falco.alerts.2.4 Air-Gapped Trivy DB Distribution (Server-Side gRPC Streaming)
The Trivy CVE database can be 50–100MB+. This pipeline ensures agents can scan offline without hitting GitHub rate limits.
{"event": "trivy_db_update", "version": "v2.1.0"}.Epic 3: Smart Execution & Remediation (AWX Engine)
Goal: ServiceRadar acts as the brain, AWX acts as the hands. AWX is already deployed in the K8s cluster and provides a robust REST API over Ansible.
3.1 Event-Driven Patching
POST /api/v2/job_templates/{id}/launch/), passing the target Proxmox LXC/VM hostname as an extra-var.apt,yum,win_updatesmodules), and sends a webhook back to ServiceRadar when the job completes.3.2 Running vs. Dormant Vulnerability Correlation
This is the key differentiator—correlating static vulnerability data with live runtime state.
gopsutilprocess data shows which binaries are currently executing in memory.3.3 AWX Integration Setup
patch_server.yml,isolate_host.yml).community.general.proxmox) to manage VMs/LXCs, andkubernetes.corefor K8s cluster state.Epic 4: Next-Gen SIEM & Observability Datastore (CNPG)
Goal: Replace Elasticsearch, OpenSearch, and Neo4j by pushing all advanced analytics directly into heavily optimized PostgreSQL extensions within CloudNativePG.
4.1 Time-Series Metrics & Logs (TimescaleDB)
gopsutilsystem metrics (CPU, RAM, disk, network I/O), XDP firewall drop logs, and structured log streams.4.2 Relational Tables (Standard PostgreSQL)
has_manypackages, a packagebelongs_to_manyCVEs. Standard Postgres handles this naturally with Ash Framework resources.4.3 Network Graphing & Blast Radius (Apache AGE)
gopsutil/net.Connections()and sends connection data to core-elx over NATS.(Process A on Host 1) -[CONNECTS_TO]-> (Port 5432 on Host 2).4.4 Full-Text Search (ParadeDB / BM25)
bm25index within CNPG.4.5 Semantic & AI Search (pgvector / HNSW)
tail -n 10 /etc/shadowinstead ofcat /etc/shadow). Keywords miss this; vector similarity catches it.all-MiniLM-L6-v2, 22–90MB) entirely within the BEAM VM on CPU.4. Feature Deep-Dive: SRQL (ServiceRadar Query Language)
To provide a world-class threat hunting experience, ServiceRadar introduces SRQL, a custom DSL parsed by Elixir that compiles down to highly optimized Postgres queries leveraging ParadeDB (BM25) and pgvector (semantic search).
4.1 SRQL Syntax & Compilation
SRQL allows analysts to pipe (
|) exact keyword matches into semantic filters natively in the UI search bar.Example 1: Pure ParadeDB Search (Keyword)
Elixir compiles this to a ParadeDB
paradedb.search()query using BM25 scoring. Extremely fast exact matches.Example 2: Semantic Threat Hunting (pgvector)
Elixir uses Bumblebee to vectorize the string, then compiles to:
Example 3: Hybrid Query (Keyword → Semantic Pipeline)
Compilation:
host:proxmox-node-01ANDseverity:high(reducing 50M rows to ~5,000).Result: AI-driven threat hunting with minimal latency, entirely on-premise.
4.2 Automated Semantic Deduplication ("Smart Alert Grouping")
Instead of overwhelming the UI with 10,000 Falco syscall alerts or XDP drop logs, core-elx runs a vector clustering algorithm. The UI displays: "10,000 events occurred, but they represent only 3 semantically unique attack patterns."
5. Embedding Strategy: CPU-First, GPU-Optional
A GPU must not be a hard requirement for deployment. Embedding models are small mathematical functions, not LLMs.
5.1 Model Selection
Use micro-models only:
all-MiniLM-L6-v2orbge-micro-v2(22–90MB). These fit entirely in L3 cache and generate an embedding for a log line in under 5–10ms on CPU.5.2 EXLA + CPU SIMD Optimization
Nx uses EXLA (Google's XLA compiler) under the hood. At application boot, EXLA JIT-compiles the Bumblebee embedding model to native machine code targeting AVX-512 and SIMD instructions on the host CPU, enabling multiple vector operations per clock cycle.
5.3 Broadway Batching (Never Embed 1-by-1)
Broadway pulls messages from NATS JetStream and groups them. Configure Broadway to batch 256 logs or wait 500ms (whichever comes first). Pass the entire batch to Bumblebee at once—the CPU vectorizes all 256 logs simultaneously via matrix multiplication, increasing throughput by orders of magnitude over sequential processing.
5.4 Selective Vectorization
Not all logs warrant embedding. The Elixir pipeline applies a fast pattern-match filter before the embedding stage.
Route to Bumblebee (embed):
level: warning,error, orcriticalBypass Bumblebee (keyword-only via ParadeDB):
5.5 Auto-Detect GPU (Progressive Enhancement)
At application boot in
runtime.exs, check for NVIDIA/CUDA drivers viaSystem.cmd("nvidia-smi", ...). If present, configure Nx to use the CUDA backend for dramatically faster embedding throughput. If absent, fall back to CPU EXLA—still performant for the micro-models used.6. Deployment & Tech Stack Summary
gopsutil,lxc/go-lxc, Trivy,cilium/ebpf7. Rejected Alternatives & Rationale
gopsutil+ Docker/LXC Go SDKs + Trivy SBOMs already provide equivalent system state natively in the Go agent.8. Phase 1 Deliverables & Next Steps
Phase 1A: Core Infrastructure
security.falco.alerts.gopsutil→ agent → gRPC → gateway → NATS → Broadway (Elixir) → TimescaleDB hypertable.Phase 1B: Vulnerability Pipeline
gopsutilrunning processes to auto-classify vulnerability severity.Phase 1C: SIEM Search
all-MiniLM-L6-v2via Bumblebee. Implement selective vectorization in the Broadway pipeline. Validate "Find Similar" queries against test security events.Phase 1D: Topology & Visualization
gopsutil/net.Connections()data into Apache AGE via Cypher. Build initial blast radius query and visualization in the ServiceRadar UI.Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2936#issuecomment-3975186653
Original created: 2026-02-27T21:32:09Z
might want to include this as part of the update as well https://github.com/carverauto/serviceradar/issues/2787
Imported GitHub comment.
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2936#issuecomment-4018075164
Original created: 2026-03-08T03:16:22Z
https://awesomeagents.ai/news/claude-code-sandbox-escape-denylist/