feat(netprobe): move p0f encoding to userspace; fix eBPF verifier (#3425) #3514

Merged
mfreeman451 merged 1 commit from feat/netprobe-p0f-userspace-ebpf-verifier into staging 2026-06-02 06:05:15 +00:00
Owner

Summary

Part of the native add-on rollout (#3425). The netprobe eBPF object could not load on the target kernel: the TCP-SYN / p0f path first exceeded the BPF verifier's 1M-instruction complexity limit (the p0f::encode option/quirk string builder), and once that was removed it hit the 512-byte BPF stack limit. This moves p0f signature encoding out of the eBPF program into userspace and restructures the eBPF datapath so every program verifies cheaply, while making p0f matching actually work.

eBPF (rust/netprobe/ebpf/src/lib.rs)

  • Drop p0f from the kernel: remove p0f::encode, emit_p0f_signature, the P0fRecord struct and the p0f_signatures ring buffer. The TC-SYN path now only parses the SYN into a raw TcpSynSignatureRecord and emits it to tcp_syn_signatures. The record gains source_endpoint + df/id+/id- IP-level quirk bits and a deterministic #[repr(C)] 104-byte layout.
  • Tail-call split for stack budget: the SYN parse is a tail-call target (netprobe_tc_syn_signature) so it verifies with its own fresh 512-byte stack; the entry classifiers do flow accounting then bpf_tail_call. The tail call lives in the entry program because the verifier rejects bpf_tail_call inside bpf-to-bpf subprograms.
  • Stack reductions: split the flow_table update out of the parse chain, de-inline the IPv4/IPv6 parse + SYN-header helpers, and compare 16-byte addresses as a single big-endian u128 instead of a byte loop. The byte loop unrolled into per-byte stack spills (~392-byte frames); the u128 compare dropped parse_ipv6_flow_key 392→64 and record_flow_pid 480→168 bytes.

Userspace

  • New rust/netprobe/src/p0f_encode.rs: the no_std/no-alloc p0f encoder moved from the eBPF crate, extended to render df/id+/id- quirks.
  • fingerprint.rs reads the raw tcp_syn_signatures records, builds the p0f signature string in userspace, and matches the bundled corpus exactly as before (P0fSignatureRuntime repointed from p0f_signaturestcp_syn_signatures).
  • The df/id+ quirks make real SYNs match the bundled corpus — the old eBPF-encoded path emitted no IP-level quirks and therefore matched nothing.

Verification

Built hermetically (RBE) and validated on a kernel-6.8 box (agent-sr-test-pve04, amd64):

  • All eBPF programs load (no verifier grind: ~12k insns / 64ms; no stack overflow) and attach (TC ingress/egress, XDP, kprobes, tracepoints).
  • Flow accounting populates flow_table; p0f fingerprint events flow via the userspace path (netprobe_events_emitted_total{stream="fingerprint"} increments).
  • netprobe idles at ~0% CPU / ~7 MB RSS.

Unit tests updated for the raw-record format; bazel test //rust/netprobe:netprobe_test passes.

🤖 Generated with Claude Code

## Summary Part of the native add-on rollout (#3425). The netprobe eBPF object could not load on the target kernel: the TCP-SYN / p0f path first exceeded the BPF verifier's **1M-instruction complexity limit** (the `p0f::encode` option/quirk string builder), and once that was removed it hit the **512-byte BPF stack limit**. This moves p0f signature *encoding* out of the eBPF program into userspace and restructures the eBPF datapath so every program verifies cheaply, while making p0f matching actually work. ## eBPF (`rust/netprobe/ebpf/src/lib.rs`) - **Drop p0f from the kernel**: remove `p0f::encode`, `emit_p0f_signature`, the `P0fRecord` struct and the `p0f_signatures` ring buffer. The TC-SYN path now only parses the SYN into a raw `TcpSynSignatureRecord` and emits it to `tcp_syn_signatures`. The record gains `source_endpoint` + `df`/`id+`/`id-` IP-level quirk bits and a deterministic `#[repr(C)]` 104-byte layout. - **Tail-call split for stack budget**: the SYN parse is a tail-call target (`netprobe_tc_syn_signature`) so it verifies with its own fresh 512-byte stack; the entry classifiers do flow accounting then `bpf_tail_call`. The tail call lives in the entry program because the verifier rejects `bpf_tail_call` inside bpf-to-bpf subprograms. - **Stack reductions**: split the `flow_table` update out of the parse chain, de-inline the IPv4/IPv6 parse + SYN-header helpers, and compare 16-byte addresses as a single big-endian `u128` instead of a byte loop. The byte loop unrolled into per-byte stack spills (~392-byte frames); the `u128` compare dropped `parse_ipv6_flow_key` 392→64 and `record_flow_pid` 480→168 bytes. ## Userspace - New `rust/netprobe/src/p0f_encode.rs`: the `no_std`/no-alloc p0f encoder moved from the eBPF crate, extended to render `df`/`id+`/`id-` quirks. - `fingerprint.rs` reads the raw `tcp_syn_signatures` records, builds the p0f signature string in userspace, and matches the bundled corpus exactly as before (`P0fSignatureRuntime` repointed from `p0f_signatures` → `tcp_syn_signatures`). - The `df`/`id+` quirks make real SYNs match the bundled corpus — the old eBPF-encoded path emitted no IP-level quirks and therefore matched nothing. ## Verification Built hermetically (RBE) and validated on a **kernel-6.8 box** (`agent-sr-test-pve04`, amd64): - All eBPF programs **load** (no verifier grind: ~12k insns / 64ms; no stack overflow) and **attach** (TC ingress/egress, XDP, kprobes, tracepoints). - Flow accounting populates `flow_table`; p0f **fingerprint events flow** via the userspace path (`netprobe_events_emitted_total{stream="fingerprint"}` increments). - netprobe idles at **~0% CPU / ~7 MB RSS**. Unit tests updated for the raw-record format; `bazel test //rust/netprobe:netprobe_test` passes. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
feat(netprobe): move p0f encoding to userspace; fix eBPF verifier (#3425)
Some checks failed
Secret Scan / gitleaks (push) Successful in 35s
Source Security Scan / source-security (push) Successful in 1m0s
Publish OCI Images / publish (push) Has been cancelled
Netprobe eBPF Verifier / Verify eBPF programs on Linux 5.15 (push) Has been cancelled
Netprobe eBPF Verifier / Verify eBPF programs on Linux 5.8 (push) Has been cancelled
Netprobe eBPF Verifier / Verify eBPF programs on Linux 6.x (push) Has been cancelled
Netprobe eBPF Verifier / Verify eBPF refusal on Linux 5.4 (push) Has been cancelled
CI / build (push) Has been cancelled
Fingerprint Licensing / netprobe-fingerprint-licenses (push) Failing after 55s
Rust Tests / test-rust (rust/rdp-adapter, cargo) (push) Successful in 1m2s
lint / lint (push) Successful in 1m34s
Rust Tests / test-rust (//rust/netprobe:netprobe, //build/platforms:linux_aarch64_musl, rust/netprobe, bazel-static) (push) Successful in 1m58s
Rust Tests / test-rust (//rust/netprobe:netprobe, //build/platforms:linux_x86_64_musl, rust/netprobe, bazel-static) (push) Successful in 2m3s
Rust Tests / test-rust (rust/rperf-client, cargo) (push) Successful in 1m57s
Rust Tests / test-rust (//rust/rperf-server:rperf, rust/rperf-server, bazel) (push) Successful in 2m7s
Rust Tests / test-rust (//rust/netprobe:netprobe_test, rust/netprobe, bazel-test) (push) Failing after 2m34s
Secret Scan / gitleaks (pull_request) Successful in 28s
Rust Tests / test-rust (rust/trapd, cargo) (push) Successful in 2m58s
Rust Tests / test-rust (rust/consumers/zen, cargo) (push) Successful in 3m15s
Rust Tests / test-rust (rust/log-collector, cargo) (push) Successful in 3m15s
Fingerprint Licensing / netprobe-fingerprint-licenses (pull_request) Failing after 50s
lint / lint (pull_request) Successful in 1m43s
Rust Tests / test-rust (rust/rdp-connector-probe, cargo) (push) Successful in 4m14s
Rust Tests / test-rust (rust/srql, cargo) (push) Successful in 6m14s
CI / build (pull_request) Failing after 9m56s
57386101aa
The eBPF TCP-SYN/p0f path could not load: it first blew the BPF verifier's
1M-instruction complexity limit (the p0f::encode option/quirk string builder),
and once that was removed it hit the 512-byte BPF stack limit. This moves p0f
signature *encoding* out of the eBPF program into userspace and restructures the
eBPF datapath so every program verifies cheaply.

eBPF (rust/netprobe/ebpf/src/lib.rs):
- Remove p0f::encode, emit_p0f_signature, the P0fRecord struct and the
  p0f_signatures ring buffer. The TC-SYN path now only parses the SYN into a
  raw TcpSynSignatureRecord (now carrying source_endpoint + df/id+/id- IP-level
  quirk bits, deterministic #[repr(C)] 104-byte layout) emitted to
  tcp_syn_signatures.
- Split the SYN parse into a tail-call target (netprobe_tc_syn_signature) so it
  verifies with its own fresh 512-byte stack budget; the entry classifiers do
  flow accounting then bpf_tail_call. The tail call lives in the entry program
  because the verifier rejects bpf_tail_call inside bpf-to-bpf subprograms.
- Split the flow_table update out of the parse chain and de-inline the IPv4/IPv6
  parse + SYN-header helpers so no single call chain exceeds the stack limit.
- Compare 16-byte addresses as a single big-endian u128 instead of a byte loop:
  the loop unrolled into per-byte stack spills (~392-byte frames) that overflowed
  the stack. This dropped parse_ipv6_flow_key 392->64 and record_flow_pid 480->168.

userspace:
- New rust/netprobe/src/p0f_encode.rs: the no_std/no-alloc p0f encoder moved from
  the eBPF crate, extended to render df/id+/id- quirks. Declared in both lib.rs
  and main.rs (dual lib+bin crate).
- fingerprint.rs reads the raw tcp_syn_signatures records, builds the p0f
  signature string in userspace, and matches the corpus exactly as before
  (P0fSignatureRuntime repointed from p0f_signatures to tcp_syn_signatures).
- The df/id+ quirks make real SYNs match the bundled corpus, which the old
  eBPF-encoded path never did (it emitted no IP-level quirks).

Verified on a kernel-6.8 box (agent-sr-test-pve04, amd64): all programs load
with no verifier grind and no stack overflow, attach (TC ingress/egress, XDP,
kprobes, tracepoints), flow accounting populates flow_table, and p0f fingerprint
events flow via the userspace path. netprobe idles at ~0% CPU / ~7 MB RSS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mfreeman451 left a comment

lgtm

lgtm
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
carverauto/serviceradar!3514
No description provided.