You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(stack-coexist): backfill 09/10 with measured v6 dual-stack perf A/B (no regression)
Real-machine A/B (same helloworld linked against the macro-on lib, toggling only config kernel_coexist; client wrk against the DPDK NIC 9.134.214.176:80): A1 (v6 default dual-build) vs A0 (pure F-Stack) throughput delta T1 -1.73% / T2 +1.68% / T3 +5.87%, all within trial noise, p99 essentially equal, zero socket errors. PERF-1/2/4 now measured PASS in 10 §10 (zh_cn + English); the dual-build cost is paid once on listen setup, the keep-alive data hot path stays single-stack and does not consult the map.
Copy file name to clipboardExpand all lines: docs/kernel_event_support_spec/10-perf-baseline-report.md
+37-15Lines changed: 37 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
> **Doc id**: SPEC-KE-10
6
6
> **Version**: v6 (native automatic dual-stack paradigm; retains the v4/v5 true-coexistence methodology; supersedes the v3 pure-kernel-loopback methodology)
7
7
> **Date**: 2026-06-17
8
-
> **Status**: §4/§5 are v5 R4 real-machine FINAL (per-fd either/or methodology, toggling only runtime `kernel_coexist` 0/1). **v6 automatic dual-stack (commit 13b418191) functional correctness + macro-off zero regression + hot-path code guarantee are measured/proven PASS (see §10); but the v6 wrk throughput baseline for PERF-1/2 was NOT re-run (honestly flagged in §10, no fabricated numbers).**
8
+
> **Status**: §4/§5 are v5 R4 real-machine FINAL (per-fd either/or methodology, toggling only runtime `kernel_coexist` 0/1). **v6 automatic dual-stack (commit 13b418191) functional correctness + macro-off zero regression + PERF-1/2/4 F-Stack fast-path A/B are all real-machine measured PASS (see §10).**
9
9
> **v6 note**: v5 measured per-fd either/or (default builds F-Stack only) → PERF-1/2 zero regression. v6 automatic dual-stack introduces **default dual-build/dual-drive**, so RE-MEASURE: (1) F-Stack business fast path still no regression under default dual-stack (PERF-1/2); (2) single-stack connection hot path does NOT consult `ff_native_fd_map` (PERF-4, see `07` UT-17). R6 macro-off (incl. v6 `ff_native_fd_map` not compiled) zero-regression is still verified by `07 §1bis` MT-1 `nm` symbol comparison; macro off = same binary as upstream, no perf retest.
10
10
> **Scope**: empirically prove coexistence causes **no regression on the F-Stack business fast path** (PERF-1/2/**4**), and give a **kernel-side bypass throughput** (PERF-3) management-plane data point.
11
11
> **Empirical rule**: every number comes from real wrk output (`/tmp/helloworld-coexist-bench/`, `/tmp/kbench-perf/`); no fabrication. Real server/client IPs are source-side `sed`-masked before landing on disk (`9.134.214.176→192.168.1.1`, `9.134.211.87→192.168.1.2`).
| PERF-1 | F-Stack fast-path regression | coexist off vs on, press F-Stack business only | throughput/latency delta ≤ noise (NFR-2) |
26
26
| PERF-2 | default-path zero overhead | effect of the coexist branch on default/`SOCK_FSTACK`| zero/negligible (NFR-1) |
27
27
| PERF-3 | kernel-side bypass throughput | local loopback wrk against the `SOCK_KERNEL` listener | meets management-plane expectation (not a fast path) |
28
-
|**PERF-4 (v6)**|**hot path does not consult the map**| single-stack connection recv/send throughput with auto dual-stack on/off (the single-stack connection accepted from a default dual-stack listen) | zero extra cost on the connection hot path (NFR-2, see `07` UT-17); **proven PASS by code** (recv/send do a single `ff_is_kernel_fd` check, no map lookup), wrk throughput numbers not re-run (see §10) |
28
+
|**PERF-4 (v6)**|**hot path does not consult the map**| single-stack connection recv/send throughput with auto dual-stack on/off (the single-stack connection accepted from a default dual-stack listen) | zero extra cost on the connection hot path (NFR-2, see `07` UT-17); **measured PASS** (recv/send do a single `ff_is_kernel_fd` check, no map lookup; §10.2 keep-alive throughput A1≈A0 corroborates) |
29
29
30
30
> **§4/§5 are the v5 per-fd either/or FINAL measurement**; under v6 automatic dual-stack (default dual-build/dual-drive), PERF-1/2/4 must be re-measured at R7 (see the v6 note above).
31
31
@@ -168,23 +168,45 @@ cd /data/workspace/f-stack/example/helloworld_stacksel && make # ./helloworld_
> **Honest basis**: this section separates "measured/provable PASS" from "v6 wrk throughput baseline not re-run"; no performance numbers are fabricated. The throughput tables in §4/§5 are still the **v5 per-fd either/or** FINAL data and were **not** re-measured under v6 default dual-build/dual-drive.
171
+
> This section is the measured verdict for v6 native automatic dual-stack. The vector A A/B throughput in §10.2 is a **v6 default dual-build/dual-drive** real-machine measurement (helloworld, IPv4-only, linked against the macro-on lib, toggling only config `kernel_coexist` 0/1; client wrk 4.2.0 against the DPDK NIC 9.134.214.176:80). §4/§5 remain the v5 per-fd either/or FINAL, kept as a historical reference.
172
172
173
-
### 10.1 Measured / provable PASS
173
+
### 10.1 Measured / proven items
174
174
175
175
| Item | Evidence | Verdict |
176
176
|---|---|---|
177
-
|**Macro-off zero regression (compile-time)**|`make` clean rebuild rc=0; `nm libfstack.a` coexist symbols=0; `libfstack.a` size 6539682, byte-for-byte identical to baseline | PASS (same binary as upstream F-Stack, performance-equivalent, no retest needed) |
|**Dual-mode unit tests**| macro-off P1 50/50; macro-on P1 incl. `test_ff_native_fd_map`/`test_ff_kernel_fd_encode_roundtrip` all pass | PASS |
180
-
|**Real-machine dual-stack function (one listen, many uses)**| single `listen(80)` demo: kernel side `ss 0.0.0.0:80` + `curl 127.0.0.1:80=HTTP 200`; F-Stack side `ssh f-stack-client→9.134.214.176:80=HTTP 200` (same process, same epoll) | PASS (functional correctness, not a throughput baseline) |
181
-
|**PERF-4 hot path no map lookup (proven by code)**| recv/send/read/write/recvfrom/sendto only prepend a single `ff_is_kernel_fd()` and do NOT call `ff_native_map_get` (`ff_syscall_wrapper.c` review + `08 §4` V8) | PASS (zero extra cost at the code level) |
177
+
| Macro-off zero regression (compile-time) |`nm libfstack.a` coexist symbols=0; size 6539682 byte-for-byte identical to baseline | PASS (same binary as upstream F-Stack) |
| Real-machine dual-stack function (one listen, many uses) | single `listen(80)`: kernel `curl 127.0.0.1:80=200`; F-Stack `ssh→9.134.214.176:80=200`| PASS |
181
+
| PERF-1/2 F-Stack fast-path no regression | §10.2 vector A A/B real-machine measurement | PASS |
182
+
| PERF-4 hot path no map lookup | recv/send do a single `ff_is_kernel_fd` check, no map lookup (code) + §10.2 keep-alive throughput A1≈A0 (measured) | PASS |
182
183
183
-
### 10.2 Not re-run (honestly flagged)
184
+
### 10.2 Vector A: v6 default dual-build vs F-Stack business fast path A/B (PERF-1/2, real-machine)
184
185
185
-
-**PERF-1 / PERF-2 (F-Stack business fast-path wrk throughput A/B under v6 default dual-build/dual-drive)**: the v6 three-tier wrk baseline was **not** re-run this round.
186
-
- Current basis (inferred, not measured numbers): (1) macro-off is byte-for-byte identical to baseline (compile-time zero regression proven); (2) the dual-drive branches short-circuit when runtime `kernel_coexist=0`; (3) the v6 "dual-build" cost is paid once on `ff_socket`/`bind`/`listen`/`accept` link setup, while the **connection data hot path (recv/send) is single-stack and does not consult the map** (10.1 PERF-4); (4) the v5 same-basis wrk measurement (§4) already showed no regression on the F-Stack fast path when toggling `kernel_coexist` 0/1.
187
-
-**Verdict**: no regression is expected on the F-Stack business fast path under v6, but **v6 measured wrk numbers are missing**. For exact numbers, re-run T1/T2/T3 x3 trials under `kernel_coexist=1` + macro-on + default dual-stack, pressing the F-Stack business (wrk on f-stack-client against 9.134.214.176:80) per the §3 method.
188
-
-**Link-setup overhead (extra syscalls of dual-build on the socket/accept path)**: not separately quantified; it is a management/low-frequency path, not the data hot path.
186
+
> Same helloworld (IPv4-only, linked against the macro-on lib), toggling only `config.ini [stack] kernel_coexist`: A0=0 (pure F-Stack) / A1=1 (v6 default dual-build/dual-drive). Client (f-stack-client, masked 192.168.1.2) wrk 4.2.0 against the DPDK NIC 9.134.214.176:80 (masked 192.168.1.1); median of 3 trials per tier; environment/method per §2/§3 (single lcore `lcore_mask=10`, `idle_sleep=20`, keep-alive).
189
187
190
-
→ **v6 R7 performance gate verdict: functional correctness + compile-time zero regression + hot-path code guarantee PASS; the v6 throughput wrk baseline (PERF-1/2) is "not re-run, inferred no-regression by design" and needs a follow-up real-machine measurement to give FINAL numbers.**
All v6 default dual-build/dual-drive on (A1) vs off (A0) deltas fall within trial noise with no systematic negative trend: T1 −1.73%, T2 +1.68%, T3 +5.87% (A1 slightly faster at T2/T3); p99 essentially equal (T1 ~526us, T2 ~730us, T3 ~206-208ms same-basis c500 single-lcore tail, identical A0/A1 behavior). This matches the v5 §4 verdict: the dual-build cost is paid once on listen-socket setup, while a keep-alive connection's data hot path (recv/send) is single-stack and does not consult the map (PERF-4), so there is no measurable regression on the F-Stack business fast path.
209
+
210
+
→ **PERF-1/2/4 PASS (v6 real-machine): v6 native automatic dual-stack introduces no measurable regression on the F-Stack business fast path (NFR-1/NFR-2); F-Stack always carries the business (NFR-3).**
211
+
212
+
> Raw wrk output (IP-masked): `/tmp/perf/A{0,1}_T{1,2,3}_tr{1,2,3}.txt` (cleaned via `rm_tmp_file.sh` after the run).
0 commit comments