Skip to content

Commit 813f226

Browse files
author
en-translation-leader
committed
docs(en): translate remaining Phase-2 / Phase-5b-spec / F-A1 specs to English siblings
Continues the docs-sync-2026-06-08 + b8c3c78 closure + 9ea1519 (option B core-5) translation effort. Adds English sibling files for the 14 remaining post-2026-06-08 zh_cn docs that lacked English mirrors, per the user's explicit follow-up request (option B-extended; gitnexus-reindex-execution- log.md is the one zh_cn doc the user opted to leave Chinese-only). New English siblings (14 files, 2563 lines), placed alongside zh_cn/ at docs/freebsd_13_to_15_upgrade_spec/ (matches the existing M0..M5 / runtime- fix / Phase-5b-execution-log layout, NOT under an en/ subdirectory): Phase-2 plan + M-series: phase2-feature-enable-plan.md Phase-2 master plan phase2-M6-spec.md FF_NETGRAPH + FF_IPFW (P0) phase2-M6-execution-log.md phase2-M7-spec.md FF_USE_PAGE_ARRAY (P1a) phase2-M7-execution-log.md phase2-M8-spec.md FF_ZC_SEND (P1b) phase2-M8-execution-log.md phase2-M9-spec.md PA + ZC combo (P1c) phase2-M9-execution-log.md phase2-M10-spec.md FF_FLOW_IPIP (P1d) phase2-M10-execution-log.md phase2-M11-M13-spec.md P2 smoke trio phase2-MFinal-execution-log.md Phase-2 wrap-up Phase-5b spec + F-A1 fix: phase-5b-perf-baseline-spec.md F-A1-fix-execution-log.md Translation conventions (same as the option-B core-5 commit 9ea1519): - Each English file carries one front-matter line: > Chinese version: ./zh_cn/<file>.md (authoritative) - Verbatim preservation of file paths, commit hashes, line numbers, symbol names, code blocks, gate IDs, AC IDs, BOUNCE counts. - Tables and bullet lists keep the same structure as the Chinese master. - Backed up to /data/workspace/freebsd-src-releng-15.0/f-stack-lib/ test-configs/en-translation/ (now 20 files total: 5 from option B core + 14 added here + 1 from earlier README_EN.md note). Bilingual-doc coverage status (post-commit): - zh_cn/ has 46 .md (gitnexus-reindex log Chinese-only by user choice) - parent EN siblings: 47 .md (28 from earlier docs-sync + README_EN + 5 option-B core + 14 in this commit) - Coverage: 45/46 zh_cn docs have English siblings (98% — only gitnexus-reindex-execution-log.md intentionally excluded) No code changes. KG will auto re-index on commit. Local commit only; no push (per the project's standing convention).
1 parent 9ea1519 commit 813f226

15 files changed

Lines changed: 2563 additions & 0 deletions
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# F-A1 Fix Execution Log — `ff_if_send_onepkt` Fatal Panic → Soft Drop
2+
3+
> Chinese version: ./zh_cn/F-A1-fix-execution-log.md (authoritative)
4+
5+
**Author**: F-A1 Fix Leader
6+
**Date**: 2026-06-08
7+
**Predecessor**: `phase-5b-perf-baseline-report.md` §3.1 (commit `435e02753`)
8+
**Commit**: this commit
9+
**Status**: ✅ COMPLETE — F-A1 closed
10+
11+
---
12+
13+
## 1. Summary
14+
15+
Phase-5b found that with `FF_USE_PAGE_ARRAY=1` enabled alone, ICMP+HTTP fully break (finding F-A1, HIGH severity). This fix closes the finding with **1 file + 1 function** of change:
16+
17+
```
18+
lib/ff_memory.c:ff_if_send_onepkt()
19+
rte_panic(...) → rte_log(WARNING) + ff_mbuf_free(m) + return 0
20+
```
21+
22+
Phase-5b matrix re-run with the same harness (`tools/sbin/p5b_perf_matrix.sh`):
23+
24+
| Config | TC1 (100c) | TC2 (1000c) | functional |
25+
|---|---|---|---|
26+
| C0 baseline | 0.795s | 7.327s | 100% |
27+
| C7 PA-only **before fix** | n/a | n/a | **0% ❌** |
28+
| **C7fix PA-only after fix** | **0.736s (−7.4%)** | **7.378s (+0.7%)** | **100% ✅** |
29+
30+
---
31+
32+
## 2. Real Root Cause (instrumented one-shot confirmation)
33+
34+
### 2.1 Repro path
35+
36+
What Phase-5b observed: with `FF_USE_PAGE_ARRAY=1` the build came up, `helloworld primary alive=yes`, but client-side `ping`/`curl` all failed. The original interpretation was "the PA path breaks the NIC bridge".
37+
38+
The actual root cause: the Phase-5b liveness probe checks right after init (after 12 s sleep), at which point helloworld is still alive. **It is when ARP/ICMP replies are sent** that `ff_if_send_onepkt` line 457's `rte_panic` triggers → the primary `abort()`s. All later client traffic times out because the server-side primary is already dead.
39+
40+
### 2.2 Instrumented measurement
41+
42+
Added 6 path counters at the entry of `ff_if_send_onepkt` (`_dbg_in / chk_t / chk_f / b2r_null / ec_null / ok`) and replaced `rte_panic` with `rte_log + ff_mbuf_free + return 0`; rebuilt; same hardware + same harness:
43+
44+
```
45+
[ZC build] 1000 curl PASS, primary ALIVE throughout
46+
[FA1-DROP] 0 events under steady-state traffic
47+
counter dump: nearly every packet hits chk_t (inside the PA VMA) + ok
48+
```
49+
50+
**Key observation**: in steady state, the panic path is fired 0 times. The panic only triggers on **a startup-window edge case** (gratuitous ARP / IPv6 RS / loopback control mbuf) — once. But once is enough for `rte_panic` to abort the primary, after which all traffic appears broken.
51+
52+
---
53+
54+
## 3. Production Fix (instrumentation removed)
55+
56+
In `lib/ff_memory.c:440-505`, `rte_panic``rte_log(WARNING) + ff_mbuf_free + return 0`.
57+
58+
Design tradeoffs:
59+
60+
| Option | Score |
61+
|---|---|
62+
| ❌ Complex IOMMU `rte_extmem_register + rte_eth_dev_dma_map` | large change, multi-platform compat impact |
63+
| ❌ Build-time enforce "PA must be co-enabled with ZC" | weakens the design intent of standalone PA |
64+
|**`rte_panic` → log+drop** | 1 site + aligned with non-PA default behavior (`ff_dpdk_if.c:2150` fallback already silently drops on alloc failure) |
65+
66+
Rationale:
67+
- TCP congestion control + retransmit timer recover automatically
68+
- ARP retries (BSD default 5 times)
69+
- IPv6 ND auto-resends
70+
- No correctness loss (the upper stack treats the packet as a normal loss)
71+
72+
---
73+
74+
## 4. Verification (production build, debug counters removed)
75+
76+
### 4.1 G1 build
77+
78+
```
79+
lib make clean && make: exit=0 / 0 errors / 57 warnings (= baseline)
80+
example make: exit=0 / 3 binaries
81+
```
82+
83+
### 4.2 G2 stack-up
84+
85+
`FF_USE_PAGE_ARRAY=1` (no ZC), `--proc-type=primary --proc-id=0`:
86+
- `ff_mmap_init mmap 65536 pages, 256 MB.`
87+
- `ipfw2 (+ipv6) initialized`
88+
- `f-stack-0: Successed to register dpdk interface`
89+
- 12s+ ALIVE, no SIGSEGV
90+
91+
### 4.3 G3 functional
92+
93+
```
94+
ping -c 3 → 3/3 received, 0% loss, RTT 0.39-0.46 ms
95+
curl / → HTTP=200 / 0.93 ms
96+
30 serial curl → 30/30 PASS
97+
100 serial curl → 100/100 PASS (median 0.736s)
98+
1000 serial curl → 1000/1000 PASS (median 7.378s)
99+
```
100+
101+
### 4.4 G4 perf observation
102+
103+
| Config | TC1 median | Δ vs C0 | TC2 median | Δ vs C0 |
104+
|---|---|---|---|---|
105+
| C0 baseline | 0.795s || 7.327s ||
106+
| **C7fix PA-only** | 0.736s | **−7.4%** | 7.378s | +0.7% |
107+
108+
PA-only after fix is **7.4% faster than baseline (short-conn) / on par (long-conn)** — consistent with PA's design intent (the mmap pool reduces per-packet alloc/free).
109+
110+
### 4.5 G6 lint / G7 commit
111+
112+
- 0 lint errors
113+
- log counter: `grep -c "dropped pkt" steady-state.log = 0` (the fix path never fires in steady state, proving the fix is "defensive" — no zero-copy fast-path performance is sacrificed).
114+
115+
---
116+
117+
## 5. Impact Scope & Regression Guarantee
118+
119+
### 5.1 Touched code
120+
121+
Only `lib/ff_memory.c:ff_if_send_onepkt` — single file, single function, **+35/-2 lines** (with a detailed comment block).
122+
123+
### 5.2 Unaffected scenarios
124+
125+
- ✅ C0 baseline (no PA): `ff_if_send_onepkt` is not compiled in (`#ifdef FF_USE_PAGE_ARRAY`)
126+
- ✅ C8 ZC-only: same
127+
- ✅ C9 PA+ZC: same path as C7fix; the early-startup edge case now silently drops instead of aborting, more stable in concert with ZC
128+
- ✅ C10 FLOW_IPIP (with ZC/PA off): `ff_if_send_onepkt` is not compiled in
129+
130+
### 5.3 Phase-5b matrix supplement (C7fix row)
131+
132+
`docs/.../p5b_data/C7fix_TC{1,2}.csv` newly persisted, supersedes the earlier `C7_TC{1,2}.csv` (which captured the broken state).
133+
134+
---
135+
136+
## 6. F-A1 Status Change
137+
138+
| Source | Old | New |
139+
|---|---|---|
140+
| `phase-5b-perf-baseline-report.md` §3.1 | `🟠 DEFERRED HIGH` |**CLOSED** |
141+
| `phase-5b-perf-baseline-report.md` §5 followups | `F-A1 Owner=TBD Priority=High` |**fixed in this commit** |
142+
| Production default recommendation | "C8 ZC-only or C9 PA+ZC; avoid C7 PA-only" | "**all 4 configs (C0/C7/C8/C9) production-ready**; pick per scenario" |
143+
144+
### 6.1 F-A2 sync
145+
146+
`F-A2 (Medium)`: originally planned to retest whether C9 ARP-on-PA truly works with a cleared client ARP cache. With this fix, **no PA-path packet ever kills the primary** (no panic), so the ARP-cache theory is no longer relevant → **F-A2 auto N/A**.
147+
148+
### 6.2 F-A3 / F-A4 unchanged
149+
150+
Still Low Priority follow-ups (wrk/iperf3 client + physical-machine/CVM dual baseline).
151+
152+
---
153+
154+
## 7. Compliance & Audit
155+
156+
-`rm_tmp_file.sh` / `kill_process.sh` / `chmod_modify.sh` throughout
157+
-`[ -d /proc/$PID ]` — 0 `kill -0`
158+
- ✅ Local commit only
159+
- ✅ Commit message in English
160+
- ✅ 0 escalation / 0 bounces (1 round of instrumented measurement to confirm root cause + 1 round of production fix)
161+
162+
---
163+
164+
## 8. Timeline
165+
166+
| Stage | Start | End | Duration |
167+
|---|---|---|---|
168+
| RCA static reading (ff_chk_vma / ff_extcl_to_rte / ff_init_ref_pool) | 20:11 | 20:35 | 24 min |
169+
| Instrumented build + measurement | 20:35 | 20:55 | 20 min |
170+
| Production fix + minimal G test | 20:55 | 21:08 | 13 min |
171+
| Doc + Commit | 21:08 | 21:15 | 7 min |
172+
| **Total** | | | **≈ 64 min** |
173+
174+
---
175+
176+
> F-A1 closed. All Phase-2 enabled flags now have end-to-end functional verification PASS.
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Phase-5b Perf Baseline Spec — F-Stack v1.26 / FreeBSD 15.0 (Phase-2 Feature Matrix)
2+
3+
> Chinese version: ./zh_cn/phase-5b-perf-baseline-spec.md (authoritative)
4+
5+
**Author**: Phase-5b Leader
6+
**Date**: 2026-06-08
7+
**Predecessor**: Phase-2 M6~M13 + M-Final (commit `99cc538cd`, ahead 22)
8+
**Methodology base**: `06-test-and-acceptance-spec.md §5` NFR-1 framework + `cvm-bench-methodology.md`
9+
**Status**: ACTIVE
10+
11+
---
12+
13+
## 1. Goal
14+
15+
Establish a **cross-config relative perf-delta matrix** for Phase-2's 8 enabled feature flags (M6~M13). Specifically close the two recorded follow-ups:
16+
17+
- **M9-F1**: PA+ZC combo 3.5× slower at 1000 short conns (vs. M8 ZC-only) — needs root-cause analysis or expected-behavior classification.
18+
- **M10-F2**: IPIP tunnel heavy-traffic throughput baseline not done — substituted with a ping-RTT timing baseline due to client-tool limits.
19+
20+
**Non-goals**:
21+
- Absolute QPS / throughput numbers (the client only has `curl`; no wrk/iperf3/ab; the canonical NFR-1 numbers were already PASSed in M5).
22+
- Re-measuring the 13.0 baseline (already dual-baselined in M5).
23+
- Physical-machine + CVM dual baseline (environment-limited).
24+
25+
---
26+
27+
## 2. Environmental Constraints (recorded as-is, never bypassed)
28+
29+
| Dimension | State |
30+
|---|---|
31+
| Server NIC | 1× virtio `0000:00:09.0` (also carries SSH transport) |
32+
| Hugepages | configured (M6~M13 measured working) |
33+
| DPDK binding | igb_uio loaded |
34+
| Client OS | Linux (Ubuntu/Debian family per `uname -a`) |
35+
| Client load tool | **`curl` only** (no iperf3/wrk/ab/httping) |
36+
| Link | server-client same VPC, ping RTT < 1 ms |
37+
| ssh round-trip | ≈ 6-7 ms (measured earlier; the bottleneck for 100/1000 serial curls) |
38+
39+
**No new tools installed**: respect OQ-1 (the client user is independently maintained).
40+
41+
---
42+
43+
## 3. Methodology
44+
45+
### 3.1 Trial unit (reusable harness)
46+
47+
`tools/sbin/p5b_perf_matrix.sh` implements **single trial = N serial curls from f-stack-client**:
48+
49+
```
50+
T_total = $(ssh f-stack-client "time (for i in 1..N; do curl ... ; done)")
51+
QPS_eff = N / T_total # includes ssh round-trip; only meaningful as cross-config delta
52+
fail_rate = (N - http_200_count) / N
53+
```
54+
55+
Each config runs **3 trials**; we take **median(T_total)** + **max-min jitter** to dampen single-shot noise.
56+
57+
### 3.2 Matrix dimensions
58+
59+
| Config | lib/Makefile flags |
60+
|---|---|
61+
| **C0 BASELINE** | P0 only: FF_NETGRAPH+FF_IPFW (restore the minimal set with no P1/P2, but keep LOOPBACK=1 since the `ff_swi_net_excute` stub already landed) |
62+
| **C7 M7** | C0 + FF_USE_PAGE_ARRAY=1 |
63+
| **C8 M8** | C0 + FF_ZC_SEND=1 |
64+
| **C9 M9** | C0 + FF_USE_PAGE_ARRAY=1 + FF_ZC_SEND=1 |
65+
| **C10 M10** | C0 + FF_FLOW_IPIP=1 (used for tunnel ping-RTT timing + direct-path HTTP control) |
66+
67+
The P2 trio (M11/M12/M13) is smoke-only and **does not enter the perf matrix** (rte_flow has no effect on virtio).
68+
69+
### 3.3 Test cases (per config)
70+
71+
| TC | Description | Expected |
72+
|---|---|---|
73+
| **TC1** | 100 serial curls of `/` (short conn) | pass_rate = 100% / cross-config delta < 30% |
74+
| **TC2** | 1000 serial curls of `/` (short conn) | pass_rate ≥ 99% / config-to-config ratio quantifies M9-F1 |
75+
| **TC3 (C10 only)** | 100 pings of `10.10.10.1` inside the tunnel (IPIP inner IP) | 0% loss / median RTT ≤ 2 ms |
76+
77+
### 3.4 Acceptance downgrade (per master-plan OQ-2 + OQ-4)
78+
79+
- **observation-only**: every measurement is recorded but is not a PASS/FAIL gate.
80+
- **Sole hard fail**: primary process SIGSEGV / panic / pass_rate < 90%.
81+
- bounces ≤ 3 / milestone (same standard as Phase-2).
82+
83+
---
84+
85+
## 4. 5-Phase Pipeline
86+
87+
| Phase | Output |
88+
|---|---|
89+
| A. Spec | this document |
90+
| B. Research | folded into §2 + §3 (env constraints + reusing the Phase-2 toolchain) |
91+
| C. Code | `tools/sbin/p5b_perf_matrix.sh` |
92+
| D. Run | 5 configs × 3 trials × 2 TCs + 1 IPIP-tunnel TC = 33 runs; total ETA ≤ 25 min |
93+
| E. Gate | author `phase-5b-perf-baseline-report.md` (matrix CSV + delta table + 4 RCA items) |
94+
95+
---
96+
97+
## 5. Decision Points
98+
99+
After Phase-5b finishes, we must answer:
100+
101+
- **D1 M9-F1 disposition**: is 3.5× expected (PA mmap + ZC fast-path amplifies short-conn context-switch overhead) or a regression (need code fix)? Or is it only amplified by the small-traffic + ssh-round-trip-dominated regime?
102+
- **D2 M10-F2 closure rationale**: a stable ping-RTT sequence counts as the IPIP software path baseline PASS.
103+
- **D3 production default**: post M-Final, which flags should be on by default (P0 must-on + P1/P2 awaits user decision)?
104+
105+
---
106+
107+
## 6. Compliance
108+
109+
- `rm_tmp_file.sh` / `kill_process.sh` / `chmod_modify.sh` throughout
110+
- liveness check via `[ -d /proc/$PID ]`, never `kill -0`
111+
- Local commit only (no push, continuing the Phase-2 mandate)
112+
- Commit message in English
113+
114+
---
115+
116+
> Entering Phase C: implement `p5b_perf_matrix.sh`.

0 commit comments

Comments
 (0)