Skip to content

Commit 5ba71d0

Browse files
author
zc-doc-translator
committed
docs(zc_stack_user_spec): English translation of all zh_cn specs
Add English versions of all 35 Chinese spec docs into the parent docs/zc_stack_user_spec/ directory (same filenames). Prose, headings, tables and diagram labels translated; code blocks, file:line refs, identifiers, paths and .sh wrappers kept verbatim. Produced by a 6-way parallel translator agent team; verified: 35/35 files, code-fence parity, zero residual Chinese.
1 parent d6c78d4 commit 5ba71d0

35 files changed

Lines changed: 4286 additions & 0 deletions

docs/zc_stack_user_spec/00-plan.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# FSTACK_ZC_RECV (Zero-Copy read) Feasibility Study — Plan
2+
3+
> Status: PLAN-READY → (pending execution) BUILDING → FINISHED
4+
> Scope: **feasibility study + design + documentation only**; no implementation code this round
5+
> Baseline reference: the already-landed FSTACK_ZC_SEND (M8)
6+
> Documentation language: **Chinese only** (docs/zc_stack_user_spec/zh_cn/); the English version will be considered after manual audit
7+
> Method: harness engineering + spec-driven + agent team (leader + sub-agents)
8+
> Anti-drift iron rule: all conclusions are **grounded in actually-measured code**; docs/external web are inspiration only; where inconsistent, code prevails; never give a conclusion without execution
9+
10+
---
11+
12+
## §1 Background and Goals
13+
14+
f-stack already supports zero-copy **write** (FSTACK_ZC_SEND / M8): the APP uses `ff_zc_mbuf_get`+`ff_zc_mbuf_write` to directly fill the BSD mbuf chain, then uses `ff_zc_send` with the `FSTACK_ZC_MAGIC` sentinel to let the kernel `m_uiotombuf` skip the copy and directly attach the mbuf.
15+
16+
Goal of this study: assess the feasibility of symmetrically supporting zero-copy **read** (tentatively named FSTACK_ZC_RECV) — letting the APP directly obtain the BSD mbuf chain in the socket receive buffer (whose data already zero-copy points to the DPDK mbuf), eliminating the sole mbuf→user-buffer copy in `soreceive → uiomove`.
17+
18+
**Deliverables**: feasibility conclusion (feasible/partially feasible/infeasible + rationale), kernel- and user-space design, API design, lifecycle/ownership scheme, risk and effort assessment, and follow-up implementation milestone recommendations.
19+
20+
---
21+
22+
## §2 Phase 0 Hands-On Reconnaissance Conclusions (completed, code prevails)
23+
24+
> All of the following are grep/read measurements of this phase, not speculation. Line numbers are per the current working tree.
25+
26+
### 2.1 ZC-SEND Baseline Reference (already-landed M8)
27+
| Item | Location | Description |
28+
|---|---|---|
29+
| Data structure | `ff_api.h:347` | `struct ff_zc_mbuf{ void *bsd_mbuf; void *bsd_mbuf_off; int off; int len; }` |
30+
| Allocate mbuf | `ff_veth.c:306` | `ff_zc_mbuf_get(m,len)` |
31+
| Write mbuf | `ff_veth.c:326` | `ff_zc_mbuf_write(zm,data,len)` (bcopy into M_TRAILINGSPACE) |
32+
| Send entry | `ff_syscall_wrapper.c:1199` | `ff_zc_send(fd,mb,nbytes)` sets `uio_offset=FSTACK_ZC_MAGIC``kern_writev` |
33+
| Kernel hook | `freebsd/kern/uipc_mbuf.c:2028-2046` | `#ifdef FSTACK_ZC_SEND` detects magic→attaches mbuf, skips copy |
34+
| Sentinel | `freebsd/sys/mbuf.h:1868` | `FSTACK_ZC_MAGIC ((off_t)0xF8AC2C00F8AC2C00LL)` |
35+
| Compile switch | `lib/Makefile:212` | `CFLAGS+= -DFSTACK_ZC_SEND` |
36+
| Mis-trigger guard | `ff_syscall_wrapper.c:1151/1175` | ordinary `ff_write/ff_writev` explicitly `uio_offset=0` opt-out |
37+
| Example | `example/main_zc.c` ||
38+
39+
### 2.2 ZC-READ Current State (target hook point)
40+
- `ff_veth.c:359 ff_zc_mbuf_read(...)` = **empty stub** (`// DOTO: Support read zero copy; return 0;`)
41+
- `ff_api.h:400` declared, comment "not implemented now"
42+
- **Missing**: `ff_zc_recv` entry, `FSTACK_ZC_RECV` macro, Makefile switch, release/consume API
43+
44+
### 2.3 RX-Side Zero-Copy Foundation (already exists = favorable)
45+
- `ff_mbuf_gethdr` (ff_veth.c) uses `m_extadd(m,data,len,ff_mbuf_ext_free,pkt,...,EXT_DISPOSABLE)` to zero-copy attach DPDK mbuf data as an external mbuf; `ff_mbuf_ext_free` is responsible for returning the DPDK mbuf.
46+
- Conclusion: **NIC→DPDK mbuf→BSD mbuf(ext) is already zero-copy; the sole copy point is soreceive→uiomove**.
47+
48+
### 2.4 READ Path and Symmetric Hook Point
49+
- User entries: `ff_read`(1077)/`ff_readv`(1105)/`ff_recv`(1313)/`ff_recvfrom`(1319)/`ff_recvmsg`(1359) → `kern_readv`/`kern_recvit`
50+
- Kernel copy point: inside `freebsd/kern/uipc_socket.c soreceive_generic_locked`(L2744), `uiomove(mtod(m,...),len,uio)`(L3031)
51+
- **Key advantage**: `soreceive_generic(so,psa,uio,mp0,controlp,flagsp)` natively carries the `mp0` out-parameter — when `mp0!=NULL`, FreeBSD hands out the sockbuf mbuf directly without uiomove, a natural in-kernel mechanism candidate for ZC-read.
52+
53+
---
54+
55+
## §3 Key Design Questions to Investigate (answered in Phase 2-3, no speculation allowed)
56+
57+
1. **Kernel mechanism selection**: reuse `soreceive`'s `mp0` out-parameter (FreeBSD native, small change) vs. mimicking send's `FSTACK_ZC_MAGIC` sentinel hook (symmetric with send). Compare pros/cons, change surface, risk.
58+
2. **mbuf ownership/lifecycle**: while the APP holds the ext-mbuf, when is the original DPDK mbuf returned? How do the `m_ext` refcount and `ff_mbuf_ext_free` cooperate to avoid use-after-free or premature recycling.
59+
3. **API form**: the current `ff_zc_mbuf_read(struct ff_zc_mbuf*, const char *data, int len)`'s `const char*data` contradicts "read out" semantics and needs redesign; whether to add `ff_zc_recv(fd, struct ff_zc_mbuf*, len)`.
60+
4. **Release semantics**: symmetric to send's get/write/send, the read side needs three stages receive→consume→release; how the APP explicitly returns the mbuf chain after reading.
61+
5. **sockbuf accounting**: after soreceive takes the mbuf, `sbfree`/`sb_cc` accounting and window update.
62+
6. **Boundaries**: MSG_PEEK / MSG_WAITALL / non-blocking / partial packet / spanning multiple mbufs / control messages (SCM_RIGHTS etc.) / OOB.
63+
7. **Relationship and conflicts with LD_PRELOAD ring mode and FF_USE_PAGE_ARRAY**.
64+
8. **Performance expectations and applicable scenarios** (large-packet receiving, proxy forwarding, splice, etc.) + non-applicable scenarios.
65+
66+
---
67+
68+
## §4 Agent Team Topology (leader + sub-agents, harness+spec)
69+
70+
> Execution uses a hybrid mode: parallel spawn of multiple code-explorer sub-agents during the probe phase, serial during the design/review phase.
71+
72+
| Role | Responsibility | Tool/Mode |
73+
|---|---|---|
74+
| **leader (main agent)** | Coordination, task dispatch, result aggregation, gate adjudication, doc landing, rollback decision | main agent |
75+
| **probe-zcsend** | Map the full ZC-SEND chain (API→syscall→m_uiotombuf→sbappend) as the "baseline reference" | async code-explorer (read-only) |
76+
| **probe-recvpath** | Map the full READ chain (ff_recv*→kern_recvit→soreceive_generic_locked→uiomove/mp0 path) + sockbuf accounting | async code-explorer (read-only) |
77+
| **probe-extmbuf** | Map the RX-side ext-mbuf lifecycle (ff_mbuf_gethdr/m_extadd/ff_mbuf_ext_free/refcount/DPDK mbuf return) | async code-explorer (read-only) |
78+
| **research-ext** | External research (GitHub F-Stack issue/PR/wiki, DPDK docs, FreeBSD soreceive/mp0 material, related blogs/official accounts) → inspiration but requires code verification | web_search/web_fetch |
79+
| **design-arch** | Synthesize and produce the design (kernel mechanism selection + API + lifecycle + boundaries) | serial (scheduled by leader) |
80+
| **gatekeeper** | Doc review gate: for each doc, verify across 4 dimensions "whether measured code references really exist, whether line numbers/symbols match, whether conclusions have rationale, whether speculation exists"; failing any bounces back to design/probe redo | serial |
81+
82+
**Failure-rollback SOP**: gatekeeper fails → bounce back to the corresponding probe/design sub-agent to re-gather evidence/rewrite; if a probe finds contradiction with a doc → code prevails and the correction is annotated.
83+
84+
---
85+
86+
## §5 Phase Schedule
87+
88+
| Phase | Name | Output | Sub-agent |
89+
|---|---|---|---|
90+
| **P0** | Hands-on reconnaissance (✅ completed, see §2) | this plan §2 | leader |
91+
| **P1** | Doc skeleton landing | skeletons of `00-plan.md`(this doc) / `01-zcsend-baseline.md` / `02-recv-path-analysis.md` / `03-extmbuf-lifecycle.md` / `04-external-research.md` / `05-design-and-feasibility.md` / `09-review.md` | leader |
92+
| **P2** | Parallel architecture probing | fill in 01/02/03 (all measured code references + line-number cross-verification) | probe-zcsend / probe-recvpath / probe-extmbuf |
93+
| **P3** | External research | fill in 04 (GitHub/DPDK/FreeBSD/blogs, annotated for consistency with code) | research-ext |
94+
| **P4** | Design | fill in 05 (kernel mechanism selection comparison + API design + lifecycle scheme + boundary matrix + risk + effort + milestone recommendations + feasibility conclusion) | design-arch |
95+
| **P5** | Doc review gate | 09-review (4-dimension verification results); on failure, roll back P2/P4 | gatekeeper |
96+
| **P6** | Wrap-up | aggregate conclusions; local commit (concise English message) | leader |
97+
98+
---
99+
100+
## §6 Document List (docs/zc_stack_user_spec/zh_cn/, Chinese only)
101+
102+
- `00-plan.md` —— this plan
103+
- `01-zcsend-baseline.md` —— full ZC-SEND baseline reference chain
104+
- `02-recv-path-analysis.md` —— READ path + soreceive/mp0 + sockbuf accounting analysis
105+
- `03-extmbuf-lifecycle.md` —— RX ext-mbuf lifecycle and ownership
106+
- `04-external-research.md` —— external material research and cross-verification
107+
- `05-design-and-feasibility.md` —— design + feasibility conclusion + risk/effort/milestones
108+
- `09-review.md` —— doc review gate record
109+
110+
---
111+
112+
## §7 Acceptance Gate (this study's acceptance gate)
113+
114+
| Gate | Content |
115+
|---|---|
116+
| G-ZCR-1 | All code references (file:line:symbol) verified to really exist by measurement; gatekeeper spot-check hits 100% |
117+
| G-ZCR-2 | Kernel mechanism selection provides ≥2 alternative comparison (mp0 reuse vs ZC_MAGIC symmetric hook) + explicit recommendation |
118+
| G-ZCR-3 | mbuf lifecycle/ownership scheme is closed-loop (argument for no use-after-free / no leak) |
119+
| G-ZCR-4 | Boundary matrix covers all scenarios in §3.6 |
120+
| G-ZCR-5 | Explicit feasibility conclusion (feasible/partially feasible/infeasible) + effort and milestone recommendations |
121+
| G-ZCR-6 | Where external material and code are inconsistent, explicitly let code prevail and annotate |
122+
| G-ZCR-7 | Chinese-only docs; no speculative wording (every conclusion traceable to code or annotated "pending implementation verification") |
123+
124+
---
125+
126+
## §8 Scope and Constraints
127+
128+
**In-scope**: feasibility study, design/API/lifecycle design, risk and effort assessment, documentation.
129+
**Out-of-scope (not done this round)**: implementation code, kernel patch, unit/integration tests, performance stress tests, English docs.
130+
**Workspace conventions**: deletion→`rm_tmp_file.sh`; process termination→`kill_process.sh`; permission change→`chmod_modify.sh`; non-direct-chmod commands such as `make install` may be executed. Commit message in English.
131+
132+
---
133+
134+
## §9 Risks
135+
136+
| Risk | Mitigation |
137+
|---|---|
138+
| The soreceive mp0 path may already have been modified in f-stack's reworked kernel | P2 probe-recvpath must measure the f-stack version of uipc_socket.c, not assume native FreeBSD behavior |
139+
| ext-mbuf refcount and DPDK mbuf return timing are complex | dedicated P2 probe-extmbuf + P4 lifecycle closed-loop argument |
140+
| External material outdated / inconsistent with the f-stack branch | always let measured code prevail; external is inspiration only |
141+
| Sub-agent output contains speculation | gatekeeper 4-dimension gate; bounce back if not traceable to source |
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# 01 · ZC-SEND Baseline Reference Full Chain (FSTACK_ZC_SEND / M8)
2+
3+
> Probe: probe-zcsend (read-only measurement). All file:line:symbol references verified. Where code and comments conflict, code prevails.
4+
5+
## 0. One-Sentence Overview
6+
APP `ff_zc_mbuf_get` allocates the BSD mbuf chain → `ff_zc_mbuf_write` uses `bcopy` to fill the mbuf trailing space → `ff_zc_send` treats the **mbuf chain head pointer** as `iov_base` and stamps the sentinel `FSTACK_ZC_MAGIC` into `uio_offset``kern_writev → dofilewrite` (offset=-1 preserves magic) → the kernel `m_uiotombuf` detects the magic → **directly treats iov_base as the mbuf chain and attaches it, skipping m_getm2 allocation and the uiomove copy**.
7+
**Direction: user-space constructs the mbuf, the kernel takes over (user → kernel).**
8+
9+
## 1. API Layer
10+
### 1.1 `struct ff_zc_mbuf` (ff_api.h:347-352)
11+
| Field | Meaning |
12+
|---|---|
13+
| `bsd_mbuf` | points to the **head** of the mbuf chain (the handle passed to the kernel) |
14+
| `bsd_mbuf_off` | the mbuf at the current write position (locates continuation across multiple writes) |
15+
| `off` | the accumulated written offset; the APP should not modify it (ff_api.h:350 comment) |
16+
| `len` | total capacity of the chain allocated by get |
17+
18+
### 1.2 `ff_zc_mbuf_get` (ff_veth.c:306-323)
19+
- `m_getm2(NULL, max(len,1), M_WAITOK, MT_DATA, 0)` (L313) allocates in one shot an mbuf chain able to hold len bytes;
20+
- the 5th argument flags=0 → **no M_PKTHDR**;
21+
- `bsd_mbuf=bsd_mbuf_off=mb; off=0; len=len` (L318-320); returns -1 on failure.
22+
23+
### 1.3 `ff_zc_mbuf_write` (ff_veth.c:325-356)
24+
- continues writing from `bsd_mbuf_off`, traversing along m_next, writing `min(M_TRAILINGSPACE(mb), remaining)` per mbuf (L341);
25+
- **still `bcopy` (L342)** —— what ZC saves is the "user→kernel" copy; filling the mbuf itself still copies once;
26+
- only updates each mbuf's `m_len`, **does not update m_pkthdr.len** (L349-350 are commented out); the total length is reflected via ff_zc_send's nbytes through m_uiotombuf's total.
27+
28+
### 1.4 `ff_zc_mbuf_read` (ff_veth.c:358-363) = empty stub
29+
`// DOTO: Support read zero copy; return 0;`. The signature `const char *data` does not match "read out" semantics; it is an undesigned placeholder.
30+
31+
## 2. User-Space syscall Layer
32+
- `ff_zc_send` (ff_syscall_wrapper.c:1199-1225, `#ifdef FSTACK_ZC_SEND`): `aiov.iov_base=mb` (L1210), `auio.uio_offset=FSTACK_ZC_MAGIC` (L1216) → `kern_writev` (L1217).
33+
- Mis-trigger guard: ordinary `ff_write` (L1151) / `ff_writev` (L1175) explicitly `uio_offset=0` opt-out (an uninitialized uio_offset on the stack auio would falsely trigger the magic, causing m_demote GPF; see L1146-1150 comment).
34+
35+
## 3. Kernel Hook Layer (freebsd/kern/uipc_mbuf.c)
36+
- The `#ifdef FSTACK_ZC_SEND` branch of `m_uiotombuf` (L2028-2046): when `uio->uio_offset == FSTACK_ZC_MAGIC` (L2041) → directly attaches iov_base as the mbuf chain, skips the regular copy loop, and rewrites `uio->uio_offset` to `total` (L2046).
37+
- Sentinel: `FSTACK_ZC_MAGIC ((off_t)0xF8AC2C00F8AC2C00LL)` (freebsd/sys/mbuf.h:1868).
38+
- kern_writev → dofilewrite passes `offset=(off_t)-1`, not overwriting auio.uio_offset, so the magic is preserved until m_uiotombuf.
39+
40+
## 4. Compile Switch and Example
41+
- `lib/Makefile:212 CFLAGS+= -DFSTACK_ZC_SEND`
42+
- `example/main_zc.c`: call sequence `ff_zc_mbuf_get → ff_zc_mbuf_write (can be multiple times) → ff_zc_send`.
43+
44+
## 5. Mirrorable / Non-Mirrorable Points for ZC-READ
45+
| Dimension | SEND (existing) | READ (target) | Mirrorable? |
46+
|---|---|---|---|
47+
| Direction | user constructs mbuf→kernel | kernel mbuf→handed to user | ✗ opposite direction; the mechanism cannot be simply symmetric |
48+
| Sentinel hook | uio_offset=MAGIC triggers m_uiotombuf | the read-side kernel already has the mp0 out-parameter (see 02) | ⚠ no need to mimic magic; reusing mp0 is better |
49+
| struct ff_zc_mbuf | filled by get/write | can be reused to carry "the chain handed out by the kernel" | ✓ reusable/extensible |
50+
| Data copy | bcopy fill + kernel zero-copy attach | target: eliminate soreceive→uiomove | ✓ this is the value point of ZC-read |
51+
| Release contract | kernel frees on its own after taking over | **the DPDK mbuf cannot be reclaimed while the APP holds it; needs release(m_freem)** | ✗ new; a read-only difficulty (see 03) |
52+
53+
**Key judgment**: ZC-send is "user gives the kernel an mbuf", ZC-read is "kernel gives the user an mbuf"; the two are opposite in direction, so **the FSTACK_ZC_MAGIC mechanism cannot be simply replicated symmetrically**; the read side should preferentially reuse the `soreceive` mp0 out-parameter that already exists in the kernel (see 02, 05 for details).

0 commit comments

Comments
 (0)