Skip to content

Commit b6ce588

Browse files
author
zc-send-impl-leader
committed
feat(zc-send): native kern_zc_sendit via sosend(top), drop magic hack
Replace the FSTACK_ZC_MAGIC + m_uiotombuf zero-copy-send hack with a symmetric kern_zc_sendit that calls sosend(uio=NULL, top=chain). M0 kernel: - add kern_zc_sendit (uipc_syscalls.c) + decl (syscallsubr.h) - remove FSTACK_ZC_MAGIC (mbuf.h), m_uiotombuf ZC fast-path branch (uipc_mbuf.c), dofilewrite uio_offset guard + mbuf.h include (sys_generic.c) M1 userspace: - ff_zc_send -> kern_zc_sendit (no uio/magic) - ff_zc_mbuf_get: M_PKTHDR; ff_zc_mbuf_write: maintain head pkthdr.len - drop ff_write/ff_writev uio_offset opt-out; ff_api.h doc ABI unchanged; example/main_zc.c untouched. Builds clean for default / FF_ZC_SEND / FF_ZC_RECV / both. Note: full m_uiotombuf vanilla revert not feasible (m_uiotombuf_nomap/mc_uiotomc are #ifndef FSTACK).
1 parent a8876ec commit b6ce588

9 files changed

Lines changed: 278 additions & 102 deletions

File tree

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# 41. 发包零拷贝(用户态 → 协议栈)原生化 —— 实施期 plan
2+
3+
> 阶段:实施期(spec 30-40 已就绪并经 gatekeeper PASS)
4+
> 模式:harness 工程 + spec 驱动 + agent team(leader + 子 agent)
5+
> 目标:把 `FSTACK_ZC_SEND``FSTACK_ZC_MAGIC + m_uiotombuf` 魔改切换为
6+
> FreeBSD 原生 `sosend(uio=NULL, top=链)` 路径(新增对称 `kern_zc_sendit`),
7+
> 消除 m_uiotombuf 内核 patch,降低误触与升级维护成本。
8+
> 强制规约:DP-10(rm/kill/chmod 一律走 `/data/workspace/{rm_tmp_file,kill_process,chmod_modify}.sh`);
9+
> 实测优先、禁止臆测;门禁失败打回上一步,单步打回 ≤3 次,超限停人工决策。
10+
11+
---
12+
13+
## §0 Phase-0 实测侦察结论(已完成,全部 file:line 实测,代码为准)
14+
15+
### 0.1 现有 5 处魔改触点(确认与 31-spec 一致)
16+
17+
| # | 文件 | 实测行 | 现状 | 处置 |
18+
|---|---|---|---|---|
19+
| 1 | `freebsd/sys/mbuf.h` | 1856-1869 | `#ifdef FSTACK_ZC_SEND` + `#define FSTACK_ZC_MAGIC ((off_t)0xF8AC2C00F8AC2C00LL)` | DELETE |
20+
| 2 | `freebsd/kern/uipc_mbuf.c` | 1955-2077 | `#ifndef FSTACK`(vanilla 15.0) / `#else`(13.0 简化版+ZC 分支) / `#endif` 包裹 m_uiotombuf | RESTORE→vanilla |
21+
| 3 | `freebsd/kern/uipc_mbuf.c` | 2028-2049,2070-2072 | m_uiotombuf 内 `#ifdef FSTACK_ZC_SEND` 快路径 | DELETE(随 #2|
22+
| 4 | `freebsd/kern/sys_generic.c` | 57 | `#include <sys/mbuf.h> /* M8 */` | 视依赖删除 |
23+
| 5 | `freebsd/kern/sys_generic.c` | 560-573 | dofilewrite `#ifdef FSTACK_ZC_SEND` uio_offset 守护 | DELETE→单行 |
24+
| 6 | `lib/ff_syscall_wrapper.c` | 1146-1151 | ff_write `auio.uio_offset = 0;` opt-out + 注释 | DELETE |
25+
| 7 | `lib/ff_syscall_wrapper.c` | 1175 | ff_writev `auio.uio_offset = 0;` | DELETE |
26+
| 8 | `lib/ff_syscall_wrapper.c` | 1186-1226 | 旧 ff_zc_send(构造 uio + MAGIC + kern_writev)| REWRITE |
27+
| 9 | `lib/ff_veth.c` | 306-323 | ff_zc_mbuf_get `m_getm2(...,MT_DATA,0)` 无 M_PKTHDR | REWRITE +M_PKTHDR |
28+
| 10 | `lib/ff_veth.c` | 325-356 | ff_zc_mbuf_write 注释掉 pkthdr.len 累加(L349-350) | REWRITE 维护 pkthdr.len |
29+
30+
### 0.2 原生 sosend(top) 入口(确认)
31+
- `kern_zc_recvit` 实测在 `uipc_syscalls.c:1064-1108`,紧随 `#endif /* FSTACK_ZC_RECV */`(L1109) —— `kern_zc_sendit` 对称插入点。
32+
- `syscallsubr.h:304-310``kern_zc_recvit` 声明,新声明插其后(L310 之后)。
33+
- `kern_sendit` 模式实测:`getsock(td, s, &cap_send_rights, &fp)`(L745/750) + `mac_socket_check_send`(L770) —— `kern_zc_sendit` 复用。
34+
35+
### 0.3 m_uiotombuf 回退可行性(关键编译风险,已预验证)
36+
- f-stack 树中 `m_uiotombuf_nomap`(L1865)、`mc_split`/`mc_first`(L1122/1127)、`struct mchain` 均存在;
37+
- vanilla `freebsd-src-releng-15.0/sys/kern/uipc_mbuf.c``m_uiotombuf` 在 L1950,与 f-stack `#ifndef FSTACK` 分支(L1956-1992)对齐;
38+
- 回退 = 删除 `#else` 13.0 分支 + 去 `#ifndef FSTACK/#else/#endif` 包裹,保留 vanilla 分支为无条件代码;
39+
- ⚠ M0 必须以**实际编译**验证 vanilla 分支可链接(mc_uiotomc 是否定义齐全),失败则打回。
40+
41+
### 0.4 构建环境(已确认就绪)
42+
- DPDK 23.11.5 pkgconfig @ `/usr/local/lib64/pkgconfig`;cmocka 1.1.7;
43+
- `uipc_mbuf.c` 编入 lib(lib/Makefile:361);`lib/libfstack.a` 上次 M2 构建产物存在。
44+
45+
### 0.5 ⚠ 关键交叉验证落差(spec 文档 vs 代码实测,以代码为准)
46+
1. **spec 37 §8.3 引用的 `tests/integration/test_ff_zc_recv_integration.c`(称 commit 8a06862cd 建立)不存在**:实测 8a06862cd 仅改 docs + example/main_zc.c + ff_api.symlist,无任何 zc 测试文件。当前 `tests/unit``tests/integration` 均无 zc 测试。→ ZC-send 测试须以**真实存在**的范式为准(`test_ff_dpdk_kni.c` 的 EAL+cmocka、`test_ff_dpdk_pcap.c` 的 mbuf 构造、`test_ff_dpdk_if_integration.c` 的真 EAL)。
47+
2. **`ff_veth.c` / `ff_syscall_wrapper.c` 深度依赖 FreeBSD 内核头**`sys/socketvar.h``net/if_var.h``netinet/in.h`…),现有 host-based unit harness(把 `lib/*.c` 用 host 头编入 `lib_objs`**无法 host 编译**这两个文件。→ spec 37 的 U1-U12 纯逻辑单测假设不直接成立;M2 须务实调整(见 §3 M2)。
48+
49+
---
50+
51+
## §1 Agent Team 拓扑(team: zc-send-impl)
52+
53+
| 角色 | 实现方式 | 职责 |
54+
|---|---|---|
55+
| **Leader**(本对话)| 主 agent | 里程碑调度、实际代码编辑/编译/测试执行、门禁裁决、bounce counter、commit |
56+
| **impl-review** | `Task(code-explorer)` 只读 | 每个里程碑代码改动后 review:与 spec/实测一致性、diff 最小化、规约合规 |
57+
| **gatekeeper** | `Task(code-explorer)` 只读 | 里程碑门禁终审:grep/编译产物/符号/diff-vs-vanilla 抽检,PASS/FAIL |
58+
59+
> 编辑动作用 `c-precision-surgery`(内核精准 patch)+ `c-unittest-expert`(单测)技能由 Leader 执行;
60+
> 子 agent 为只读分析(与 spec 阶段一致的高效模式)。
61+
62+
### 门禁回退规约(per memory 86071475)
63+
- 任一里程碑门禁 FAIL → 打回该里程碑修复;
64+
- **单里程碑打回 ≤3 次**;第 4 次仍 FAIL → **停止任务,转人工决策**
65+
- 每里程碑维护 `bounce[Mx]` 计数,落 49/4x review。
66+
67+
---
68+
69+
## §2 里程碑总览与门禁
70+
71+
| 里程碑 | 内容 | 门禁(gate)|
72+
|---|---|---|
73+
| **M0 内核** | K1 kern_zc_sendit 声明 + K2 实现;D1 删 MAGIC 宏;D2 回退 m_uiotombuf→vanilla;D3 删 sys_generic 守护 | 编译 4 组合 `-Werror` clean;`nm``kern_zc_sendit`;grep C2/K-G3/4/5=0;diff m_uiotombuf vs vanilla=0 |
74+
| **M1 用户态** | U1 重写 ff_zc_send→kern_zc_sendit;U2 ff_zc_mbuf_get +M_PKTHDR;U3 ff_zc_mbuf_write 维护 pkthdr.len;U4 删 ff_write/writev opt-out;U5 ff_api.h 注释 | 编译 clean;`nm``ff_zc_send``example/main_zc.c` diff=0;U-G1/3/4/5 grep;helloworld_zc 链接 |
75+
| **M2 单测** | 据 §0.5 务实方案:可 host 编译的纯逻辑用例 + 不可编译部分以编译期/集成期覆盖 | 测试编译+run PASS;valgrind 0 definite leak;无新增规约违规 |
76+
| **M3 功能/集成** | 复刻 21-m2-report 的 E2E 路径(helloworld_zc + HTTP,发包路径);环境允许则真跑 | http=200(发包零拷贝路径);或如实记录环境限制 + 退化为 libfstack 链接级验证 |
77+
| **M4 性能基线** | wrk T1/T2/T3 + 大包;对照 baseline | best-effort;环境不足则**如实标注 deferred,禁止臆造数据** |
78+
| **M5 收尾** | 实施 review 文档(42-impl-review.md)+ bounce 汇总 + 分批 commit(简短英文 msg)| gatekeeper 终审 PASS |
79+
80+
> M3/M4 受 DPDK 运行时(hugepage/NIC/vdev)约束;凡无法实跑者,**如实记录**,不臆造结果(规约 #4)。
81+
> 范围补充:若实施中发现**收包层(ZC-recv)问题**,一并修复(用户授权)。
82+
83+
---
84+
85+
## §3 里程碑详规
86+
87+
### M0 — 内核 patch(c-precision-surgery)
88+
- **K1** `freebsd/sys/syscallsubr.h` L310 后插 `#ifdef FSTACK_ZC_SEND ... int kern_zc_sendit(...); #endif`(对称 kern_zc_recvit)。
89+
- **K2** `freebsd/kern/uipc_syscalls.c` L1109 后插 `kern_zc_sendit` 实现(33 §2.2 版:入参校验 m_freem→getsock→MAC→sosend(so,NULL,NULL,top,NULL,flags,td)→成功 td_retval=len/失败 SIGPIPE→fdrop)。
90+
- **D1** `freebsd/sys/mbuf.h` 删 1856-1869 整段。
91+
- **D2** `freebsd/kern/uipc_mbuf.c``#else`(13.0+ZC)分支 + 去 `#ifndef FSTACK/#else/#endif` 包裹,保留 vanilla 分支。
92+
- **D3** `freebsd/kern/sys_generic.c` 560-573→单行 `auio->uio_offset = offset;`;grep 验证 L57 include 是否可删。
93+
- **门禁 M0**
94+
1. `cd lib && PKG_CONFIG_PATH=... make clean && FF_ZC_SEND=1 make`(再测默认 / FF_ZC_RECV=1 / 双开)四组合 `-Werror` clean;
95+
2. `nm libfstack.a | grep kern_zc_sendit` = 1 T 符号;
96+
3. `grep FSTACK_ZC_SEND freebsd/kern/uipc_mbuf.c freebsd/kern/sys_generic.c freebsd/sys/mbuf.h` = 0;
97+
4. `grep FSTACK_ZC_MAGIC freebsd/ lib/` 源码 0;
98+
5. diff m_uiotombuf 区 vs `freebsd-src-releng-15.0/sys/kern/uipc_mbuf.c` = 0。
99+
- FAIL → bounce[M0]++(≤3)。
100+
101+
### M1 — 用户态 API(c-precision-surgery)
102+
- **U1** `lib/ff_syscall_wrapper.c:1186-1226` 重写 ff_zc_send(34 §3.3 版:cast top + kern_zc_sendit(curthread,fd,top,0))。
103+
- **U2** `lib/ff_veth.c:306-323` ff_zc_mbuf_get:`m_getm2(...,MT_DATA,0)``M_PKTHDR`,加 `len<0` 校验。
104+
- **U3** `lib/ff_veth.c:325-356` ff_zc_mbuf_write:链头 `head->m_pkthdr.len += progress`(O(1)),加 data!=NULL/len<0/len==0 处理,删未用 ret。
105+
- **U4**`lib/ff_syscall_wrapper.c:1146-1151`(ff_write opt-out)+ `:1175`(ff_writev opt-out)。
106+
- **U5** `lib/ff_api.h:437-446` 更新文档块(签名不变)。
107+
- **门禁 M1**:编译 clean;`nm libfstack.a | grep ff_zc_send`=1;`grep -n "auio.uio_offset = 0" lib/ff_syscall_wrapper.c`=0;ff_zc_mbuf_get 含 M_PKTHDR、ff_zc_mbuf_write 含未注释 m_pkthdr.len;`git diff --stat example/main_zc.c`=空;`cd example && FF_PATH=... make` helloworld_zc 链接通过。
108+
- FAIL → bounce[M1]++(≤3)。
109+
110+
### M2 — 单元测试(务实,c-unittest-expert)
111+
- 受 §0.5(2) 限制,`ff_veth.c`/`ff_syscall_wrapper.c` 不可 host 编译。务实策略:
112+
- **方案 A(首选)**:新建 `tests/unit/test_ff_zc_send_logic.c`**提取被测函数的纯逻辑**(ff_zc_mbuf_write 的 pkthdr.len 累加/边界 + ff_zc_send 入参校验)以本地 mbuf shim + `--wrap` 方式覆盖 U5/U6/U7/U8/U9/U10/U11/U12 类断言;Makefile 仿 `test_ff_dpdk_pcap` 加 per-target `-DFSTACK_ZC_SEND`
113+
- **若方案 A 因 BSD 头耦合不可行 → bounce 一次**,改 **方案 B**:M_PKTHDR/pkthdr.len 正确性以 **M0/M1 编译期断言 + M3 集成功能** 覆盖,单测仅保留可独立编译的 ff_zc_send 入参校验桩;并在 42-impl-review 如实记录 harness 限制。
114+
- **门禁 M2**:测试 binary 编译 + run PASS;`make check`(valgrind) 0 definite leak。
115+
- FAIL → bounce[M2]++(≤3)。
116+
117+
### M3 — 功能/集成
118+
- 复刻 `21-m2-test-report.md` 的 E2E:`FF_PATH`/`FF_DPDK` 配好后用 `example/helloworld_zc`(FF_ZC_SEND=1)起服务 + HTTP GET 验证发包零拷贝路径 http=200;进程清理走 `/data/workspace/kill_process.sh`,临时文件走 `rm_tmp_file.sh`
119+
- 环境不足(无 hugepage/NIC/vdev)→ 退化为:libfstack.a + helloworld_zc 链接级 + symbol 验证,并**如实记录**
120+
- **门禁 M3**:http=200(理想)或链接级 PASS + 限制记录。FAIL → bounce[M3]++(≤3)。
121+
122+
### M4 — 性能(best-effort)
123+
- 环境允许:wrk T1/T2/T3 + 大包 vs baseline,Δ≤±3%;否则 **deferred 如实标注**,禁止臆造。
124+
125+
### M5 — 收尾
126+
-`docs/zc_stack_user_spec/zh_cn/42-impl-review.md`(每里程碑结果 + bounce counter + 门禁抽检 + 与 spec 偏差说明)。
127+
- 分批 `git commit`(简短英文 msg,per 规约 #5 + memory 73362122):M0 内核 / M1 用户态 / M2 单测 / (M3 如有产物) / M5 review。
128+
- gatekeeper 终审。
129+
130+
---
131+
132+
## §4 风险与回退
133+
134+
| 风险 | 触发 | 处置 |
135+
|---|---|---|
136+
| R-M0-1 | vanilla m_uiotombuf 回退后链接失败(mc_uiotomc 未定义)| bounce M0;改为"仅删 ZC 分支、保留 13.0 简化版"折中(达成消除魔改核心目标,AC4 diff 项降级标注)|
137+
| R-M0-2 | kern_zc_sendit 未被引用导致 `-Werror=unused` | M1 的 ff_zc_send 即引用方;M0 单独编译时该函数 gated 但被 symlist/调用方引用,必要时 M0+M1 合并编译验证 |
138+
| R-M2-1 | ff_veth.c 不可 host 编译 | 方案 A→B 退化(§3 M2)|
139+
| R-M3-1 | DPDK 运行时不可用 | 退化为链接级 + 如实记录,不臆造 http=200 |
140+
| 通用 | 单步 bounce ≥3 | **停止任务,转人工决策**(memory 86071475)|
141+
142+
---
143+
144+
## §5 bounce counter(执行期维护)
145+
146+
```
147+
M0=0 M1=0 M2=0 M3=0 M4=0 M5=0
148+
```
149+
150+
(执行中实时更新;任一 =3 触发人工。)

freebsd/kern/sys_generic.c

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@
5454
#include <sys/protosw.h>
5555
#include <sys/socketvar.h>
5656
#include <sys/uio.h>
57-
#include <sys/mbuf.h> /* M8: FSTACK_ZC_MAGIC for ZC fast-path */
5857
#include <sys/eventfd.h>
5958
#include <sys/kernel.h>
6059
#include <sys/ktr.h>
@@ -557,20 +556,7 @@ dofilewrite(struct thread *td, int fd, struct file *fp, struct uio *auio,
557556
AUDIT_ARG_FD(fd);
558557
auio->uio_rw = UIO_WRITE;
559558
auio->uio_td = td;
560-
#ifdef FSTACK_ZC_SEND
561-
/*
562-
* M8: preserve FSTACK_ZC_MAGIC sentinel set by ff_zc_send so it
563-
* survives down to m_uiotombuf where the ZC fast path tests for
564-
* it. Plain ff_write callers pass uio_offset = 0, which is
565-
* indistinguishable from default offset = -1 here, so we still
566-
* overwrite for them (the fast-path predicate also checks
567-
* UIO_SYSSPACE/UIO_WRITE which everyone has).
568-
*/
569-
if (auio->uio_offset != FSTACK_ZC_MAGIC)
570-
auio->uio_offset = offset;
571-
#else
572559
auio->uio_offset = offset;
573-
#endif
574560
#ifdef KTRACE
575561
if (KTRPOINT(td, KTR_GENIO))
576562
ktruio = cloneuio(auio);

freebsd/kern/uipc_mbuf.c

Lines changed: 3 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1992,10 +1992,9 @@ m_uiotombuf(struct uio *uio, int how, int len, int lspace, int flags)
19921992
}
19931993
#else
19941994
/*
1995-
* F-Stack m_uiotombuf compat: keep the simple 13.0-era semantics so
1996-
* the FSTACK_ZC_SEND zero-copy fast path remains valid. F-Stack does
1997-
* not implement unmapped mbuf pools (M_EXTPG) nor mchain, so the
1998-
* 14.0+ branches are never reached at runtime.
1995+
* F-Stack m_uiotombuf compat: keep the simple 13.0-era semantics.
1996+
* F-Stack does not implement unmapped mbuf pools (M_EXTPG) nor
1997+
* mchain, so the 14.0+ branches are never reached at runtime.
19991998
*/
20001999
struct mbuf *
20012000
m_uiotombuf(struct uio *uio, int how, int len, int align, int flags)
@@ -2025,28 +2024,6 @@ m_uiotombuf(struct uio *uio, int how, int len, int align, int flags)
20252024
* Give us the full allocation or nothing.
20262025
* If len is zero return the smallest empty mbuf.
20272026
*/
2028-
#ifdef FSTACK_ZC_SEND
2029-
/*
2030-
* M8: tighten the ZC fast-path predicate. The original 13.0
2031-
* F-Stack baseline relied on (UIO_SYSSPACE && UIO_WRITE) which
2032-
* matches *every* ff_write/ff_writev call (lib/ff_syscall_wrapper.c
2033-
* sets uio_segflg = UIO_SYSSPACE unconditionally), causing a
2034-
* GPF in m_demote when callers passed plain char buffers.
2035-
*
2036-
* The new contract: callers must use ff_zc_send (libfstack) which
2037-
* stamps uio->uio_offset = FSTACK_ZC_MAGIC. Plain ff_write paths
2038-
* now explicitly set uio_offset = 0 to opt out.
2039-
*/
2040-
if (uio->uio_segflg == UIO_SYSSPACE && uio->uio_rw == UIO_WRITE &&
2041-
uio->uio_offset == FSTACK_ZC_MAGIC) {
2042-
m = (struct mbuf *)uio->uio_iov->iov_base;
2043-
uio->uio_iov->iov_base = (char *)(uio->uio_iov->iov_base) + total;
2044-
uio->uio_iov->iov_len = 0;
2045-
uio->uio_resid = 0;
2046-
uio->uio_offset = total;
2047-
progress = total;
2048-
} else {
2049-
#endif
20502027
m = m_getm2(NULL, max(total + align, 1), how, MT_DATA, flags);
20512028
if (m == NULL)
20522029
return (NULL);
@@ -2067,9 +2044,6 @@ m_uiotombuf(struct uio *uio, int how, int len, int align, int flags)
20672044
if (flags & M_PKTHDR)
20682045
m->m_pkthdr.len += length;
20692046
}
2070-
#ifdef FSTACK_ZC_SEND
2071-
}
2072-
#endif
20732047
KASSERT(progress == total, ("%s: progress != total", __func__));
20742048

20752049
return (m);

freebsd/kern/uipc_syscalls.c

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,6 +1108,76 @@ kern_zc_recvit(struct thread *td, int s, struct uio *uio, struct mbuf **mp0)
11081108
}
11091109
#endif /* FSTACK_ZC_RECV */
11101110

1111+
#ifdef FSTACK_ZC_SEND
1112+
/*
1113+
* FSTACK_ZC_SEND: zero-copy send.
1114+
*
1115+
* A compact sibling of kern_sendit that calls sosend() directly with a
1116+
* non-NULL `top` mbuf chain and uio == NULL. Per the FreeBSD sosend(9)
1117+
* contract, when uio is NULL sosend takes resid from top->m_pkthdr.len
1118+
* and skips the m_uiotombuf copy. No address (sendto) / control (SCM)
1119+
* handling here: ZC send targets the bulk data fast path. Caller MUST
1120+
* pass a M_PKTHDR-headed chain with pkthdr.len == sum-of-segments
1121+
* (see lib/ff_veth.c ff_zc_mbuf_get/write).
1122+
*
1123+
* Ownership: on success sosend adopts `top`. On error before sosend
1124+
* (validation/getsock/MAC) kern_zc_sendit frees `top`; once sosend is
1125+
* entered the protosw layer frees top on its own error paths, so we do
1126+
* not double-free (see 35-mbuf-lifecycle-spec INV-3).
1127+
*/
1128+
int
1129+
kern_zc_sendit(struct thread *td, int s, struct mbuf *top, int flags)
1130+
{
1131+
struct file *fp;
1132+
struct socket *so;
1133+
ssize_t len;
1134+
int error;
1135+
1136+
if (top == NULL || (top->m_flags & M_PKTHDR) == 0) {
1137+
if (top != NULL)
1138+
m_freem(top);
1139+
return (EINVAL);
1140+
}
1141+
len = top->m_pkthdr.len;
1142+
if (len < 0) {
1143+
m_freem(top);
1144+
return (EINVAL);
1145+
}
1146+
1147+
AUDIT_ARG_FD(s);
1148+
error = getsock(td, s, &cap_send_rights, &fp);
1149+
if (error != 0) {
1150+
m_freem(top);
1151+
return (error);
1152+
}
1153+
so = fp->f_data;
1154+
1155+
#ifdef MAC
1156+
error = mac_socket_check_send(td->td_ucred, so);
1157+
if (error != 0) {
1158+
m_freem(top);
1159+
fdrop(fp, td);
1160+
return (error);
1161+
}
1162+
#endif
1163+
1164+
error = sosend(so, NULL, NULL, top, NULL, flags, td);
1165+
/* top adopted by sosend on success; freed by protosw on error. */
1166+
1167+
if (error == 0) {
1168+
td->td_retval[0] = len;
1169+
} else if (error == EPIPE && (so->so_options & SO_NOSIGPIPE) == 0 &&
1170+
(flags & MSG_NOSIGNAL) == 0) {
1171+
PROC_LOCK(td->td_proc);
1172+
tdsignal(td, SIGPIPE);
1173+
PROC_UNLOCK(td->td_proc);
1174+
}
1175+
1176+
fdrop(fp, td);
1177+
return (error);
1178+
}
1179+
#endif /* FSTACK_ZC_SEND */
1180+
11111181
static int
11121182
recvit(struct thread *td, int s, struct msghdr *mp, void *namelenp)
11131183
{

freebsd/sys/mbuf.h

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1853,20 +1853,5 @@ mbuf_has_tls_session(struct mbuf *m)
18531853
return (false);
18541854
}
18551855

1856-
#ifdef FSTACK_ZC_SEND
1857-
/*
1858-
* M8: sentinel placed in uio->uio_offset by ff_zc_send to opt-in to
1859-
* the FSTACK_ZC_SEND fast path in m_uiotombuf (uipc_mbuf.c). Plain
1860-
* ff_write/ff_writev callers leave uio_offset = 0 and therefore
1861-
* never trigger the fast path, avoiding mis-interpretation of plain
1862-
* char buffers as mbuf pointers (which previously crashed in
1863-
* m_demote with rbx="HTTP/1.1...", see phase2-M8 RCA).
1864-
*
1865-
* Value chosen as a 64-bit non-zero pattern unlikely to collide
1866-
* with any legitimate file offset, and recognizable in coredumps.
1867-
*/
1868-
#define FSTACK_ZC_MAGIC ((off_t)0xF8AC2C00F8AC2C00LL)
1869-
#endif
1870-
18711856
#endif /* _KERNEL */
18721857
#endif /* !_SYS_MBUF_H_ */

freebsd/sys/syscallsubr.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,14 @@ int kern_recvit(struct thread *td, int s, struct msghdr *mp,
308308
int kern_zc_recvit(struct thread *td, int s, struct uio *uio,
309309
struct mbuf **mp0);
310310
#endif
311+
#ifdef FSTACK_ZC_SEND
312+
/* FSTACK_ZC_SEND: zero-copy send variant — hands a pre-built mbuf chain
313+
* (top) directly to sosend(uio=NULL, top=chain), avoiding the m_uiotombuf
314+
* copy. Caller relinquishes top ownership on success; on error
315+
* kern_zc_sendit frees top via m_freem (see 35-mbuf-lifecycle-spec). */
316+
int kern_zc_sendit(struct thread *td, int s, struct mbuf *top,
317+
int flags);
318+
#endif
311319
int kern_renameat(struct thread *td, int oldfd, const char *old, int newfd,
312320
const char *new, enum uio_seg pathseg);
313321
int kern_frmdirat(struct thread *td, int dfd, const char *path, int fd,

lib/ff_api.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -435,9 +435,12 @@ void ff_zc_recv_free(struct ff_zc_mbuf *zm);
435435
#endif /* FSTACK_ZC_RECV */
436436

437437
/*
438-
* M8: zero-copy send entry. Caller must pass the mbuf chain
439-
* obtained from ff_zc_mbuf_get + ff_zc_mbuf_write as `mb`. The
440-
* returned bytes match `nbytes` on success, -1 on error (errno set).
438+
* Zero-copy send entry. Caller must pass the mbuf chain obtained from
439+
* ff_zc_mbuf_get + ff_zc_mbuf_write as `mb`. Returns the sent byte
440+
* count on success, -1 on error (errno set). Internally calls
441+
* kern_zc_sendit -> sosend(uio=NULL, top=chain), the FreeBSD-native
442+
* zero-copy send path. On success the kernel adopts the chain; on
443+
* error the kernel frees it. Reuse requires another ff_zc_mbuf_get.
441444
*
442445
* Plain ff_write() / ff_writev() / ff_send() / ff_sendto() must NOT
443446
* be used to send a zero-copy mbuf chain — they take char buffers

0 commit comments

Comments
 (0)