Follow-up to #114 (audit of fd-translation bugs across grates).
Motivation
Every Rust grate that maintains its own fdtables entries has been duplicating the same fd-translation boilerplate — and getting it wrong. The IPC grate work on fix-ipc-grate-fd-translation wired ~30 syscall handlers just to translate the user-supplied grate-vfd to the runtime-vfd before forwarding. The audit found that 5 of 6 grates that allocate their own fdtable entries are broken in the same way.
The boilerplate should live in grate-rs (or a small companion crate), with sane defaults and per-syscall override.
Proposed design
Builder API
GrateBuilder::with_fdtable() // opts into defaults
.fdkind(IPC_PIPE)
.on_close(ipc_pipe_close_handler) // wraps fdtables::register_close_handlers
.fdkind(IPC_SOCKET)
.on_close(ipc_socket_close_handler)
.register(SYS_PIPE, ipc_pipe_handler) // overrides default (last-write-wins)
.register(SYS_READ, ipc_read_handler)
.register(SYS_WRITE, ipc_write_handler)
// ... only the syscalls the grate actually wants to customize
.run(argv);
GrateBuilder::new() stays minimal so trivial grates (geteuid, strace) don't pay for handlers they don't need.
Defaults registered by with_fdtable()
fd-creating — forward, then register_kernel_fd (allocates fresh grate-vfd over the runtime-vfd):
SYS_OPEN, SYS_OPENAT, SYS_DUP, SYS_DUP2, SYS_DUP3
SYS_SOCKET, SYS_ACCEPT, SYS_ACCEPT4, SYS_PIPE, SYS_PIPE2, SYS_SOCKETPAIR
SYS_EPOLL_CREATE, SYS_EPOLL_CREATE1
fd-using (arg1) — translate arg1 to underfd, forward:
SYS_READ, SYS_WRITE, SYS_CLOSE, SYS_FCNTL, SYS_LSEEK, SYS_IOCTL
SYS_FSTAT, SYS_FSYNC, SYS_FDATASYNC, SYS_FTRUNCATE, SYS_FLOCK
SYS_FCHDIR, SYS_FCHMOD, SYS_GETDENTS, SYS_FSTATFS, SYS_SYNC_FILE_RANGE
SYS_PREAD, SYS_PWRITE, SYS_READV, SYS_WRITEV, SYS_PREADV, SYS_PWRITEV
SYS_BIND, SYS_LISTEN, SYS_CONNECT, SYS_SHUTDOWN
SYS_SENDTO, SYS_RECVFROM, SYS_SENDMSG, SYS_RECVMSG
SYS_SETSOCKOPT, SYS_GETSOCKOPT, SYS_GETSOCKNAME, SYS_GETPEERNAME
SYS_EPOLL_WAIT
Special-arg defaults:
SYS_MMAP — translate arg5, skip if MAP_ANON
SYS_EPOLL_CTL — translate arg1 (epfd) and arg3 (target fd)
SYS_OPENAT and other *at syscalls — translate dirfd unless it's AT_FDCWD (-100)
Embedded-in-cage-memory defaults:
SYS_POLL, SYS_PPOLL — translate each fd in the pollfd[] buffer; reverse on return
SYS_SELECT — translate fds in the fd_set bitmaps; reverse-map on return; pass runtime_nfds = max(underfd) + 1
Lifecycle defaults:
SYS_CLONE — copy_fdtable_for_cage + copy_handler_table_to_cage + re-overlay every parent entry whose fdkind ≠ FDKIND_KERNEL (so RawPOSIX populating the child first doesn't matter)
SYS_EXIT, SYS_EXIT_GROUP — remove_cage_from_fdtable so registered close handlers fire on cage exit
- preexec hook — install identity 0/1/2 entries for stdio (the IPC, devnull, fdtables-test grates all duplicate this today)
Custom-fdkind handling
Default read_handler is fine for FDKIND_KERNEL (translate + forward), but if a grate has IPC_PIPE entries, the default will translate pipe_id as if it were a runtime-vfd → EBADF.
v1 approach: the default checks entry.fdkind and only translates for FDKIND_KERNEL; for any other fdkind it returns -EBADF. The grate overrides with its own handler that dispatches on kind. (Strict subset of v2; users can layer their own dispatch.)
v2 sugar (follow-up): per-fdkind callbacks
.fdkind_read(IPC_PIPE, my_pipe_read)
.fdkind_write(IPC_PIPE, my_pipe_write)
Default handler dispatches on kind. Cleaner for grates with rich custom kinds.
Helpers to expose publicly
So override handlers don't reimplement these:
pub fn translate_to_underfd(cage: u64, fd: u64) -> Option<u64>;
pub fn translate_dirfd(cage: u64, fd: u64) -> Option<u64>;
pub fn register_kernel_fd(cage: u64, runtime_vfd: i32, cloexec: bool, perfdinfo: u64) -> i32;
pub fn forward_with_fd1(syscall: u64, cage: u64, args: [u64; 6], arg_cages: [u64; 6]) -> i32;
pub fn forward_with_dirfd1(...);
These are already in rust-grates/ipc-grate/src/handlers.rs on the fix-ipc-grate-fd-translation branch — pull them up into lib/grate-rs/src/.
Where it lives
Extend grate-rs rather than spinning a new crate. Most users already pull grate-rs, and "if you have an fdtable, opt into this" is one builder call.
Quirks the implementation must respect
These are the patterns the IPC grate had to special-case; the audit (#114) confirms other grates kept getting them wrong:
dup2 / dup3 — never forward grate-vfd as newfd to the runtime. Pattern: SYS_DUP(old_under) to get a fresh runtime-vfd, then get_specific_virtual_fd(cage, newfd, FDKIND_KERNEL, new_runtime_vfd, …) on the grate side.
*at syscalls — AT_FDCWD (-100) must NOT be translated.
mmap — fd is in arg5, skip translation when MAP_ANON is set (fd may be -1).
epoll_ctl — translate arg1 AND arg3.
poll/ppoll/select — fds are inside the cage buffer/bitmask, not in the syscall args. Translate, forward, reverse-translate on return so the user sees their original grate-vfds.
- AF_INET loopback take-over (or any "convert FDKIND_KERNEL into custom fdkind in place") — close the runtime vfd first via
translate_to_underfd, then get_specific_virtual_fd overwrites.
- Fork — the parent's grate-side custom-fdkind entries must be re-overlaid in the child after
copy_fdtable_for_cage, because RawPOSIX may have already populated the child cage from its own fork path.
Validation plan
Once landed, port the at-risk grates from #114 onto the new defaults — that converts the audit punch list into a regression test:
devnull-grate — should shrink dramatically.
write-filter-grate — should shrink dramatically.
mtls-grate — replace half-translated handlers with the defaults; keep TLS-specific overrides.
imfs-grate — keep IMFS dispatch on IMFS_FDKIND, drop the host-libc stdio bypass.
resource-grate — remove identity-pinning hack; fix underfd to be the runtime-vfd.
ipc-grate — replace the ~500 lines of translation boilerplate on fix-ipc-grate-fd-translation with overrides on the defaults.
Reference
Working example of all the translation logic (just embedded in one grate today): rust-grates/ipc-grate/src/handlers.rs on the fix-ipc-grate-fd-translation branch — see forward_with_fd1, translate_to_underfd, register_kernel_fd, translate_fd1_handler! macro.
Follow-up to #114 (audit of fd-translation bugs across grates).
Motivation
Every Rust grate that maintains its own
fdtablesentries has been duplicating the same fd-translation boilerplate — and getting it wrong. The IPC grate work onfix-ipc-grate-fd-translationwired ~30 syscall handlers just to translate the user-supplied grate-vfd to the runtime-vfd before forwarding. The audit found that 5 of 6 grates that allocate their own fdtable entries are broken in the same way.The boilerplate should live in
grate-rs(or a small companion crate), with sane defaults and per-syscall override.Proposed design
Builder API
GrateBuilder::new()stays minimal so trivial grates (geteuid, strace) don't pay for handlers they don't need.Defaults registered by
with_fdtable()fd-creating — forward, then
register_kernel_fd(allocates fresh grate-vfd over the runtime-vfd):SYS_OPEN,SYS_OPENAT,SYS_DUP,SYS_DUP2,SYS_DUP3SYS_SOCKET,SYS_ACCEPT,SYS_ACCEPT4,SYS_PIPE,SYS_PIPE2,SYS_SOCKETPAIRSYS_EPOLL_CREATE,SYS_EPOLL_CREATE1fd-using (arg1) — translate arg1 to underfd, forward:
SYS_READ,SYS_WRITE,SYS_CLOSE,SYS_FCNTL,SYS_LSEEK,SYS_IOCTLSYS_FSTAT,SYS_FSYNC,SYS_FDATASYNC,SYS_FTRUNCATE,SYS_FLOCKSYS_FCHDIR,SYS_FCHMOD,SYS_GETDENTS,SYS_FSTATFS,SYS_SYNC_FILE_RANGESYS_PREAD,SYS_PWRITE,SYS_READV,SYS_WRITEV,SYS_PREADV,SYS_PWRITEVSYS_BIND,SYS_LISTEN,SYS_CONNECT,SYS_SHUTDOWNSYS_SENDTO,SYS_RECVFROM,SYS_SENDMSG,SYS_RECVMSGSYS_SETSOCKOPT,SYS_GETSOCKOPT,SYS_GETSOCKNAME,SYS_GETPEERNAMESYS_EPOLL_WAITSpecial-arg defaults:
SYS_MMAP— translate arg5, skip ifMAP_ANONSYS_EPOLL_CTL— translate arg1 (epfd) and arg3 (target fd)SYS_OPENATand other*atsyscalls — translate dirfd unless it'sAT_FDCWD(-100)Embedded-in-cage-memory defaults:
SYS_POLL,SYS_PPOLL— translate each fd in thepollfd[]buffer; reverse on returnSYS_SELECT— translate fds in the fd_set bitmaps; reverse-map on return; passruntime_nfds = max(underfd) + 1Lifecycle defaults:
SYS_CLONE—copy_fdtable_for_cage+copy_handler_table_to_cage+ re-overlay every parent entry whose fdkind ≠FDKIND_KERNEL(so RawPOSIX populating the child first doesn't matter)SYS_EXIT,SYS_EXIT_GROUP—remove_cage_from_fdtableso registered close handlers fire on cage exitCustom-fdkind handling
Default
read_handleris fine forFDKIND_KERNEL(translate + forward), but if a grate has IPC_PIPE entries, the default will translatepipe_idas if it were a runtime-vfd → EBADF.v1 approach: the default checks
entry.fdkindand only translates forFDKIND_KERNEL; for any other fdkind it returns-EBADF. The grate overrides with its own handler that dispatches on kind. (Strict subset of v2; users can layer their own dispatch.)v2 sugar (follow-up): per-fdkind callbacks
Default handler dispatches on kind. Cleaner for grates with rich custom kinds.
Helpers to expose publicly
So override handlers don't reimplement these:
These are already in
rust-grates/ipc-grate/src/handlers.rson thefix-ipc-grate-fd-translationbranch — pull them up intolib/grate-rs/src/.Where it lives
Extend
grate-rsrather than spinning a new crate. Most users already pullgrate-rs, and "if you have an fdtable, opt into this" is one builder call.Quirks the implementation must respect
These are the patterns the IPC grate had to special-case; the audit (#114) confirms other grates kept getting them wrong:
dup2/dup3— never forward grate-vfd asnewfdto the runtime. Pattern:SYS_DUP(old_under)to get a fresh runtime-vfd, thenget_specific_virtual_fd(cage, newfd, FDKIND_KERNEL, new_runtime_vfd, …)on the grate side.*atsyscalls —AT_FDCWD(-100) must NOT be translated.mmap— fd is in arg5, skip translation whenMAP_ANONis set (fd may be -1).epoll_ctl— translate arg1 AND arg3.poll/ppoll/select— fds are inside the cage buffer/bitmask, not in the syscall args. Translate, forward, reverse-translate on return so the user sees their original grate-vfds.translate_to_underfd, thenget_specific_virtual_fdoverwrites.copy_fdtable_for_cage, because RawPOSIX may have already populated the child cage from its own fork path.Validation plan
Once landed, port the at-risk grates from #114 onto the new defaults — that converts the audit punch list into a regression test:
devnull-grate— should shrink dramatically.write-filter-grate— should shrink dramatically.mtls-grate— replace half-translated handlers with the defaults; keep TLS-specific overrides.imfs-grate— keep IMFS dispatch onIMFS_FDKIND, drop the host-libc stdio bypass.resource-grate— remove identity-pinning hack; fixunderfdto be the runtime-vfd.ipc-grate— replace the ~500 lines of translation boilerplate onfix-ipc-grate-fd-translationwith overrides on the defaults.Reference
Working example of all the translation logic (just embedded in one grate today):
rust-grates/ipc-grate/src/handlers.rson thefix-ipc-grate-fd-translationbranch — seeforward_with_fd1,translate_to_underfd,register_kernel_fd,translate_fd1_handler!macro.