libbpf-tools: Fix ringbuf leaks that leak prior events to userspace by vdasu · Pull Request #5506 · iovisor/bcc

vdasu · 2026-05-01T15:55:33Z

Five libbpf-tools (mountsnoop, opensnoop, filelife, oomkill, and tcppktlat) call reserve_buf() to obtain an event buffer, populate a subset of the event's fields, and emit the full sizeof(*eventp) bytes via submit_buf(). Neither path of reserve_buf() (bpf_ringbuf_reserve() or the per-CPU array heap fallback) zero-initializes the returned memory. Any bytes the source-level path leaves unwritten retain whatever the slot held previously. This can leak previously emitted record content back to userspace on subsequent events.

The mountsnoop leak was tested on Linux 6.8. Since struct event contains a tagged union with one active arm per syscall operation, and only the arm matching argp->op is written, the inactive union bytes contained prior MOUNT records' src, dest, fs, and data paths.

A natural approach to zero the memory would be to use __builtin_memset(eventp, 0, sizeof(*eventp)). This does not seem to work reliably for larger BPF event structs: once the size exceeds LLVM's inline-store budget, LLVM may lower the memset into a libcall. BPF programs cannot link against libc, so the BPF backend has no memset symbol to resolve and compilation fails. The added zero_buf() helper avoids this by writing the byte loop explicitly in a __noinline subprogram, so it lands as a single BPF-to-BPF call shared across sites, and by using volatile writes so LLVM's loop-idiom recognition does not re-lower it back into __builtin_memset.

Copilot

Pull request overview

This PR mitigates information leaks in several libbpf-tools eBPF programs by ensuring reserved event buffers are explicitly zero-initialized before only a subset of fields are populated and the full event struct is emitted to userspace.

Changes:

Add a shared zero_buf() helper in compat.bpf.h to reliably zero event buffers without relying on __builtin_memset codegen.
Call zero_buf() immediately after reserve_buf() in mountsnoop, opensnoop, filelife, oomkill, and tcppktlat BPF programs.
Ensure that inactive/unused bytes in event structs (e.g., union arms) don’t retain prior record contents.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
libbpf-tools/compat.bpf.h	Adds `zero_buf()` helper to clear reserved event buffers safely for BPF.
libbpf-tools/mountsnoop.bpf.c	Zero-initializes `struct event` after reserving buffer to avoid union residue leaks.
libbpf-tools/opensnoop.bpf.c	Zero-initializes `struct event` after reserving buffer before populating fields.
libbpf-tools/filelife.bpf.c	Zero-initializes `struct event` after reserving buffer prior to filling output.
libbpf-tools/oomkill.bpf.c	Zero-initializes `struct data_t` after reserving buffer prior to filling output.
libbpf-tools/tcppktlat.bpf.c	Zero-initializes `struct event` after reserving buffer before populating fields.

vdasu · 2026-05-04T19:51:46Z

+ * recognition, which would otherwise re-lower this back to __builtin_memset
+ * (which the BPF backend cannot inline for buffers larger than ~256 bytes). */
+static __noinline void zero_buf(void *p, __u64 sz)
+{
+	for (__u64 i = 0; i < sz; i++)
+		*(volatile char *)((char *)p + i) = 0;


@ekyooo I can commit this optimization if you think it is necessary.

ekyooo · 2026-05-18T10:55:14Z

 	eventp = reserve_buf(sizeof(*eventp));
 	if (!eventp)
 		goto cleanup;
+	zero_buf(eventp, sizeof(*eventp));


For mountsnoop, userspace already accesses only the active union arm via switch(e->op), so stale bytes in inactive arms are never read. Is "previously emitted record leak" the right framing for this case?

Would it be simpler to check the return value of bpf_probe_read_user_str and skip submit_buf on failure, rather than zeroing the whole buffer? Any case where a partial event is still useful?

Stale bytes are never read in the current userspace program. However, the ringbuf does contain stale bytes from prior events that are accessible from userspace. Any other consumer of the same ringbuf (e.g., a custom loader) sees the full record submitted to userspace, not just the bytes in the active arm.

I confirmed the leak empirically by reading the ringbuf directly (i.e., not through mountsnoop.c) on Linux 6.8. When you first reserve ringbuf memory (bpf_ringbuf_reserve(size)), the kernel returns a zeroed-out chunk of memory. However, since the ringbuf is 256 KB and each event is 8768 bytes, the buffer holds about ~29 events before the write position wraps. From event 30 onwards, each write lands on cycled slots, and the inactive-arm bytes contain fully decodable previous records. Across 200 events under a mount -t tmpfs / umount loop, 11,396 bytes of non-zero residue leaked, including PIDs, command names, and mount paths like /run/user/<UID>/credentials/mnt-test.mount, which reveal which user triggered earlier mount operations.

Regarding checking the return value of bpf_probe_read_user_str, that is good practice, but it's orthogonal to this leak. When the active arm is smaller than mount (e.g., FSMOUNT writes 12 bytes, UMOUNT writes 4104 bytes), the trailing bytes of the union (8720 - active arm size) are never targeted by any helper. Even if every bpf_probe_read_user_str succeeds and we never skip submit_buf, those bytes remain unwritten.

Partial events might be useful, since they are better than silently dropping the event. However, I believe that is a separate question since it would not fix the leak.

I confirmed the leak empirically by reading the ringbuf directly (i.e., not through mountsnoop.c) on Linux 6.8. When you first reserve ringbuf memory (bpf_ringbuf_reserve(size)), the kernel returns a zeroed-out chunk of memory. However, since the ringbuf is 256 KB and each event is 8768 bytes, the buffer holds about ~29 events before the write position wraps. From event 30 onwards, each write lands on cycled slots, and the inactive-arm bytes contain fully decodable previous records. Across 200 events under a mount -t tmpfs / umount loop, 11,396 bytes of non-zero residue leaked, including PIDs, command names, and mount paths like /run/user//credentials/mnt-test.mount, which reveal which user triggered earlier mount operations.
Regarding checking the return value of bpf_probe_read_user_str, that is good practice, but it's orthogonal to this leak. When the active arm is smaller than mount (e.g., FSMOUNT writes 12 bytes, UMOUNT writes 4104 bytes), the trailing bytes of the union (8720 - active arm size) are never targeted by any helper.

Your observation is valid, and this seems to be an expected result of mountsnoop’s design.
Can you clarify the exact issue you are trying to solve?

Previously emitted records were already delivered to userspace. In that sense, is this really a leak in the security sense?

When using mountsnoop.c, inactive union fields are not read, so there is no practical issue. Do you have another real consumer besides mountsnoop that is affected?

ekyooo · 2026-05-18T10:57:59Z

+ * subprogram shared across call sites; volatile defeats LLVM's loop-idiom
+ * recognition, which would otherwise re-lower this back to __builtin_memset
+ * (which the BPF backend cannot inline for buffers larger than ~256 bytes). */
+static __noinline void zero_buf(void *p, __u64 sz)


Two concerns with zero_buf():

opensnoop is a high-frequency probe and struct event is ~8 KB — per-event zeroing may add noticeable overhead?

The variable-bounded loop requires Linux 5.3+. Kernels below that would reject the program at load time?

Sorry, I missed this.

Yes, I zeroing out will add some overhead. Copilot's suggestion to optimize the for loop should speed things up significantly. Would you like me to run some tests to see the ovehead zero_buf adds? Also, another way to speed things up would be to zero out only the hole in the struct. If we do this, we have will have optimized ringbuf zero functions for each eBPF program that leaks data.

This should not be a problem for libbpf-tools since it requires CO-RE which requires kernel version 5.4+. Also, at this point in time, it looks like most libbpf-tools programs require version 5.5+ (This issue (Allow libbpf-tools to run on old kernels #4231) seems to address the backwards compatibility problem, but at this point in time, most programs are still not backward compatible).

Fix ringbuf leaks

1144f94

vdasu requested review from brendangregg, chenhengqi, ekyooo and yonghong-song as code owners May 1, 2026 15:55

ekyooo requested a review from Copilot May 4, 2026 02:16

Copilot started reviewing on behalf of ekyooo May 4, 2026 02:17 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

ekyooo reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libbpf-tools: Fix ringbuf leaks that leak prior events to userspace#5506

libbpf-tools: Fix ringbuf leaks that leak prior events to userspace#5506
vdasu wants to merge 1 commit into
iovisor:masterfrom
vdasu:ringbuf_leaks

vdasu commented May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

vdasu May 4, 2026

Uh oh!

ekyooo May 18, 2026

Uh oh!

vdasu May 28, 2026

Uh oh!

ekyooo Jun 5, 2026

Uh oh!

ekyooo May 18, 2026

Uh oh!

vdasu Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vdasu commented May 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

vdasu May 4, 2026

Choose a reason for hiding this comment

Uh oh!

ekyooo May 18, 2026

Choose a reason for hiding this comment

Uh oh!

vdasu May 28, 2026

Choose a reason for hiding this comment

Uh oh!

ekyooo Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

ekyooo May 18, 2026

Choose a reason for hiding this comment

Uh oh!

vdasu Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants