Skip to content

Commit 9e86764

Browse files
author
lipeng hao
committed
verifier study
1 parent bbff032 commit 9e86764

File tree

4 files changed

+776
-0
lines changed

4 files changed

+776
-0
lines changed
Lines changed: 388 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,388 @@
1+
# eBPF Stack Limit Bypass: Per-CPU Array in Practice
2+
3+
## 1. Background
4+
5+
### 1.1 eBPF Stack Limit
6+
7+
eBPF programs run in kernel space. For safety, the kernel strictly limits eBPF program stack space to **512 bytes**.
8+
9+
```
10+
┌─────────────────────────────────────┐
11+
│ eBPF Program Stack Space │
12+
│ │
13+
│ ┌─────────────────────────┐ │
14+
│ │ Maximum 512 bytes │ │
15+
│ │ │ │
16+
│ │ Local variables, temp │ │
17+
│ │ │ │
18+
│ └─────────────────────────┘ │
19+
│ │
20+
│ Exceed limit → Verifier rejects │
21+
└─────────────────────────────────────┘
22+
```
23+
24+
### 1.2 Why This Limit?
25+
26+
| Reason | Description |
27+
|--------|-------------|
28+
| **Limited kernel stack** | Kernel stack is typically 8KB-16KB, must reserve for other kernel code |
29+
| **Prevent stack overflow** | Stack overflow could crash kernel or cause security vulnerabilities |
30+
| **Predictability** | Fixed limit allows verifier to statically analyze stack usage |
31+
32+
### 1.3 Real-World Problems
33+
34+
In practice, eBPF programs often need to handle large data structures:
35+
36+
```c
37+
// This will fail!
38+
SEC("tracepoint/...")
39+
int my_prog(void *ctx) {
40+
struct big_event e; // 544 bytes
41+
struct extra_buffer ex; // 768 bytes
42+
struct local_data ld; // 256 bytes
43+
// verifier rejects: total ~1568 bytes exceeds 512B limit
44+
}
45+
```
46+
47+
Common scenarios requiring large buffers:
48+
- Process monitoring: store process name, path, arguments
49+
- Network analysis: store packet contents
50+
- Security auditing: collect detailed context information
51+
- File monitoring: store file paths and contents
52+
53+
## 2. Solution: Per-CPU Array
54+
55+
### 2.1 Core Idea
56+
57+
Store large data structures in BPF Maps instead of on the stack:
58+
59+
```
60+
┌─────────────────────────────────────────────────────┐
61+
│ Traditional Way (Fails) │
62+
├─────────────────────────────────────────────────────┤
63+
│ Stack allocation: │
64+
│ struct big_event e; // 544B ─┐ │
65+
│ struct extra_buffer ex; // 768B ├→ 1568B > 512B │
66+
│ struct local_data ld; // 256B ─┘ │
67+
│ │
68+
│ Result: Verifier rejects │
69+
└─────────────────────────────────────────────────────┘
70+
71+
┌─────────────────────────────────────────────────────┐
72+
│ Per-CPU Array (Works) │
73+
├─────────────────────────────────────────────────────┤
74+
│ Map allocation: │
75+
__u32 key = 0; // 4B ─┐ │
76+
│ struct big_event *e = lookup(&map, &key); │→ ~12B │
77+
│ struct extra_buffer *ex = lookup(...); │ │
78+
│ struct local_data *ld = lookup(...); ─┘ │
79+
│ │
80+
│ Result: Stack usage < 512B, Verifier passes │
81+
└─────────────────────────────────────────────────────┘
82+
```
83+
84+
### 2.2 Why Per-CPU Array?
85+
86+
| Map Type | Concurrency Safe | Performance | Use Case |
87+
|----------|-----------------|-------------|----------|
88+
| `BPF_MAP_TYPE_ARRAY` | Needs locking | Medium | Shared data |
89+
| `BPF_MAP_TYPE_PERCPU_ARRAY` | Naturally safe | High | Temp buffers |
90+
| `BPF_MAP_TYPE_HASH` | Needs locking | Medium | Dynamic keys |
91+
92+
**Per-CPU Array advantages**:
93+
- Each CPU gets an independent buffer copy
94+
- No lock contention, no cacheline bouncing
95+
- O(1) lookup time
96+
- Perfect for temporary work buffers
97+
98+
## 3. Implementation
99+
100+
### 3.1 BPF Kernel Program
101+
102+
```c
103+
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
104+
#include "vmlinux.h"
105+
#include <bpf/bpf_helpers.h>
106+
107+
char LICENSE[] SEC("license") = "Dual BSD/GPL";
108+
109+
// Large struct: event data (exceeds 512B stack limit)
110+
struct big_event {
111+
__u32 pid;
112+
__u64 timestamp;
113+
char comm[16];
114+
char data[512]; // This field makes struct exceed 512B
115+
};
116+
117+
// Per-CPU Array definition
118+
struct {
119+
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
120+
__uint(max_entries, 1);
121+
__type(key, __u32);
122+
__type(value, struct big_event);
123+
} event_buffer SEC(".maps");
124+
125+
// Ring Buffer: pass events to userspace
126+
struct {
127+
__uint(type, BPF_MAP_TYPE_RINGBUF);
128+
__uint(max_entries, 256 * 1024);
129+
} events SEC(".maps");
130+
131+
SEC("tracepoint/sched/sched_process_exec")
132+
int trace_exec(struct trace_event_raw_sched_process_exec *ctx)
133+
{
134+
struct big_event *e;
135+
__u32 key = 0;
136+
137+
// Get buffer from Per-CPU Array
138+
e = bpf_map_lookup_elem(&event_buffer, &key);
139+
if (!e)
140+
return 0;
141+
142+
// Fill event data
143+
e->pid = bpf_get_current_pid_tgid() >> 32;
144+
e->timestamp = bpf_ktime_get_ns();
145+
bpf_get_current_comm(e->comm, sizeof(e->comm));
146+
147+
// Fill data field
148+
e->data[0] = e->pid & 0xFF;
149+
e->data[1] = (e->timestamp >> 8) & 0xFF;
150+
e->data[2] = (e->pid >> 16) & 0xFF;
151+
152+
// Send to Ring Buffer
153+
bpf_ringbuf_output(&events, e, sizeof(*e), 0);
154+
155+
return 0;
156+
}
157+
```
158+
159+
### 3.2 Key Code Analysis
160+
161+
#### Per-CPU Array Definition
162+
163+
```c
164+
struct {
165+
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
166+
__uint(max_entries, 1); // Only need 1 slot
167+
__type(key, __u32);
168+
__type(value, struct big_event);
169+
} event_buffer SEC(".maps");
170+
```
171+
172+
- `max_entries = 1`: As a temp buffer, only need one element
173+
- Each CPU automatically gets independent copy
174+
175+
#### Getting the Buffer
176+
177+
```c
178+
__u32 key = 0;
179+
struct big_event *e = bpf_map_lookup_elem(&event_buffer, &key);
180+
if (!e)
181+
return 0; // Must check for NULL
182+
```
183+
184+
- Always use `key = 0`
185+
- Returns pointer to current CPU's dedicated buffer
186+
- **Must** check for NULL, or verifier rejects
187+
188+
### 3.3 Userspace Program
189+
190+
```c
191+
#include <stdio.h>
192+
#include <signal.h>
193+
#include <bpf/libbpf.h>
194+
#include "stack_limit_bypass.skel.h"
195+
196+
struct big_event {
197+
__u32 pid;
198+
__u64 timestamp;
199+
char comm[16];
200+
char data[512];
201+
};
202+
203+
static volatile sig_atomic_t exiting = 0;
204+
205+
static void sig_handler(int sig) { exiting = 1; }
206+
207+
static int handle_event(void *ctx, void *data, size_t data_sz)
208+
{
209+
struct big_event *e = data;
210+
printf("[%llu] PID: %-6u | comm: %-16s\n",
211+
e->timestamp / 1000000, e->pid, e->comm);
212+
return 0;
213+
}
214+
215+
int main(int argc, char **argv)
216+
{
217+
struct stack_limit_bypass_bpf *skel;
218+
struct ring_buffer *rb = NULL;
219+
220+
signal(SIGINT, sig_handler);
221+
signal(SIGTERM, sig_handler);
222+
223+
// Load BPF program
224+
skel = stack_limit_bypass_bpf__open_and_load();
225+
if (!skel) {
226+
fprintf(stderr, "Failed to load BPF program\n");
227+
return 1;
228+
}
229+
230+
// Attach to tracepoint
231+
stack_limit_bypass_bpf__attach(skel);
232+
233+
// Create Ring Buffer
234+
rb = ring_buffer__new(bpf_map__fd(skel->maps.events),
235+
handle_event, NULL, NULL);
236+
237+
printf("Monitoring process exec events... (Ctrl+C to exit)\n");
238+
239+
while (!exiting) {
240+
ring_buffer__poll(rb, 100);
241+
}
242+
243+
ring_buffer__free(rb);
244+
stack_limit_bypass_bpf__destroy(skel);
245+
return 0;
246+
}
247+
```
248+
249+
## 4. Build and Run
250+
251+
### 4.1 Normal Build
252+
253+
```bash
254+
cd src/19-bypass-stack-limit
255+
make clean && make
256+
sudo ./stack_limit_bypass
257+
```
258+
259+
Expected output:
260+
261+
```
262+
========================================
263+
Per-CPU Array Demo - Bypass eBPF 512B Stack Limit
264+
========================================
265+
Struct sizes:
266+
- big_event: 544 bytes
267+
- Total stack: ~1568 bytes (if using local variables)
268+
- eBPF limit: 512 bytes
269+
========================================
270+
Monitoring process exec events... (Ctrl+C to exit)
271+
272+
[12345.678] PID: 1234 | comm: bash | data[0-3]: 0x12 0x34 0x56 0x78
273+
```
274+
275+
### 4.2 Trigger Stack Limit Error (Demo)
276+
277+
Set `BAD_EXAMPLE_STACK` to 1 in the code, or use compile flag:
278+
279+
```bash
280+
make clean && make EXTRA_CFLAGS="-DBAD_EXAMPLE_STACK=1"
281+
sudo ./stack_limit_bypass
282+
```
283+
284+
Expected output:
285+
286+
```
287+
libbpf: prog 'trace_exec': BPF program is too large
288+
libbpf: prog 'trace_exec': -- BEGIN PROG LOAD LOG --
289+
...
290+
combined stack size of 1568 exceeds limit 512
291+
...
292+
Failed to load BPF program
293+
```
294+
295+
## 5. Stack Usage Analysis
296+
297+
### 5.1 Struct Sizes
298+
299+
| Struct | Size | Description |
300+
|--------|------|-------------|
301+
| `big_event` | ~544 bytes | pid(4) + timestamp(8) + comm(16) + data(512) + padding |
302+
303+
### 5.2 Stack Usage Comparison
304+
305+
| Method | Stack Usage | Result |
306+
|--------|-------------|--------|
307+
| Stack allocation `struct big_event e;` | 544+ bytes | Verifier rejects |
308+
| Per-CPU Array pointer | ~12 bytes | Verifier passes |
309+
310+
## 6. Preventing Compiler Optimization
311+
312+
When demonstrating the error case, prevent compiler from optimizing away unused stack variables:
313+
314+
### 6.1 Memory Barrier
315+
316+
```c
317+
#define barrier() asm volatile("" ::: "memory")
318+
319+
struct big_event stack_event = {};
320+
barrier(); // Tell compiler: memory may be modified, don't optimize
321+
```
322+
323+
### 6.2 Explicitly Use Variables
324+
325+
```c
326+
// Ensure variables are actually used
327+
e->data[0] = pid & 0xFF;
328+
e->data[100] = (pid >> 8) & 0xFF;
329+
e->data[200] = (pid >> 16) & 0xFF;
330+
```
331+
332+
## 7. Best Practices
333+
334+
### 7.1 When to Use Per-CPU Array
335+
336+
| Scenario | Recommendation |
337+
|----------|----------------|
338+
| Temporary work buffers | Highly recommended |
339+
| Event data collection | Recommended |
340+
| Large string handling | Recommended |
341+
| Cross-CPU sharing needed | Not suitable, use regular Array |
342+
343+
### 7.2 Usage Tips
344+
345+
1. **Fixed key = 0**: Only need one slot for buffer
346+
2. **Must check NULL**: `bpf_map_lookup_elem` may return NULL
347+
3. **Clear before reuse**: Consider zeroing buffer to avoid stale data
348+
4. **Mind the size**: Single Per-CPU Array element also has size limits
349+
350+
### 7.3 Common Mistakes
351+
352+
```c
353+
// Wrong: forgot NULL check
354+
e = bpf_map_lookup_elem(&buffer, &key);
355+
e->pid = 123; // Verifier rejects!
356+
357+
// Correct: must check
358+
e = bpf_map_lookup_elem(&buffer, &key);
359+
if (!e) return 0;
360+
e->pid = 123; // OK
361+
```
362+
363+
## 8. Kernel Version Compatibility
364+
365+
| Kernel Version | Stack Limit Behavior |
366+
|----------------|---------------------|
367+
| < 5.x | Strict 512 byte limit |
368+
| 5.x+ | Supports BPF-to-BPF calls, 512B per function frame |
369+
| 6.x+ | Smarter verifier, but basic limit remains |
370+
371+
The Per-CPU Array solution works on all kernel versions that support eBPF.
372+
373+
## 9. Summary
374+
375+
This lesson covered the eBPF 512-byte stack limit and its solution:
376+
377+
1. **Problem**: eBPF program stack is limited to 512 bytes
378+
2. **Impact**: Cannot allocate large data structures on stack
379+
3. **Solution**: Use Per-CPU Array as temporary buffer
380+
4. **Benefits**: Concurrency safe, high performance, lock-free
381+
382+
With this technique, you can freely handle large data structures in eBPF programs without stack limit constraints.
383+
384+
## 10. References
385+
386+
- [BPF Design Q&A - Stack Space](https://docs.kernel.org/bpf/bpf_design_QA.html)
387+
- [Per-CPU Variables](https://lwn.net/Articles/258238/)
388+
- [libbpf Documentation](https://libbpf.readthedocs.io/)
File renamed without changes.

0 commit comments

Comments
 (0)