This document should help end-users with troubleshooting their eBPF programs. With a primary focus on programs under kernels samples/bpf.
The eBPF maps uses locked memory. A typical Ubuntu system will set RLIMIT to 64k uname -l.
The bpf_create_map call will set the RLIMIT_MEMLOCK to RLIM_INFINITY and will return errno EPERM (Operation not
permitted) when the RLIMIT_MEMLOCK memory size limit is exceeded.
Not seeing the expected performance and perf top showing
__bpf_prog_run() as the top CPU consumer.
Did you remember to enable JIT'ing of the BPF code? Like:
$ sysctl net/core/bpf_jit_enable=1 net.core.bpf_jit_enable = 1
Notice there is both JIT'ing of eBPF and cBPF (Classical BPF) implemented in the kernel per arch. You can see current cBPF and eBPF JITs that are supported by the kernel via:
$ git grep BPF_JIT | grep select arch/arm/Kconfig: select HAVE_CBPF_JIT arch/arm64/Kconfig: select HAVE_EBPF_JIT arch/mips/Kconfig: select HAVE_CBPF_JIT if !CPU_MICROMIPS arch/powerpc/Kconfig: select HAVE_CBPF_JIT if !PPC64 arch/powerpc/Kconfig: select HAVE_EBPF_JIT if PPC64 arch/s390/Kconfig: select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES arch/sparc/Kconfig: select HAVE_CBPF_JIT arch/x86/Kconfig: select HAVE_EBPF_JIT if X86_64
The binary containing the eBPF program, which got generated by the
LLVM compiler, is an normal ELF binary. For samples/bpf/ this is the
file named xxx_kern.o. It is possible to inspect this normal ELF file,
with tools like readelf or llvm-objdump.
$ llvm-objdump -h xdp_ddos01_blacklist_kern.o xdp_ddos01_blacklist_kern.o: file format ELF64-unknown Sections: Idx Name Size Address Type 0 00000000 0000000000000000 1 .strtab 00000072 0000000000000000 2 .text 00000000 0000000000000000 TEXT DATA 3 xdp_prog 000001b8 0000000000000000 TEXT DATA 4 .relxdp_prog 00000020 0000000000000000 5 maps 00000028 0000000000000000 DATA 6 license 00000004 0000000000000000 DATA 7 .symtab 000000d8 0000000000000000
From the above output some trivial information can be extracted. This
is an XDP program, as the defined program section Idx 3 starts with
the letters "xdp". From the same line the size column also show the
program size in hex 0001b8 equal 440 bytes, or 55 bpf instructions, as
each insns is 8 bytes (see struct bpf_insn) (shell trick echo
$((0x1b8)) insns=$((0x1b8 / 8))). Do notice this size is not the
JIT'ed program size.
The loader code samples/bpf/bpf_load.c parse this elf file, extract needed program sections, uses the maps section and relocation section (here .relxdp_prog ) to remap the BPF_PSEUDO_MAP_FD instruction to point to the correct map (which gets created during parsing of the maps section, via standard bpf-syscall bpf_create_map).
.. TODO:: Document what LLVM version this "-S" option got added
In newer versions of LLVM, the tool llvm-objdump, supports showing
section names, asm code and original C code, if compiled with -g.
llvm-objdump -S prog_kern.o
.. TODO:: What does the option -no-show-raw-insn do?
Until this section is improved, look at mailing list response: https://www.spinics.net/lists/netdev/msg407045.html
For debugging/seeing the generated JIT code, is it possible to change this proc sysctl:
sysctl net.core.bpf_jit_enable=2
The output looks like:
flen=55 proglen=335 pass=4 image=ffffffffa0006820 from=xdp_ddos01_blac pid=13333 JIT code: 00000000: 55 48 89 e5 48 81 ec 28 02 00 00 48 89 9d d8 fd JIT code: 00000010: ff ff 4c 89 ad e0 fd ff ff 4c 89 b5 e8 fd ff ff JIT code: 00000020: 4c 89 bd f0 fd ff ff 31 c0 48 89 85 f8 fd ff ff JIT code: 00000030: bb 02 00 00 00 48 8b 77 08 48 8b 7f 00 48 89 fa JIT code: 00000040: 48 83 c2 0e 48 39 f2 0f 87 e1 00 00 00 48 0f b6 JIT code: 00000050: 4f 0c 48 0f b6 57 0d 48 c1 e2 08 48 09 ca 48 89 JIT code: 00000060: d1 48 81 e1 ff 00 00 00 41 b8 06 00 00 00 49 39 JIT code: 00000070: c8 0f 87 b7 00 00 00 48 81 fa 88 a8 00 00 74 0e JIT code: 00000080: b9 0e 00 00 00 48 81 fa 81 00 00 00 75 1a 48 89 JIT code: 00000090: fa 48 83 c2 12 48 39 f2 0f 87 90 00 00 00 b9 12 JIT code: 000000a0: 00 00 00 48 0f b7 57 10 bb 02 00 00 00 48 81 e2 JIT code: 000000b0: ff ff 00 00 48 83 fa 08 75 49 48 01 cf 31 db 48 JIT code: 000000c0: 89 fa 48 83 c2 14 48 39 f2 77 38 8b 7f 0c 89 7d JIT code: 000000d0: fc 48 89 ee 48 83 c6 fc 48 bf 00 9c 24 5f 07 88 JIT code: 000000e0: ff ff e8 29 cd 13 e1 bb 02 00 00 00 48 83 f8 00 JIT code: 000000f0: 74 11 48 8b 78 00 48 83 c7 01 48 89 78 00 bb 01 JIT code: 00000100: 00 00 00 89 5d f8 48 89 ee 48 83 c6 f8 48 bf c0 JIT code: 00000110: 76 12 13 04 88 ff ff e8 f4 cc 13 e1 48 83 f8 00 JIT code: 00000120: 74 0c 48 8b 78 00 48 83 c7 01 48 89 78 00 48 89 JIT code: 00000130: d8 48 8b 9d d8 fd ff ff 4c 8b ad e0 fd ff ff 4c JIT code: 00000140: 8b b5 e8 fd ff ff 4c 8b bd f0 fd ff ff c9 c3
The proglen is the len of opcode sequence generated and flen
is the number of bpf insns. You can use tools/net/bpf_jit_disasm.c to
disassemble that output. bpf_jit_disasm -o will dump the related
opcodes as well.