|
| 1 | +# llvm-nanobind Issues & Gotchas |
| 2 | + |
| 3 | +Issues encountered while porting Pluto obfuscation passes to Python using llvm-nanobind. |
| 4 | + |
| 5 | +## API Surprises |
| 6 | + |
| 7 | +### `ctx.types.ptr` is a property, not a method |
| 8 | +```python |
| 9 | +# WRONG — TypeError: 'llvm.Type' object is not callable |
| 10 | +ptr_ty = ctx.types.ptr() |
| 11 | + |
| 12 | +# CORRECT |
| 13 | +ptr_ty = ctx.types.ptr |
| 14 | +``` |
| 15 | +Other type accessors like `ctx.types.i32`, `ctx.types.void` are also properties. |
| 16 | +Only `ctx.types.function(ret, args)` and `ctx.types.array(elem, count)` are methods. |
| 17 | + |
| 18 | +### `ctx.create_module()` returns `ModuleManager`, not `Module` |
| 19 | +Must use as a context manager to get the actual `Module`: |
| 20 | +```python |
| 21 | +# WRONG — returns ModuleManager, has no add_function/add_global/etc. |
| 22 | +mod = ctx.create_module("test") |
| 23 | + |
| 24 | +# CORRECT |
| 25 | +with ctx.create_module("test") as mod: |
| 26 | + mod.add_function(...) |
| 27 | +``` |
| 28 | + |
| 29 | +### `mod.target_triple`, not `mod.triple` |
| 30 | +```python |
| 31 | +# WRONG |
| 32 | +mod.triple = "x86_64-pc-windows-msvc" |
| 33 | + |
| 34 | +# CORRECT |
| 35 | +mod.target_triple = "x86_64-pc-windows-msvc" |
| 36 | +``` |
| 37 | + |
| 38 | +### `llvm.create_target_machine()` is a module-level function |
| 39 | +```python |
| 40 | +# WRONG — Target has no create_target_machine method |
| 41 | +tm = target.create_target_machine(triple, cpu, features) |
| 42 | + |
| 43 | +# CORRECT |
| 44 | +tm = llvm.create_target_machine(target, triple, cpu, features) |
| 45 | +``` |
| 46 | + |
| 47 | +### `inst.block` for parent block, not `inst.parent` |
| 48 | +```python |
| 49 | +bb = inst.block # correct |
| 50 | +``` |
| 51 | + |
| 52 | +### `gv.global_value_type` for content type |
| 53 | +`gv.type` returns the pointer type. Use `gv.global_value_type` for the actual stored type. |
| 54 | + |
| 55 | +## Call Instruction Limitations |
| 56 | + |
| 57 | +### No setter for call callee |
| 58 | +There is no `set_called_operand()` or `set_callee()` on call instructions. |
| 59 | +`called_value` is read-only. To change a call's target, rebuild the call: |
| 60 | + |
| 61 | +```python |
| 62 | +# Build new indirect call and replace the old one |
| 63 | +with bb.create_builder() as builder: |
| 64 | + builder.position_before(call_inst) |
| 65 | + loaded = builder.load(ptr_ty, gv, "fn.ptr") |
| 66 | + new_call = builder.call(func_ty, loaded, args, "result") |
| 67 | +call_inst.replace_all_uses_with(new_call) |
| 68 | +call_inst.erase_from_parent() |
| 69 | +``` |
| 70 | + |
| 71 | +### Two overloads for `builder.call()` |
| 72 | +```python |
| 73 | +# Direct call (infers function type from Function object) |
| 74 | +builder.call(func, args, name) |
| 75 | + |
| 76 | +# Indirect call (explicit function type, callee can be any value/pointer) |
| 77 | +builder.call(func_ty, loaded_ptr, args, name) |
| 78 | +``` |
| 79 | +Passing a loaded pointer to the 2-arg form causes `LLVMAssertionError`. |
| 80 | + |
| 81 | +## Segfaults & Crashes |
| 82 | + |
| 83 | +### ConstantDataArray element access crashes |
| 84 | +Accessing elements of array initializers via `init.get_operand(i)` on arrays created |
| 85 | +with `const_array` causes a segfault. No workaround found — we removed array encryption |
| 86 | +from the GlobalEncryption pass entirely. |
| 87 | + |
| 88 | +### `func.dll_storage_class` required for Windows DLL exports |
| 89 | +Functions emitted to object files for Windows DLLs must have: |
| 90 | +```python |
| 91 | +func.dll_storage_class = llvm.DLLExport |
| 92 | +``` |
| 93 | +Otherwise the symbol won't be exported and `ctypes.CDLL` can't find it. |
| 94 | + |
| 95 | +## Missing APIs |
| 96 | + |
| 97 | +### No `splitBasicBlock()` |
| 98 | +Cannot split a basic block at an arbitrary instruction. The Bogus Control Flow pass |
| 99 | +had to be redesigned to work without block splitting — it inserts opaque predicates |
| 100 | +before existing terminators instead of cloning block contents. |
| 101 | + |
| 102 | +## Initialization |
| 103 | + |
| 104 | +### Must initialize ASM printers for `emit_to_file()` |
| 105 | +```python |
| 106 | +llvm.initialize_all_targets() |
| 107 | +llvm.initialize_all_target_mcs() |
| 108 | +llvm.initialize_all_target_infos() |
| 109 | +llvm.initialize_all_asm_printers() # without this: "can't emit a file of this type" |
| 110 | +``` |
| 111 | + |
| 112 | +## Integer Constant Overflow |
| 113 | + |
| 114 | +### Constants must fit the type's bit width |
| 115 | +`vtype.constant(value)` raises `TypeError` if `value` exceeds the type's range. |
| 116 | +When generating random keys for XOR encryption, mask to the bit width: |
| 117 | +```python |
| 118 | +bit_width = vtype.int_width |
| 119 | +mask = (1 << bit_width) - 1 |
| 120 | +key = rng.get_uint64() & mask |
| 121 | +``` |
| 122 | + |
| 123 | +## PHI Node Maintenance |
| 124 | + |
| 125 | +### New predecessors require PHI incoming entries |
| 126 | +When adding a new block that branches to an existing block containing PHI nodes, |
| 127 | +you must add incoming values for the new predecessor: |
| 128 | +```python |
| 129 | +for inst in target_bb.instructions: |
| 130 | + if inst.opcode == llvm.Opcode.PHI: |
| 131 | + inst.add_incoming(inst.type.undef(), new_bb) |
| 132 | + else: |
| 133 | + break |
| 134 | +``` |
| 135 | +Without this, the module verifier fails with: |
| 136 | +`PHINode should have one entry for each predecessor of its parent basic block!` |
0 commit comments