runtime: hold ctypes c_char_p byte buffers in Python locals to prevent GC

ehsan6sha · claude · ehsan6sha · commit dec8471504d9 · 2026-05-27T18:01:34.000-04:00
Lab-observed bug 2026-05-27: model emitted Chinese tokens + random
text fragments after the role-based v1.2.3 refactor. Root cause was
a ctypes lifetime issue — ctypes c_char_p does NOT own the bytes it
points at, so temporaries like role.encode('utf-8') get garbage-
collected before the C side (rkllm_run) finishes reading them.

The pre-refactor code only had ONE c_char_p assignment per call
(input_data.prompt_input), which Python happened to keep alive in
the inp struct via reference. The refactor added a second
(inp.role = role.encode('utf-8')) — both temporaries became
collectable and the C side saw uninitialized memory.

Fix: hold both encoded bytes objects in NAMED Python locals so they
stay reachable for the full duration of the rkllm_run call. Mirrors
the pattern in Rockchip's reference Python example which wraps with
ctypes.c_char_p(...) for the same reason.

55/55 tests pass.

Co-Authored-By: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/src/runtime/rkllm_runtime.py b/src/runtime/rkllm_runtime.py
@@ -470,12 +470,21 @@ def generate(
                 except queue.Empty:
                     break
 
+            # CRITICAL: ctypes c_char_p does NOT own the bytes — if the
+            # Python bytes object is garbage-collected before the C call
+            # finishes reading it, the C side sees uninitialized memory.
+            # This caused the lab-observed gibberish-output bug 2026-05-27
+            # (model emitted "用户" + random text fragments). The
+            # encoded bytes objects MUST be held in named Python locals
+            # for the duration of the rkllm_run call.
+            role_bytes = role.encode("utf-8")
+            prompt_bytes = prompt.encode("utf-8")
             inp = RKLLMInput()
             ctypes.memset(ctypes.byref(inp), 0, ctypes.sizeof(inp))
-            inp.role = role.encode("utf-8")
+            inp.role = role_bytes
             inp.enable_thinking = enable_thinking
             inp.input_type = RKLLM_INPUT_PROMPT
-            inp.input_data.prompt_input = prompt.encode("utf-8")
+            inp.input_data.prompt_input = prompt_bytes
             infer = RKLLMInferParam()
             ctypes.memset(ctypes.byref(infer), 0, ctypes.sizeof(infer))
             infer.mode = RKLLM_INFER_GENERATE