You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
runtime: update ctypes wrappers + init flow for RKLLM v1.2.3 ABI
The v1.1.4 -> v1.2.3 runtime bump (committed earlier today) made the
existing struct layout incompatible with rkllm_init. Lab device
hit "E rkllm: The n_batch must be between 1 and 100, but got 0" on
first call because RKLLMExtendParam in our Python wrapper still had
the old "base_domain_id + 112 bytes reserved" layout. v1.2.3 split
those reserved bytes into n_batch + several other required fields.
This commit ports the ctypes definitions to the v1.2.3 ABI as
documented in rkllm-runtime/Linux/librkllm_api/include/rkllm.h at
the release-v1.2.3 tag.
Struct changes:
RKLLMExtendParam: added embed_flash, enabled_cpus_num,
enabled_cpus_mask, n_batch, use_cross_attn;
reserved shrunk 112 -> 104 bytes
RKLLMParam: added n_keep between top_k and top_p
RKLLMInput: restructured — role + enable_thinking +
input_type now prefix the union (was just
input_mode)
RKLLMInferParam: added keep_history at the end
RKLLMResult: added token_id + logits + perf fields;
dropped legacy `size` (not in v1.2.3 C struct)
RKLLMMultiModalInput: added n_image, image_width, image_height
Callback signature: returns int (was void in v1.1.4). _on_token now
ends with `return 0` and traceback-guards the queue puts so the
callback never raises into the C side.
init_model flow updates:
- zero the full RKLLMParam via ctypes.memset before populating
(defensive; with new fields any uninitialized bytes could be
interpreted as garbage)
- set n_keep = -1 (runtime default — typically keeps the
system-prompt portion of KV cache when context shifts)
- set extend_param.embed_flash = 1 (lower RAM)
- set extend_param.enabled_cpus_num = 4 + enabled_cpus_mask
targeting RK3588 big cores (4-7, Cortex-A76) for best
per-token latency
- set extend_param.n_batch = 1 (single-sample; v1.2.3 rejects 0)
- set extend_param.use_cross_attn = 0
- call rkllm_set_chat_template(handle, "", "", "") AFTER init
to make the runtime pass our pre-formatted ChatML through
verbatim (we build the full envelope including <think>\n
prefix ourselves in _build_chat_prompt; without this override
the runtime would double-wrap with its built-in Qwen 3 template)
generate flow updates:
- zero RKLLMInput via memset
- set role = "user" (default for our pre-formatted ChatML path)
- set enable_thinking = False (we handle thinking-mode injection
in _build_chat_prompt; setting True would re-inject and
double-think)
- set input_type = RKLLM_INPUT_PROMPT (was input_mode in v1.1.4)
- set infer.keep_history = 0 (we manage multi-turn history
ourselves per turn; let the runtime discard its KV history
between calls)
Existing unit tests 55/55 still pass — struct field additions are
backward-compatible at the Python level since tests don't exercise
the ctypes layout directly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0 commit comments