Skip to content

Commit cdcc915

Browse files
authored
Fix C++ -Werror regressions in llama runner (#19326) (#19326)
Summary: Fixes 3 `-Werror` diagnostics that broke the qualcomm llama runner build on `cfg:android-arm64-clang19-no-san` and disabled the following test infra targets: - `xplat/executorch/examples/qualcomm/oss_scripts/llama:runner_lib` - `xplat/executorch/examples/qualcomm/oss_scripts/llama:runner_lib_static` - `xplat/executorch/examples/qualcomm/oss_scripts/llama:qnn_llama_runner` - `xplat/executorch/examples/qualcomm/oss_scripts/llama:qnn_llama_runner_static` Three diagnostics fixed: 1. `-Wreorder-ctor` in `runner.cpp`: `attention_sink_rope_module_` is declared as the 2nd field of `Runner<T>` (right after `module_`) but the constructor initializer list appended it last, after `tokenizer_`. Moved it to the correct position in the init list to match declaration order. Recent regression introduced in the attention-sink diff (#16574). 2. `-Woverloaded-virtual` in `lhd_token_generator.h` and `multimodal_lhd_token_generator.h`: the derived classes define a `prepare_io(std::vector<uint64_t>, std::vector<int32_t>)` overload that hides the base class virtual `prepare_io(uint64_t, int64_t)`. Added a `using TokenGenerator<T>::prepare_io;` (and equivalent for the multimodal hierarchy) declaration so the base virtual stays in scope and the warning is silenced without changing behavior. Latent bug surfaced by the clang19 toolchain bump. 3. `-Wdelete-non-abstract-non-virtual-dtor` in `prompt_processor.h`: `PromptProcessor<T>` has virtual member functions but no virtual destructor, so deleting via `std::unique_ptr<PromptProcessor<T>>` in `Runner` was undefined behavior under strict warnings. Added `virtual ~PromptProcessor() = default;` mirroring the pattern already used in `TokenGenerator` (`token_generator.h`). Also transitively fixes `MultimodalPromptProcessor<T>`. Reviewed By: rascani Differential Revision: D103991803
1 parent 3ffaf27 commit cdcc915

4 files changed

Lines changed: 10 additions & 2 deletions

File tree

examples/qualcomm/oss_scripts/llama/runner/lhd_token_generator.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,9 @@ class LhdTokenGenerator : public TokenGenerator<T> {
102102
AttentionSinkRopeRunner* attention_sink_rope_runner) override;
103103

104104
private:
105+
// Bring base class's virtual prepare_io into scope so the overload below
106+
// does not hide it (-Woverloaded-virtual).
107+
using TokenGenerator<T>::prepare_io;
105108
/**
106109
* @brief Fill in I/O buffers with prompt token and position.
107110
* @param cur_token Current token.

examples/qualcomm/oss_scripts/llama/runner/multimodal_runner/multimodal_lhd_token_generator.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,9 @@ class MultimodalLhdTokenGenerator
108108
AttentionSinkRopeRunner* attention_sink_rope_runner) override;
109109

110110
private:
111+
// Bring base class's virtual prepare_io into scope so the overload below
112+
// does not hide it (-Woverloaded-virtual).
113+
using TokenGenerator<T>::prepare_io;
111114
/**
112115
* @brief Fill in I/O buffers with prompt token and position.
113116
* @param cur_token Current token.

examples/qualcomm/oss_scripts/llama/runner/prompt_processor.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ class PromptProcessor {
4040
const std::string& method_name,
4141
Metadata metadata);
4242

43+
virtual ~PromptProcessor() = default;
44+
4345
/**
4446
* @brief Initialize I/O tensor and allocate I/O data buffer.
4547
* @param buffer_manager Pointer to IMemAlloc instance; by default, it uses a

examples/qualcomm/oss_scripts/llama/runner/runner.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ Runner<T>::Runner(
102102
std::unique_ptr<tokenizers::Tokenizer> tokenizer,
103103
std::unique_ptr<executorch::extension::Module> attention_sink_rope_module)
104104
: module_(std::move(module)),
105+
attention_sink_rope_module_(std::move(attention_sink_rope_module)),
105106
ngram_(ngram),
106107
window_(window),
107108
gcap_(gcap),
@@ -111,8 +112,7 @@ Runner<T>::Runner(
111112
temperature_(temperature),
112113
eval_mode_(static_cast<EvalMode>(eval_mode)),
113114
shared_buffer_(shared_buffer),
114-
tokenizer_(std::move(tokenizer)),
115-
attention_sink_rope_module_(std::move(attention_sink_rope_module)) {
115+
tokenizer_(std::move(tokenizer)) {
116116
stats_.reset();
117117

118118
if (decoder_model_version == "llama2") {

0 commit comments

Comments
 (0)