Skip to content

Commit dcd8d77

Browse files
authored
Merge pull request #9 from sujitvasanth/fix/iswa-get-can-shift-gemma4
fix: Gemma 4 time-to-first-token drops from 8-12s to <1s by unblocking cache reuse
2 parents e381dc9 + d1333b0 commit dcd8d77

1 file changed

Lines changed: 1 addition & 2 deletions

File tree

src/llama-kv-cache-iswa.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -233,8 +233,7 @@ llama_memory_context_ptr llama_kv_cache_iswa::init_mtp(llama_seq_id seq_id, llam
233233

234234
bool llama_kv_cache_iswa::get_can_shift() const {
235235
return kv_base->get_can_shift() &&
236-
kv_swa->get_can_shift() &&
237-
kv_base->get_size() == kv_swa->get_size();
236+
kv_swa->get_can_shift();
238237
}
239238

240239
void llama_kv_cache_iswa::state_write(llama_io_write_i & io, llama_seq_id seq_id, llama_state_seq_flags flags) const {

0 commit comments

Comments
 (0)