Question on Quest implemenation in KV cache selection

Thanks for your great work and corresponding implementation of baselines! It really benefits the future work a lot!
I have a question about the implementation of KV cache selection in Quest.

It looks like in this repo, the Quest cache will select all the generated tokens (see https://github.com/Infini-AI-Lab/MagicPIG/blob/ac9aa36c866330ca6ad2ce342a7848d7df6f49bb/evaluations/RULER/pred/quest_cache.py#L127 ), 
The selection of tokens is limited to the prompt tokens because the KV page is built only on prefill. 

It looks like the original Quest implementation (https://github.com/mit-han-lab/Quest/blob/main/evaluation/quest_attention.py ) will dynamically update the KV pages during the decoding.  

Will you take the implementation of dynamic KV page updating into consideration? I implemented a very simple but not perfect version for this here: https://github.com/Monstertail/MagicPIG/blob/b635d06ae2c68c1d2949f2e95f358fb5746f6108/RULER/RULER/scripts/pred/quest_cache.py#L253 . If you are interested, we can think about how to make it better together. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on Quest implemenation in KV cache selection #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question on Quest implemenation in KV cache selection #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions