Hi, thanks for the amazing work and for releasing this paper.
I have a question about long-horizon rollout. From my reading, HyDRA seems to be trained with a
fixed context window (e.g. the paper mentions encoding 77 context frames), and the memory
tokenizer / retrieval operate over that context. In practice, if one wants to use HyDRA in an
autoregressive rollout setting where the generated video keeps getting longer, is the method
still limited to a fixed-length context window (for example via a sliding window), or is there a
mechanism to accumulate memory over arbitrarily long horizons?
Hi, thanks for the amazing work and for releasing this paper.
I have a question about long-horizon rollout. From my reading, HyDRA seems to be trained with a
fixed context window (e.g. the paper mentions encoding 77 context frames), and the memory
tokenizer / retrieval operate over that context. In practice, if one wants to use HyDRA in an
autoregressive rollout setting where the generated video keeps getting longer, is the method
still limited to a fixed-length context window (for example via a sliding window), or is there a
mechanism to accumulate memory over arbitrarily long horizons?