Hi, thank you for your work and sharing it. I wanted to ask a question somewhat related to 214. I'm looking into fine-tuning the model so that it would keep only last K frames as conditioning. However in current implementation of assigning IDs to images, it seems that the oldest frame gets 0 and newest gets N. Is there a reason it's not the other way? It would seem to me a bit more natural that the embeddings are fixed from the perspective of current generation (the one right before always has 0). I'm wondering if it's worth expeerimenting with fine-tuning this model, with this reversed.
Would you have some thoughts on this?
Thank you
Hi, thank you for your work and sharing it. I wanted to ask a question somewhat related to 214. I'm looking into fine-tuning the model so that it would keep only last K frames as conditioning. However in current implementation of assigning IDs to images, it seems that the oldest frame gets 0 and newest gets N. Is there a reason it's not the other way? It would seem to me a bit more natural that the embeddings are fixed from the perspective of current generation (the one right before always has 0). I'm wondering if it's worth expeerimenting with fine-tuning this model, with this reversed.
Would you have some thoughts on this?
Thank you