I think one fundamental issue with our strategy to use RNNs or SSMs is that it indeed can bring a large advantage to 'look into the future': RNNs or SSMs have no clue at which angle values the kinematics are starting (at t=0). With a CNN or even more so with a transformer, they can look at the entire sequence and 'generate' a full sequence of kinetics that makes sense from start to end.
(The envisioned-diffusion approach might thus really be a decent choice)
I think one fundamental issue with our strategy to use RNNs or SSMs is that it indeed can bring a large advantage to 'look into the future': RNNs or SSMs have no clue at which angle values the kinematics are starting (at t=0). With a CNN or even more so with a transformer, they can look at the entire sequence and 'generate' a full sequence of kinetics that makes sense from start to end.
(The envisioned-diffusion approach might thus really be a decent choice)