The [paper pdf](https://arxiv.org/pdf/2409.13598) mentions on page 6: "With these choices we are dealing with 51,840 tokens per sample". How exactly do you arrive at that number?
The paper pdf mentions on page 6: "With these choices we are dealing with 51,840 tokens per sample". How exactly do you arrive at that number?