You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a full runtime analysis of incremental BPE tokenization, see Jiang and Gong, ["Incremental BPE Tokenization"](https://arxiv.org/abs/2605.30813) (ICML 2026), which presents an algorithm with a worst-case `O(n log^2 t)` complexity (where `n` is the input length and `t` is the maximum token length).
305
+
Their work builds on our incremental algorithm and takes it one step further by combining the aho-corasick search with the compatibility test into a single automaton.
306
+
Their implementation is available at [ModelTC/mtc-inc-bpe](https://github.com/ModelTC/mtc-inc-bpe).
0 commit comments