github · gorzell · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026
@@ -300,3 +300,7 @@ This case is particularly challenging for tiktoken, which shows a quadratic grow
 The Huggingface encoder scales better, but becomes slower and slower compared to our implementation as input size increases.
 
 ![worst-case encoding runtime comparison](./images/performance-worstcase.svg)
+
+For a full runtime analysis of incremental BPE tokenization, see Jiang and Gong, ["Incremental BPE Tokenization"](https://arxiv.org/abs/2605.30813) (ICML 2026), which presents an algorithm with a worst-case `O(n log^2 t)` complexity (where `n` is the input length and `t` is the maximum token length).
+Their work builds on our incremental algorithm and takes it one step further by combining the aho-corasick search with the compatibility test into a single automaton.
+Their implementation is available at [ModelTC/mtc-inc-bpe](https://github.com/ModelTC/mtc-inc-bpe).