Skip to content

Inline Char16Trie traversal methods#8142

Draft
Xuanwo wants to merge 1 commit into
unicode-org:mainfrom
Xuanwo:xuanwo/icu-char16trie-inline
Draft

Inline Char16Trie traversal methods#8142
Xuanwo wants to merge 1 commit into
unicode-org:mainfrom
Xuanwo:xuanwo/icu-char16trie-inline

Conversation

@Xuanwo

@Xuanwo Xuanwo commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

This PR isolates the Char16Trie traversal inline annotations from #8113.

The change is intentionally small: it only adds #[inline] to the hot Char16TrieIterator traversal entry points and helper paths in components/collections/src/char16trie/trie.rs. It does not change public APIs and does not include the ResultCache, zero-copy complex segmentation, or benchmark implementation changes from the larger performance branch.

Why separate:

  • Char16TrieIterator is shared by published segmenter code and the in-progress neo segmenter.
  • The inline behavior is an independent optimization from the segmenter cache and slicing changes.
  • Keeping this PR scoped makes the performance impact easier to review and attribute.

Benchmark setup:

  • Baseline: latest origin/main at 4f4c9f1fa068b6e4e0331b385104a7b2872489c5.
  • This branch: 665a7db7fa Inline Char16Trie traversal methods.
  • Benchmark code: PR Add segmenter comparison benchmarks #8141, Add segmenter comparison benchmarks, head 044cc38edec9ae46d03db66e9f5016bcd85e5835.
  • Method: the Add segmenter comparison benchmarks #8141 benchmark patch was temporarily applied to both the baseline worktree and this branch's worktree. The benchmark file is intentionally not included in this PR diff.
  • Reproduction outline: apply Add segmenter comparison benchmarks #8141's benchmark patch to origin/main, run the command below; then apply the same patch to this branch and rerun the same command.
  • Command: cargo bench -p icu_segmenter --bench bench --features unstable -- dictionary --sample-size 10 --measurement-time 1 --warm-up-time 0.5.
  • Values below are criterion median point estimates for the dictionary filter. This was a short local run, so small single-digit changes and short English/Latin1 paths should be treated as noisy; the table is intended to show the current published and neo dictionary/Char16Trie-path direction under the latest benchmark code.

Representative dictionary / Char16Trie-related results:

Case Before After Delta
Segmenter_word_segment_str/published_dictionary_thai 2.355 us 2.347 us -0.3%
Segmenter_word_segment_str/published_dictionary_japanese 1.661 us 1.663 us +0.1%
Segmenter_word_segment_str/published_dictionary_han 1.530 us 1.530 us -0.0%
Segmenter_word_segment_str/neo_dictionary_thai 1.452 us 1.373 us -5.5%
Segmenter_word_segment_utf8/neo_dictionary_thai 1.720 us 1.727 us +0.5%
Segmenter_word_segment_utf16/neo_dictionary_thai 1.271 us 1.202 us -5.4%
Segmenter_line_segment_str/published_dictionary_thai 2.461 us 2.427 us -1.4%
Segmenter_line_segment_str/neo_dictionary_thai 1.768 us 1.681 us -5.0%
Segmenter_line_segment_utf8/published_dictionary_thai 2.545 us 2.684 us +5.5%
Segmenter_line_segment_utf8/neo_dictionary_thai 2.391 us 2.130 us -10.9%

Dictionary benchmark delta:

Case Before After Delta
Segmenter_line_segment_latin1/neo_dictionary_css_english 3.010 us 4.279 us +42.1%
Segmenter_line_segment_latin1/neo_dictionary_english 2.105 us 4.481 us +112.9%
Segmenter_line_segment_latin1/published_dictionary_css_english 2.730 us 3.825 us +40.1%
Segmenter_line_segment_latin1/published_dictionary_english 4.048 us 4.101 us +1.3%
Segmenter_line_segment_str/neo_dictionary_css_english 6.446 us 5.552 us -13.9%
Segmenter_line_segment_str/neo_dictionary_english 3.902 us 3.763 us -3.6%
Segmenter_line_segment_str/neo_dictionary_han 309.7 ns 251.5 ns -18.8%
Segmenter_line_segment_str/neo_dictionary_japanese 400.0 ns 344.5 ns -13.9%
Segmenter_line_segment_str/neo_dictionary_thai 1.768 us 1.681 us -5.0%
Segmenter_line_segment_str/neo_dictionary_thai_han 675.3 ns 712.1 ns +5.5%
Segmenter_line_segment_str/neo_dictionary_thai_japanese 793.6 ns 748.7 ns -5.7%
Segmenter_line_segment_str/published_dictionary_css_english 7.274 us 7.726 us +6.2%
Segmenter_line_segment_str/published_dictionary_english 6.482 us 6.230 us -3.9%
Segmenter_line_segment_str/published_dictionary_han 285.3 ns 288.9 ns +1.3%
Segmenter_line_segment_str/published_dictionary_japanese 381.9 ns 375.9 ns -1.6%
Segmenter_line_segment_str/published_dictionary_thai 2.461 us 2.427 us -1.4%
Segmenter_line_segment_str/published_dictionary_thai_han 838.3 ns 821.0 ns -2.1%
Segmenter_line_segment_str/published_dictionary_thai_japanese 961.9 ns 954.0 ns -0.8%
Segmenter_line_segment_utf16/neo_dictionary_css_english 7.386 us 5.719 us -22.6%
Segmenter_line_segment_utf16/neo_dictionary_english 4.931 us 4.774 us -3.2%
Segmenter_line_segment_utf16/neo_dictionary_han 346.1 ns 221.8 ns -35.9%
Segmenter_line_segment_utf16/neo_dictionary_japanese 378.4 ns 286.6 ns -24.3%
Segmenter_line_segment_utf16/neo_dictionary_thai 1.528 us 1.518 us -0.7%
Segmenter_line_segment_utf16/neo_dictionary_thai_han 621.8 ns 517.1 ns -16.8%
Segmenter_line_segment_utf16/neo_dictionary_thai_japanese 740.8 ns 603.5 ns -18.5%
Segmenter_line_segment_utf16/published_dictionary_css_english 6.348 us 6.004 us -5.4%
Segmenter_line_segment_utf16/published_dictionary_english 6.063 us 5.969 us -1.5%
Segmenter_line_segment_utf16/published_dictionary_han 293.0 ns 285.0 ns -2.7%
Segmenter_line_segment_utf16/published_dictionary_japanese 384.7 ns 387.6 ns +0.8%
Segmenter_line_segment_utf16/published_dictionary_thai 2.154 us 2.420 us +12.3%
Segmenter_line_segment_utf16/published_dictionary_thai_han 691.1 ns 691.6 ns +0.1%
Segmenter_line_segment_utf16/published_dictionary_thai_japanese 831.6 ns 851.2 ns +2.4%
Segmenter_line_segment_utf8/neo_dictionary_css_english 11.801 us 7.075 us -40.1%
Segmenter_line_segment_utf8/neo_dictionary_english 7.939 us 5.882 us -25.9%
Segmenter_line_segment_utf8/neo_dictionary_han 458.5 ns 345.9 ns -24.6%
Segmenter_line_segment_utf8/neo_dictionary_japanese 595.4 ns 446.6 ns -25.0%
Segmenter_line_segment_utf8/neo_dictionary_thai 2.391 us 2.130 us -10.9%
Segmenter_line_segment_utf8/neo_dictionary_thai_han 924.4 ns 842.1 ns -8.9%
Segmenter_line_segment_utf8/neo_dictionary_thai_japanese 1.369 us 1.087 us -20.6%
Segmenter_line_segment_utf8/published_dictionary_css_english 8.242 us 7.735 us -6.2%
Segmenter_line_segment_utf8/published_dictionary_english 6.856 us 8.144 us +18.8%
Segmenter_line_segment_utf8/published_dictionary_han 314.3 ns 339.4 ns +8.0%
Segmenter_line_segment_utf8/published_dictionary_japanese 414.5 ns 435.9 ns +5.2%
Segmenter_line_segment_utf8/published_dictionary_thai 2.545 us 2.684 us +5.5%
Segmenter_line_segment_utf8/published_dictionary_thai_han 899.9 ns 895.1 ns -0.5%
Segmenter_line_segment_utf8/published_dictionary_thai_japanese 1.007 us 1.037 us +3.0%
Segmenter_word_segment_str/neo_dictionary_english 4.043 us 4.010 us -0.8%
Segmenter_word_segment_str/neo_dictionary_han 980.5 ns 770.5 ns -21.4%
Segmenter_word_segment_str/neo_dictionary_japanese 1.863 us 906.4 ns -51.3%
Segmenter_word_segment_str/neo_dictionary_thai 1.452 us 1.373 us -5.5%
Segmenter_word_segment_str/neo_dictionary_thai_han 623.2 ns 579.5 ns -7.0%
Segmenter_word_segment_str/neo_dictionary_thai_japanese 1.259 us 864.7 ns -31.3%
Segmenter_word_segment_str/published_dictionary_english 4.412 us 4.392 us -0.5%
Segmenter_word_segment_str/published_dictionary_han 1.530 us 1.530 us -0.0%
Segmenter_word_segment_str/published_dictionary_japanese 1.661 us 1.663 us +0.1%
Segmenter_word_segment_str/published_dictionary_thai 2.355 us 2.347 us -0.3%
Segmenter_word_segment_str/published_dictionary_thai_han 1.042 us 1.020 us -2.1%
Segmenter_word_segment_str/published_dictionary_thai_japanese 1.518 us 1.483 us -2.3%
Segmenter_word_segment_utf16/neo_dictionary_english 4.930 us 5.135 us +4.1%
Segmenter_word_segment_utf16/neo_dictionary_han 776.6 ns 699.4 ns -9.9%
Segmenter_word_segment_utf16/neo_dictionary_japanese 878.8 ns 818.8 ns -6.8%
Segmenter_word_segment_utf16/neo_dictionary_thai 1.271 us 1.202 us -5.4%
Segmenter_word_segment_utf16/neo_dictionary_thai_han 609.4 ns 510.8 ns -16.2%
Segmenter_word_segment_utf16/neo_dictionary_thai_japanese 1.113 us 767.8 ns -31.0%
Segmenter_word_segment_utf16/published_dictionary_english 4.344 us 3.782 us -12.9%
Segmenter_word_segment_utf16/published_dictionary_han 2.601 us 1.410 us -45.8%
Segmenter_word_segment_utf16/published_dictionary_japanese 2.151 us 1.554 us -27.8%
Segmenter_word_segment_utf16/published_dictionary_thai 2.159 us 2.147 us -0.6%
Segmenter_word_segment_utf16/published_dictionary_thai_han 993.2 ns 946.1 ns -4.7%
Segmenter_word_segment_utf16/published_dictionary_thai_japanese 1.702 us 1.376 us -19.1%
Segmenter_word_segment_utf8/neo_dictionary_english 6.351 us 6.594 us +3.8%
Segmenter_word_segment_utf8/neo_dictionary_han 933.5 ns 909.3 ns -2.6%
Segmenter_word_segment_utf8/neo_dictionary_japanese 1.091 us 1.075 us -1.4%
Segmenter_word_segment_utf8/neo_dictionary_thai 1.720 us 1.727 us +0.5%
Segmenter_word_segment_utf8/neo_dictionary_thai_han 795.5 ns 748.7 ns -5.9%
Segmenter_word_segment_utf8/neo_dictionary_thai_japanese 1.177 us 1.130 us -4.0%
Segmenter_word_segment_utf8/published_dictionary_english 5.042 us 4.975 us -1.3%
Segmenter_word_segment_utf8/published_dictionary_han 1.665 us 1.624 us -2.5%
Segmenter_word_segment_utf8/published_dictionary_japanese 1.759 us 1.756 us -0.1%
Segmenter_word_segment_utf8/published_dictionary_thai 2.594 us 2.556 us -1.5%
Segmenter_word_segment_utf8/published_dictionary_thai_han 1.120 us 1.155 us +3.2%
Segmenter_word_segment_utf8/published_dictionary_thai_japanese 1.616 us 1.585 us -1.9%

Validation:

  • cargo fmt -p icu_collections -p icu_segmenter
  • git diff --check
  • cargo test -p icu_collections --features serde,databake
  • cargo test -p icu_segmenter --features unstable

Changelog

icu_collections: Improve Char16Trie traversal performance with inline annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant