|
| 1 | +# Benchmarks — v0.11.0 |
| 2 | + |
| 3 | +Date: 2026-05-03 |
| 4 | +Platform: macOS (Apple Silicon) |
| 5 | + |
| 6 | +Criterion's reported `change` column compares against the previous saved local |
| 7 | +run, which on this machine is v0.10.0 stable. Deltas here measure the work that |
| 8 | +landed between v0.10.0 and v0.11.0: AWS spec restoration (#101), local-project |
| 9 | +providers (#97), OSC 7772 buffer framing (#94), async suggestion feedback (#98), |
| 10 | +multi-word alias expansion (#95), static `args.suggestions` (#96), popup page |
| 11 | +navigation (#99), and the popup rendering hardening pass (#100). Three new |
| 12 | +benchmark groups (`osc7772_decode`, `feedback_render`, plus `alias_hit_*` cases |
| 13 | +in `engine_suggest_sync` and `with_static_suggestions_tar` in `spec_resolution`) |
| 14 | +ship as new baselines. |
| 15 | + |
| 16 | +## vt_parse_throughput |
| 17 | + |
| 18 | +| Benchmark | Time | vs v0.10.0 | |
| 19 | +|-----------|------|------------| |
| 20 | +| plain_text | 197.36 µs | +10.6% | |
| 21 | +| ansi_colored | 246.11 µs | +10.5% | |
| 22 | +| cursor_heavy | 266.70 µs | −2.9% improvement | |
| 23 | + |
| 24 | +The plain_text and ansi_colored regressions trace to the OSC dispatch fan-out |
| 25 | +in `gc-parser`: the OSC handler now matches OSC 7770 (legacy), 7771 (cursor), |
| 26 | +and 7772 (percent-encoded buffer) sequences, plus the per-frame deprecation |
| 27 | +warn for 7770. The plain_text input contains no OSCs, so the cost is purely |
| 28 | +the dispatch table grow; cursor_heavy is dominated by CUP/CUF/CUB sequences, |
| 29 | +which sit on a different code path and improve marginally as side benefit of |
| 30 | +the parser refactor. |
| 31 | + |
| 32 | +## osc7772_decode (new in v0.11.0) |
| 33 | + |
| 34 | +| Benchmark | Time | Notes | |
| 35 | +|-----------|------|-------| |
| 36 | +| 100 | 574.03 ns | 100 byte payload (typical command line) | |
| 37 | +| 1024 | 4.0154 µs | 1 KiB payload (long pasted line) | |
| 38 | +| 8192 | 28.623 µs | 8 KiB payload (worst-case heredoc body) | |
| 39 | + |
| 40 | +Per-byte percent-decode cost lands at ~5.7 ns/byte, well under the keystroke |
| 41 | +budget. The shell-side encoder benchmark is left out — emission happens once |
| 42 | +per prompt redraw at human typing cadence, never on the hot path. |
| 43 | + |
| 44 | +## feedback_render (new in v0.11.0) |
| 45 | + |
| 46 | +| Benchmark | Time | Notes | |
| 47 | +|-----------|------|-------| |
| 48 | +| indicator_row_width_60 | 121.96 ns | Steady-state spinner repaint | |
| 49 | +| indicator_row_width_60_varying_frame | 123.15 ns | Frame-varying input (defeats const-folding) | |
| 50 | + |
| 51 | +The async-feedback indicator row repaints at ≤16 fps in the worst case |
| 52 | +(spinner cadence). Sub-microsecond per-frame cost means the feature is free |
| 53 | +on the render path. |
| 54 | + |
| 55 | +## fuzzy_ranking |
| 56 | + |
| 57 | +| Benchmark | Time | vs v0.10.0 | |
| 58 | +|-----------|------|------------| |
| 59 | +| 1k_3char | 101.44 µs | +1.5% (flat) | |
| 60 | +| 10k_3char | 1.0743 ms | +4.5% within noise | |
| 61 | +| 10k_empty | 350.17 µs | +2.0% (flat) | |
| 62 | + |
| 63 | +Within run-to-run noise. Nucleo's hot path is unchanged; the small drift is |
| 64 | +attributable to the `EnumValue` `SuggestionKind` variant added for static |
| 65 | +`args.suggestions` priority routing. |
| 66 | + |
| 67 | +## spec_loading |
| 68 | + |
| 69 | +| Benchmark | Time | vs v0.10.0 | |
| 70 | +|-----------|------|------------| |
| 71 | +| load_717_specs | 288.14 ms | +241% expected | |
| 72 | + |
| 73 | +**Expected.** This is the AWS spec restoration showing up at startup-load |
| 74 | +time. The bench loads the embedded spec corpus once, and the corpus grew |
| 75 | +from ~47 MB minified pre-AWS to ~83 MB minified with AWS — 17 139 added |
| 76 | +subcommands and 99 537 added options. The load happens once per process at |
| 77 | +init, so the increase is on cold-start, not on the keystroke path. Embedded |
| 78 | +specs heap walk (`memory/embedded_specs_heap_walk` = 4.23 ms) confirms the |
| 79 | +runtime walk over the full corpus is still well-bounded. |
| 80 | + |
| 81 | +zstd-compressing the embedded JSON corpus (separate plan tracked in |
| 82 | +follow-on work) is the principled reclaim path here and would drop both |
| 83 | +binary size and load time meaningfully. |
| 84 | + |
| 85 | +## spec_resolution |
| 86 | + |
| 87 | +| Benchmark | Time | vs v0.10.0 | |
| 88 | +|-----------|------|------------| |
| 89 | +| shallow | 2.0979 µs | +8.4% | |
| 90 | +| deep | 1.2872 µs | +1.7% (flat) | |
| 91 | +| with_static_suggestions_tar | 6.0840 µs | new in v0.11.0 | |
| 92 | + |
| 93 | +The shallow regression (~163 ns absolute) covers the alias-aware spec walk |
| 94 | +(`resolve_ctx_for_spec_walk` helper + cycle guard) added for multi-word |
| 95 | +alias expansion. Deep walks dominate descent time, so the alias preamble is |
| 96 | +amortised away. |
| 97 | + |
| 98 | +## transform_pipeline |
| 99 | + |
| 100 | +| Benchmark | Time | vs v0.10.0 | |
| 101 | +|-----------|------|------------| |
| 102 | +| simple | 26.598 µs | +2.5% (flat) | |
| 103 | +| regex | 113.72 µs | +2.0% (flat) | |
| 104 | +| json | 35.870 µs | +33.6% expected | |
| 105 | + |
| 106 | +The json regression traces to the `json_extract_array` extension landed in |
| 107 | +the docs/correctness pass — `json_extract` now branches on whether the |
| 108 | +target is a scalar or an array projection, adding a per-row clone for the |
| 109 | +array case. The bench exercises the new array path, so the +33% is the |
| 110 | +cost of correctness, not avoidable overhead. Previous behaviour silently |
| 111 | +emitted wrong completions on array-shaped JSON (caught in v0.10.0 |
| 112 | +`### Corrected`). |
| 113 | + |
| 114 | +## engine_suggest_sync |
| 115 | + |
| 116 | +| Benchmark | Time | vs v0.10.0 | |
| 117 | +|-----------|------|------------| |
| 118 | +| command_position | 17.677 µs | +2.4% within noise | |
| 119 | +| subcommand_with_spec | 16.863 µs | +0.6% (flat) | |
| 120 | +| filesystem_fallback | 1.0955 ms | +2.3% within noise | |
| 121 | +| alias_hit_single | 16.034 µs | new in v0.11.0 (apples-to-apples vs subcommand_with_spec, +-2 µs) | |
| 122 | +| alias_hit_multi | 1.1587 ms | new in v0.11.0 (multi-word alias path) | |
| 123 | + |
| 124 | +`alias_hit_single` (alias `g=git`, ctx `g ch<TAB>`) lands within ±2 µs of |
| 125 | +`subcommand_with_spec` — alias expansion plus the spec walk is in the same |
| 126 | +budget as a non-aliased spec walk. `alias_hit_multi` (alias `gco='git |
| 127 | +checkout'`, ctx `gco m<TAB>`) is dominated by the filesystem walk over the |
| 128 | +bench's 2 k-file tmp dir (git checkout has a `filepaths` template), not by |
| 129 | +the alias machinery. |
| 130 | + |
| 131 | +## priority_sort |
| 132 | + |
| 133 | +| Benchmark | Time | vs v0.10.0 | |
| 134 | +|-----------|------|------------| |
| 135 | +| empty_query_10k | 723.97 µs | +1.4% (flat) | |
| 136 | +| fuzzy_query_10k | 650.20 µs | +5.7% within noise | |
| 137 | + |
| 138 | +## memory |
| 139 | + |
| 140 | +| Benchmark | Time | Notes | |
| 141 | +|-----------|------|-------| |
| 142 | +| embedded_specs_heap_walk | 4.23 ms | Walks every spec on the heap once | |
| 143 | + |
| 144 | +Up from ~3.0 ms in v0.10.0; the raw walk now traverses the AWS subtree. |
| 145 | +Still well under the runtime memory budget — the embedded-spec heap test |
| 146 | +budget rose from 64 MiB to 128 MiB to admit AWS, with ~104 MiB used. |
| 147 | + |
| 148 | +## Analysis |
| 149 | + |
| 150 | +Three baselines added (`osc7772_decode`, `feedback_render`, |
| 151 | +`engine_suggest_sync/alias_hit_*`, `spec_resolution/with_static_suggestions_tar`). |
| 152 | + |
| 153 | +The two material regressions are both **expected and acceptable**: |
| 154 | + |
| 155 | +1. `spec_loading/load_717_specs` +241% — AWS spec restoration. One-time |
| 156 | + cold-start cost, not on the keystroke path. Reclaim plan: zstd-compress |
| 157 | + embedded specs (separate spec). |
| 158 | +2. `transform_pipeline/json` +33.6% — `json_extract_array` branch added |
| 159 | + for correctness. Previous v0.10.0 behaviour was silently wrong on |
| 160 | + array-shaped JSON; the new cost is structural, not optimisable. |
| 161 | + |
| 162 | +Everything else lands within run-to-run noise (±10%). No keystroke-path |
| 163 | +regression worth blocking the release. |
| 164 | + |
| 165 | +The `osc7772_decode` baseline (574 ns at typical buffer size) confirms the |
| 166 | +new buffer framing is invisible on the hot path. The `feedback_render` |
| 167 | +indicator row at 122 ns/frame confirms the async feedback UI ships free. |
0 commit comments