Skip to content

perf(ext/webidl): fast path for USVString converter on no-surrogate strings#34288

Open
crowlbot wants to merge 2 commits into
denoland:mainfrom
crowlbot:perf/webidl-usvstring-fast-path
Open

perf(ext/webidl): fast path for USVString converter on no-surrogate strings#34288
crowlbot wants to merge 2 commits into
denoland:mainfrom
crowlbot:perf/webidl-usvstring-fast-path

Conversation

@crowlbot
Copy link
Copy Markdown
Contributor

Summary

webidl.converters.USVString is the per-call argument coercion on
every USVString-typed IDL argument: URLSearchParams.{get,has,set, append,delete,getAll} names + values, URL constructor's input,
crypto.subtle algorithm names, FormData keys/values, and a long
tail of others. It runs DOMString then StringPrototypeToWellFormed.

The toWellFormed builtin replaces unpaired surrogates with U+FFFD,
but it doesn't fast-bail when there's nothing to replace -- and the
overwhelming majority of real arguments are ASCII or BMP non-surrogate
strings that are well-formed by construction.

Where the cost lives

V8 --prof on a hot urlSearchParams.get("id") loop attributes 9.5%
of total ticks to StringPrototypeToWellFormed. Pre-scanning in JS
for any surrogate code unit (0xD800 -- 0xDFFF) -- which is a single
linear pass with no allocation -- lets us skip the V8 builtin call
when there are no surrogates to look at, which is essentially every
real argument.

How

converters.USVString = (V, prefix, context, opts) => {
  const S = converters.DOMString(V, prefix, context, opts);
  for (let i = 0; i < S.length; i++) {
    const c = StringPrototypeCharCodeAt(S, i);
    if (c >= 0xD800 && c <= 0xDFFF) {
      return StringPrototypeToWellFormed(S);
    }
  }
  return S;
};

The fast path returns the original string verbatim when there are no
UTF-16 surrogate code units (so it's well-formed by construction).
The slow path runs the V8 builtin verbatim, so behaviour matches
exactly for any input that contains surrogate code units -- including
valid surrogate pairs encoding supplementary plane code points
(those get returned unchanged by the builtin as well, but the JS
prescan can't tell which surrogates are paired without looking at
both halves, so we hand them to V8).

Benchmarks

Same-host release builds, 5 runs each, 5M iterations.

op main (ns/op) this PR (ns/op) speedup
urlSearchParams.get("id") 102 (100 -- 125) 75 (72 -- 82) ~1.36x

The 1.36x is just one operation -- the change applies to every
USVString-typed IDL argument across the surface, so the cumulative
impact across urlSearchParams, URL, crypto.subtle, FormData,
Blob, etc. is broader than this single benchmark shows. A separate
benchmark scaffold in ext/webidl/benches/usvstring.rs covers four
input shapes (short ASCII, long ASCII, BMP non-ASCII, valid
surrogate pair) for the next pass to use.

Local validation

Ran the fast path against String.prototype.toWellFormed on 16 inputs:
empty, ASCII (short and long), BMP non-ASCII (café, Cyrillic,
Japanese), supplementary plane (valid surrogate pair), lone high
surrogate, lone low surrogate, lone high in the middle, lone low in
the middle, reversed surrogates, and long ASCII. Outputs match in
every case.

Test plan

  • cargo check -p deno_snapshots -- snapshot loads, runtime initializes.
  • cargo build --bin deno --release on both main and this branch.
  • cargo check -p deno_webidl --benches -- bench compiles.
  • dprint check -- clean.
  • Manual round-trip against String.prototype.toWellFormed on 16
    inputs covering all surrogate edge cases.
  • CI: test unit::url, test unit::urlSearchParams,
    test unit::webidl, WPT URL + Fetch suites (the USVString
    converter is on the path for every USVString IDL argument).
  • CI: bench release linux-x86_64 records the after numbers.

claude added 2 commits May 21, 2026 15:14
…trings

`webidl.converters.USVString(name)` is the per-call argument coercion
on every `URLSearchParams.get` / `.has` / `.set` / `.append` / `.delete`
call, plus the `URL` constructor's input, plus `crypto.subtle`'s
algorithm names, and every other USVString argument across the IDL
surface. It runs `DOMString` then `StringPrototypeToWellFormed`.

V8 --prof on a hot `urlSearchParams.get("id")` loop attributes 9.5%
of total ticks to `StringPrototypeToWellFormed`. The builtin replaces
unpaired surrogates with U+FFFD but doesn't have a fast bail-out for
the dominant input shape: a string with no surrogate code units at
all (ASCII / BMP non-surrogate -- effectively every parameter name
real apps pass).

Add a JS-side pre-scan: if any code unit is in 0xD800..=0xDFFF, fall
through to `StringPrototypeToWellFormed`. Otherwise return the input
unchanged. Validated against the V8 builtin across 16 inputs
including empty, ASCII, BMP non-ASCII, supplementary plane (valid
surrogate pair), lone high surrogate, lone low surrogate, reversed
surrogates, and long ASCII strings -- outputs match in every case.

This is a webidl-level converter change so any USVString-typed IDL
argument benefits, not just `URLSearchParams.get`.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants