perf(ext/webidl): fast path for USVString converter on no-surrogate strings#34288
Open
crowlbot wants to merge 2 commits into
Open
perf(ext/webidl): fast path for USVString converter on no-surrogate strings#34288crowlbot wants to merge 2 commits into
crowlbot wants to merge 2 commits into
Conversation
…trings
`webidl.converters.USVString(name)` is the per-call argument coercion
on every `URLSearchParams.get` / `.has` / `.set` / `.append` / `.delete`
call, plus the `URL` constructor's input, plus `crypto.subtle`'s
algorithm names, and every other USVString argument across the IDL
surface. It runs `DOMString` then `StringPrototypeToWellFormed`.
V8 --prof on a hot `urlSearchParams.get("id")` loop attributes 9.5%
of total ticks to `StringPrototypeToWellFormed`. The builtin replaces
unpaired surrogates with U+FFFD but doesn't have a fast bail-out for
the dominant input shape: a string with no surrogate code units at
all (ASCII / BMP non-surrogate -- effectively every parameter name
real apps pass).
Add a JS-side pre-scan: if any code unit is in 0xD800..=0xDFFF, fall
through to `StringPrototypeToWellFormed`. Otherwise return the input
unchanged. Validated against the V8 builtin across 16 inputs
including empty, ASCII, BMP non-ASCII, supplementary plane (valid
surrogate pair), lone high surrogate, lone low surrogate, reversed
surrogates, and long ASCII strings -- outputs match in every case.
This is a webidl-level converter change so any USVString-typed IDL
argument benefits, not just `URLSearchParams.get`.
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
webidl.converters.USVStringis the per-call argument coercion onevery USVString-typed IDL argument:
URLSearchParams.{get,has,set, append,delete,getAll}names + values,URLconstructor's input,crypto.subtlealgorithm names,FormDatakeys/values, and a longtail of others. It runs
DOMStringthenStringPrototypeToWellFormed.The
toWellFormedbuiltin replaces unpaired surrogates with U+FFFD,but it doesn't fast-bail when there's nothing to replace -- and the
overwhelming majority of real arguments are ASCII or BMP non-surrogate
strings that are well-formed by construction.
Where the cost lives
V8
--profon a hoturlSearchParams.get("id")loop attributes 9.5%of total ticks to
StringPrototypeToWellFormed. Pre-scanning in JSfor any surrogate code unit (0xD800 -- 0xDFFF) -- which is a single
linear pass with no allocation -- lets us skip the V8 builtin call
when there are no surrogates to look at, which is essentially every
real argument.
How
The fast path returns the original string verbatim when there are no
UTF-16 surrogate code units (so it's well-formed by construction).
The slow path runs the V8 builtin verbatim, so behaviour matches
exactly for any input that contains surrogate code units -- including
valid surrogate pairs encoding supplementary plane code points
(those get returned unchanged by the builtin as well, but the JS
prescan can't tell which surrogates are paired without looking at
both halves, so we hand them to V8).
Benchmarks
Same-host release builds, 5 runs each, 5M iterations.
urlSearchParams.get("id")The 1.36x is just one operation -- the change applies to every
USVString-typed IDL argument across the surface, so the cumulative
impact across
urlSearchParams,URL,crypto.subtle,FormData,Blob, etc. is broader than this single benchmark shows. A separatebenchmark scaffold in
ext/webidl/benches/usvstring.rscovers fourinput shapes (short ASCII, long ASCII, BMP non-ASCII, valid
surrogate pair) for the next pass to use.
Local validation
Ran the fast path against
String.prototype.toWellFormedon 16 inputs:empty, ASCII (short and long), BMP non-ASCII (
café, Cyrillic,Japanese), supplementary plane (valid surrogate pair), lone high
surrogate, lone low surrogate, lone high in the middle, lone low in
the middle, reversed surrogates, and long ASCII. Outputs match in
every case.
Test plan
cargo check -p deno_snapshots-- snapshot loads, runtime initializes.cargo build --bin deno --releaseon both main and this branch.cargo check -p deno_webidl --benches-- bench compiles.dprint check-- clean.String.prototype.toWellFormedon 16inputs covering all surrogate edge cases.
test unit::url,test unit::urlSearchParams,test unit::webidl, WPT URL + Fetch suites (the USVStringconverter is on the path for every USVString IDL argument).
bench release linux-x86_64records the after numbers.