spec: conform to latest WHATWG URL Pattern WPT test data#9
Merged
Conversation
Refresh testdata/urlpatterntestdata.json from the current WPT snapshot
(365 cases, up from 313) and align the implementation with the new
expectations:
- createComponentMatchResult now bounds iteration by the group name
list length so user-provided regex groups (e.g. :foo((?<x>a))) no
longer index past the name list and panic. Groups without a name
are simply not exposed on result.Groups, matching WPT test 364.
- canonicalizeHostname: drop the broad reject-on-'/ ? # \\' guard.
The URL parser now canonicalizes those boundary chars correctly;
keep only the ':' check since without it "bad:hostname" would be
silently split into host + port.
- hostnameParser: drop WithFailOnValidationError so tab/LF/CR in a
hostname are stripped per the WHATWG URL spec instead of errored.
- canonicalizePort: rewritten to
1. skip leading ASCII tab / LF / CR and require the first
significant byte to be a digit (to reject "invalid80"),
2. always parse against a non-special scheme when no protocol
is supplied so the library doesn't collapse default ports
like 80/443 to empty,
3. remove the former "portValue != canonicalized" rejection
which also rejected valid canonicalizations like "8\t0" -> "80"
and "80x" -> "80" (state-override truncation).
All 365 WPT cases pass (2 skipped for advanced unicode features Go's
regexp engine does not support).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates this library’s URLPattern conformance by syncing to the latest WHATWG URL Pattern WPT snapshot and adjusting canonicalization/match-result behavior to align with the new cases.
Changes:
- Refreshes
testdata/urlpatterntestdata.jsonto the latest WPT dataset (365 cases) and updates expected outcomes. - Prevents panics in
createComponentMatchResultby bounding subgroup iteration to the pattern’s known group-name list (dropping extra user-regex captures). - Aligns hostname/port canonicalization with the URL spec by relaxing validation-error behavior (strip tab/LF/CR) and updating port parsing rules (leading significant digit requirement; non-special scheme when protocol is absent).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| urlpattern.go | Bounds capture-group extraction to avoid indexing past the group-name list. |
| parser.go | Adjusts hostname parser strictness and rewrites port canonicalization to match updated WPT/spec behavior. |
| testdata/urlpatterntestdata.json | Updates WPT-derived test corpus and expected results for new conformance cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refresh
testdata/urlpatterntestdata.jsonfrom the current WPT snapshot (365 cases, up from 313) and align the implementation:createComponentMatchResult: bound iteration by the group name list length so user-provided regex groups (e.g.:foo((?<x>a))) no longer index past the name list and panic. Unnamed captures are simply omitted fromGroups, matching WPT 364.canonicalizeHostname: drop the broad reject-on-/?#\guard — the URL parser now truncates hostnames correctly at those boundaries. The:check stays because the parser silently splitshost:portwith no surface-level error.hostnameParser: dropWithFailOnValidationErrorso tab / LF / CR are stripped per the URL spec instead of errored.canonicalizePort: rewritten to (1) skip leading tab/LF/CR and require the first significant byte to be a digit (rejects"invalid80"), (2) parse against a non-special scheme when no protocol is supplied so default ports are not collapsed to"", and (3) drop the former equality-with-input check that also rejected valid canonicalizations like"8\t0"→"80".All 365 WPT cases pass (2 skipped: advanced unicode features Go's regexp engine does not support).