Skip to content

spec: conform to latest WHATWG URL Pattern WPT test data#9

Merged
dunglas merged 1 commit into
mainfrom
spec/conformance
Apr 20, 2026
Merged

spec: conform to latest WHATWG URL Pattern WPT test data#9
dunglas merged 1 commit into
mainfrom
spec/conformance

Conversation

@dunglas
Copy link
Copy Markdown
Owner

@dunglas dunglas commented Apr 20, 2026

Summary

Refresh testdata/urlpatterntestdata.json from the current WPT snapshot (365 cases, up from 313) and align the implementation:

  • createComponentMatchResult: bound iteration by the group name list length so user-provided regex groups (e.g. :foo((?<x>a))) no longer index past the name list and panic. Unnamed captures are simply omitted from Groups, matching WPT 364.
  • canonicalizeHostname: drop the broad reject-on-/?#\ guard — the URL parser now truncates hostnames correctly at those boundaries. The : check stays because the parser silently splits host:port with no surface-level error.
  • hostnameParser: drop WithFailOnValidationError so tab / LF / CR are stripped per the URL spec instead of errored.
  • canonicalizePort: rewritten to (1) skip leading tab/LF/CR and require the first significant byte to be a digit (rejects "invalid80"), (2) parse against a non-special scheme when no protocol is supplied so default ports are not collapsed to "", and (3) drop the former equality-with-input check that also rejected valid canonicalizations like "8\t0""80".

All 365 WPT cases pass (2 skipped: advanced unicode features Go's regexp engine does not support).

Refresh testdata/urlpatterntestdata.json from the current WPT snapshot
(365 cases, up from 313) and align the implementation with the new
expectations:

- createComponentMatchResult now bounds iteration by the group name
  list length so user-provided regex groups (e.g. :foo((?<x>a))) no
  longer index past the name list and panic. Groups without a name
  are simply not exposed on result.Groups, matching WPT test 364.
- canonicalizeHostname: drop the broad reject-on-'/ ? # \\' guard.
  The URL parser now canonicalizes those boundary chars correctly;
  keep only the ':' check since without it "bad:hostname" would be
  silently split into host + port.
- hostnameParser: drop WithFailOnValidationError so tab/LF/CR in a
  hostname are stripped per the WHATWG URL spec instead of errored.
- canonicalizePort: rewritten to
    1. skip leading ASCII tab / LF / CR and require the first
       significant byte to be a digit (to reject "invalid80"),
    2. always parse against a non-special scheme when no protocol
       is supplied so the library doesn't collapse default ports
       like 80/443 to empty,
    3. remove the former "portValue != canonicalized" rejection
       which also rejected valid canonicalizations like "8\t0" -> "80"
       and "80x" -> "80" (state-override truncation).

All 365 WPT cases pass (2 skipped for advanced unicode features Go's
regexp engine does not support).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 20, 2026 11:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates this library’s URLPattern conformance by syncing to the latest WHATWG URL Pattern WPT snapshot and adjusting canonicalization/match-result behavior to align with the new cases.

Changes:

  • Refreshes testdata/urlpatterntestdata.json to the latest WPT dataset (365 cases) and updates expected outcomes.
  • Prevents panics in createComponentMatchResult by bounding subgroup iteration to the pattern’s known group-name list (dropping extra user-regex captures).
  • Aligns hostname/port canonicalization with the URL spec by relaxing validation-error behavior (strip tab/LF/CR) and updating port parsing rules (leading significant digit requirement; non-special scheme when protocol is absent).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
urlpattern.go Bounds capture-group extraction to avoid indexing past the group-name list.
parser.go Adjusts hostname parser strictness and rewrites port canonicalization to match updated WPT/spec behavior.
testdata/urlpatterntestdata.json Updates WPT-derived test corpus and expected results for new conformance cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dunglas dunglas merged commit fc2a960 into main Apr 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants