Skip to content

fix(intl): #5585 — ECMA-402 option/locale validation for the six small constructors#5727

Merged
proggeramlug merged 2 commits into
mainfrom
worktree-fix-5585-intl-small-constructors
Jun 27, 2026
Merged

fix(intl): #5585 — ECMA-402 option/locale validation for the six small constructors#5727
proggeramlug merged 2 commits into
mainfrom
worktree-fix-5585-intl-small-constructors

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Closes #5585 (substantially — the structural tail; see "Remaining" below).

What

Brings the six small Intl constructors up to ECMA-402 spec on option coercion, locale-list handling, and several behavioral gaps that the test262 intl402 tail exercises.

Measured against the pinned test262 (scripts/test262_subset.py --dir intl402/Collator intl402/Segmenter intl402/ListFormat intl402/RelativeTimeFormat intl402/PluralRules intl402/DisplayNames):

pass runtime-fail parity
before 241 146 62.3%
after 297 90 76.7%

56 cases fixed, 0 regressions (verified by diffing the baseline vs. final failure sets; the few tests "newly" in the fail list were already failing pre-change but absent from the capped baseline sample, and now advance further).

Changes

Shared (intl.rs)

  • get_option_string_coerced / enum_option_strict — GetOption with full ToString coercion (null"null", Symbol → TypeError) so options-*-invalid values are rejected, not silently defaulted.
  • get_options_object (GetOptionsObject) and coerce_options_reject_null (ToObject, null → TypeError), wired per the spec step each constructor uses (ListFormat/Segmenter/PluralRules = GetOptionsObject; RelativeTimeFormat/Collator = ToObject).
  • locales_from_value now follows CanonicalizeLocaleList: null → TypeError, array-like Objects iterated ({0:"DE",length:1}["de"]), element type-checks, and a throwing/Symbol length propagates.
  • supportedLocalesOf validates options (localeMatcher) and filters the request list via a BestAvailableLocale check (so e.g. zxx is dropped and the length matches).
  • Each constructor reads localeMatcher (+ numberingSystem for RTF) in the exact GetOption order the options-order/read-order tests assert.

Collator (date_collator.rs)

  • NFD-normalize before comparing → canonically-equivalent strings collate 0.
  • Resolve numeric/caseFirst/collation from options or the locale's -u-kn/-u-kf/-u-co keywords; reflect them (+ usage, sensitivity, ignorePunctuation) in resolvedOptions in spec order; ignorePunctuation strips ignorable punctuation in compare.

RelativeTimeFormat / PluralRules (list_relative_plural.rs)

  • RTF format/formatToParts use full ToNumber (Symbol → TypeError, object valueOf honoured); units are case-sensitive; formatToParts attaches the unit field to numeric parts.
  • PluralRules reads options in SetNumberFormatDigitOptions order, adds notation/compactDisplay, and selectRange throws TypeError for undefined/Symbol args.

Remaining (out of scope — need CLDR data or larger runtime work)

Locale-specific formatting/collation ("foo y bar", phonebook/pinyin ordering, non-en plural categories), the prototype accessor-getter shape for compare/format, %Segments.prototype%.containing, subclassing/new.target, and @@toStringTag inheritance on instances.

Testing

  • Full perry-runtime test suite green (the lone builtin_prototype_methods_reject_dynamic_new failure is the known pre-existing test-isolation flake; passes in isolation and on main).
  • Spot-checked NumberFormat/DateTimeFormat supportedLocalesOf + locales handling for regressions from the shared locales_from_value change — none.

Per maintainer convention, no version bump / CHANGELOG entry (folded in at merge).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • supportedLocalesOf and Intl constructors now apply more consistent locale/option handling, including canonicalization and improved resolvedOptions reporting (with spec-aligned ordering and defaults).
    • Intl.RelativeTimeFormat.formatToParts() now returns the resolved unit for non-literal parts.
  • Bug Fixes

    • Intl.Collator comparisons now normalize strings (when enabled) and honor ignorePunctuation by ignoring whitespace and ignorable punctuation; resolvedOptions now reflects the actual ignorePunctuation, collation, numeric, and caseFirst.
    • Stricter validation/throw behavior across Intl types, including Intl.PluralRules.selectRange throwing TypeError when start/end is undefined, and more accurate notation/compactDisplay in resolvedOptions.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The Intl runtime now canonicalizes locale inputs, applies stricter option coercion, and updates Collator, Segmenter, ListFormat, RelativeTimeFormat, PluralRules, and supportedLocalesOf behavior. Collator comparison and resolvedOptions now read stored state, and RelativeTimeFormat/PluralRules outputs use updated validation and defaults.

Changes

Intl locale and option semantics

Layer / File(s) Summary
Locale and option contracts
crates/perry-runtime/src/intl.rs
Adds internal slot keys, locale canonicalization and availability filtering, Unicode extension keyword lookup, strict option coercion helpers, and supportedLocalesOf argument handling.
Collator constructor and comparison
crates/perry-runtime/src/intl.rs, crates/perry-runtime/src/intl/date_collator.rs
Collator constructor state now comes from strict coercion and locale keywords, and compare/resolvedOptions use normalization, ignorePunctuation filtering, and stored collator fields.
Segmenter, ListFormat, and RelativeTimeFormat constructors
crates/perry-runtime/src/intl.rs
Segmenter, ListFormat, and RelativeTimeFormat constructors now use strict option handling, including localeMatcher, granularity, and numberingSystem validation.
RelativeTimeFormat formatToParts
crates/perry-runtime/src/intl/list_relative_plural.rs
RelativeTimeFormat formatToParts now shares coercion and JS array construction across call paths, and attaches the resolved unit to non-literal parts.
PluralRules constructor and outputs
crates/perry-runtime/src/intl.rs, crates/perry-runtime/src/intl/list_relative_plural.rs
PluralRules constructor, selectRange, and resolvedOptions now use strict option handling, undefined checks, and notation and compactDisplay fields from stored state.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • PerryTS/perry#5320: Shares the Intl constructor and formatToParts / resolvedOptions implementation area that this PR further adjusts.
  • PerryTS/perry#5729: Also changes intl.rs option coercion and null handling on overlapping ECMA-402 option paths.

Poem

I hopped through locales, nibbling at the breeze,
Found collator carrots tucked between the trees.
formatToParts went patter, resolvedOptions went bright,
With punctuation in my whiskers, I bounded through the night.
🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main ECMA-402 Intl validation changes.
Description check ✅ Passed The description covers summary, concrete changes, testing, and remaining scope, so it mostly matches the template.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-fix-5585-intl-small-constructors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-runtime/src/intl.rs`:
- Around line 1521-1527: In supportedLocalesOf, the locale list must be
canonicalized before any options validation so locale-list errors and side
effects happen first. Move the CanonicalizeLocaleList/locales_from_value step
ahead of the localeMatcher handling in the intl.rs logic, then validate options
afterward via coerce_options_reject_null and enum_option_strict. Keep the fix
scoped to the supportedLocalesOf path and preserve the existing locale filtering
behavior.
- Around line 1208-1214: Reject reserved Collator collation values in the Intl
options validation path: update the collation handling in intl.rs (the
get_option_string_coerced / is_well_formed_numbering_system check) so that
standard and search are treated as invalid option values and trigger
throw_range_error before locale-extension resolution. Keep the existing
validation flow in this options parsing block and add the reserved-value check
alongside the current collation validation so invalid input is rejected
immediately.
- Around line 616-620: The Unicode extension scan in intl.rs is incorrectly
treating private-use sequences like x-u-* as a real Unicode extension. Update
the parsing logic around the iterator loop that looks for "u" so it first
detects a preceding "x" private-use section and stops scanning there, ensuring
tags like en-x-u-kn do not set Unicode keyword flags such as numeric in the
Unicode extension parser.
- Around line 549-554: In CanonicalizeLocaleList, the locale array/object
iteration currently reads every index directly via js_array_get_f64 and can pass
missing entries into push_locale_element; add a HasProperty-style check before
reading each element so sparse arrays and objects with holes are skipped instead
of throwing. Apply the same fix in the second locale-list loop near the other
occurrence of push_locale_element, keeping the behavior consistent across both
paths.
- Around line 523-525: The remaining CanonicalizeLocaleList branch in intl.rs
still calls canonical_locale for tag normalization/validation; update this path
to use canonicalize_language_tag instead, matching the feature-aware helper
already used in intl/locales.rs. Keep the surrounding throw_invalid_language_tag
handling intact, but ensure the CanonicalizeLocaleList flow consistently routes
through the same locale-canonicalization helper in the relevant locale-list code
paths.

In `@crates/perry-runtime/src/intl/date_collator.rs`:
- Around line 403-411: The is_punctuation helper is too broad and is stripping
non-punctuation characters like letters, symbols, and superscript digits under
ignorePunctuation. Tighten the logic in is_punctuation in date_collator.rs by
replacing the wide Unicode ranges with a punctuation/category-based check or a
narrower explicit set so only true punctuation is ignored and distinct strings
do not compare equal.
- Around line 345-365: The normalization in collation_normalize is breaking the
Swedish fast path because compare_strings now feeds decomposed accents into
swedish_collation_key. Change the normalization strategy to NFC, or update
swedish_collation_key to treat decomposed å/ä/ö the same as precomposed forms,
so the sv locale branch still orders those characters correctly while preserving
canonical-equivalence handling.

In `@crates/perry-runtime/src/intl/list_relative_plural.rs`:
- Around line 316-317: The `ListRelativePlural` number conversion is using
`js_number_coerce`, which incorrectly accepts BigInt instead of throwing in the
`format` and `selectRange` paths. Replace this with a strict ToNumber-style
helper in `list_relative_plural.rs` so `ToPrimitive` rejects BigInt for both
call sites, and keep the existing `unit_arg` string conversion unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: af3e1bd5-6cc7-456e-9099-b7786ad05eec

📥 Commits

Reviewing files that changed from the base of the PR and between 9fb1940 and d4e624c.

📒 Files selected for processing (3)
  • crates/perry-runtime/src/intl.rs
  • crates/perry-runtime/src/intl/date_collator.rs
  • crates/perry-runtime/src/intl/list_relative_plural.rs

Comment thread crates/perry-runtime/src/intl.rs Outdated
Comment thread crates/perry-runtime/src/intl.rs
Comment thread crates/perry-runtime/src/intl.rs Outdated
Comment thread crates/perry-runtime/src/intl.rs Outdated
Comment thread crates/perry-runtime/src/intl.rs Outdated
Comment thread crates/perry-runtime/src/intl/date_collator.rs
Comment thread crates/perry-runtime/src/intl/date_collator.rs
Comment thread crates/perry-runtime/src/intl/list_relative_plural.rs Outdated
Ralph and others added 2 commits June 27, 2026 01:58
…l constructors

Brings the small Intl constructors (Collator, Segmenter, ListFormat,
RelativeTimeFormat, PluralRules) up to spec on option coercion, locale-list
handling, and a few behavioral gaps. test262 intl402 parity across these six
dirs goes from 241/387 (62%) to 297/387 (77%) — 56 cases fixed, no regressions.

Shared helpers (intl.rs):
- `get_option_string_coerced` / `enum_option_strict` — GetOption with full
  ToString coercion (null → "null", Symbol → TypeError), so `options-*-invalid`
  values are rejected instead of silently defaulting.
- `get_options_object` (GetOptionsObject: non-Object/non-undefined → TypeError)
  and `coerce_options_reject_null` (ToObject: null → TypeError), wired per the
  spec step each constructor uses.
- `locales_from_value` now follows CanonicalizeLocaleList: null → TypeError,
  array-like Objects are iterated (`{0:"DE",length:1}` → `["de"]`), element
  type-checks throw TypeError, and a throwing/Symbol `length` propagates.
- `supportedLocalesOf` validates its `options` arg (localeMatcher) and filters
  the request list through a BestAvailableLocale check, so the result length
  matches (e.g. `zxx` is dropped).
- Per-constructor: read `localeMatcher` (+ `numberingSystem` for RTF) in the
  exact GetOption order the `options-order` / read-order tests assert.

Collator (date_collator.rs):
- NFD-normalize before comparing so canonically-equivalent strings collate 0.
- Resolve `numeric`/`caseFirst`/`collation` from options or the locale's
  `-u-kn`/`-u-kf`/`-u-co` extension keywords; reflect them (plus `usage`,
  `sensitivity`, `ignorePunctuation`, `collation`) in resolvedOptions in spec
  order. `ignorePunctuation` now strips ignorable punctuation in compare.

RelativeTimeFormat / PluralRules (list_relative_plural.rs):
- RTF format/formatToParts use full ToNumber (Symbol → TypeError, object valueOf
  honoured); units are case-sensitive; formatToParts attaches the `unit` field
  to the numeric parts.
- PluralRules reads options in SetNumberFormatDigitOptions order, adds
  `notation`/`compactDisplay`, and `selectRange` throws TypeError for undefined
  or Symbol arguments.

Out of scope (still failing, need CLDR data or larger runtime work): locale-
specific formatting/collation, the prototype accessor-getter shape for
compare/format, %Segments.prototype%.containing, subclassing/new.target, and
@@toStringTag inheritance on instances.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- cargo fmt (the lint check failed on formatting).
- Route the CanonicalizeLocaleList branches through `canonicalize_language_tag`
  (feature-aware, matches intl/locales.rs) instead of `canonical_locale`.
- Skip absent indices via HasProperty when walking array-like locale objects
  (`{length:3,0:"en"}` → `["en"]`, not a TypeError on the holes).
- `unicode_extension_keyword` stops at the `x` private-use singleton so
  `en-x-u-kn` is not misread as a `kn` Unicode extension.
- Collator `collation` option rejects the reserved `standard`/`search` values
  (and malformed types); a valid option now wins over the `-u-co-` keyword.
- `supportedLocalesOf` runs CanonicalizeLocaleList(locales) before reading
  `options` (locale errors/side-effects precede option validation).
- Collator compare normalizes with NFC (not NFD) so the Swedish fast path keeps
  its precomposed å/ä/ö; `ignorePunctuation` stripping uses an explicit
  punctuation set instead of whole Latin-1 ranges (no longer eats ª/µ/º/¹²³/¼).
- RTF `format`/`formatToParts` and PluralRules `selectRange` use a ToNumber that
  rejects BigInt with TypeError (`format(1n,"day")` throws).

test262 intl402 across the six dirs: 307/387 pass (79%), up from 297 — 10 more
cases, no regressions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@proggeramlug proggeramlug force-pushed the worktree-fix-5585-intl-small-constructors branch from c1c42c9 to b02914c Compare June 27, 2026 08:58

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-runtime/src/intl/date_collator.rs`:
- Around line 345-349: Update the stale documentation in date_collator’s
normalization comment so it matches the actual behavior in the Collator.compare
path. The header currently says NFD, but the implementation and nearby comment
in the same block use s.nfc() and intentionally preserve the Swedish precomposed
fast path; change the wording to NFC and keep the explanation aligned with the
canonical-equivalence behavior described in the Collator logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 94f68f15-05ff-4374-92bf-e268a1a1f629

📥 Commits

Reviewing files that changed from the base of the PR and between c1c42c9 and b02914c.

📒 Files selected for processing (3)
  • crates/perry-runtime/src/intl.rs
  • crates/perry-runtime/src/intl/date_collator.rs
  • crates/perry-runtime/src/intl/list_relative_plural.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/perry-runtime/src/intl/list_relative_plural.rs
  • crates/perry-runtime/src/intl.rs

Comment on lines +345 to +349
/// Normalize to NFD so canonically-equivalent strings (e.g. `"ö"` precomposed
/// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that
/// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent
/// -strings.js). Without `string-normalize` this is an identity passthrough, so
/// the precomposed/decomposed pair still compares unequal (best effort).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Stale doc header: says NFD but code/behavior is NFC.

The header comment states "Normalize to NFD", which contradicts both the inline comment on Lines 353-354 ("NFC (composition), not NFD") and the implementation (s.nfc()). Since the switch to NFC was the deliberate fix to preserve the Swedish precomposed å/ä/ö fast path, the header is misleading. Align it with the actual normalization form.

📝 Proposed doc fix
-/// Normalize to NFD so canonically-equivalent strings (e.g. `"ö"` precomposed
-/// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that
+/// Normalize to NFC so canonically-equivalent strings (e.g. `"ö"` precomposed
+/// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that
 /// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent
 /// -strings.js). Without `string-normalize` this is an identity passthrough, so
 /// the precomposed/decomposed pair still compares unequal (best effort).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Normalize to NFD so canonically-equivalent strings (e.g. `"ö"` precomposed
/// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that
/// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent
/// -strings.js). Without `string-normalize` this is an identity passthrough, so
/// the precomposed/decomposed pair still compares unequal (best effort).
/// Normalize to NFC so canonically-equivalent strings (e.g. `"ö"` precomposed
/// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that
/// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent
/// -strings.js). Without `string-normalize` this is an identity passthrough, so
/// the precomposed/decomposed pair still compares unequal (best effort).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-runtime/src/intl/date_collator.rs` around lines 345 - 349,
Update the stale documentation in date_collator’s normalization comment so it
matches the actual behavior in the Collator.compare path. The header currently
says NFD, but the implementation and nearby comment in the same block use
s.nfc() and intentionally preserve the Swedish precomposed fast path; change the
wording to NFC and keep the explanation aligned with the canonical-equivalence
behavior described in the Collator logic.

@proggeramlug proggeramlug merged commit d359615 into main Jun 27, 2026
15 checks passed
@proggeramlug proggeramlug deleted the worktree-fix-5585-intl-small-constructors branch June 27, 2026 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test262 intl402 small constructors — 80 fails (RelativeTimeFormat/Segmenter/Collator/ListFormat/PluralRules/DisplayNames)

1 participant