fix(intl): #5585 — ECMA-402 option/locale validation for the six small constructors#5727
Conversation
📝 WalkthroughWalkthroughThe Intl runtime now canonicalizes locale inputs, applies stricter option coercion, and updates Collator, Segmenter, ListFormat, RelativeTimeFormat, PluralRules, and supportedLocalesOf behavior. Collator comparison and resolvedOptions now read stored state, and RelativeTimeFormat/PluralRules outputs use updated validation and defaults. ChangesIntl locale and option semantics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-runtime/src/intl.rs`:
- Around line 1521-1527: In supportedLocalesOf, the locale list must be
canonicalized before any options validation so locale-list errors and side
effects happen first. Move the CanonicalizeLocaleList/locales_from_value step
ahead of the localeMatcher handling in the intl.rs logic, then validate options
afterward via coerce_options_reject_null and enum_option_strict. Keep the fix
scoped to the supportedLocalesOf path and preserve the existing locale filtering
behavior.
- Around line 1208-1214: Reject reserved Collator collation values in the Intl
options validation path: update the collation handling in intl.rs (the
get_option_string_coerced / is_well_formed_numbering_system check) so that
standard and search are treated as invalid option values and trigger
throw_range_error before locale-extension resolution. Keep the existing
validation flow in this options parsing block and add the reserved-value check
alongside the current collation validation so invalid input is rejected
immediately.
- Around line 616-620: The Unicode extension scan in intl.rs is incorrectly
treating private-use sequences like x-u-* as a real Unicode extension. Update
the parsing logic around the iterator loop that looks for "u" so it first
detects a preceding "x" private-use section and stops scanning there, ensuring
tags like en-x-u-kn do not set Unicode keyword flags such as numeric in the
Unicode extension parser.
- Around line 549-554: In CanonicalizeLocaleList, the locale array/object
iteration currently reads every index directly via js_array_get_f64 and can pass
missing entries into push_locale_element; add a HasProperty-style check before
reading each element so sparse arrays and objects with holes are skipped instead
of throwing. Apply the same fix in the second locale-list loop near the other
occurrence of push_locale_element, keeping the behavior consistent across both
paths.
- Around line 523-525: The remaining CanonicalizeLocaleList branch in intl.rs
still calls canonical_locale for tag normalization/validation; update this path
to use canonicalize_language_tag instead, matching the feature-aware helper
already used in intl/locales.rs. Keep the surrounding throw_invalid_language_tag
handling intact, but ensure the CanonicalizeLocaleList flow consistently routes
through the same locale-canonicalization helper in the relevant locale-list code
paths.
In `@crates/perry-runtime/src/intl/date_collator.rs`:
- Around line 403-411: The is_punctuation helper is too broad and is stripping
non-punctuation characters like letters, symbols, and superscript digits under
ignorePunctuation. Tighten the logic in is_punctuation in date_collator.rs by
replacing the wide Unicode ranges with a punctuation/category-based check or a
narrower explicit set so only true punctuation is ignored and distinct strings
do not compare equal.
- Around line 345-365: The normalization in collation_normalize is breaking the
Swedish fast path because compare_strings now feeds decomposed accents into
swedish_collation_key. Change the normalization strategy to NFC, or update
swedish_collation_key to treat decomposed å/ä/ö the same as precomposed forms,
so the sv locale branch still orders those characters correctly while preserving
canonical-equivalence handling.
In `@crates/perry-runtime/src/intl/list_relative_plural.rs`:
- Around line 316-317: The `ListRelativePlural` number conversion is using
`js_number_coerce`, which incorrectly accepts BigInt instead of throwing in the
`format` and `selectRange` paths. Replace this with a strict ToNumber-style
helper in `list_relative_plural.rs` so `ToPrimitive` rejects BigInt for both
call sites, and keep the existing `unit_arg` string conversion unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: af3e1bd5-6cc7-456e-9099-b7786ad05eec
📒 Files selected for processing (3)
crates/perry-runtime/src/intl.rscrates/perry-runtime/src/intl/date_collator.rscrates/perry-runtime/src/intl/list_relative_plural.rs
…l constructors
Brings the small Intl constructors (Collator, Segmenter, ListFormat,
RelativeTimeFormat, PluralRules) up to spec on option coercion, locale-list
handling, and a few behavioral gaps. test262 intl402 parity across these six
dirs goes from 241/387 (62%) to 297/387 (77%) — 56 cases fixed, no regressions.
Shared helpers (intl.rs):
- `get_option_string_coerced` / `enum_option_strict` — GetOption with full
ToString coercion (null → "null", Symbol → TypeError), so `options-*-invalid`
values are rejected instead of silently defaulting.
- `get_options_object` (GetOptionsObject: non-Object/non-undefined → TypeError)
and `coerce_options_reject_null` (ToObject: null → TypeError), wired per the
spec step each constructor uses.
- `locales_from_value` now follows CanonicalizeLocaleList: null → TypeError,
array-like Objects are iterated (`{0:"DE",length:1}` → `["de"]`), element
type-checks throw TypeError, and a throwing/Symbol `length` propagates.
- `supportedLocalesOf` validates its `options` arg (localeMatcher) and filters
the request list through a BestAvailableLocale check, so the result length
matches (e.g. `zxx` is dropped).
- Per-constructor: read `localeMatcher` (+ `numberingSystem` for RTF) in the
exact GetOption order the `options-order` / read-order tests assert.
Collator (date_collator.rs):
- NFD-normalize before comparing so canonically-equivalent strings collate 0.
- Resolve `numeric`/`caseFirst`/`collation` from options or the locale's
`-u-kn`/`-u-kf`/`-u-co` extension keywords; reflect them (plus `usage`,
`sensitivity`, `ignorePunctuation`, `collation`) in resolvedOptions in spec
order. `ignorePunctuation` now strips ignorable punctuation in compare.
RelativeTimeFormat / PluralRules (list_relative_plural.rs):
- RTF format/formatToParts use full ToNumber (Symbol → TypeError, object valueOf
honoured); units are case-sensitive; formatToParts attaches the `unit` field
to the numeric parts.
- PluralRules reads options in SetNumberFormatDigitOptions order, adds
`notation`/`compactDisplay`, and `selectRange` throws TypeError for undefined
or Symbol arguments.
Out of scope (still failing, need CLDR data or larger runtime work): locale-
specific formatting/collation, the prototype accessor-getter shape for
compare/format, %Segments.prototype%.containing, subclassing/new.target, and
@@toStringTag inheritance on instances.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- cargo fmt (the lint check failed on formatting).
- Route the CanonicalizeLocaleList branches through `canonicalize_language_tag`
(feature-aware, matches intl/locales.rs) instead of `canonical_locale`.
- Skip absent indices via HasProperty when walking array-like locale objects
(`{length:3,0:"en"}` → `["en"]`, not a TypeError on the holes).
- `unicode_extension_keyword` stops at the `x` private-use singleton so
`en-x-u-kn` is not misread as a `kn` Unicode extension.
- Collator `collation` option rejects the reserved `standard`/`search` values
(and malformed types); a valid option now wins over the `-u-co-` keyword.
- `supportedLocalesOf` runs CanonicalizeLocaleList(locales) before reading
`options` (locale errors/side-effects precede option validation).
- Collator compare normalizes with NFC (not NFD) so the Swedish fast path keeps
its precomposed å/ä/ö; `ignorePunctuation` stripping uses an explicit
punctuation set instead of whole Latin-1 ranges (no longer eats ª/µ/º/¹²³/¼).
- RTF `format`/`formatToParts` and PluralRules `selectRange` use a ToNumber that
rejects BigInt with TypeError (`format(1n,"day")` throws).
test262 intl402 across the six dirs: 307/387 pass (79%), up from 297 — 10 more
cases, no regressions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c1c42c9 to
b02914c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-runtime/src/intl/date_collator.rs`:
- Around line 345-349: Update the stale documentation in date_collator’s
normalization comment so it matches the actual behavior in the Collator.compare
path. The header currently says NFD, but the implementation and nearby comment
in the same block use s.nfc() and intentionally preserve the Swedish precomposed
fast path; change the wording to NFC and keep the explanation aligned with the
canonical-equivalence behavior described in the Collator logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 94f68f15-05ff-4374-92bf-e268a1a1f629
📒 Files selected for processing (3)
crates/perry-runtime/src/intl.rscrates/perry-runtime/src/intl/date_collator.rscrates/perry-runtime/src/intl/list_relative_plural.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/perry-runtime/src/intl/list_relative_plural.rs
- crates/perry-runtime/src/intl.rs
| /// Normalize to NFD so canonically-equivalent strings (e.g. `"ö"` precomposed | ||
| /// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that | ||
| /// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent | ||
| /// -strings.js). Without `string-normalize` this is an identity passthrough, so | ||
| /// the precomposed/decomposed pair still compares unequal (best effort). |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Stale doc header: says NFD but code/behavior is NFC.
The header comment states "Normalize to NFD", which contradicts both the inline comment on Lines 353-354 ("NFC (composition), not NFD") and the implementation (s.nfc()). Since the switch to NFC was the deliberate fix to preserve the Swedish precomposed å/ä/ö fast path, the header is misleading. Align it with the actual normalization form.
📝 Proposed doc fix
-/// Normalize to NFD so canonically-equivalent strings (e.g. `"ö"` precomposed
-/// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that
+/// Normalize to NFC so canonically-equivalent strings (e.g. `"ö"` precomposed
+/// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that
/// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent
/// -strings.js). Without `string-normalize` this is an identity passthrough, so
/// the precomposed/decomposed pair still compares unequal (best effort).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Normalize to NFD so canonically-equivalent strings (e.g. `"ö"` precomposed | |
| /// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that | |
| /// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent | |
| /// -strings.js). Without `string-normalize` this is an identity passthrough, so | |
| /// the precomposed/decomposed pair still compares unequal (best effort). | |
| /// Normalize to NFC so canonically-equivalent strings (e.g. `"ö"` precomposed | |
| /// vs. `"ö"` decomposed) collate equal — the ECMA-402 requirement that | |
| /// `Collator.compare` treats canonical equivalents as 0 (canonically-equivalent | |
| /// -strings.js). Without `string-normalize` this is an identity passthrough, so | |
| /// the precomposed/decomposed pair still compares unequal (best effort). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-runtime/src/intl/date_collator.rs` around lines 345 - 349,
Update the stale documentation in date_collator’s normalization comment so it
matches the actual behavior in the Collator.compare path. The header currently
says NFD, but the implementation and nearby comment in the same block use
s.nfc() and intentionally preserve the Swedish precomposed fast path; change the
wording to NFC and keep the explanation aligned with the canonical-equivalence
behavior described in the Collator logic.
Closes #5585 (substantially — the structural tail; see "Remaining" below).
What
Brings the six small
Intlconstructors up to ECMA-402 spec on option coercion, locale-list handling, and several behavioral gaps that the test262intl402tail exercises.Measured against the pinned test262 (
scripts/test262_subset.py --dir intl402/Collator intl402/Segmenter intl402/ListFormat intl402/RelativeTimeFormat intl402/PluralRules intl402/DisplayNames):56 cases fixed, 0 regressions (verified by diffing the baseline vs. final failure sets; the few tests "newly" in the fail list were already failing pre-change but absent from the capped baseline sample, and now advance further).
Changes
Shared (
intl.rs)get_option_string_coerced/enum_option_strict— GetOption with fullToStringcoercion (null→"null", Symbol → TypeError) sooptions-*-invalidvalues are rejected, not silently defaulted.get_options_object(GetOptionsObject) andcoerce_options_reject_null(ToObject,null→ TypeError), wired per the spec step each constructor uses (ListFormat/Segmenter/PluralRules = GetOptionsObject; RelativeTimeFormat/Collator = ToObject).locales_from_valuenow follows CanonicalizeLocaleList:null→ TypeError, array-like Objects iterated ({0:"DE",length:1}→["de"]), element type-checks, and a throwing/Symbollengthpropagates.supportedLocalesOfvalidatesoptions(localeMatcher) and filters the request list via a BestAvailableLocale check (so e.g.zxxis dropped and the length matches).localeMatcher(+numberingSystemfor RTF) in the exact GetOption order theoptions-order/read-order tests assert.Collator (
date_collator.rs)0.numeric/caseFirst/collationfrom options or the locale's-u-kn/-u-kf/-u-cokeywords; reflect them (+usage,sensitivity,ignorePunctuation) inresolvedOptionsin spec order;ignorePunctuationstrips ignorable punctuation incompare.RelativeTimeFormat / PluralRules (
list_relative_plural.rs)format/formatToPartsuse full ToNumber (Symbol → TypeError, objectvalueOfhonoured); units are case-sensitive;formatToPartsattaches theunitfield to numeric parts.notation/compactDisplay, andselectRangethrows TypeError forundefined/Symbol args.Remaining (out of scope — need CLDR data or larger runtime work)
Locale-specific formatting/collation (
"foo y bar", phonebook/pinyin ordering, non-en plural categories), the prototype accessor-getter shape forcompare/format,%Segments.prototype%.containing, subclassing/new.target, and@@toStringTaginheritance on instances.Testing
builtin_prototype_methods_reject_dynamic_newfailure is the known pre-existing test-isolation flake; passes in isolation and onmain).supportedLocalesOf+ locales handling for regressions from the sharedlocales_from_valuechange — none.Per maintainer convention, no version bump / CHANGELOG entry (folded in at merge).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
supportedLocalesOfand Intl constructors now apply more consistent locale/option handling, including canonicalization and improvedresolvedOptionsreporting (with spec-aligned ordering and defaults).Intl.RelativeTimeFormat.formatToParts()now returns the resolved unit for non-literal parts.Bug Fixes
Intl.Collatorcomparisons now normalize strings (when enabled) and honorignorePunctuationby ignoring whitespace and ignorable punctuation;resolvedOptionsnow reflects the actualignorePunctuation,collation,numeric, andcaseFirst.Intl.PluralRules.selectRangethrowingTypeErrorwhenstart/endisundefined, and more accuratenotation/compactDisplayinresolvedOptions.