test(idn-hostname): reject fullwidth digits#927
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds test coverage to ensure idn-hostname rejects RFC 5892–disallowed fullwidth digits across multiple JSON Schema draft test suites.
Changes:
- Added a negative test case asserting fullwidth digits (U+FF11..U+FF13) are invalid for
idn-hostname. - Applied the same test addition to v1, draft7, draft2019-09, and draft2020-12 optional format suites.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/v1/format/idn-hostname.json | Adds invalid-case test for fullwidth digits in idn-hostname. |
| tests/draft7/optional/format/idn-hostname.json | Mirrors the new invalid-case test for draft7 optional formats. |
| tests/draft2019-09/optional/format/idn-hostname.json | Mirrors the new invalid-case test for draft2019-09 optional formats. |
| tests/draft2020-12/optional/format/idn-hostname.json | Mirrors the new invalid-case test for draft2020-12 optional formats. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jviotti
left a comment
There was a problem hiding this comment.
Awesome. Very interesting that some implementations fail on this!
d56d669 to
2fd334c
Compare
|
This isn't correct. See my comment in a spec discussion about this stuff for a little more detail. In short, IDNA2008 is just a framework. It's not a complete specification for IDNs by itself. It defines a "mapping" step and leaves that to other specifications to define. UTS 46 is the mapping step. You can't have an IDN validator without a mapper. The spec doesn't specify what mapping to use, but since UTS 46 is the only one that's been formally defined and it's what everyone implements, I think it's fair to assume UTS 46 as the mapper. |
2fd334c to
335f8bd
Compare
|
You're right @jdesrosiers. I had these as strict IDNA2008 with no mapping step. Under UTS 46, U+FF11-13 map to |
|
I spent some time digging into this and I think you are right @jdesrosiers . This is a tricky subtle one. Unicode releases a test suite here in fact (https://github.com/unicode-org/unicodetools/blob/final-17.0-20250910/unicodetools/data/idna/16.0.0/IdnaTestV2.txt), that does assume the UTS 46 mapper. Aligning to that suite, which according to my read, would mean aligning with @jdesrosiers 's last comment, seems reasonable? Looks like I'll need to improve my implementation then 😅 |
See: json-schema-org/JSON-Schema-Test-Suite#927 Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
See: json-schema-org/JSON-Schema-Test-Suite#927 Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
See: json-schema-org/JSON-Schema-Test-Suite#927 Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
See: json-schema-org/JSON-Schema-Test-Suite#927 Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Following the methodology I used for ipv4 (#907) and the earlier idn-hostname tests, I read RFC 5892 section 2.6 and found the current idn-hostname.json has no test for fullwidth digits in a label.
Fullwidth digits U+FF11-U+FF13 have the DISALLOWED property under IDNA2008, so a label made of them is not a valid U-label. UTS46 processors map them to ASCII
123and accept; strict IDNA2008 rejects. This parallels the merged ipv4 fullwidth/astral digit test (#907).Changes
123(U+FF11 U+FF12 U+FF13) is invalid.Ecosystem Impact
is_idn_hostnamecallsidna.encode, which raises InvalidCodepoint on U+FF11.RFC References
Reproduction commands and the idn-hostname cross-implementation matrix are in my evidence repo: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-hostname.md
Related: #965