Skip to content

test(idn-hostname): reject fullwidth digits#927

Open
vtushar06 wants to merge 1 commit into
json-schema-org:mainfrom
vtushar06:idn-hostname-fullwidth-digits
Open

test(idn-hostname): reject fullwidth digits#927
vtushar06 wants to merge 1 commit into
json-schema-org:mainfrom
vtushar06:idn-hostname-fullwidth-digits

Conversation

@vtushar06

@vtushar06 vtushar06 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Following the methodology I used for ipv4 (#907) and the earlier idn-hostname tests, I read RFC 5892 section 2.6 and found the current idn-hostname.json has no test for fullwidth digits in a label.

Fullwidth digits U+FF11-U+FF13 have the DISALLOWED property under IDNA2008, so a label made of them is not a valid U-label. UTS46 processors map them to ASCII 123 and accept; strict IDNA2008 rejects. This parallels the merged ipv4 fullwidth/astral digit test (#907).

Changes

  • Added 1 test case across draft7, draft2019-09, draft2020-12, and v1.
  • 123 (U+FF11 U+FF12 U+FF13) is invalid.

Ecosystem Impact

  1. python-jsonschema 4.x: PASSES (rejects it). is_idn_hostname calls idna.encode, which raises InvalidCodepoint on U+FF11.
  2. libidn2, Go x/net/idna, Node (WHATWG), PHP idn_to_ascii, Rust idna, java.net.IDN, ICU4J, Guava: FAIL (accept it). UTS46 maps the fullwidth digits to ASCII and validates the mapped form. ICU4J, the reference UTS46 implementation, accepts it even with CHECK_BIDI and CHECK_CONTEXTJ enabled.

RFC References

Reproduction commands and the idn-hostname cross-implementation matrix are in my evidence repo: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-hostname.md

Related: #965

@vtushar06 vtushar06 requested a review from a team as a code owner June 10, 2026 14:46
Copilot AI review requested due to automatic review settings June 10, 2026 14:46

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds test coverage to ensure idn-hostname rejects RFC 5892–disallowed fullwidth digits across multiple JSON Schema draft test suites.

Changes:

  • Added a negative test case asserting fullwidth digits (U+FF11..U+FF13) are invalid for idn-hostname.
  • Applied the same test addition to v1, draft7, draft2019-09, and draft2020-12 optional format suites.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/v1/format/idn-hostname.json Adds invalid-case test for fullwidth digits in idn-hostname.
tests/draft7/optional/format/idn-hostname.json Mirrors the new invalid-case test for draft7 optional formats.
tests/draft2019-09/optional/format/idn-hostname.json Mirrors the new invalid-case test for draft2019-09 optional formats.
tests/draft2020-12/optional/format/idn-hostname.json Mirrors the new invalid-case test for draft2020-12 optional formats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/v1/format/idn-hostname.json

@jviotti jviotti left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Very interesting that some implementations fail on this!

@vtushar06 vtushar06 force-pushed the idn-hostname-fullwidth-digits branch from d56d669 to 2fd334c Compare June 24, 2026 20:55
@jdesrosiers

Copy link
Copy Markdown
Member

This isn't correct. See my comment in a spec discussion about this stuff for a little more detail.

In short, IDNA2008 is just a framework. It's not a complete specification for IDNs by itself. It defines a "mapping" step and leaves that to other specifications to define. UTS 46 is the mapping step. You can't have an IDN validator without a mapper. The spec doesn't specify what mapping to use, but since UTS 46 is the only one that's been formally defined and it's what everyone implements, I think it's fair to assume UTS 46 as the mapper.

@vtushar06 vtushar06 force-pushed the idn-hostname-fullwidth-digits branch from 2fd334c to 335f8bd Compare July 1, 2026 12:54
@vtushar06

Copy link
Copy Markdown
Contributor Author

You're right @jdesrosiers. I had these as strict IDNA2008 with no mapping step. Under UTS 46, U+FF11-13 map to 123 and U+200B is ignored, so 123 and a‍b both resolve to valid labels (123, ab), I confirmed against the Unicode 16 IdnaMappingT able and idna's uts46 mode. Happy to flip both to valid, or hold them until #1636 settles the mapping - whichever you prefer.

@jviotti

jviotti commented Jul 1, 2026

Copy link
Copy Markdown
Member

I spent some time digging into this and I think you are right @jdesrosiers . This is a tricky subtle one. Unicode releases a test suite here in fact (https://github.com/unicode-org/unicodetools/blob/final-17.0-20250910/unicodetools/data/idna/16.0.0/IdnaTestV2.txt), that does assume the UTS 46 mapper. Aligning to that suite, which according to my read, would mean aligning with @jdesrosiers 's last comment, seems reasonable?

Looks like I'll need to improve my implementation then 😅

jviotti added a commit to sourcemeta/core that referenced this pull request Jul 1, 2026
jviotti added a commit to sourcemeta/core that referenced this pull request Jul 1, 2026
jviotti added a commit to sourcemeta/core that referenced this pull request Jul 1, 2026
jviotti added a commit to sourcemeta/core that referenced this pull request Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants