Skip to content

test(idn-email): add Unicode NFC normalization cases#945

Open
vtushar06 wants to merge 1 commit into
json-schema-org:mainfrom
vtushar06:idn-email-nfc-normalization
Open

test(idn-email): add Unicode NFC normalization cases#945
vtushar06 wants to merge 1 commit into
json-schema-org:mainfrom
vtushar06:idn-email-nfc-normalization

Conversation

@vtushar06

Copy link
Copy Markdown
Contributor

Following the methodology I used for ipv4 and uuid, I read RFC 6531 section 3.3 and RFC 5890 section 2.3.2.1, and found the suite has no test for Unicode normalization in an idn-email.

RFC 5890 section 2.3.2.1 says a domain U-label is "in Normalization Form C (NFC)", so a decomposed domain label is invalid. RFC 6531 builds the local part on RFC 5321 and adds a UTF8-non-ascii alternative, but it sets no normalization requirement on the local part. So the same decomposed sequence "cafe" + U+0301 (combining acute) is invalid as a domain label but valid as a local part. The existing internationalized test (the Hangul address) is already in NFC, so neither case is covered today.

Changes

  • Added 2 test cases across draft7, draft2019-09, draft2020-12, and v1.
  • user@cafe<U+0301>.com - a domain label that is not in NFC - invalid.
  • cafe<U+0301>@example.com - the same decomposed sequence in the local part - valid.

Ecosystem Impact

  1. python-jsonschema 4.x: accepts both. Its idn-email check is only "@" in instance, so it never looks at normalization - it is wrong on the first case.
  2. ajv-formats 3.x: has no idn-email format at all, so it does not validate the format and accepts both - also wrong on the first case.
  3. A strict IDNA check on the domain (for example python idna.encode): rejects the decomposed domain label, which is the correct verdict. Validators that normalize the domain (UTS-46 / NFC) before checking accept it instead, so the decomposed form passes silently.

A question on the second case

RFC 6531 does not require the local part to be in NFC - I read every section and there is no normalization rule on the local part, only on the domain U-label (RFC 5890). So I marked the decomposed local part valid. If the intent is that idn-email should normalize or reject a non-NFC local part, this is the one to flip - happy to do that. The domain case is not in question.

RFC References

Reproduction and the idn-email cross-implementation matrix are in my evidence repo: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-email.md

Related: #965

Copilot AI review requested due to automatic review settings June 16, 2026 16:39
@vtushar06 vtushar06 requested a review from a team as a code owner June 16, 2026 16:39

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds additional idn-email format tests to cover Unicode normalization (NFC vs non-NFC) handling for the domain label and local-part.

Changes:

  • Add a failing case for a non-NFC domain label (user@cafe\u0301.com)
  • Add a passing case for a non-NFC local part (cafe\u0301@example.com)
  • Apply the same new test cases across v1, draft7, draft2019-09, and draft2020-12 suites

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/v1/format/idn-email.json Adds NFC normalization edge cases for idn-email in v1 tests.
tests/draft7/optional/format/idn-email.json Adds NFC normalization edge cases for idn-email in draft7 optional tests.
tests/draft2019-09/optional/format/idn-email.json Adds NFC normalization edge cases for idn-email in draft2019-09 optional tests.
tests/draft2020-12/optional/format/idn-email.json Adds NFC normalization edge cases for idn-email in draft2020-12 optional tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/v1/format/idn-email.json Outdated
Comment thread tests/draft7/optional/format/idn-email.json
@jviotti

jviotti commented Jun 16, 2026

Copy link
Copy Markdown
Member

This is an interesting one. I guess it comes down to whether we enforce the SHOULD or not. I propose either:

  1. We interpret the SHOULD as a MUST on the test suite for robustness
  2. We extract this as a yet another optional test file that depends on the NFC support

@jdesrosiers cc @json-schema-org/tsc what do you think?

@jviotti

jviotti commented Jun 16, 2026

Copy link
Copy Markdown
Member

Actually, because it is a SHOULD, I guess we must not reject it (as it is technically valid as per the spec?)

@jviotti

jviotti commented Jun 19, 2026

Copy link
Copy Markdown
Member

I went through this again. My analysis is that NFC is indeed a SHOULD for this case, so we CANNOT reject an idn-email that is not NFC. We should play it conservative even though the RFC does mention non-NFC is discouraged

@jdesrosiers jdesrosiers left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jviotti. These should both be allowed. The spec doesn't require NFC, it just recommends it.

@vtushar06 vtushar06 force-pushed the idn-email-nfc-normalization branch from b6a2263 to 41fc4d9 Compare July 1, 2026 12:46
@vtushar06

vtushar06 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@jdesrosiers @jviotti Makes sense since NFC is a SHOULD here we can't reject it. I've flipped the domain label case to valid, so both tests now assert a non-NFC idn-email is accepted rather than rejected.

NFC is a SHOULD not a MUST for idn-email, so a non-NFC local part or domain label is valid and must not be rejected.
@vtushar06 vtushar06 force-pushed the idn-email-nfc-normalization branch from 41fc4d9 to 43d53ff Compare July 1, 2026 12:56
@vtushar06

Copy link
Copy Markdown
Contributor Author

@jviotti one knock-on I want to flag from this - Core's is_idn_hostname rejects a non-NFC domain, and is_idn_email just delegates the domain to it, so Core itself will fail this new valid:true test once it merges. and since the idn-email domain is basically an idn-hostname (same RFC 5890 §2.3.2.1 U-label NFC rule), doesn't the same SHOULD apply to idn-hostname too?

so two things really: do you want Core to relax the domain NFC check to match, and should idn-hostname accept non-NFC as well - or keep that a separate call?

@jviotti

jviotti commented Jul 1, 2026

Copy link
Copy Markdown
Member

@vtushar06 Yeah, I'll be relaxing this on the Core side. What I had built there was under the assumption that NFC was mandatory. I'll handle it!

jviotti added a commit to sourcemeta/core that referenced this pull request Jul 1, 2026
See: json-schema-org/JSON-Schema-Test-Suite#945
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants