test(idn-email): add Unicode NFC normalization cases#945
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds additional idn-email format tests to cover Unicode normalization (NFC vs non-NFC) handling for the domain label and local-part.
Changes:
- Add a failing case for a non-NFC domain label (
user@cafe\u0301.com) - Add a passing case for a non-NFC local part (
cafe\u0301@example.com) - Apply the same new test cases across v1, draft7, draft2019-09, and draft2020-12 suites
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/v1/format/idn-email.json | Adds NFC normalization edge cases for idn-email in v1 tests. |
| tests/draft7/optional/format/idn-email.json | Adds NFC normalization edge cases for idn-email in draft7 optional tests. |
| tests/draft2019-09/optional/format/idn-email.json | Adds NFC normalization edge cases for idn-email in draft2019-09 optional tests. |
| tests/draft2020-12/optional/format/idn-email.json | Adds NFC normalization edge cases for idn-email in draft2020-12 optional tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This is an interesting one. I guess it comes down to whether we enforce the SHOULD or not. I propose either:
@jdesrosiers cc @json-schema-org/tsc what do you think? |
|
Actually, because it is a SHOULD, I guess we must not reject it (as it is technically valid as per the spec?) |
|
I went through this again. My analysis is that NFC is indeed a SHOULD for this case, so we CANNOT reject an |
jdesrosiers
left a comment
There was a problem hiding this comment.
I agree with @jviotti. These should both be allowed. The spec doesn't require NFC, it just recommends it.
b6a2263 to
41fc4d9
Compare
|
@jdesrosiers @jviotti Makes sense since NFC is a SHOULD here we can't reject it. I've flipped the domain label case to valid, so both tests now assert a non-NFC idn-email is accepted rather than rejected. |
NFC is a SHOULD not a MUST for idn-email, so a non-NFC local part or domain label is valid and must not be rejected.
41fc4d9 to
43d53ff
Compare
|
@jviotti one knock-on I want to flag from this - Core's so two things really: do you want Core to relax the domain NFC check to match, and should idn-hostname accept non-NFC as well - or keep that a separate call? |
|
@vtushar06 Yeah, I'll be relaxing this on the Core side. What I had built there was under the assumption that NFC was mandatory. I'll handle it! |
See: json-schema-org/JSON-Schema-Test-Suite#945 Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Following the methodology I used for ipv4 and uuid, I read RFC 6531 section 3.3 and RFC 5890 section 2.3.2.1, and found the suite has no test for Unicode normalization in an idn-email.
RFC 5890 section 2.3.2.1 says a domain U-label is "in Normalization Form C (NFC)", so a decomposed domain label is invalid. RFC 6531 builds the local part on RFC 5321 and adds a UTF8-non-ascii alternative, but it sets no normalization requirement on the local part. So the same decomposed sequence "cafe" + U+0301 (combining acute) is invalid as a domain label but valid as a local part. The existing internationalized test (the Hangul address) is already in NFC, so neither case is covered today.
Changes
user@cafe<U+0301>.com- a domain label that is not in NFC - invalid.cafe<U+0301>@example.com- the same decomposed sequence in the local part - valid.Ecosystem Impact
idn-emailcheck is only"@" in instance, so it never looks at normalization - it is wrong on the first case.idn-emailformat at all, so it does not validate the format and accepts both - also wrong on the first case.idna.encode): rejects the decomposed domain label, which is the correct verdict. Validators that normalize the domain (UTS-46 / NFC) before checking accept it instead, so the decomposed form passes silently.A question on the second case
RFC 6531 does not require the local part to be in NFC - I read every section and there is no normalization rule on the local part, only on the domain U-label (RFC 5890). So I marked the decomposed local part valid. If the intent is that idn-email should normalize or reject a non-NFC local part, this is the one to flip - happy to do that. The domain case is not in question.
RFC References
Reproduction and the idn-email cross-implementation matrix are in my evidence repo: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-email.md
Related: #965