test(format/date): add boundary whitespace, Unicode dash, and max boundary cases#951
test(format/date): add boundary whitespace, Unicode dash, and max boundary cases#951AcEKaycgR wants to merge 1 commit into
Conversation
jviotti
left a comment
There was a problem hiding this comment.
Looks spec-correct but again would be nice to see if any existing implementation gets tripped up by any of these new ones to have more insights on their value compared to the existing ones!
|
Thanks for the approval, @jviotti Under Bowtie verification, these cases successfully caught a live bug in opis-json-schema (PHP), which incorrectly validates the trailing newline case "2020-01-01\n" as valid due to a regex anchoring gap. Other strict engines like Go and Rust correctly reject it. |
jdesrosiers
left a comment
There was a problem hiding this comment.
The only test here that I think we should keep is the year boundary test.
| { | ||
| "description": "invalid: Unicode en dash separators", | ||
| "data": "2020–01–01", | ||
| "valid": false | ||
| }, | ||
| { | ||
| "description": "invalid: Unicode minus sign separators", | ||
| "data": "2020−01−01", | ||
| "valid": false | ||
| }, |
There was a problem hiding this comment.
I'm not convinced these are needed. These characters aren't special because the look similar to -. IMO, this is the same test as 2020/01/01, which we're already testing.
There was a problem hiding this comment.
Of course I would like to see proof, but I wouldn't be surprised if because these characters look similar to the human eye some implementations try to be lenient on it, without thinking about the fact there are standards behind it
| "description": "invalid: partial date-time prefix with T suffix", | ||
| "data": "1985-04-12T", | ||
| "valid": false | ||
| }, | ||
| { | ||
| "description": "invalid: partial date-time prefix with hour component", | ||
| "data": "1985-04-12T00", | ||
| "valid": false | ||
| } |
There was a problem hiding this comment.
These aren't meaningfully different than the tests that check for anything after the date.
79b9b46 to
4c4c64e
Compare
|
Hi @jdesrosiers, You're completely right on the Unicode dash and prefix cases. I've removed those to keep the PR minimal. Regarding the trailing newline ( This is a common quirk of PHP's regex engine where |
Applies to: draft-v1, draft-07, draft/2019-09, draft/2020-12
Adds 8 new edge cases validating strict string anchoring constraints, Unicode whitespace characters, alternative Unicode dash variations, and maximum boundary years for the
dateformat.Added:
Rationale for separating Unicode en dash and Unicode minus sign:
We explicitly test both
–(U+2013, General Punctuation block) and−(U+2212, Mathematical Operators block) because they belong to completely separate Unicode blocks. Implementations that perform partial Unicode normalization or range checks (e.g., stripping punctuation blocks vs mathematical symbol blocks) could reject one while erroneously normalizing and accepting the other.@jviotti @jdesrosiers @karenetheridge Ready for review.