Skip to content

test(format/date): add boundary whitespace, Unicode dash, and max boundary cases#951

Open
AcEKaycgR wants to merge 1 commit into
json-schema-org:mainfrom
AcEKaycgR:date-edge-cases
Open

test(format/date): add boundary whitespace, Unicode dash, and max boundary cases#951
AcEKaycgR wants to merge 1 commit into
json-schema-org:mainfrom
AcEKaycgR:date-edge-cases

Conversation

@AcEKaycgR

Copy link
Copy Markdown
Contributor

Applies to: draft-v1, draft-07, draft/2019-09, draft/2020-12

Adds 8 new edge cases validating strict string anchoring constraints, Unicode whitespace characters, alternative Unicode dash variations, and maximum boundary years for the date format.

Added:

  • Leading tab character before valid date (invalid)
  • Trailing newline after valid date (invalid)
  • Leading non-breaking space before valid date (invalid)
  • Unicode en dash separators (invalid)
  • Unicode minus sign separators (invalid)
  • Maximum boundary full-date year 9999 (valid)
  • Partial date-time prefix with T suffix (invalid)
  • Partial date-time prefix with hour component (invalid)

Rationale for separating Unicode en dash and Unicode minus sign:

We explicitly test both (U+2013, General Punctuation block) and (U+2212, Mathematical Operators block) because they belong to completely separate Unicode blocks. Implementations that perform partial Unicode normalization or range checks (e.g., stripping punctuation blocks vs mathematical symbol blocks) could reject one while erroneously normalizing and accepting the other.

@jviotti @jdesrosiers @karenetheridge Ready for review.

@AcEKaycgR AcEKaycgR requested a review from a team as a code owner June 24, 2026 14:00

@jviotti jviotti left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks spec-correct but again would be nice to see if any existing implementation gets tripped up by any of these new ones to have more insights on their value compared to the existing ones!

@AcEKaycgR

Copy link
Copy Markdown
Contributor Author

Thanks for the approval, @jviotti

Under Bowtie verification, these cases successfully caught a live bug in opis-json-schema (PHP), which incorrectly validates the trailing newline case "2020-01-01\n" as valid due to a regex anchoring gap. Other strict engines like Go and Rust correctly reject it.

@jdesrosiers jdesrosiers left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only test here that I think we should keep is the year boundary test.

Comment thread tests/draft2019-09/optional/format/date.json
Comment on lines +376 to +385
{
"description": "invalid: Unicode en dash separators",
"data": "2020–01–01",
"valid": false
},
{
"description": "invalid: Unicode minus sign separators",
"data": "2020−01−01",
"valid": false
},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced these are needed. These characters aren't special because the look similar to -. IMO, this is the same test as 2020/01/01, which we're already testing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I would like to see proof, but I wouldn't be surprised if because these characters look similar to the human eye some implementations try to be lenient on it, without thinking about the fact there are standards behind it

Comment on lines 392 to 400
"description": "invalid: partial date-time prefix with T suffix",
"data": "1985-04-12T",
"valid": false
},
{
"description": "invalid: partial date-time prefix with hour component",
"data": "1985-04-12T00",
"valid": false
}

@jdesrosiers jdesrosiers Jun 30, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't meaningfully different than the tests that check for anything after the date.

@AcEKaycgR

AcEKaycgR commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Hi @jdesrosiers,

You're completely right on the Unicode dash and prefix cases. I've removed those to keep the PR minimal.

Regarding the trailing newline (\n), we actually have a concrete example of this exact discrepancy In the PHP ecosystem, php-opis-json-schema incorrectly validates 2020-01-01\n as valid while correctly rejecting 2020-01-01 as invalid.

This is a common quirk of PHP's regex engine where $ matches a trailing newline at the end of a string unless the /D PCRE modifier is explicitly set. The trailing newline test catches this specific ecosystem gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants