Commit 2fe74de
Fix ISO 8601 date pattern accepting impossible month/day values (#2113)
* Fix ISO 8601 date pattern accepting impossible month/day values
The "ISO 8601 datetime" pattern in DateRecognizer used `[01]\d` for the
month and `[0-3]\d` for the day. These ranges admit impossible values:
month `00` and `13`-`19`, and day `00` and `32`-`39`. As a result
strings such as `2024-13-15T14:30:00Z` and `2024-12-32T14:30Z` were
detected as DATE_TIME.
Every other date pattern in this same file already constrains the
month to `01`-`12` and the day to `01`-`31`; only the ISO 8601 pattern
was loose. Tighten the ISO month/day fields to match (using
non-capturing groups so existing capture-group positions are
unaffected). No valid ISO 8601 datetime is lost, since those values are
not valid dates to begin with.
Adds parametrized cases for invalid month (00, 13) and day (00, 32).
* Address review: apply word boundary across all alternatives, rename pattern
Two review points from #2113:
1. `|` has lower precedence than concatenation, so the pattern
`\b A | B | C \b` was parsed as `(\b A) | B | (C \b)`. The leading
`\b` only guarded the first alternative (the full-fractional form)
and the trailing `\b` only guarded the last (the minutes-only form).
The seconds-only alternative in the middle had no word-boundary
anchor at all, so a valid seconds/minutes datetime could match
mid-word (e.g. `Today is2024-03-15T14:30:00+02:00`). Wrap the
alternation in a non-capturing group so both `\b` anchors apply to
every alternative.
2. Rename the pattern from "ISO 8601 datetime" to
"Datetime (yyyy-mm-ddThh:mm[:ss[.f]] with timezone)" — the pattern
doesn't fully validate ISO 8601 (e.g. the hour field admits 24–29).
The new name honestly describes the shape it accepts.
---------
Co-authored-by: Sharon Hart <sharonh.dev@gmail.com>1 parent ac56751 commit 2fe74de
2 files changed
Lines changed: 12 additions & 2 deletions
File tree
- presidio-analyzer
- presidio_analyzer/predefined_recognizers/generic
- tests
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | | - | |
| 18 | + | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
52 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
53 | 63 | | |
54 | 64 | | |
55 | 65 | | |
| |||
0 commit comments