Skip to content

ICU-23431 Add link detection/formatting tools#4048

Open
macchiati wants to merge 5 commits into
mainfrom
ICU-23431-Add-link-detection/formatting-tools
Open

ICU-23431 Add link detection/formatting tools#4048
macchiati wants to merge 5 commits into
mainfrom
ICU-23431-Add-link-detection/formatting-tools

Conversation

@macchiati

@macchiati macchiati commented Jun 27, 2026

Copy link
Copy Markdown
Member

ICU-23431

Notes

  • The bulk of the code is in an impl class, ported from UTS58. It could use some further cleanup (notably some code that is unused), but I suggest doing that in a later PR.
  • If and when the right properties are available in ICU, then hardcoded internals can be changed. This might not happen in v79.1, depending on resources. If not, it needs to be done before the v80.1.
  • There are three test cases that need to be fixed in UTS58 test data; those are listed at the top.
    • two of them are in minimal escaping; where the result is slightly less minimal than possible
    • one is in detection, because UTS48 checks for valid TLDs, and the testData should be neutral in that regard
    • when those are fixed (ideally in v18), the test data needs to be regenerated
  • UTS58 should have some additional description of certain strategies it uses for detection
    • local-parts and domain names are considered broken if they have bad dots:
      • local-parts: abc..def or .abc.def or abc.def.
      • domain names .abc.com or .abc..com
      • Broken parts are completely skipped. Eg john..smith.com does not detect as john.smith.com
    • domain names are also broken if they are invalid according to UTS46.
    • if there is a broken local-part in front of an @ or domain name after the @, then the whole is skipped.
      • Eg john..smith@example.com is not detected as john..smith@example.com

Checklist

  • Required: Issue filed: ICU-23431
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

@macchiati macchiati marked this pull request as draft June 27, 2026 10:19
@macchiati macchiati marked this pull request as ready for review July 3, 2026 09:43
@macchiati macchiati requested a review from eggrobin July 3, 2026 10:09
@macchiati

Copy link
Copy Markdown
Member Author

All, I think it is ready for review; it passes the UTS58 test data (with 3 cases noted in the description). I plan on further cleanup (mostly unused code), but suggest doing than in a subsequent PR.

@macchiati

Copy link
Copy Markdown
Member Author

Also, I wasn't quite sure about the conventions for where to put files nowadays (It's be a long time since I contributed to ICU!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant