ICU-23440 Merge the end-of-text and Sentence_Break=Sep symbols in the sentence breaking state machine#4036
ICU-23440 Merge the end-of-text and Sentence_Break=Sep symbols in the sentence breaking state machine#4036eggrobin wants to merge 1 commit into
Conversation
|
The TC only glanced at this and has no opinion... it would be nice if the description in the ticket and in the PR was less heavy on obscure (to segmentation outsiders) abbreviations... |
It would be very nice if the long aliases for Sentence_Break and Word_Break values were not abbreviations. Sep is the long alias for sb=SE, and it actually means something like paragraphs separators. Apparently those cryptic names were coined by one mark.davis@us.ibm.com twenty-five years ago in https://www.unicode.org/reports/tr29/tr29-1.html. (I also find Word_Break=ALetter annoying, that one first shows up in https://www.unicode.org/reports/tr29/tr29-2.html.) |
… sentence breaking state machine
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
|
Just saying that the PR description and ticket should be more readable to non-segmenters than “eot=Sep for sent” which means nothing to all but a handful of people. |
|
Well the ticket is more verbose. But despite being verbose even I find any discussion of sentence breaking quite impenetrable because the value aliases are too short to be descriptive… |
|
Thanks for updating the description! |
No functional change, but saves a few bytes. See the changes to the state machine: eggrobin/unicodetools@1665e41.
Checklist