Add APIs for case folding to the standard library#154742
Add APIs for case folding to the standard library#154742Jules-Bertholet wants to merge 10 commits into
Conversation
|
These commits modify the If this was unintentional then you should revert the changes before this PR is merged.
If you want to modify |
|
r? @scottmcm rustbot has assigned @scottmcm. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
5b5e617 to
bf4ee7c
Compare
|
@rustbot reroll |
This comment has been minimized.
This comment has been minimized.
bf4ee7c to
f504859
Compare
This comment has been minimized.
This comment has been minimized.
f504859 to
b0d7515
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
dd25c4f to
b14b43b
Compare
|
LGTM. |
This comment has been minimized.
This comment has been minimized.
|
r? libs-api |
|
I don't mind, but any particular reason for the reassign? I thought it was good to go. |
|
The API needs libs-API approval, I believe. They expressed interest in something like this, but there was never an ACP. (I also need to add a tracking issue after I get that) |
|
|
||
| // If, after updating the Unicode data | ||
| // to a new Unicode version, the below | ||
| // assertion starts to fail in tests, |
There was a problem hiding this comment.
And we're confident those tests run with debug-asserts enabled? Why can't this be done in the generator (or as dedicated tests produced by the generator)?
There was a problem hiding this comment.
Yes, confident, I checked. I've also now added a bunch of comments to document these assertions.
|
Reminder, once the PR becomes ready for a review, use |
|
@rustbot ready |
|
@bors r+ rollup |
…imulacrum Add APIs for case folding to the standard library [Libs-api requested these](rust-lang#154287 (comment)), so here they are. New public API (gated behind `#[feature(casefold)]`): ```rust impl char { pub fn to_casefold(self) -> ToCasefold; } impl str { pub fn to_casefold(&self) -> String; pub fn eq_ignore_case(&self) -> bool; } pub struct ToCasefold { ... } impl Iterator for ToCasefold { type Item = char; ... } impl DoubleEndedIterator for ToCasefold { ... } impl FusedIterator for ToCasefold { } impl ExactSizeIterator for ToCasefold { ... } impl fmt::Display for ToCasefold { ... } ``` ## Notes - This only adds a negligible amount of static data to `core::unicode`. To accomplish that, we compute the case-folding for most characters as the lowercase of their uppercase; this double mapping adds some complexity to the implementation. - No normalization (e.g. NFC) is performed, so visually and semantically equivalent strings can compare unequal. - I have not put any effort into optimizing `eq_ignore_case()`; there may be a more performant implementation. - `char::eq_ignore_case()` is left to future work—it's a potential footgun, so we may want to think more deeply about how to expose and document that API. @rustbot label T-libs-api A-unicode
Rollup of 8 pull requests Successful merges: - #157240 (Enable Enzyme for aarch64-apple-darwin) - #157276 (miri subtree update) - #154742 (Add APIs for case folding to the standard library) - #157130 (Use a `ArrayVec` in `CastTarget`) - #157195 (Move feature gating to the new attr parsing infrastructure) - #157256 (tests: adapt for LLVM codegen change) - #157265 (Update books) - #157277 (triagebot.toml: add LawnGnome to libs reviewers)
|
This pull request was unapproved. This PR was contained in a rollup (#157279), which was unapproved. |
This reverts commit 1ec2ee9, which unfortunately prevented `convert_while_ascii` from vectorizing :( "LLVM gave, and LLVM hath taken away"
|
@rustbot ready |
View all comments
Libs-api requested these, so here they are.
New public API (gated behind
#[feature(casefold)]):Notes
core::unicode. To accomplish that, we compute the case-folding for most characters as the lowercase of their uppercase; this double mapping adds some complexity to the implementation.eq_ignore_case(); there may be a more performant implementation.char::eq_ignore_case()is left to future work—it's a potential footgun, so we may want to think more deeply about how to expose and document that API.@rustbot label T-libs-api A-unicode