-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
Add an optional group_sep bool parameter for str.split #146540
Description
Feature or enhancement
Rationale:
Currently, we have two different behaviors for str.split depending on the sep value:
- If
sepis given, consecutive delimiters are not grouped together and are deemed to delimit empty strings. - If
sepis not specified or isNone, runs a consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
While both of the behaviors are useful, we limit the second one (with grouping separators) only to whitespaces, disallowing its usage with custom separators, even though the logic is already present in the code.
Proposal:
I suggest adding an optional group_sep bool parameter, which will be used as follows:
- By default,
group_sep = False. - If
sepis not specified or isNone,group_sephas no effect (no changes to the existing behavior). - If
sepis given,group_sep = Falsewill preserve the current behavior. - If
sepis given,group_sep = Truewill treat consecutivesepcharacters as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailingsepcharacters (mimicking the current behavior for non-specifiedsep).
The above will preserve the existing behavior if group_sep is not specified, and will allow grouping separators by explicitly setting this parameter to True.
Implementation:
The needed logic is already defined in STRINGLIB(split_whitespace). To reuse it, we could convert this function to one that accepts an is_separator function as a parameter, and then declare STRINGLIB(split_whitespace) and STRINGLIB(split_char_group) (or whatever) as its wrappers, passing the right parameter value.
If you are on board with the suggestion, I could submit a PR.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response