Skip to content

Add an optional group_sep bool parameter for str.split #146540

@denyshon

Description

@denyshon

Feature or enhancement

Rationale:

Currently, we have two different behaviors for str.split depending on the sep value:

  • If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings.
  • If sep is not specified or is None, runs a consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

While both of the behaviors are useful, we limit the second one (with grouping separators) only to whitespaces, disallowing its usage with custom separators, even though the logic is already present in the code.

Proposal:

I suggest adding an optional group_sep bool parameter, which will be used as follows:

  • By default, group_sep = False.
  • If sep is not specified or is None, group_sep has no effect (no changes to the existing behavior).
  • If sep is given, group_sep = False will preserve the current behavior.
  • If sep is given, group_sep = True will treat consecutive sep characters as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing sep characters (mimicking the current behavior for non-specified sep).

The above will preserve the existing behavior if group_sep is not specified, and will allow grouping separators by explicitly setting this parameter to True.

Implementation:

The needed logic is already defined in STRINGLIB(split_whitespace). To reuse it, we could convert this function to one that accepts an is_separator function as a parameter, and then declare STRINGLIB(split_whitespace) and STRINGLIB(split_char_group) (or whatever) as its wrappers, passing the right parameter value.

If you are on board with the suggestion, I could submit a PR.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions