Skip to content

Deprecate the multi-patterns cudf::strings::replace_re API#22380

Merged
rapids-bot[bot] merged 4 commits into
rapidsai:mainfrom
davidwendt:dep-multi-replace-re
May 12, 2026
Merged

Deprecate the multi-patterns cudf::strings::replace_re API#22380
rapids-bot[bot] merged 4 commits into
rapidsai:mainfrom
davidwendt:dep-multi-replace-re

Conversation

@davidwendt
Copy link
Copy Markdown
Contributor

@davidwendt davidwendt commented May 5, 2026

Description

Deprecates the cudf::strings::replace_re function that accepts multiple regex patterns and replacements. This API does not follow the other regex API which all accept a regex_program parameter and has become difficult to maintain. This function pattern is not support by Pandas and there is no JNI wrapper for it either.
After trying to create a libcudf benchmark for this API, it was found the function crashes if called with more than a few dozen rows even with only 2 patterns. The crash is due to a bug in the code which has never been reported (the bug was introduced 4 years ago according to git). Therefore, I have complete confidence that this API has never been used and can be removed in a future release.

The gtests have also been removed to prevent deprecation warnings.
This PR also includes a fix for the bug for completeness.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt self-assigned this May 5, 2026
@davidwendt davidwendt requested a review from a team as a code owner May 5, 2026 16:50
@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team labels May 5, 2026
@davidwendt davidwendt requested review from ttnghia and vyasr May 5, 2026 16:50
@davidwendt davidwendt added libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels May 5, 2026
@davidwendt davidwendt requested a review from a team as a code owner May 6, 2026 14:47
@github-actions github-actions Bot added the Python Affects Python cuDF API. label May 6, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 6, 2026
Copy link
Copy Markdown
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about the Python documentation

Comment thread python/cudf/cudf/core/accessors/string.py
@davidwendt davidwendt moved this to Burndown in libcudf May 11, 2026
@davidwendt
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 05bc74e into rapidsai:main May 12, 2026
252 of 254 checks passed
@davidwendt davidwendt deleted the dep-multi-replace-re branch May 12, 2026 17:15
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 12, 2026
TomAugspurger pushed a commit to TomAugspurger/pygdf that referenced this pull request May 12, 2026
…22380)

Deprecates the `cudf::strings::replace_re` function that accepts multiple regex patterns and replacements. This API does not follow the other regex API which all accept a `regex_program` parameter and has become difficult to maintain. This  function pattern is not support by Pandas and there is no JNI wrapper for it either.
After trying to create a libcudf benchmark for this API, it was found the function crashes if called with more than a few dozen rows even with only 2 patterns. The crash is due to a bug in the code which has never been reported (the bug was introduced 4 years ago according to git). Therefore, I have complete confidence that this API has never been used and can be removed in a future release.

The gtests have also been removed to prevent deprecation warnings. 
This PR also includes a fix for the bug for completeness.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Bradley Dice (https://github.com/bdice)
  - Lawrence Mitchell (https://github.com/wence-)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#22380
shrshi pushed a commit to shrshi/cudf that referenced this pull request May 12, 2026
…22380)

Deprecates the `cudf::strings::replace_re` function that accepts multiple regex patterns and replacements. This API does not follow the other regex API which all accept a `regex_program` parameter and has become difficult to maintain. This  function pattern is not support by Pandas and there is no JNI wrapper for it either.
After trying to create a libcudf benchmark for this API, it was found the function crashes if called with more than a few dozen rows even with only 2 patterns. The crash is due to a bug in the code which has never been reported (the bug was introduced 4 years ago according to git). Therefore, I have complete confidence that this API has never been used and can be removed in a future release.

The gtests have also been removed to prevent deprecation warnings. 
This PR also includes a fix for the bug for completeness.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Bradley Dice (https://github.com/bdice)
  - Lawrence Mitchell (https://github.com/wence-)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#22380
@vuule vuule moved this from Burndown to Landed in libcudf May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done
Status: Landed

Development

Successfully merging this pull request may close these issues.

7 participants