Skip to content

fix: replace with empty search string should be a no-op#22497

Open
Amogh-2404 wants to merge 1 commit into
apache:mainfrom
Amogh-2404:fix/issue-22253-replace-empty-search
Open

fix: replace with empty search string should be a no-op#22497
Amogh-2404 wants to merge 1 commit into
apache:mainfrom
Amogh-2404:fix/issue-22253-replace-empty-search

Conversation

@Amogh-2404
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

PostgreSQL returns the input unchanged when replace is called with an empty from. DataFusion was instead inserting to before every character and at both ends, so replace('abc', '', 'x') returned xaxbxcx. This PR brings the behaviour in line with PostgreSQL. Part of the PG-compatibility cleanup tracked in #22247.

What changes are included in this PR?

  • datafusion/functions/src/string/replace.rs: the empty-from branch in apply_replace now writes the input verbatim instead of inserting to. Added a LargeUtf8 unit test for the new behaviour.
  • datafusion/sqllogictest/test_files/string/string_literal.slt: four new SLT asserts covering the Utf8, Dictionary, Utf8View, and LargeUtf8 paths.
  • datafusion/sqllogictest/test_files/string/string_query.slt.part: updated four expected rows that were asserting the old buggy output.

Are these changes tested?

Yes. The unit test in replace.rs covers the LargeUtf8 path, and the four new SLT asserts in string_literal.slt cover the remaining Arrow string encodings end-to-end. The full SLT suite passes locally.

Are there any user-facing changes?

Yes. replace(str, '', x) now returns str unchanged instead of inserting x between every character. This matches PostgreSQL.

PostgreSQL returns the input unchanged when `replace` is called with an empty `from`. DataFusion was instead inserting `to` before every character and at both ends - so `replace('abc', '', 'x')` returned `xaxbxcx`. Fix the empty-`from` branch in `apply_replace` to write the input verbatim. Update the regression tests in `string_query.slt.part` that were asserting the buggy output.

Closes apache#22253

Signed-off-by: Amogh Ramesh <ramogh2404@gmail.com>
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 24, 2026
@Omega359
Copy link
Copy Markdown
Contributor

LGTM, thanks!

@Jefffrey
Copy link
Copy Markdown
Contributor

We have an existing PR for this issue:

@Amogh-2404
Copy link
Copy Markdown
Author

We have an existing PR for this issue:

Thanks @Jefffrey — I missed #22357 when I picked this up, apologies for the duplicate. Happy to close this in favour of it.
In case it helps unblock #22357's CI: the cargo test failure looks like the empty-from regression rows in string_query.slt.part still asserting the old output (this PR updates those four rows), and the benchmark failure is likely the benches/replace.rs comment @neilconway mentioned. Glad to help either way.

@Jefffrey
Copy link
Copy Markdown
Contributor

I suppose if #22357 doesn't become active again we can proceed with this PR instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PostgreSQL compatibility: replace with an empty search string should be a no-op

3 participants