Skip to content

fix: align regexp_instr empty pattern#22286

Open
Sean-Kenneth-Doherty wants to merge 2 commits into
apache:mainfrom
Sean-Kenneth-Doherty:codex/regexp-instr-empty-pattern
Open

fix: align regexp_instr empty pattern#22286
Sean-Kenneth-Doherty wants to merge 2 commits into
apache:mainfrom
Sean-Kenneth-Doherty:codex/regexp-instr-empty-pattern

Conversation

@Sean-Kenneth-Doherty
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

PostgreSQL treats an empty regular expression in regexp_instr as a zero-width match. DataFusion previously special-cased empty patterns to return 0, so regexp_instr('abc', '') diverged from PostgreSQL.

What changes are included in this PR?

  • Handles empty regexp_instr patterns as zero-width matches at start + N - 1, returning 0 when that position is past the end of the string.
  • Keeps existing regex/flag validation for empty patterns.
  • Lets start/N validation run before empty-string handling.
  • Adds Rust unit coverage and sqllogictest coverage for empty-pattern behavior.

Are these changes tested?

  • cargo fmt --all
  • cargo test -p datafusion-functions regex::regexpinstr::tests::test_regexp_instr -- --nocapture
  • cargo test --profile=ci --test sqllogictests -- regexp/regexp_instr.slt
  • TMPDIR=/home/sean/Projects/datafusion-contrib/target/tmp cargo clippy --all-targets --all-features -- -D warnings
  • git diff --check

Are there any user-facing changes?

Yes. regexp_instr now matches PostgreSQL behavior for empty regular expression patterns.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PostgreSQL compatibility: regexp_instr with an empty pattern should return 1

1 participant