Skip to content

fix: avoid unicode filepath suffix panic#393

Open
tmdgusya wants to merge 2 commits intodmtrKovalenko:mainfrom
tmdgusya:fix/unicode-filepath-suffix-panic
Open

fix: avoid unicode filepath suffix panic#393
tmdgusya wants to merge 2 commits intodmtrKovalenko:mainfrom
tmdgusya:fix/unicode-filepath-suffix-panic

Conversation

@tmdgusya
Copy link
Copy Markdown

Summary

  • fix a remaining UTF-8 boundary panic in path_ends_with_suffix()
  • return false instead of panicking when a byte-derived suffix offset lands inside a multibyte character
  • add regression tests for the helper and for apply_constraints(Constraint::FilePath(...))

Similar PR / duplicate check

I checked existing PRs before opening this:

  • closest prior fix: fix: Unicode segmentation crash #373 (fix: Unicode segmentation crash)
  • no open PR currently covers this remaining Constraint::FilePath / path_ends_with_suffix() panic path

This PR is intentionally narrow: it fixes the unchecked path[start..] slice in crates/fff-core/src/constraints.rs without changing matching semantics.

Root cause

path_ends_with_suffix() computed:

let start = path.len() - suffix.len();

and then sliced with:

path[start..]

start is a byte offset, not guaranteed to be a UTF-8 char boundary. For Unicode filenames, a non-matching suffix can make start land inside a multibyte codepoint, which panics before constraint filtering can return false.

Fix

Use path.get(start..) instead of unchecked indexing:

  • if start is not a valid char boundary, return false
  • otherwise preserve the existing ASCII-insensitive suffix comparison and / boundary behavior

Verification

I avoided using any user-specific filename in tests and instead used synthetic Unicode fixture names.

Reproduction guard added

New tests in crates/fff-core/src/constraints.rs:

  • test_path_ends_with_suffix_does_not_panic_on_unicode_suffix
  • test_apply_constraints_file_path_with_unicode_suffix
  • test_path_contains_segment_does_not_panic_on_unicode_segment

The important regression case uses a synthetic filename such as:

  • data/유니코드_파일_테스트.csv

and a deliberately non-matching suffix that would previously place the byte offset in the middle of a multibyte character.

Commands run

cargo test -p fff-search constraints::tests -- --nocapture

Result

All constraint tests pass locally after the fix, including the new Unicode regression coverage.

Scope notes

  • no parser changes
  • no matching behavior expansion
  • no Unicode normalization/case-folding changes
  • only panic prevention for valid UTF-8 input in the file path suffix constraint path

@dmtrKovalenko
Copy link
Copy Markdown
Owner

@copilot resolve the merge conflicts in this pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants