Skip to content

fix(server): warn and skip non-numeric filenames in consumer offset directories instead of panicking#3135

Merged
hubcio merged 3 commits into
apache:masterfrom
atharvalade:fix/panic-on-non-numeric-consumer-offset-filenames
May 19, 2026
Merged

fix(server): warn and skip non-numeric filenames in consumer offset directories instead of panicking#3135
hubcio merged 3 commits into
apache:masterfrom
atharvalade:fix/panic-on-non-numeric-consumer-offset-filenames

Conversation

@atharvalade

@atharvalade atharvalade commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3132

Rationale

An operator accidentally placing a file (e.g. .DS_Store, editor swap file, backup) in the consumer offsets directory crashes the server on startup due to a panic! on non-numeric filenames.

What changed?

Both load_consumer_offsets and load_consumer_group_offsets panicked on stray files in the consumer offset directories. Several panic and abort paths existed: into_string().unwrap() on non-UTF-8 filenames, dir_entry.unwrap() on per-entry IO errors, metadata.is_err() -> break silently dropping remaining valid offsets, the non-numeric filename panic, and read_exact/File::open via ? aborting the entire load on a single truncated or unreadable file.

All panic and abort paths in both functions are now replaced with warn! + continue, so one bad file never takes down the rest of the load. Note: a genuinely corrupt or renamed offset file that should be numeric is now silently skipped (with a warning), meaning that consumer loses its committed offset and will re-consume on restart. The warn! gives observability so operators can detect this.

Local Execution

  • Passed
  • Pre-commit hooks ran

AI Usage

  1. Opus 4.6
  2. Minimal AI used
  3. All tests verified locally with cargo test, cargo clippy, cargo fmt --check, and CI scripts
  4. Yes, all code can be explained

@codecov

codecov Bot commented Apr 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 64.00000% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.07%. Comparing base (645a4e9) to head (df87087).

Files with missing lines Patch % Lines
core/server/src/streaming/partitions/storage.rs 64.00% 18 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3135       +/-   ##
=============================================
- Coverage     73.77%   51.07%   -22.71%     
  Complexity      943      943               
=============================================
  Files          1200     1198        -2     
  Lines        109106    94848    -14258     
  Branches      85996    71755    -14241     
=============================================
- Hits          80496    48439    -32057     
- Misses        25873    43857    +17984     
+ Partials       2737     2552      -185     
Components Coverage Δ
Rust Core 44.73% <64.00%> (-30.18%) ⬇️
Java SDK 58.44% <ø> (ø)
C# SDK 69.13% <ø> (-0.31%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.53% <ø> (ø)
Go SDK 39.91% <ø> (ø)
Files with missing lines Coverage Δ
core/server/src/streaming/partitions/storage.rs 72.36% <64.00%> (+5.16%) ⬆️

... and 300 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added the S-stale Inactive issue or pull request label Apr 23, 2026
@hubcio

hubcio commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

can you make it a integration test instead of unit? code in core/server will be replaced when we introduce clustering, thus test will be lost.

@github-actions github-actions Bot removed the S-stale Inactive issue or pull request label Apr 24, 2026
@atharvalade atharvalade force-pushed the fix/panic-on-non-numeric-consumer-offset-filenames branch from 4fc2476 to 39db2e4 Compare April 24, 2026 15:30
@atharvalade

Copy link
Copy Markdown
Contributor Author

can you make it a integration test instead of unit? code in core/server will be replaced when we introduce clustering, thus test will be lost.

Good call, moved the tests from core/server unit tests to core/integration/tests/storage/consumer_offsets.rs. They follow the same pattern as tests/state/ (direct server:: imports, no full harness needed). This way they survive the server ti server-ng migration and can be re-run against the new binary.

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added S-stale Inactive issue or pull request and removed S-stale Inactive issue or pull request labels May 7, 2026
@hubcio

hubcio commented May 14, 2026

Copy link
Copy Markdown
Contributor

/ready

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label May 14, 2026

@hubcio hubcio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core idea is right - replacing the panic! on non-numeric filenames with warn + skip is the correct direction. but as posted this is a partial fix: into_string().unwrap() and dir_entry.unwrap() in both functions keep the exact startup-panic-on-stray-file behavior #3132 targets, and the regression tests only feed inputs the patch already handles (.DS_Store, backup.bak - all valid utf-8), so they confirm the patch runs but don't prove the bug class is gone. inline comments below.

/author

Comment thread core/server/src/streaming/partitions/storage.rs Outdated
Comment thread core/server/src/streaming/partitions/storage.rs
Comment thread core/server/src/streaming/partitions/storage.rs Outdated
Comment thread core/integration/tests/storage/consumer_offsets.rs
Comment thread core/integration/tests/storage/consumer_offsets.rs
Comment thread core/integration/tests/storage/consumer_offsets.rs Outdated
@hubcio

hubcio commented May 14, 2026

Copy link
Copy Markdown
Contributor

/author

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 14, 2026
@atharvalade atharvalade force-pushed the fix/panic-on-non-numeric-consumer-offset-filenames branch from 5ef2977 to 3a6493b Compare May 17, 2026 07:14
@atharvalade

Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 17, 2026
@hubcio hubcio merged commit a86be3a into apache:master May 19, 2026
80 checks passed
@github-actions github-actions Bot removed the S-waiting-on-review PR is waiting on a reviewer label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: panic on non-numeric filenames in consumer offset directories

4 participants