Skip to content

fix: S3Downloader - improve concurrent behavior#3327

Merged
anakin87 merged 4 commits into
mainfrom
s3-downloader-cache
May 19, 2026
Merged

fix: S3Downloader - improve concurrent behavior#3327
anakin87 merged 4 commits into
mainfrom
s3-downloader-cache

Conversation

@anakin87
Copy link
Copy Markdown
Member

@anakin87 anakin87 commented May 18, 2026

Related Issues

Proposed Changes:

  • bug: _download_file writes straight to the final cache path, so a concurrent reader can observe a partial PDF mid-write -> download to a temp path and finally move the file to its expected location (no more partial files)
  • other minor race condition: if file_path.is_file(): file_path.touch(). Between the condition and the touch command, another downloader running could potentially delete the file and touch recreates an empty one. -> use utime that does not create the file but only updates timestamp

How did you test it?

CI

Notes for the reviewer

Another major race condition (_cleanup_cache only protects files referenced by the current run, so it can delete files another in-flight run is about to read) is not touched by this PR because it's hard to avoid. A workaround is setting max_cache_size to a high number of files (similar to disabling cache deletion).

Checklist

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Coverage report (amazon_bedrock)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/amazon_bedrock/src/haystack_integrations/components/downloaders/s3
  s3_downloader.py
Project Total  

This report was generated by python-coverage-comment-action

@anakin87 anakin87 marked this pull request as ready for review May 18, 2026 16:42
@anakin87 anakin87 requested a review from a team as a code owner May 18, 2026 16:42
@anakin87 anakin87 requested review from julian-risch and sjrl and removed request for a team and julian-risch May 18, 2026 16:42
Copy link
Copy Markdown
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

@anakin87 anakin87 merged commit 4fdaece into main May 19, 2026
16 checks passed
@anakin87 anakin87 deleted the s3-downloader-cache branch May 19, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants