Skip to content

Fix: FileNotFoundError on Context initialization with stale cache (#5712)#5740

Open
Pavan-Rana wants to merge 6 commits intoSQLMesh:mainfrom
Pavan-Rana:fix/sqlmesh-stale-cache-file-not-found
Open

Fix: FileNotFoundError on Context initialization with stale cache (#5712)#5740
Pavan-Rana wants to merge 6 commits intoSQLMesh:mainfrom
Pavan-Rana:fix/sqlmesh-stale-cache-file-not-found

Conversation

@Pavan-Rana
Copy link
Copy Markdown

Description

Fixes #5712

When FileCache.__init__() scans the cache directory, it calls file.stat() on every file returned by glob(). In environments with persistent cache directories, a file can be deleted between the glob() call and the subsequent stat() call, resulting in a FileNotFoundError that prevents Context initialization entirely.

This fix wraps the stat() call in a try/except FileNotFoundError block, allowing the cache scan to skip stale entries gracefully rather than crashing.

Note: the issue's Option 1 suggested narrowing the glob to glob(f"{self._cache_version}*"), which would skip the startswith check. This implementation keeps the original glob("*") to preserve the existing behaviour of cleaning up files from old cache versions and expired files, while still handling the race condition.

Test Plan

Added test_file_cache_init_handles_stale_file in tests/utils/test_cache.py which:

  • Creates a cache file with the correct version prefix so stat() is forced to be called
  • Monkeypatches Path.stat to raise FileNotFoundError for that specific file, simulating the race condition
  • Asserts that FileCache.__init__() completes without raising
    All existing tests pass.

Checklist

  • I have run make style and fixed any issues
  • I have added tests for my changes (if applicable)
  • All existing tests pass (make fast-test)
  • My commits are signed off (git commit -s) per the DCO

…nitialisation

Signed-off-by: Pavan-Rana <psrbr157@gmail.com>
Signed-off-by: Pavan-Rana <psrbr157@gmail.com>
@Pavan-Rana
Copy link
Copy Markdown
Author

Looping in @themisvaltinos for visibility in case this falls under your area.

This PR fixes a race condition in FileCache.init where a file can be deleted between glob() and stat(), causing a FileNotFoundError during Context initialization.

Summary:

  • Non-breaking fix (skips stale cache entries instead of raising)
  • Preserves existing cache cleanup behavior
  • Adds a unit test to simulate the race condition via monkeypatching

It looks like CI workflows are currently awaiting approval, so checks haven’t run yet.

Would a maintainer be able to:

  1. approve the pending workflows so CI can run
  2. take a look or tag the right reviewer for this area

Happy to make any changes if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

# SQLMesh FileNotFoundError on Context Initialization with Stale Cache

1 participant