Skip to content

Add option to cache unknown bucket type#844

Open
Yonghui-Lee wants to merge 1 commit into
fsspec:mainfrom
Yonghui-Lee:cache-unknown-buckets-pure
Open

Add option to cache unknown bucket type#844
Yonghui-Lee wants to merge 1 commit into
fsspec:mainfrom
Yonghui-Lee:cache-unknown-buckets-pure

Conversation

@Yonghui-Lee
Copy link
Copy Markdown
Collaborator

This PR adds a cache_unknown_buckets configuration option to allow caching of UNKNOWN bucket type .

Currently, ExtendedGcsFileSystem attempts to detect bucket types (zonal, HNS, etc.) using the Storage Control API. If a user lacks permissions for this API (or when using an emulator that doesn't support it), the lookup fails and the bucket type falls back to UNKNOWN.

Since UNKNOWN types are not cached by default, every subsequent operation on the bucket triggers another failing API call, causing significant performance degradation due to repeated slow lookups.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.56%. Comparing base (991faba) to head (65c0c9b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #844      +/-   ##
==========================================
+ Coverage   88.52%   88.56%   +0.03%     
==========================================
  Files          15       15              
  Lines        2989     2990       +1     
==========================================
+ Hits         2646     2648       +2     
+ Misses        343      342       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread gcsfs/extended_gcsfs.py
- retry_multiplier: Multiplier for delay between retries.
These map to `google.api_core.retry.AsyncRetry` arguments (without 'retry_' prefix).
"""
self._cache_unknown_buckets = kwargs.pop("cache_unknown_buckets", False)
Copy link
Copy Markdown
Collaborator

@zhixiangli zhixiangli May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user sets _cache_unknown_buckets to true and has the required permissions, is the result still cached when a transient error occurs? Could we check if it is a permission error / cache with a TTL, allowing for access to be granted later? WDYT?

@Yonghui-Lee Yonghui-Lee force-pushed the cache-unknown-buckets-pure branch from 6adac07 to 65c0c9b Compare May 13, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants