Summary
Following review feedback on PR #1311, we should refactor the storage layer to use consistent URL representation for all data sources, including local files.
Current Behavior
- Remote paths use URLs:
s3://bucket/path, gs://bucket/path
- Local paths use raw filesystem paths:
/path/to/file
is_remote_url() function distinguishes between the two
Proposed Change
- Accept both formats from users:
/path/to/file and file:///path/to/file
- Normalize to URLs internally: Convert all paths to URL format (
file:// for local)
- Store URLs consistently in the database
- Leverage fsspec uniformity: fsspec already treats all backends (including local) uniformly via URLs
Benefits
- Coherent internal representation
- Simpler codebase - no special-casing for local vs remote
- Better alignment with fsspec's design philosophy
- Avoids potential bugs from inconsistent handling
Implementation Notes
- Add
file:// to REMOTE_PROTOCOLS (or rename to URL_PROTOCOLS)
- Create helper to normalize user paths to URLs
- Update
StorageBackend to work with URLs consistently
- Ensure backward compatibility for existing stored paths
References
Summary
Following review feedback on PR #1311, we should refactor the storage layer to use consistent URL representation for all data sources, including local files.
Current Behavior
s3://bucket/path,gs://bucket/path/path/to/fileis_remote_url()function distinguishes between the twoProposed Change
/path/to/fileandfile:///path/to/filefile://for local)Benefits
Implementation Notes
file://toREMOTE_PROTOCOLS(or rename toURL_PROTOCOLS)StorageBackendto work with URLs consistentlyReferences