Skip to content

[WIP][java][python] Add block-level local disk cache for file reads#7699

Open
JingsongLi wants to merge 1 commit intoapache:masterfrom
JingsongLi:python_cache
Open

[WIP][java][python] Add block-level local disk cache for file reads#7699
JingsongLi wants to merge 1 commit intoapache:masterfrom
JingsongLi:python_cache

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi commented Apr 26, 2026

Purpose

Introduce a CachingFileIO wrapper that transparently caches remote file reads at block granularity on local disk. Files are classified by FileType and only META, GLOBAL_INDEX, BUCKET_INDEX types are cached; DATA and FILE_INDEX are read directly.

Enable via table.copy({"file-cache.enabled": "true"}).

Tests

@JingsongLi JingsongLi changed the title [python] Add block-level local disk cache for file reads [java][python] Add block-level local disk cache for file reads Apr 28, 2026
Introduce a CachingFileIO wrapper that transparently caches remote file
reads at block granularity on local disk. Files are classified by FileType
(ported from Java) and only META, GLOBAL_INDEX, BUCKET_INDEX types are
cached; DATA and FILE_INDEX are read directly.

Enable via table.copy({"file-cache.enabled": "true"}).
@JingsongLi JingsongLi changed the title [java][python] Add block-level local disk cache for file reads [WIP][java][python] Add block-level local disk cache for file reads Apr 30, 2026
@JingsongLi
Copy link
Copy Markdown
Contributor Author

This configure option should be catalog-level, and it is better to name to local-disk.cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant