| title | Local Disk Cache | ||
|---|---|---|---|
| weight | 8 | ||
| type | docs | ||
| aliases |
|
When reading files from remote storage (S3, OSS, HDFS, etc.), each seek+read goes over the network. Paimon provides a block-level local disk cache that transparently caches file reads on local disk, significantly reducing remote I/O for repeated access patterns.
The cache classifies files by type. By default, only meta and global-index types are cached. You can customize this via the file-cache.whitelist option.
| File Type | Config Name | Examples | Default Cached |
|---|---|---|---|
| META | meta | snapshot, schema, manifest, statistics, tag | Yes |
| GLOBAL_INDEX | global-index | BTree, Lumina, Tantivy index files | Yes |
| BUCKET_INDEX | bucket-index | Hash, deletion vector index files | No |
| DATA | data | Data files (ORC, Parquet, etc.) | No |
| FILE_INDEX | file-index | Data-file level bloom filter, bitmap | No |
All file types can be added to the whitelist. The default whitelist is meta,global-index.
Use table.copy() to pass cache options as dynamic parameters:
{{< tabs "enable-cache" >}}
{{< tab "Java" >}}
import org.apache.paimon.table.Table;
import java.util.HashMap;
import java.util.Map;
Table table = catalog.getTable(Identifier.create("my_db", "my_table"));
Map<String, String> options = new HashMap<>();
options.put("file-cache.enabled", "true");
// optional: customize cache directory and limits
options.put("file-cache.dir", "/tmp/paimon-file-cache");
options.put("file-cache.max-size", "2gb");
options.put("file-cache.block-size", "1mb");
// All subsequent reads on this table instance will use the cache
table = table.copy(options);{{< /tab >}}
{{< tab "Python" >}}
table = catalog.get_table("db.my_table")
# Enable cache with dynamic options
table = table.copy({
"file-cache.enabled": "true",
# optional: customize cache directory and limits
"file-cache.dir": "/tmp/paimon-file-cache",
"file-cache.max-size": "2gb",
"file-cache.block-size": "1mb",
})
# All subsequent reads on this table instance will use the cache{{< /tab >}}
{{< /tabs >}}
| Option | Type | Default | Description |
|---|---|---|---|
file-cache.enabled |
Boolean | false | Whether to enable local disk block cache. |
file-cache.dir |
String | <tmpdir>/paimon-file-cache |
Directory for storing cached blocks. |
file-cache.max-size |
MemorySize | unlimited | Maximum total size of the cache. When exceeded, the least recently used blocks are evicted. |
file-cache.block-size |
MemorySize | 1 mb | Block size for caching. Files are logically divided into fixed-size blocks and cached independently. |
file-cache.whitelist |
String | meta,global-index | Comma-separated list of file types to cache. Supported values: meta, global-index, bucket-index, data, file-index. |
- Files are logically divided into fixed-size blocks (default 1 MB).
- On the first read, blocks are downloaded from remote storage and saved to local disk.
- Subsequent reads of the same block are served from local disk, skipping remote I/O.
- Cache files are keyed by remote file path and block offset, so they persist across process restarts and can be reused.
- When the cache exceeds
max-size, the least recently used blocks are evicted automatically.