Skip to content

[Python] Add metadata index cache for TsFileDataFrame#857

Open
ColinLeeo wants to merge 1 commit into
apache:developfrom
ColinLeeo:opt_dataframe
Open

[Python] Add metadata index cache for TsFileDataFrame#857
ColinLeeo wants to merge 1 commit into
apache:developfrom
ColinLeeo:opt_dataframe

Conversation

@ColinLeeo

Copy link
Copy Markdown
Contributor

Persist each shard's MetadataCatalog to a fixed-name index file in the dataset directory so repeated loads skip the expensive native metadata walk. A new use_cache flag (default True) enables it only when a single directory is passed; single-file and list inputs are unchanged.

The cache is binary: a pickled sidecar for the small table/device tables plus one numpy int64 structured array per shard for the bulk series stats. Writes are atomic (temp + os.replace); load falls back to a fresh build on a bad magic/version or a changed file set. Source files are not validated, per design.

Persist each shard's MetadataCatalog to a fixed-name index file in the
dataset directory so repeated loads skip the expensive native metadata
walk. A new use_cache flag (default True) enables it only when a single
directory is passed; single-file and list inputs are unchanged.

The cache is binary: a pickled sidecar for the small table/device tables
plus one numpy int64 structured array per shard for the bulk series stats.
Writes are atomic (temp + os.replace); load falls back to a fresh build on
a bad magic/version or a changed file set. Source files are not validated,
per design.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant