feat(checkpoint): add file-path identity checkpoint mode#6869
feat(checkpoint): add file-path identity checkpoint mode#6869chenghuichen wants to merge 12 commits into
Conversation
Greptile SummaryThis PR adds a file-path checkpoint mode that derives checkpoint keys from scan-task metadata (file paths + chunk specs) instead of requiring a user-supplied
Confidence Score: 4/5Safe to merge after addressing the silent error swallowing in the distributed file-path checkpoint path. One P1 finding in the distributed path where store errors are silently ignored, which can cause duplicate processing on transient failures. All other components are well-structured with good test coverage. src/daft-distributed/src/pipeline_node/stage_checkpoint_keys.rs — error handling in the file-path checkpoint set loading. Important Files Changed
Reviews (2): Last reviewed commit: "file-path identity checkpoint mode" | Re-trigger Greptile |
Merging this PR will not alter performance
Comparing Footnotes
|
Changes Made
Add a file-path checkpoint mode as a fast-path that derives checkpoint keys from scan-task metadata (file paths + chunk specs like row groups and byte ranges) instead of requiring a user-specified key column. Activated when
on=is omitted fromCheckpointConfig.For the full design doc, see #6446 (comment)