feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids#6352
Merged
feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids#6352
Conversation
3227b37 to
ceba410
Compare
4 tasks
ceba410 to
e96a920
Compare
3 tasks
4 tasks
e96a920 to
9926093
Compare
acc5099 to
49176b0
Compare
fc6f90a to
720560d
Compare
49176b0 to
17135dc
Compare
720560d to
3e61af4
Compare
17135dc to
5a5ee74
Compare
3e61af4 to
f90a265
Compare
5a5ee74 to
bde30af
Compare
f90a265 to
5c7476e
Compare
bde30af to
6e709a0
Compare
5c7476e to
e87a598
Compare
6e709a0 to
6190bb8
Compare
e87a598 to
2474b24
Compare
6190bb8 to
3017111
Compare
2474b24 to
f374ad0
Compare
3017111 to
1df59b4
Compare
f374ad0 to
c639cf6
Compare
798a93a to
e5afaca
Compare
Contributor
Author
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5afacaa10
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
mattmkim
approved these changes
May 1, 2026
Contributor
mattmkim
left a comment
There was a problem hiding this comment.
LGTM, seems like we have unecessary diff in the *.json files?
…it_ids (Phase 3a) Phase 3 pipeline integration, first PR: - merge_parquet_split_metadata(): aggregates input split metadata with MergeOutputFile physical metadata to produce complete ParquetSplitMetadata for merged output. Validates invariant fields, unions metric_names and tags, finalizes tag cardinality after merge. 17 tests. - ParquetNewSplits, ParquetMergeTask, ParquetMergeScratch message types for the merge actor chain (planner → scheduler → downloader → executor). - Add replaced_split_ids to ParquetSplitBatch and propagate through ParquetUploader (was hardcoded Vec::new()). Enables merge executor to specify which splits are being replaced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…it ID Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e5afaca to
fe969e2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 (pipeline integration), first PR. Building on Phase 1 (merge engine, #6335) and Phase 2 (merge policy, #6351).
merge_parquet_split_metadata()— aggregates input split metadata withMergeOutputFilephysical metadata to produce completeParquetSplitMetadatafor merged output. Validates invariant fields (kind, index_uid, partition_id, sort_fields, window), unions metric_names and tags, finalizes tag cardinality after merge. 17 unit tests.ParquetNewSplits,ParquetMergeTask,ParquetMergeScratchfor the merge actor chain (planner → scheduler → downloader → executor).replaced_split_ids— added toParquetSplitBatchand propagated throughParquetUploader(was hardcodedVec::new()). Enables the merge executor to specify which splits are being replaced during atomic publish-and-replace.Test plan
merge_parquet_split_metadata()ParquetUploadertests pass with new fieldcargo clippyclean,cargo doccompiles, license headers OK🤖 Generated with Claude Code