[BugFix] Fix ORC file metric construction in AddFilesProcedure#71220
Conversation
|
Hi @Youngwb , please help to |
Signed-off-by: chrisyguo <511955993@qq.com>
80b7252 to
1994e9a
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 2 / 2 (100.00%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
|
There was a problem hiding this comment.
Pull request overview
Fixes ORC metric extraction in the Iceberg add_files procedure by correcting how ORC Reader.getStatistics() entries are mapped to Iceberg fields, and adds a unit test to validate the expected mapping.
Changes:
- Adjust ORC statistics iteration to skip the file-level/root statistics entry at index 0.
- Offset ORC schema field-name lookup by
-1to align statistics indexes with top-level columns. - Add a unit test covering ORC metric extraction for a simple struct schema.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java | Updates ORC metrics extraction loop and column-name mapping to address off-by-one indexing. |
| fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java | Adds a reflective unit test validating ORC metrics (recordCount/valueCounts/nullValueCounts) mapping. |
|
Hey @kevincai, thanks for the feedback! What should I do to help push this PR forward? |
@Guosmilesmile |
|
@Mergifyio backport branch-4.1 branch-4.0 |
✅ Backports have been createdDetails
Cherry-pick of ca4bd59 has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
Cherry-pick of ca4bd59 has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
Signed-off-by: chrisyguo <511955993@qq.com> (cherry picked from commit ca4bd59) # Conflicts: # fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java
Signed-off-by: chrisyguo <511955993@qq.com> (cherry picked from commit ca4bd59) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java # fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java
|
@kevincai @stephen-shelby Thanks for the review! Do I need to backport it manually? |
I will take care of them. |
|
ignore backport branch-4.0, the code is only added since 4.1 |




Why I'm doing:
In AddFilesProcedure extractOrcMetrics, ORC's Reader.getStatistics() returns file-level stats at index 0, with actual column stats starting from index 1. The current code iterates from index 0, causing all column metrics to be shifted by one position.
What I'm doing:
Fix the off-by-one bug by starting the loop from colId = 1 and passing colId - 1 to getColumnNameFromOrcSchem to create correct metric in orc
Fixes #71222
Issue: #71222
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: