Skip to content

[BugFix] Fix ORC file metric construction in AddFilesProcedure#71220

Merged
kevincai merged 1 commit into
StarRocks:mainfrom
Guosmilesmile:fix_orc_metric
Apr 7, 2026
Merged

[BugFix] Fix ORC file metric construction in AddFilesProcedure#71220
kevincai merged 1 commit into
StarRocks:mainfrom
Guosmilesmile:fix_orc_metric

Conversation

@Guosmilesmile
Copy link
Copy Markdown
Contributor

@Guosmilesmile Guosmilesmile commented Apr 2, 2026

Why I'm doing:

In AddFilesProcedure extractOrcMetrics, ORC's Reader.getStatistics() returns file-level stats at index 0, with actual column stats starting from index 1. The current code iterates from index 0, causing all column metrics to be shifted by one position.

What I'm doing:

Fix the off-by-one bug by starting the loop from colId = 1 and passing colId - 1 to getColumnNameFromOrcSchem to create correct metric in orc

Fixes #71222
Issue: #71222

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

@Guosmilesmile
Copy link
Copy Markdown
Contributor Author

Hi @Youngwb , please help to
review if you have time. thx

Signed-off-by: chrisyguo <511955993@qq.com>
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

[FE Incremental Coverage Report]

pass : 2 / 2 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java 2 2 100.00% []

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 2, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
E Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ORC metric extraction in the Iceberg add_files procedure by correcting how ORC Reader.getStatistics() entries are mapped to Iceberg fields, and adds a unit test to validate the expected mapping.

Changes:

  • Adjust ORC statistics iteration to skip the file-level/root statistics entry at index 0.
  • Offset ORC schema field-name lookup by -1 to align statistics indexes with top-level columns.
  • Add a unit test covering ORC metric extraction for a simple struct schema.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
fe/fe-core/src/main/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java Updates ORC metrics extraction loop and column-name mapping to address off-by-one indexing.
fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java Adds a reflective unit test validating ORC metrics (recordCount/valueCounts/nullValueCounts) mapping.

@Guosmilesmile
Copy link
Copy Markdown
Contributor Author

Hey @kevincai, thanks for the feedback! What should I do to help push this PR forward?

@kevincai
Copy link
Copy Markdown
Contributor

kevincai commented Apr 7, 2026

Hey @kevincai, thanks for the feedback! What should I do to help push this PR forward?

@Guosmilesmile
I am trying to figure out the ORC spec that stats about the columns statistics structure where the stats[0] is the file stat instead of column stats

@kevincai kevincai merged commit ca4bd59 into StarRocks:main Apr 7, 2026
90 checks passed
@kevincai
Copy link
Copy Markdown
Contributor

kevincai commented Apr 7, 2026

@Mergifyio backport branch-4.1 branch-4.0

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 7, 2026

backport branch-4.1 branch-4.0

✅ Backports have been created

Details

Cherry-pick of ca4bd59 has failed:

On branch mergify/bp/branch-4.1/pr-71220
Your branch is up to date with 'origin/branch-4.1'.

You are currently cherry-picking commit ca4bd593ee.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   fe/fe-core/src/main/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Cherry-pick of ca4bd59 has failed:

On branch mergify/bp/branch-4.0/pr-71220
Your branch is up to date with 'origin/branch-4.0'.

You are currently cherry-picking commit ca4bd593ee.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	deleted by us:   fe/fe-core/src/main/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java
	deleted by us:   fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java

no changes added to commit (use "git add" and/or "git commit -a")

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify Bot pushed a commit that referenced this pull request Apr 7, 2026
Signed-off-by: chrisyguo <511955993@qq.com>
(cherry picked from commit ca4bd59)

# Conflicts:
#	fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java
mergify Bot pushed a commit that referenced this pull request Apr 7, 2026
Signed-off-by: chrisyguo <511955993@qq.com>
(cherry picked from commit ca4bd59)

# Conflicts:
#	fe/fe-core/src/main/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedure.java
#	fe/fe-core/src/test/java/com/starrocks/connector/iceberg/procedure/AddFilesProcedureTest.java
@Guosmilesmile
Copy link
Copy Markdown
Contributor Author

Guosmilesmile commented Apr 7, 2026

@kevincai @stephen-shelby Thanks for the review!

Do I need to backport it manually?

@kevincai
Copy link
Copy Markdown
Contributor

kevincai commented Apr 7, 2026

@kevincai @stephen-shelby Thanks for the review!

Do I need to backport it manually?

I will take care of them.

@github-actions github-actions Bot removed the 4.0 label Apr 7, 2026
@kevincai
Copy link
Copy Markdown
Contributor

kevincai commented Apr 7, 2026

ignore backport branch-4.0, the code is only added since 4.1

@Guosmilesmile Guosmilesmile deleted the fix_orc_metric branch April 7, 2026 13:16
wanpengfei-git pushed a commit that referenced this pull request Apr 7, 2026
…ort #71220) (#71379)

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: GuoYu <511955993@qq.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AddFilesProcedure incorrectly retrieves metrics for ORC files

5 participants