Skip to content

[regression](load) add some p0 case for load strict mode#62478

Open
jacktengg wants to merge 2 commits intoapache:masterfrom
jacktengg:260414-add-load-case
Open

[regression](load) add some p0 case for load strict mode#62478
jacktengg wants to merge 2 commits intoapache:masterfrom
jacktengg:260414-add-load-case

Conversation

@jacktengg
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

yiguolei
yiguolei previously approved these changes Apr 15, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@yiguolei
Copy link
Copy Markdown
Contributor

review

@yiguolei
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding:

  1. regression-test/suites/load_p0/insert/test_insert_strict_mode_and_filter_ratio.groovy: the new strict-mode insert into select case for NULL -> NOT NULL is asserting the wrong failure contract. In FE, successful insert into select jobs call AbstractInsertExecutor.checkStrictModeAndFilterRatio() after execution, and that path raises Insert has filtered data in strict mode whenever enable_insert_strict=true and filteredRows > 0 (fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/AbstractInsertExecutor.java:220-225). The paired non-strict case in this same test (insert_max_filter_ratio=0.2) already proves this scenario is counted as filtered rows rather than failing earlier with an executor error, so this strict assertion would lock in behavior that contradicts the current implementation.

Critical checkpoint conclusions:

  • Goal of the task: add regression coverage for strict mode and filter-ratio behavior in stream load and insert into select. The PR mostly does that, but the strict NULL -> NOT NULL insert into select assertion does not match the current code path, so the goal is not fully met yet.
  • Modification size and focus: yes, the PR is focused on regression coverage only.
  • Concurrency: not applicable; these changes are test-only and do not introduce concurrent code.
  • Lifecycle, static init, config, compatibility: not applicable; no new lifecycle, config, protocol, or storage-format changes.
  • Parallel code paths: checked against existing VALUES, TVF, and FE insert-executor behavior. One insert into select branch is inconsistent with the FE path that governs filtered-row handling.
  • Special conditions: the strict-mode and error-message distinction depends on whether the failure happens during execution or during post-exec filtered-row validation. This case is in the latter category.
  • Test coverage: coverage is expanded well overall, but this blocking assertion should be corrected before relying on the suite.
  • Test result files: the new .out updates are consistent with the new cases and appear auto-generated.
  • Observability, transaction, persistence, data write safety, performance: not applicable for this regression-only PR.
  • Other issues: none beyond the blocking assertion above.

sql """
insert into test_insert_select_strict_mode_and_filter_ratio_dst select * from test_insert_select_strict_mode_and_filter_ratio_src;
"""
exception """Insert has too many filtered data"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion looks inconsistent with the current insert into select implementation. For this branch, the non-strict sibling just above already shows the job completes and reports filtered rows, so the strict-mode failure should come from AbstractInsertExecutor.checkStrictModeAndFilterRatio() in FE. That code path raises Insert has filtered data in strict mode, not Insert has too many filtered data, and it does not append a url check either.

Suggested change
exception """Insert has too many filtered data"""
exception """Insert has filtered data in strict mode"""

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 15, 2026
@jacktengg
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking findings from code review.

Critical checkpoint conclusions:

  • Goal / correctness: The PR expands regression coverage for load strict-mode and filter-ratio behavior, especially INSERT INTO SELECT cases in insert, tinyint overflow in stream_load, and expected error-message alignment in tvf. The added/updated suites and .out files are consistent with that goal. I did not see a contradictory expectation in the changed files.
  • Scope / focus: The modification is small and focused to regression assets only (regression-test/suites, .out, and one CSV input). No unrelated production-code churn is mixed in.
  • Concurrency: Not applicable; these are regression scripts and golden outputs only.
  • Lifecycle / static init: Not applicable.
  • Configuration: No new config items are added. The changes only exercise existing session variables and load properties.
  • Compatibility: No incompatible API, storage-format, or protocol changes are introduced.
  • Parallel code paths: The PR touches the relevant load-path regression suites in scope (insert, stream_load, tvf). Within this scope I did not find an obvious missing companion assertion that would make coverage inconsistent.
  • Special conditions: The tests intentionally cover strict-mode and filter-ratio thresholds with explicit expected failures/successes; the conditions are understandable from the case names and comments.
  • Test coverage: Coverage is materially improved for strict-mode/filter-ratio combinations. For this PR type, regression tests are the right level. I could not verify execution in this runner.
  • Modified test results: The updated .out files match the new query tags added in the suites. I did not find a mismatched or missing golden-result section in the modified files.
  • Observability: No observability changes are needed for this test-only PR.
  • Transaction / persistence / data-write correctness: No production logic changed, so there is no direct risk in FE/BE transaction or persistence behavior from this patch itself.
  • Performance: No production hot path changed.
  • Other issues: None identified during review.

Residual risk: since I did not execute the regression suite here, the remaining risk is limited to environment-dependent result drift rather than a clear logic problem in the patch itself.

@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants