[regression](load) add some p0 case for load strict mode by jacktengg · Pull Request #62478 · apache/doris

jacktengg · 2026-04-14T05:20:19Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-04-14T05:20:25Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

jacktengg · 2026-04-14T05:20:28Z

run buildall

github-actions · 2026-04-15T01:34:13Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-04-15T01:34:15Z

PR approved by anyone and no changes requested.

yiguolei · 2026-04-15T01:38:19Z

review

yiguolei · 2026-04-15T01:58:28Z

/review

github-actions

Finding:

regression-test/suites/load_p0/insert/test_insert_strict_mode_and_filter_ratio.groovy: the new strict-mode insert into select case for NULL -> NOT NULL is asserting the wrong failure contract. In FE, successful insert into select jobs call AbstractInsertExecutor.checkStrictModeAndFilterRatio() after execution, and that path raises Insert has filtered data in strict mode whenever enable_insert_strict=true and filteredRows > 0 (fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/AbstractInsertExecutor.java:220-225). The paired non-strict case in this same test (insert_max_filter_ratio=0.2) already proves this scenario is counted as filtered rows rather than failing earlier with an executor error, so this strict assertion would lock in behavior that contradicts the current implementation.

Critical checkpoint conclusions:

Goal of the task: add regression coverage for strict mode and filter-ratio behavior in stream load and insert into select. The PR mostly does that, but the strict NULL -> NOT NULL insert into select assertion does not match the current code path, so the goal is not fully met yet.
Modification size and focus: yes, the PR is focused on regression coverage only.
Concurrency: not applicable; these changes are test-only and do not introduce concurrent code.
Lifecycle, static init, config, compatibility: not applicable; no new lifecycle, config, protocol, or storage-format changes.
Parallel code paths: checked against existing VALUES, TVF, and FE insert-executor behavior. One insert into select branch is inconsistent with the FE path that governs filtered-row handling.
Special conditions: the strict-mode and error-message distinction depends on whether the failure happens during execution or during post-exec filtered-row validation. This case is in the latter category.
Test coverage: coverage is expanded well overall, but this blocking assertion should be corrected before relying on the suite.
Test result files: the new .out updates are consistent with the new cases and appear auto-generated.
Observability, transaction, persistence, data write safety, performance: not applicable for this regression-only PR.
Other issues: none beyond the blocking assertion above.

github-actions · 2026-04-15T02:04:59Z

+        sql """
+        insert into test_insert_select_strict_mode_and_filter_ratio_dst select * from test_insert_select_strict_mode_and_filter_ratio_src;
+        """
+        exception """Insert has too many filtered data"""


This assertion looks inconsistent with the current insert into select implementation. For this branch, the non-strict sibling just above already shows the job completes and reports filtered rows, so the strict-mode failure should come from AbstractInsertExecutor.checkStrictModeAndFilterRatio() in FE. That code path raises Insert has filtered data in strict mode, not Insert has too many filtered data, and it does not append a url check either.

Suggested change

exception """Insert has too many filtered data"""

exception """Insert has filtered data in strict mode"""

jacktengg · 2026-04-15T02:55:47Z

/review

github-actions

No blocking findings from code review.

Critical checkpoint conclusions:

Goal / correctness: The PR expands regression coverage for load strict-mode and filter-ratio behavior, especially INSERT INTO SELECT cases in insert, tinyint overflow in stream_load, and expected error-message alignment in tvf. The added/updated suites and .out files are consistent with that goal. I did not see a contradictory expectation in the changed files.
Scope / focus: The modification is small and focused to regression assets only (regression-test/suites, .out, and one CSV input). No unrelated production-code churn is mixed in.
Concurrency: Not applicable; these are regression scripts and golden outputs only.
Lifecycle / static init: Not applicable.
Configuration: No new config items are added. The changes only exercise existing session variables and load properties.
Compatibility: No incompatible API, storage-format, or protocol changes are introduced.
Parallel code paths: The PR touches the relevant load-path regression suites in scope (insert, stream_load, tvf). Within this scope I did not find an obvious missing companion assertion that would make coverage inconsistent.
Special conditions: The tests intentionally cover strict-mode and filter-ratio thresholds with explicit expected failures/successes; the conditions are understandable from the case names and comments.
Test coverage: Coverage is materially improved for strict-mode/filter-ratio combinations. For this PR type, regression tests are the right level. I could not verify execution in this runner.
Modified test results: The updated .out files match the new query tags added in the suites. I did not find a mismatched or missing golden-result section in the modified files.
Observability: No observability changes are needed for this test-only PR.
Transaction / persistence / data-write correctness: No production logic changed, so there is no direct risk in FE/BE transaction or persistence behavior from this patch itself.
Performance: No production hot path changed.
Other issues: None identified during review.

Residual risk: since I did not execute the regression suite here, the remaining risk is limited to environment-dependent result drift rather than a clear logic problem in the patch itself.

jacktengg · 2026-04-15T03:12:18Z

run buildall

jacktengg · 2026-04-15T08:31:01Z

run buildall

github-actions · 2026-04-16T01:41:52Z

PR approved by at least one committer and no changes requested.

fixed

yiguolei · 2026-04-16T01:42:42Z

/review

github-actions

No blocking findings from code review.

Critical checkpoint conclusions:

Goal / correctness: The PR goal is to expand strict-mode and filter-ratio regression coverage for load-related paths, and to make the exception assertions match the cloud-mode truncated error contract. The changed suites and golden files are consistent with that goal, including the new INSERT INTO SELECT cases, the tinyint-overflow stream_load cases, and the updated tvf/insert exception checks.
Scope / focus: The modification is small and focused to regression assets only (regression-test/suites, .out, and one CSV input file). No unrelated production-code changes are mixed in.
Concurrency: Not applicable; these changes are test-only and do not introduce concurrent code.
Lifecycle / static initialization: Not applicable.
Configuration: No new config items are added. The tests only exercise existing session variables and load properties.
Compatibility: No incompatible API, protocol, symbol, or storage-format change is introduced.
Parallel code paths: The PR updates the relevant sibling suites in scope (insert, stream_load, tvf) and the expectations are consistent with the paths being exercised.
Special conditions: The cloud-mode error truncation is a non-intuitive case, and the added comments explain why the assertions must match first_error_msg fragments instead of relying on the older generic load-failure text.
Test coverage: Coverage is materially improved for strict-mode / max_filter_ratio combinations, especially for INSERT INTO SELECT and tinyint overflow. I did not execute the suites locally in this runner, but the PR checks for P0 Regression, cloud_p0, and NonConcurrent Regression are passing on the head SHA.
Modified test results: The updated .out files match the new query tags and expected result sets in the changed suites.
Observability / transaction / persistence / data-write correctness / FE-BE variable passing / performance: Not applicable for this regression-only PR.
Other issues: None identified during review.

Residual risk: limited to environment-dependent regression drift rather than an identified logic problem in this patch.

[regression](load) add some p0 case for load strict mode

8722a4e

yiguolei previously approved these changes Apr 15, 2026

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 15, 2026

github-actions Bot added the reviewed label Apr 15, 2026

github-actions Bot previously requested changes Apr 15, 2026

View reviewed changes

fix

6bc0c3d

jacktengg dismissed yiguolei’s stale review via 6bc0c3d April 15, 2026 02:53

github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Apr 15, 2026

github-actions Bot reviewed Apr 15, 2026

View reviewed changes

fix case failure in cloud mode

48bbf0a

yiguolei approved these changes Apr 16, 2026

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 16, 2026

github-actions Bot reviewed Apr 16, 2026

View reviewed changes

jacktengg merged commit 565ff39 into apache:master Apr 16, 2026
30 checks passed

mymeiyi mentioned this pull request Apr 21, 2026

[fix](wal) Fix WAL replay loading 0 rows due to columns being marked as missing #62658

Merged

	exception """Insert has too many filtered data"""
	exception """Insert has filtered data in strict mode"""

Conversation

jacktengg commented Apr 14, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 14, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

yiguolei commented Apr 15, 2026

Uh oh!

yiguolei commented Apr 15, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

jacktengg commented Apr 15, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jacktengg commented Apr 15, 2026

Uh oh!

jacktengg commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

yiguolei commented Apr 16, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants