pkg/dxf/importinto, pkg/executor: show conflict progress in SHOW IMPORT JOB#67551
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughReplaced per-byte Changes
Sequence Diagram(s)sequenceDiagram
participant Exec as Step Executor
participant Del as ConflictedDeleter
participant Hdl as BaseHandler
participant Col as execute.Collector
Exec->>Del: create deleter (pass Col = Exec)
Del->>Hdl: init handler (pass Col)
loop per KV pair
Hdl->>Hdl: Handle(kvPair)
Hdl->>Col: Processed(1, 0)
Col-->>Exec: Exec.summary.Processed += 1
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @D3Hunter. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Review Complete Findings: 0 issues ℹ️ Learn more details on Pantheon AI. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67551 +/- ##
================================================
+ Coverage 77.7488% 79.8656% +2.1167%
================================================
Files 1959 1992 +33
Lines 543393 556288 +12895
================================================
+ Hits 422482 444283 +21801
+ Misses 120070 110535 -9535
- Partials 841 1470 +629
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/ddl/backfilling_read_index.go (1)
492-494: Document thatProcessedis index-KV bytes in this step.
SubtaskSummary.Processedis step-dependent now, but the cleanup/metering path later consumes this field asindex_kv_bytes. A short comment here would make that contract obvious and reduce future misuse.As per coding guidelines, comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/ddl/backfilling_read_index.go` around lines 492 - 494, Add a short comment in the distTaskRowCntCollector.Processed method (or immediately above the SubtaskSummary.Processed update) documenting that the bytes argument and SubtaskSummary.Processed represent index-KV bytes for this step; explicitly state the contract that later cleanup/metering will consume this field as index_kv_bytes so future readers won't misuse it. Ensure the comment names the involved symbols: distTaskRowCntCollector.Processed and SubtaskSummary.Processed.pkg/dxf/framework/taskexecutor/execute/interface.go (1)
205-211: Document theProcessedinvariants on the collector API.
GetSpeedInTimeRange()assumes this counter is cumulative and uses one stable unit for the whole subtask. The new comment only says the meaning “may vary by scenario”, which leaves future executors room to mix units or reset semantics mid-subtask and silently corrupt speed/ETA. Please lock that constraint down here, and clarify whetherrowsstill means the physical row count during conflict handling.As per coding guidelines, comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs.Possible doc tweak
// Processed is used collects metrics. -// `processed` is the number of processed units, and `rows` is the number of rows processed. -// The meaning of `processed` may vary by scenario, for example: +// `processed` is the cumulative display unit for the current subtask. It must +// remain monotonic and use the same unit for the whole subtask because runtime +// speed/ETA are derived from deltas in this field. +// `rows` is the physical row count processed by the executor. +// The meaning of `processed` may vary by scenario, for example:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/dxf/framework/taskexecutor/execute/interface.go` around lines 205 - 211, Update the Processed comment on the collector API (the Processed(processed, rows int64) method) to state explicitly that `processed` is a monotonically increasing cumulative counter for the whole subtask and must use a single stable unit for the entire subtask lifetime (never reset or switch units mid-subtask), and that GetSpeedInTimeRange depends on this invariant; also clarify that `rows` is the physical row count during conflict handling (and otherwise represents the same cumulative-per-subtask semantic), and mention any concurrency expectations (e.g., callers may update this from different goroutines but updates must be monotonic).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/dxf/framework/taskexecutor/execute/interface.go`:
- Around line 89-91: The persisted `bytes` JSON key is being reused for
non-bytes counts; add an explicit unit field and keep the legacy `bytes` tag to
avoid silent misinterpretation: add a new string field (e.g. ProcessedUnit
`json:"processed_unit,omitempty"`) alongside the existing Processed int64 (leave
its json:"bytes,omitempty" tag for backward compatibility) and update all
writers (e.g. conflict step code that sets Processed) to also set ProcessedUnit
to "bytes" or "count" as appropriate; update readers to consult ProcessedUnit
before treating Processed as a byte size. Ensure the same change is applied to
the other analogous fields mentioned in the review (lines ~100-105).
---
Nitpick comments:
In `@pkg/ddl/backfilling_read_index.go`:
- Around line 492-494: Add a short comment in the
distTaskRowCntCollector.Processed method (or immediately above the
SubtaskSummary.Processed update) documenting that the bytes argument and
SubtaskSummary.Processed represent index-KV bytes for this step; explicitly
state the contract that later cleanup/metering will consume this field as
index_kv_bytes so future readers won't misuse it. Ensure the comment names the
involved symbols: distTaskRowCntCollector.Processed and
SubtaskSummary.Processed.
In `@pkg/dxf/framework/taskexecutor/execute/interface.go`:
- Around line 205-211: Update the Processed comment on the collector API (the
Processed(processed, rows int64) method) to state explicitly that `processed` is
a monotonically increasing cumulative counter for the whole subtask and must use
a single stable unit for the entire subtask lifetime (never reset or switch
units mid-subtask), and that GetSpeedInTimeRange depends on this invariant; also
clarify that `rows` is the physical row count during conflict handling (and
otherwise represents the same cumulative-per-subtask semantic), and mention any
concurrency expectations (e.g., callers may update this from different
goroutines but updates must be monotonic).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3055b4f7-91a0-4d1a-8435-d453493bcf6d
📒 Files selected for processing (25)
pkg/ddl/backfilling_clean_s3.gopkg/ddl/backfilling_read_index.gopkg/dxf/framework/storage/table_test.gopkg/dxf/framework/taskexecutor/execute/interface.gopkg/dxf/framework/taskexecutor/execute/interface_test.gopkg/dxf/importinto/collect_conflicts.gopkg/dxf/importinto/conflict_resolution.gopkg/dxf/importinto/conflictedkv/collector.gopkg/dxf/importinto/conflictedkv/collector_test.gopkg/dxf/importinto/conflictedkv/deleter.gopkg/dxf/importinto/conflictedkv/deleter_test.gopkg/dxf/importinto/conflictedkv/handler.gopkg/dxf/importinto/conflictedkv/handler_test.gopkg/dxf/importinto/job.gopkg/dxf/importinto/job_testkit_test.gopkg/dxf/importinto/planner.gopkg/dxf/importinto/scheduler.gopkg/dxf/importinto/task_executor.gopkg/executor/importer/chunk_process_testkit_test.gopkg/executor/importer/import.gopkg/executor/show.gopkg/executor/show_test.gopkg/lightning/backend/external/merge.gopkg/lightning/backend/external/onefile_writer_test.gopkg/lightning/backend/external/sort_test.go
| // Processed stores generic progress units, such as bytes processed or rows processed. | ||
| // Keep the JSON tag as "bytes" for backward compatibility with persisted summaries. | ||
| Processed int64 `json:"bytes,omitempty"` |
There was a problem hiding this comment.
Don't overload persisted bytes with a different unit.
Keeping the legacy JSON tag preserves decoding, but it also changes the meaning of persisted state. Conflict steps will now write counts under bytes, so any step-agnostic or mixed-version reader that still formats this field as a size will silently report bogus progress/speed. This should be versioned explicitly, or paired with an explicit unit, instead of reusing the bytes key.
Also applies to: 100-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/dxf/framework/taskexecutor/execute/interface.go` around lines 89 - 91,
The persisted `bytes` JSON key is being reused for non-bytes counts; add an
explicit unit field and keep the legacy `bytes` tag to avoid silent
misinterpretation: add a new string field (e.g. ProcessedUnit
`json:"processed_unit,omitempty"`) alongside the existing Processed int64 (leave
its json:"bytes,omitempty" tag for backward compatibility) and update all
writers (e.g. conflict step code that sets Processed) to also set ProcessedUnit
to "bytes" or "count" as appropriate; update readers to consult ProcessedUnit
before treating Processed as a byte size. Ensure the same change is applied to
the other analogous fields mentioned in the review (lines ~100-105).
There was a problem hiding this comment.
it serves due as a progressed data unit, either bytes or rows
There was a problem hiding this comment.
@D3Hunter, understood — the Processed / bytes field is intentionally a generic progress-unit counter (bytes during encode/merge, conflict-row count during conflict steps), and the step-aware formatting in RuntimeInfo already dispatches on step type to present the correct unit to consumers. I'll note this for future reviews.
✏️ Learnings added
Learnt from: D3Hunter
Repo: pingcap/tidb PR: 67551
File: pkg/dxf/framework/taskexecutor/execute/interface.go:89-91
Timestamp: 2026-04-08T08:19:36.935Z
Learning: In `pkg/dxf/framework/taskexecutor/execute/interface.go`, `SubtaskSummary.Processed` (JSON tag `"bytes"`) and `Progress.Processed` (JSON tag `"bytes"`) are intentionally generic "progressed data unit" fields: they store bytes during encode/merge steps and conflict-row counts during `ImportStepCollectConflicts` / `ImportStepConflictResolution` steps. The `bytes` JSON tag is kept for backward compatibility only. Correct unit interpretation is handled by the step-aware `RuntimeInfo` rendering layer in `pkg/dxf/importinto/job.go`. Do not flag this dual-use as a bug or request a separate unit field.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: joechenrh
Repo: pingcap/tidb PR: 67054
File: pkg/lightning/mydump/parquet_parser.go:542-545
Timestamp: 2026-03-17T06:01:45.974Z
Learning: In `pkg/lightning/mydump/parquet_parser.go`, `ParquetParser.lastRow.SkipCast` intentionally aliases the mutable `pp.skipCast` buffer (same as `Row.Row` aliasing the pool datum slice). Both are only valid until the next `ReadRow()` call. This is by design: callers (e.g., the encode path in `pkg/executor/importer/chunk_process.go`) consume `LastRow().SkipCast` immediately before the next read. Do not flag the direct assignment `pp.lastRow.SkipCast = pp.skipCast` as an aliasing bug.
Learnt from: joechenrh
Repo: pingcap/tidb PR: 66878
File: tests/realtikvtest/importintotest/import_into_test.go:253-267
Timestamp: 2026-03-11T06:26:14.065Z
Learning: In pingcap/tidb, PR `#66878` (cherry-pick of `#58401` to release-7.5) also bundles the follow-up nil-check fix from PR `#63146`. `TestOnUpdateColumn` in `tests/realtikvtest/importintotest/import_into_test.go` was cherry-picked from PR `#63146` and tests that `IMPORT INTO` on a table with an `ON UPDATE CURRENT_TIMESTAMP` column does NOT panic due to a nil `onDup` map in `resolveGeneratedColumns`. It is NOT intended to cover the transitive generated-column / index-inconsistency fix from `#58401/`#58400.
Learnt from: CR
Repo: pingcap/tidb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T12:48:57.065Z
Learning: Applies to pkg/executor/** : For SQL behavior changes in executor, perform targeted unit test plus relevant integration test.
Learnt from: joechenrh
Repo: pingcap/tidb PR: 66878
File: pkg/planner/core/logical_plan_builder.go:6455-6472
Timestamp: 2026-03-11T06:29:00.122Z
Learning: Ensure code reviews verify the VirtualAssignmentsOffset semantics: the planner sets Update.VirtualAssignmentsOffset = len(update.List). The executor should only apply OrderedList[:VirtualAssignmentsOffset] when composing new rows, and only after the 'changed' check should it group/evaluate OrderedList[VirtualAssignmentsOffset:] per table. This pattern applies to files under pkg/planner/core and pkg/executor (e.g., common_plans.go and update.go). Reviewers should check that updates respect slicing behavior, that the offset is consistently derived from the planner, and that downstream code does not bypass the offset when creating new rows. Add tests validating both branches: the slice before the offset for new rows, and the per-table handling of the slice after the offset.
|
🔍 Starting code review for this PR... |
ingress-bot
left a comment
There was a problem hiding this comment.
This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.
Summary
- Total findings: 7
- Inline comments: 7
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
⚠️ [Major] (4)
- Conflict progress contract says rows, but the implementation records conflict KV pairs (pkg/dxf/framework/taskexecutor/execute/interface.go:210, pkg/dxf/importinto/conflictedkv/handler.go:128, pkg/dxf/importinto/conflictedkv/handler_test.go:235, pkg/ingestor/engineapi/engine.go:76)
conflictedkvnow leaks taskexecutor collector contract across package boundaries (pkg/dxf/importinto/conflictedkv/handler.go:88, pkg/dxf/importinto/conflictedkv/collector.go:117, pkg/dxf/importinto/conflictedkv/deleter.go:76, pkg/dxf/importinto/collect_conflicts.go:71, pkg/dxf/importinto/conflict_resolution.go:51)- Per-record shared progress atomic creates a conflict-handling hotspot (pkg/dxf/importinto/conflictedkv/handler.go:128, pkg/dxf/importinto/collect_conflicts.go:204, pkg/dxf/importinto/conflict_resolution.go:143)
- SHOW IMPORT JOB changes existing size/speed field units during rolling upgrade (pkg/dxf/importinto/job.go:234, pkg/dxf/importinto/job.go:249, pkg/executor/show.go:2569)
🟡 [Minor] (3)
- Retry/replay parity for new conflict progress accounting is not covered (pkg/dxf/importinto/conflictedkv/handler.go:128, pkg/dxf/importinto/conflictedkv/handler_test.go:142, pkg/dxf/importinto/conflictedkv/handler_test.go:235)
- Conflict-step totals have no fallback when reading pre-upgrade task metadata (pkg/dxf/importinto/job.go:344, pkg/executor/importer/import.go:379)
- Summary comment now overstates unit symmetry after adding conflict-step summaries (pkg/executor/importer/import.go:371, pkg/executor/importer/import.go:379, pkg/dxf/importinto/planner.go:717)
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: GMHDBJD, joechenrh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What problem does this PR solve?
Issue Number: ref #66019
Problem Summary:
SHOW IMPORT JOBdid not expose meaningful progress for global-sort conflict handling steps (collect-conflictsandconflict-resolution). The progress columns were effectively byte-oriented and did not reflect conflict processing work units.What changed and how does it work?
This PR wires conflict-step progress into the existing runtime reporting path without changing
SHOW IMPORT JOBschema:CollectConflictsSummaryResolveConflictsSummaryRuntimeInfo:Processed/Totalas<n> conflicts<n> conflicts/sSHOW IMPORT JOBrendering to consume runtime speed formatter (SpeedStr).SHOW IMPORT JOBoutput for conflict steps.Check List
Tests
this is how it displays in SHOW
Unit test commands run:
./tools/check/failpoint-go-test.sh pkg/dxf/importinto -run 'TestShowImportProgress|TestGetTaskImportedRows' -count=1./tools/check/failpoint-go-test.sh pkg/executor -run TestFillOneImportJobInfo -count=1make lintSide effects
Documentation
Release note
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests