fix: concurrent INSERT with NULL unique key causes false duplicate errors#24054
Conversation
Review Summary by QodoFix concurrent INSERT with NULL unique key false duplicate errors
WalkthroughsDescription• Fix concurrent INSERT with NULL unique key causing false duplicate errors • Skip NULL values in duplicate checking (fuzzyCheck, checkPKDup) per SQL semantics • Replace vector aliasing with deep copy to prevent concurrent bitmap corruption • Serialize non-Connector PreScopes in TP queries to prevent data races • Add prepared flag to ValueScan to prevent double evaluation on nested MergeRun Diagramflowchart LR
A["NULL Handling Bug"] -->|Skip NULLs in fuzzyCheck| B["fuzzyCheck.go"]
A -->|Skip NULLs in checkPKDup| C["txn.go"]
A -->|Clone before sort in PKPersistedBetween| D["txn_table.go"]
E["Concurrent Vector Sharing"] -->|Use Dup instead of SetVector| F["preinsert.go"]
E -->|Serialize non-Connector PreScopes| G["scope.go"]
E -->|Add prepared flag| H["value_scan.go"]
B --> I["Fixed Duplicate Checking"]
C --> I
D --> I
F --> J["Fixed Data Races"]
G --> J
H --> J
File Changes1. pkg/sql/colexec/preinsert/preinsert.go
|
Code Review by Qodo
1. Fuzzy check empty keys
|
The race detector's type checker rejects uint16 on T_enum vectors. Use the proper types.Enum alias in both vectorToString and tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gouhongshen
left a comment
There was a problem hiding this comment.
The fix direction looks right, but I can't approve this yet because the regression proof is still too local to the helpers.
What's still missing before approval:
- A SQL-level regression that reproduces the historical false
Duplicate entryand proves the real user-visible path is fixed. - A concurrent
INSERT/UPDATEcase. The PR title is about concurrent insert with NULL unique keys, but the new tests stay single-threaded. - Coverage for the full
primaryKeysMayBeChanged -> PKExistInMemBetween -> PKPersistedBetweenpath, including a flushed/persisted-object case. - Coverage for the hidden unique-index/backfill path if that path is part of the intended fix surface.
Right now the tests prove the helper-level NULL filtering, but they do not yet prove that the full production path no longer misreports duplicates while still rejecting real non-NULL duplicates. That's the only blocking issue from this review.
XuPeng-SH
left a comment
There was a problem hiding this comment.
Response to Qodo Review
Both issues raised are false positives after careful investigation:
Issue 1: Fuzzy check empty keys — ✅ Already handled
fill() at lines 144-153 already guards against this:
if len(collision) > 0 {
f.cnt = len(collision[0])
} else {
f.cnt = 0
}
if f.cnt == 0 {
return nil // ← early return, backgroundSQLCheck never called
}When all rows are NULL, genCollsionKeys() produces empty collision slices → f.cnt = 0 → backgroundSQLCheck is skipped by the guard in compile.go line 568 (f.cnt > 0).
Issue 2: Early prescope operator free — ✅ Ownership transfer protects files
File ownership chain prevents premature deletion:
HashBuildtransfers spill fds to JoinMap at SendSucceed:ctr.spilledFds = nil(build.go:102)HashJointakes ownership viaTakeSpillBuildFds()(joinMapMsg.go:239-246)
Sequential TP execution ensures HashBuild completes and sends JoinMap before HashJoin starts reading.
- TestFillOnlyInsertHidden: hidden unique index (onlyInsertHidden) path - TestFillEndToEndConditionGeneration: full fill→condition pipeline - TestConcurrentFirstlyCheck: multi-goroutine concurrent firstlyCheck - TestConcurrentCheckPKDup: concurrent checkPKDup with NULLs - TestConcurrentCheckPKDup_RealDupWithNulls: dup detection under concurrency - TestDupVectorWithoutNulls_ConcurrentSafety: concurrent InplaceSort safety Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Integration tests added (commit 87b2c0d)Addressing all review feedback: 1. Full production path coverage (fill → condition → SQL generation)
2. Concurrent INSERT cases
3. Hidden unique index path
4. primaryKeysMayBeChanged path
All 16 new sub-tests pass. |
Merge Queue Status
This pull request spent 10 seconds in the queue, including 1 second running CI. Required conditions to merge
|
What type of PR is this?
Which issue(s) this PR fixes:
issue #24030
What this PR does / why we need it:
Two root causes:
1. NULL handling bug (logic error)
SQL defines NULL != NULL, but the duplicate checking code treated NULL values as comparable:
fuzzyCheck.firstlyCheck()/genCollsionKeys(): NULL Varlena serialized as empty string → multiple NULLs counted as duplicatescheckPKDupGeneric()/checkPKDup(): NULL entries not skipped for any typePKPersistedBetween():InplaceSort()reorders data but not the null bitmap → null positions become misaligned after sort2. Concurrent vector sharing (data race)
PreInsert.constructColBuf()usedSetVector()(pointer alias) instead ofDup()(deep copy)MergeRun()scopes cause reads/writes on shared bitmap structuresChanges:
fuzzyCheck.gofirstlyCheck/genCollsionKeys; fixvectorToStringper-row null checktxn.gocheckPKDupGeneric(all types) andcheckPKDup(string/array types)txn_table.gocloneNonNullKeys()to filter NULLs and break aliasing; clone beforeInplaceSortinPKPersistedBetweenWhat this PR does NOT include (compared to #24046):
bitmap.RecalculateCount()— unnecessary once concurrency is fixedEmptyByFlagremoval fromNulls.Contains()— masks deeper issueRange()Clone — performance hit on everyWindow()callGetUnionAllFunction— performance regressionappendOneFixedTryExpand — hot-path overheadTest coverage
Tests in
fuzzyCheck_test.go(27 sub-tests) andtxn_test.go(32 sub-tests):TestVectorToStringNullHandling— 13 types including per-row null checkTestFirstlyCheckNullValues— NULL-only, mixed, no-null, all-null batchesTestGenCollisionKeysNullHandling— NULL skip in collision key generationTestCheckPKDupGenericNullHandling— all fixed-width types + workspace/rawrowsTestCheckPKDupNullHandling— string/array types null skip