fix: 4 CN memory leak/growth issues (#24058 #24059 #24060 #24061)#24062
fix: 4 CN memory leak/growth issues (#24058 #24059 #24060 #24061)#24062mergify[bot] merged 2 commits intomatrixorigin:mainfrom
Conversation
Review Summary by QodoFix 4 CN memory leak and goroutine leak issues
WalkthroughsDescription• Fix aggState argCnt memory leak on allocation failure • Defer cleanup of duplicated vectors in constant-folding expressions • Add RemoveTid() method to prevent statsInfoMap unbounded growth • Add context cancellation check to select loop preventing goroutine leaks Diagramflowchart LR
A["Memory Leaks Identified"] --> B["aggState argCnt leak"]
A --> C["evalExpression Dup'd vectors"]
A --> D["statsInfoMap unbounded growth"]
A --> E["scope.go goroutine leak"]
B --> B1["Free argCnt on Alloc failure"]
C --> C1["Defer cleanup of ivecs"]
D --> D1["Add RemoveTid method"]
E --> E1["Add ctx.Done case to select"]
B1 --> F["Memory leaks fixed"]
C1 --> F
D1 --> F
E1 --> F
File Changes1. pkg/sql/colexec/aggexec/aggState.go
|
Code Review by Qodo
1. Engine/stats lock deadlock
|
9ec4e87 to
d1c46ea
Compare
1. aggState.go: Free argCnt on Alloc() failure path (matrixorigin#24061) 2. evalExpression.go: Defer cleanup for constant-fold ivecs (matrixorigin#24060) 3. engine.go: Move RemoveTid outside Engine lock to prevent deadlock (matrixorigin#24058) 4. stats.go: Add RemoveTid() to clean up statsInfoMap entries (matrixorigin#24059) Note: scope.go ctx.Done() fix was reverted — it caused ISCP executor test failures (TestISCPExecutor1/4/5, TestUpdateJobSpec, TestInitSql). The goroutine leak concern remains valid but needs a more nuanced fix. Fixes matrixorigin#24058 matrixorigin#24059 matrixorigin#24060 matrixorigin#24061 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Merge Queue Status
This pull request spent 58 minutes 9 seconds in the queue, including 57 minutes 53 seconds running CI. Required conditions to merge
|
What type of PR is this?
Which issue(s) this PR fixes:
Fixes #24058 #24059 #24060 #24061
What this PR does / why we need it:
Fixes 4 CN memory leak/growth issues:
argCntonAlloc()failure path — prevents mpool leak when arena allocation fails afterMakeSlicesucceeds (scope.go: missing ctx.Done() in select loop causes goroutine leak on query cancellation #24061)ivecsinGetExprZoneMap— prevents vector leak in constant folding (stats.go: statsInfoMap entries set to nil but never deleted — unbounded map growth #24060)RemoveTidcall outside Engine lock to prevent deadlock betweenEngine.muandGlobalStats.mu, and uncomment the previously disabledRemoveTidcall (aggState: mpool.MakeSlice leak when subsequent Alloc fails #24058)RemoveTid()method — cleans upstatsInfoMapentries when tables are unsubscribed, preventing unbounded map growth (evalExpression: partial vector leak in constant-folding Dup() loop #24059)Tests added:
aggexec_test.go: 3 sub-tests — normal init/free, real Alloc failure path (limited mpool), non-savearg pathstats_test.go: 3 sub-tests — remove existing entries, nonexistent table, broadcast wakeupstats_test.go: 2 sub-tests — TestCleanMemoryTableWithTable (partition+stats removal, missing partition)evalExpression_test.go: TestGetExprZoneMapConstantFold — exercises the constant-fold defer cleanup viaabs(-42)Not included (reverted):
ctx.Done()check inmergeRunSelectLoopwas reverted: caused ISCP executor test failures because pre-scope goroutines already handle context cancellation internally. Issue aggState: mpool.MakeSlice leak when subsequent Alloc fails #24058 remains open for the goroutine leak concern.