Skip to content

fix(base): avoid self-deadlock during runtime drop#20103

Open
dqhl76 wants to merge 1 commit into
databendlabs:mainfrom
dqhl76:fix-runtime-leak-session
Open

fix(base): avoid self-deadlock during runtime drop#20103
dqhl76 wants to merge 1 commit into
databendlabs:mainfrom
dqhl76:fix-runtime-leak-session

Conversation

@dqhl76

@dqhl76 dqhl76 commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Fix a runtime self-deadlock that can happen when the last Arc<Runtime> is dropped from one of that runtime's own worker tasks.

Root Cause

The existing runtime design moves the real Tokio runtime into a wait-to-drop-* OS thread. Dropping Databend's outer Runtime sends a stop signal and then joins that thread.

That is safe when the drop happens from outside the runtime. It deadlocks when the last reference is dropped by one of the runtime's own worker tasks:

runtime worker
  -> Dropper::drop()
  -> send Stop
  -> join wait-to-drop thread

wait-to-drop thread
  -> shutdown/drop Tokio runtime
  -> wait for runtime worker to exit

The worker waits for wait-to-drop, while wait-to-drop waits for the worker.

In the observed query case, failed SQL can abort the outer pipeline early. That changes ownership order so a pruning task may hold the last Arc<PruningContext>. Dropping that context drops its pruning_runtime from inside a pruning-worker task, triggering the self-deadlock.

image

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions Bot added the pr-bugfix this PR patches a bug in codebase label Jul 3, 2026
@dqhl76 dqhl76 marked this pull request as ready for review July 3, 2026 09:04
@dqhl76 dqhl76 requested a review from zhang2014 July 3, 2026 09:20
@dqhl76 dqhl76 changed the title fix(runtime): avoid self-deadlock during runtime drop fix(base): avoid self-deadlock during runtime drop Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix this PR patches a bug in codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant