Skip to content

Keep in-flight async future results alive on drop#100

Draft
fallintoplace wants to merge 1 commit into
NVlabs:mainfrom
fallintoplace:fix/devicefuture-cancel-safety
Draft

Keep in-flight async future results alive on drop#100
fallintoplace wants to merge 1 commit into
NVlabs:mainfrom
fallintoplace:fix/devicefuture-cancel-safety

Conversation

@fallintoplace
Copy link
Copy Markdown

@fallintoplace fallintoplace commented May 29, 2026

Partially addresses #99.

This keeps the owned async launch path from dropping an in-flight stored result as soon as the future is dropped after its first poll.

What changed:

  • add DeviceFuture::Drop cleanup that synchronizes the assigned stream before releasing a stored in-flight result
  • if that cleanup cannot prove the stream is idle, log the cleanup failure and leak the stored result instead of risking an early drop
  • add small unit tests around cleanup ordering and the sync-failure behavior
  • document the blocking drop behavior in the cuda-async README

Why:

  • OwnedAsyncKernelLaunch::execute moves owned resources into DeviceFuture::result while the future is still pending
  • dropping that future after submission could then drop those resources while stream work was still in flight
  • for DeviceBox, that is especially risky because drop enqueues cuMemFreeAsync on a separate deallocator stream

Tradeoff:

  • dropping an in-flight future can now block long enough to synchronize the assigned stream

Scope note:

  • this PR is intentionally narrow and only covers the normal drop/cancellation path after first poll
  • it does not try to solve the separate "launch succeeded, callback registration failed" path yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant