Skip to content

fix(actor): drain Actor::tasks JoinSet in run_async#96

Merged
dignifiedquire merged 1 commit into
n0-computer:mainfrom
cbenhagen:fix/actor-tasks-joinset-leak
May 8, 2026
Merged

fix(actor): drain Actor::tasks JoinSet in run_async#96
dignifiedquire merged 1 commit into
n0-computer:mainfrom
cbenhagen:fix/actor-tasks-joinset-leak

Conversation

@cbenhagen
Copy link
Copy Markdown
Contributor

Description

Actor::tasks is a JoinSet<()> that the three streaming-reply actions (Action::ListAuthors, Action::ListReplicas, ReplicaAction::GetMany) push tasks into via spawn_local, but run_async's tokio::select! had no join_next arm. Completed task headers accumulate for the lifetime of the actor and only get released when abort_all() runs at shutdown.

The fix adds the missing arm:

Some(res) = self.tasks.join_next(), if !self.tasks.is_empty() => {
    if let Err(err) = res {
        if !err.is_cancelled() {
            warn!(?err, "actor reply-streamer task panicked");
        }
    }
    continue;
}

This matches the canonical drain pattern used everywhere else in the crate:

  • LiveActor::run_inner drains three JoinSets in the same select!: running_sync_connect, running_sync_accept, and download_tasks (src/engine/live.rs).
  • GossipActor::progress drains active_tasks with the same is_cancelled() skip on JoinError (src/engine/gossip.rs).

Actor::tasks was the only JoinSet in the crate not following the pattern. There is also a pre-existing TODO in src/engine/live.rs that calls this out:

// TODO: abort_all and join_next all JoinSets to catch panics

The three spawn_local sites are kept as-is. iter_to_irpc is a sync-iter → async-mpsc bridge that can stall on a slow consumer, so inlining it would block the action queue (notably the MAX_COMMIT_DELAY timeout flush). The fix lives entirely in the select!.

A regression test is added next to the existing open_close test in src/actor.rs::tests. It fires 1000 calls of each streaming-reply shape (3000 streamer tasks total), drains every reply stream to completion, then asserts that Actor::tasks shrinks back to a small bound. Without the new arm the residual is ~3000; with it the count drops to near zero within tens of milliseconds. A #[cfg(test)]-gated Action::DebugTasksLen and SyncHandle::debug_tasks_len provide the introspection.

Breaking Changes

None.

Notes & open questions

The test introspection is #[cfg(test)]-only. Happy to instead expose a public wait_idle()-style helper if you'd prefer that shape.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.

The three streaming-reply actions push tasks into `Actor::tasks`
via `spawn_local`, but `run_async` had no `join_next` arm, so
completed task headers accumulated for the actor's lifetime and
only got released by `abort_all()` at shutdown. Add the missing
arm, matching the drain pattern already used by `LiveActor` and
`GossipActor`.
@n0bot n0bot Bot added this to iroh Apr 29, 2026
@github-project-automation github-project-automation Bot moved this to 🚑 Needs Triage in iroh Apr 29, 2026
Copy link
Copy Markdown
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@dignifiedquire dignifiedquire merged commit b4f985d into n0-computer:main May 8, 2026
24 of 25 checks passed
@github-project-automation github-project-automation Bot moved this from 🚑 Needs Triage to ✅ Done in iroh May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants