Skip to content

fix(transport): remove nursery param from IListener.listen (adopts #1055)#1308

Open
yashksaini-coder wants to merge 11 commits intolibp2p:mainfrom
yashksaini-coder:transport-listener-nursery
Open

fix(transport): remove nursery param from IListener.listen (adopts #1055)#1308
yashksaini-coder wants to merge 11 commits intolibp2p:mainfrom
yashksaini-coder:transport-listener-nursery

Conversation

@yashksaini-coder
Copy link
Copy Markdown
Contributor

@yashksaini-coder yashksaini-coder commented Apr 13, 2026

Summary

Fixes #1314 — removes the caller-supplied trio.Nursery from IListener.listen(). Listeners now own their background task lifetime internally and are cancelled via close(). See Discussion #726 (Issue 1) for context.

Supersedes #1055. The original work by @codemaestro64 has been adopted, rebased on current main, reconciled with upstream drift, and extended. Every refactor commit carries Co-authored-by: Michael Eze <code.maestro64@gmail.com> so authorship history is preserved.

Why a new PR

PR #1055 has been sitting with merge conflicts since November 2025 and the author has not been active on it. This branch:

  • Is clean against current main
  • Reconciles the return-type drift (PR Fix transport layer #1055 still returned bool; main now returns None and raises OpenConnectionError)
  • Captures startup exceptions properly instead of silently swallowing them
  • Adds the missing newsfragment (newsfragments/1314.breaking.rst)
  • Awaits the background system task on close() so callers observe a fully quiesced listener on return
  • Updates the Swarm shutdown path to explicitly close every listener

Approach

Listeners spawn their internal nursery as a trio.lowlevel.spawn_system_task and track readiness via a trio.Event. close() cancels the scope and awaits a matching _stopped event so the background task has actually wound down before the call returns.

TCPListener owns a single long-lived nursery across repeated listen() calls — each additional listen() schedules another nursery.start(serve_tcp, ...) into the same nursery, so close() cancels all binds atomically. A _closed flag rejects post-close listen() calls with OpenConnectionError, and a _lifecycle_lock serializes concurrent listen() / close() callers. This fixes the multi-bind lifecycle ambiguity @acul71 flagged in review.

async def listen(self, maddr):
    # ... parse port/host ...
    async with self._lifecycle_lock:
        if self._closed:
            raise OpenConnectionError("listener is closed")
        if self._nursery is None:
            await self._spawn_background_task()   # first call only
        started = await self._nursery.start(serve_tcp, ...)
        self.listeners.extend(started)

QUIC is a special case: when a transport-level background nursery is already configured (set_background_nursery, the common path under Swarm), the listener piggybacks on it. Otherwise it uses the same system-task pattern.

Files changed

  • libp2p/abc.py — drop nursery from IListener.listen
  • libp2p/transport/tcp/tcp.py — single long-lived internal nursery + system task, multi-bind lifecycle fixed
  • libp2p/transport/websocket/listener.py — same pattern
  • libp2p/transport/quic/listener.py — prefer transport background nursery, fall back to self-spawned
  • libp2p/relay/circuit_v2/transport.py — drop unused param
  • libp2p/network/swarm.py — rename listener_nursery to background_nursery (it was never only for listeners), call listener.close() on shutdown
  • Test updates across TCP / WebSocket / QUIC / factories / integration — including explicit await listener.close() teardown in direct-listener tests and in raw_conn_factory
  • newsfragments/1314.breaking.rst

Testing

  • pytest tests/core/transport/test_tcp.py tests/core/io/ tests/utils/ tests/core/network/ tests/core/security/373 passed, 2 skipped
  • New test test_tcp_listener_close_cancels_all_binds covers multi-bind teardown, idempotent close(), and post-close listen() rejection
  • make lint / make typecheck clean

References

yashksaini-coder and others added 8 commits April 13, 2026 20:57
The caller-supplied nursery was a leaky abstraction that forced Swarm
to manage a listener_nursery on behalf of every transport. Listeners
should own their own background task lifetimes internally.

This commit updates the interface only; concrete implementations follow
in subsequent commits. Part of libp2p#726.

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
TCPListener now spawns its own internal nursery as a trio system task
in listen(). The nursery hosts trio.serve_tcp and stays open until
close() cancels it.

listen() drops the nursery parameter and returns once the socket is
bound (signalled via an internal trio.Event). Startup errors are
captured and re-raised as OpenConnectionError to preserve the existing
error contract.

close() cancels the internal nursery, closes all SocketListeners, and
waits for the background system task to finish so callers observe a
fully-quiesced listener on return. Safe to call multiple times.

Part of libp2p#726.

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
…tListener

Same pattern as TCPListener: listen() drops the nursery parameter and
spawns a trio system task that hosts an internal nursery. serve_websocket
runs under that nursery until close() cancels it. Startup errors are
captured and re-raised as OpenConnectionError.

close() now also awaits the background task's completion after cancelling
the nursery, so callers observe a fully-quiesced listener on return.

Part of libp2p#726.

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
QUIC is a special case: the transport already carries a background
nursery via set_background_nursery(). Prefer that nursery when
available (the common path when used through a Swarm), and fall back
to a self-spawned trio system task with its own internal nursery when
it isn't. Either way, the listen() API no longer requires a caller-
supplied nursery.

close() cancels the self-spawned nursery when it was used and waits
for the background task to finish.

Part of libp2p#726.

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
…istener.listen

The CircuitV2 listener never used the nursery parameter — it just
records the multiaddr to advertise. Drop the parameter to align with
the updated IListener interface.

Part of libp2p#726.

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
…series

Listeners now own their internal nursery, so Swarm no longer manages
listener_nursery / event_listener_nursery_created.  The outer nursery
in run() is still needed for transport-level background tasks (QUIC /
WebSocket set_background_nursery) and for the auto-connector, so it
is renamed to background_nursery / event_background_nursery_created to
reflect its actual purpose.

run() now also calls listener.close() on every listener during
shutdown, so their internal system-task nurseries are cancelled and
fully awaited.

listen() drops the nursery argument from its listener.listen() call.

Part of libp2p#726.

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
Updates the full test suite to match the new IListener.listen(maddr)
signature: TCP, WebSocket, QUIC listener/integration tests, shared
trio I/O fixtures, and the tests/utils/factories factory.

Part of libp2p#726.

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
Co-authored-by: Michael Eze <code.maestro64@gmail.com>
Copilot AI review requested due to automatic review settings April 13, 2026 15:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the caller-supplied trio.Nursery from IListener.listen() and updates transports/tests so listeners manage their own background task lifetime and are shut down via close().

Changes:

  • Update IListener.listen(maddr, nursery) to listen(maddr) and adjust all implementations/callers.
  • Move listener background task management into transport listeners (TCP/WebSocket/QUIC) and ensure Swarm shutdown explicitly closes listeners.
  • Update tests and add a breaking-change newsfragment.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
libp2p/abc.py Update IListener.listen abstract method signature/docs to remove nursery.
libp2p/transport/tcp/tcp.py TCP listener now spawns its own background system task/nursery and cancels via close().
libp2p/transport/websocket/listener.py WebSocket listener now runs serve_websocket in an internally owned background system task/nursery.
libp2p/transport/quic/listener.py QUIC listener no longer takes a nursery; prefers transport background nursery with owned-nursery fallback.
libp2p/relay/circuit_v2/transport.py Remove nursery parameter from Circuit v2 listener listen().
libp2p/network/swarm.py Rename listener nursery concepts to a general background_nursery, wait on new event, and close listeners on shutdown.
tests/utils/factories.py Update factory usage to call listen(maddr) without a nursery.
tests/core/transport/test_tcp.py Update TCP tests to call listen(maddr) without a nursery.
tests/core/transport/websocket/test_websocket.py Update WebSocket tests to call listen(maddr) without a nursery.
tests/core/transport/websocket/test_websocket_integration.py Update Swarm event name used to await background nursery readiness.
tests/core/transport/quic/test_listener.py Update QUIC listener tests to call listen(maddr) without a nursery.
tests/core/transport/quic/test_integration.py Update QUIC integration tests to call listen(maddr) without a nursery.
tests/core/io/test_trio_real_connection.py Update real TCP connection tests to call listen(maddr) without a nursery.
newsfragments/726.breaking.rst Document the breaking API change and required caller updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libp2p/transport/tcp/tcp.py Outdated
Comment thread libp2p/transport/quic/listener.py
Comment thread tests/utils/factories.py Outdated
Comment thread tests/core/io/test_trio_real_connection.py
Comment thread libp2p/transport/websocket/listener.py
Comment thread tests/core/transport/test_tcp.py Outdated
…ormat)

- test_listener.py, test_websocket.py: remove now-unused `nursery`
  variable from `async with trio.open_nursery() as nursery:` blocks
  (ruff F841) — listen() no longer takes a nursery parameter
- quic/listener.py: avoid re-declaring _owned_start_error with a type
  annotation in listen() (mypy no-redef); capture Event objects in
  local vars before entering the system-task closure so the `set()`
  calls resolve against non-Optional types (mypy union-attr)
- tcp/tcp.py: ruff format reformatting

Co-authored-by: Michael Eze <code.maestro64@gmail.com>
@acul71
Copy link
Copy Markdown
Contributor

acul71 commented Apr 21, 2026

Hi @yashksaini-coder

Thanks for the PR and the migration work here — overall this looks solid and checks pass locally.

One note from review: the TCP repeated listen() lifecycle semantics (multi-bind vs single-listen contract) still look ambiguous after the nursery ownership change. I think this is best handled in a focused follow-up issue/PR unless maintainers want to settle semantics in this PR.

Blocking items from my side are mainly process:

  1. please link a GitHub issue in the PR body (not only Discussion # TODO Analysis: Transport Issues #726),
  2. ensure newsfragment naming aligns with the linked issue number per project policy.

Some questions:

  • Do you want to support repeated listen() calls on the same TCPListener instance as a contract, or should this now be rejected explicitly?
  • Can you include explicit listener.close() cleanup in test/factory paths that create listeners directly, to validate the new ownership model?
  • Can you link a formal issue for this PR (not only Discussion # TODO Analysis: Transport Issues #726) per project policy?

Full review here:

PR Review: #1308

1. Summary of Changes

  • This PR removes the caller-provided nursery from IListener.listen() and moves listener lifecycle ownership into listeners themselves (TCP, WebSocket, QUIC fallback path).
  • It updates related call sites/tests and Swarm shutdown behavior to call listener.close() explicitly.
  • It references Discussion # TODO Analysis: Transport Issues #726 and supersedes Fix transport layer #1055, but does not reference a GitHub issue in the PR body.
  • Affected architecture areas: transport listener lifecycle (libp2p/transport/*), network orchestration (libp2p/network/swarm.py), and listener-invoking tests/factories.
  • Breaking change is documented with newsfragments/726.breaking.rst.

2. Branch Sync Status and Merge Conflicts

Branch Sync Status

  • Status: ℹ️ Ahead of origin/main
  • Details: 0 9 in branch_sync_status.txt (0 behind, 9 ahead)

Merge Conflict Analysis

  • No conflicts detected.
  • merge_conflict_check.log indicates clean merge against origin/main.

3. Strengths

  • Good architectural direction: listener internals now own long-lived background work instead of pushing nursery ownership to callers.
  • Startup error capture in listener listen() paths is explicit and wrapped in transport-specific exceptions.
  • Swarm run loop now proactively closes listeners on shutdown, matching new listener ownership model.
  • Validation quality is strong: local checks passed (make lint, make typecheck, make test, make linux-docs).
  • Breaking change is documented via a newsfragment.

4. Issues Found

Critical

  • File: PR metadata / process requirement
  • Line(s): N/A
  • Issue: PR does not reference a GitHub issue (Fixes #... / Closes #...), only Discussion # TODO Analysis: Transport Issues #726. Per project review policy in the provided prompt, this is a merge blocker.
  • Suggestion: Open/link a tracking issue and reference it in the PR body, then align newsfragment naming with the issue number.

Major

  • File: libp2p/transport/tcp/tcp.py
  • Line(s): 116-149
  • Issue: TCPListener.listen() resets shared lifecycle fields (_nursery, _started, _stopped, _start_error) on each call while tests still call listen() twice on the same listener. This can orphan prior background server tasks and make close/start coordination ambiguous.
  • Suggestion: Either (a) disallow repeated listen() on the same listener with a clear error, or (b) maintain a single owned nursery for all binds in that listener instance and start additional servers inside it.

libp2p/transport/tcp/tcp.py — lines 116-149

        # Reset state in case of a re-listen.
        self._started = trio.Event()
        self._stopped = trio.Event()
        self._start_error = None

        async def _run_server() -> None:
            try:
                async with trio.open_nursery() as nursery:
                    self._nursery = nursery
                    try:
                        started_listeners = await nursery.start(
                            serve_tcp,
                            handler,
                            tcp_port,
                            host_str,
                        )
                        self.listeners.extend(started_listeners)
                    except BaseException as error:
                        self._start_error = error
                    finally:
                        self._started.set()
            finally:
                self._stopped.set()
                self._nursery = None
  • File: tests/core/transport/test_tcp.py, tests/core/io/test_trio_real_connection.py, tests/utils/factories.py
  • Line(s): multiple
  • Issue: Tests/factory paths now start listeners that outlive caller nurseries (system task pattern) but do not consistently await listener.close() in cleanup/finally blocks.
  • Suggestion: Add explicit teardown (await listener.close()) for listeners created directly in tests/factories to avoid leakage across long test sessions and to validate the new close contract.

tests/core/transport/test_tcp.py — lines 41-48

    listener = transport.create_listener(handler)
    assert len(listener.get_addrs()) == 0
    result = await listener.listen(LISTEN_MADDR)
    assert result is None
    assert len(listener.get_addrs()) == 1
    result = await listener.listen(LISTEN_MADDR)
    assert result is None
    assert len(listener.get_addrs()) == 2

tests/core/io/test_trio_real_connection.py — lines 68-73

        transport = TCP()
        listener = transport.create_listener(on_accept)
        await listener.listen(LISTEN_MADDR)
        local_conn = await transport.dial(listener.get_addrs()[0])
        await ready.wait()

tests/utils/factories.py — lines 250-257

    tcp_transport = TCP()
    listener = tcp_transport.create_listener(tcp_stream_handler)
    await listener.listen(LISTEN_MADDR)
    listening_maddr = listener.get_addrs()[0]
    conn_0 = await tcp_transport.dial(listening_maddr)
    await event.wait()
    assert conn_0 is not None and conn_1 is not None

Minor

  • No additional minor issues identified beyond the above functional/lifecycle concerns.

5. Security Review

  • No direct cryptographic misuse, key handling regression, or unsafe subprocess/tempfile behavior introduced by this PR.
  • Main risk here is reliability/resource lifecycle rather than security exploitability.
  • Security impact: Low.

6. Documentation and Examples

  • Breaking change is documented via newsfragments/726.breaking.rst.
  • No further docs blockers detected specific to listener API migration in this diff.

7. Newsfragment Requirement

  • ✅ Newsfragment exists: newsfragments/726.breaking.rst.
  • Policy blocker: PR does not reference a GitHub issue number; only discussion # TODO Analysis: Transport Issues #726 is referenced.
  • Under the provided review policy, merge requires:
    1. linked issue in PR body (e.g., Fixes #...)
    2. newsfragment name aligned to issue number.

8. Tests and Validation

  • make lint: passed (no errors/warnings).
  • make typecheck: passed (mypy + pyrefly).
  • make test: passed — 2659 passed, 15 skipped.
  • make linux-docs: passed with -W (build succeeded).
  • No failing checks in local validation logs captured for this review.

9. Recommendations for Improvement

  • Enforce single-listen semantics or implement true multi-listen lifecycle ownership in TCPListener.
  • Add explicit listener teardown in direct listener tests/factories (finally: await listener.close()).
  • Add/link a formal GitHub issue and update PR body + newsfragment naming to satisfy merge policy.

10. Questions for the Author

  • Do you want to support repeated listen() calls on the same TCPListener instance as a contract, or should this now be rejected explicitly?
  • Can you include explicit listener.close() cleanup in test/factory paths that create listeners directly, to validate the new ownership model?
  • Can you link a formal issue for this PR (not only Discussion # TODO Analysis: Transport Issues #726) per project policy?

11. Overall Assessment

  • Quality Rating: Needs Work
  • Security Impact: Low
  • Merge Readiness: Needs fixes
  • Confidence: High

TCPListener used to reset per-call lifecycle state and open a fresh
nursery on every listen(), orphaning earlier server tasks so close()
could only cancel the last bind. Now the background system task and
nursery are created once on first listen() and reused for subsequent
binds. close() cancels them together, is idempotent, and listen()
after close() raises OpenConnectionError.

A trio.Lock serializes concurrent listen()/close() so the lazy spawn
can't race. Adds test_tcp_listener_close_cancels_all_binds covering
multi-bind teardown, idempotent close, and post-close rejection.

Fixes the multi-bind regression acul71 flagged in the PR libp2p#1308 review.
Renames the newsfragment to match issue libp2p#1314.
Tests that create TCPListener directly (test_trio_real_connection and
raw_conn_factory) now call await listener.close() before letting the
enclosing nursery cancel. This exercises the new listener-owns-its-
lifetime contract and stops leaking listener tasks between tests.

Addresses acul71's review on PR libp2p#1308.
@yashksaini-coder
Copy link
Copy Markdown
Contributor Author

Thanks @acul71 — all three blockers and the three questions are addressed. Pushed as two commits on top of the branch:

  • c24ca50fix(tcp): own a single long-lived nursery across repeated listen() calls
  • 61bc430test: close listener explicitly in direct-listener tests and factory

Major items

1. TCPListener.listen() lifecycle ambiguity (lines 116-149 of your review). Fixed in c24ca50. The listener now spawns its background system task + nursery once on the first listen() call and reuses the same nursery for every subsequent bind:

async with self._lifecycle_lock:
    if self._closed:
        raise OpenConnectionError(f"Cannot listen on {maddr}: listener is closed")
    if self._nursery is None:
        await self._spawn_background_task()   # first call only
    ...
    started = await self._nursery.start(serve_tcp, handler, tcp_port, host_str)
    self.listeners.extend(started)
  • _nursery, _nursery_ready, _stopped are initialised once in __init__ and never reset mid-life.
  • A trio.Lock (_lifecycle_lock) serialises concurrent listen()/close() callers so the lazy spawn can't race.
  • close() sets _closed = True under the lock, cancels the nursery, and is fully idempotent; a second call returns immediately without re-doing teardown.
  • listen() after close() now raises OpenConnectionError("listener is closed") rather than silently reopening.

2. Explicit listener.close() in direct-listener tests and factories. Fixed in 61bc430. Added try/finally: await listener.close() (or a moral equivalent before nursery.cancel_scope.cancel()) to:

  • tests/core/transport/test_tcp.py — all three existing listener tests
  • tests/core/io/test_trio_real_connection.py — all four tests that construct a listener directly
  • tests/utils/factories.pyraw_conn_factory now closes the listener in finally around the yield

3. Linked issue + newsfragment naming. Opened #1314 to track the breaking change, updated the PR body with Fixes #1314, and renamed newsfragments/726.breaking.rstnewsfragments/1314.breaking.rst.

Your questions

Do you want to support repeated listen() calls on the same TCPListener instance as a contract, or should this now be rejected explicitly?

Supported as contract — option (b) from your Major item. TCPListener already stored listeners: list[...] and existing tests asserted multi-bind works; option (a) would have churned the API on top of the already-breaking change. The new test below locks this in.

Can you include explicit listener.close() cleanup in test/factory paths that create listeners directly, to validate the new ownership model?

Done — see commit 61bc430. New test test_tcp_listener_close_cancels_all_binds validates the close contract end-to-end:

await listener.listen(LISTEN_MADDR)
await listener.listen(LISTEN_MADDR)
assert len(listener.get_addrs()) == 2

await listener.close()
assert listener.get_addrs() == ()

await listener.close()                                   # safe no-op
with pytest.raises(OpenConnectionError, match="listener is closed"):
    await listener.listen(LISTEN_MADDR)                  # post-close rejected

Can you link a formal issue for this PR (not only Discussion #726) per project policy?

Done — #1314. PR body now opens with Fixes #1314 and keeps the Discussion link for context.

Validation

  • make lint clean
  • make typecheck clean (mypy + pyrefly)
  • pytest tests/core/transport/test_tcp.py tests/core/io/ tests/utils/ tests/core/network/ tests/core/security/373 passed, 2 skipped

Ready for another look when you have a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove nursery parameter from IListener.listen() (breaking)

3 participants