Skip to content

fix(proto): Send CONNECTION_CLOSE also during the handshake#718

Closed
matheus23 wants to merge 2 commits into
mainfrom
matheus23/send-close-in-handshake
Closed

fix(proto): Send CONNECTION_CLOSE also during the handshake#718
matheus23 wants to merge 2 commits into
mainfrom
matheus23/send-close-in-handshake

Conversation

@matheus23

Copy link
Copy Markdown
Member

Description

There are places handling CONNECTION_CLOSE frames in noq-proto. One is in process_early_packet, which processes frames during the handshake, and process_packet, handling frames any other time.
When we received a CONNECTION_CLOSE frame over the wire, the latter would move the state to draining and set connection_close_pending = true. The former would do almost the same, except it wouldn't set the close frame to pending.
This meant that connections that were closed by the server during the handshake would close on the client side without it sending a CONNECTION_CLOSE frame itself. The server side was thus left hanging and wouldn't close "gracefully".

This PR just aligns the behavior between these two places.

Breaking Changes

None

Notes & open questions

Disclosure: Heavily assisted by GLM 5.2.

Change checklist

  • Self-review.
  • Tests if relevant.
  • All breaking changes documented.

@matheus23 matheus23 self-assigned this Jun 19, 2026
@matheus23 matheus23 added this to iroh Jun 19, 2026
@github-project-automation github-project-automation Bot moved this to 🚑 Needs Triage in iroh Jun 19, 2026
@github-actions

Copy link
Copy Markdown

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/718/docs/noq/

Last updated: 2026-06-19T13:20:38Z

@github-actions

Copy link
Copy Markdown

Performance Comparison Report

709f2cfef98fb4b581632c2c486220f946700d03 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5362.2 Mbps 7881.5 Mbps -32.0% 95.4% / 147.0%
medium-concurrent 5422.9 Mbps 7908.0 Mbps -31.4% 96.6% / 148.0%
medium-single 3822.1 Mbps 4592.2 Mbps -16.8% 97.6% / 149.0%
small-concurrent 3734.0 Mbps 5314.2 Mbps -29.7% 95.3% / 103.0%
small-single 3452.0 Mbps 4760.8 Mbps -27.5% 93.4% / 102.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3082.0 Mbps 4106.9 Mbps -25.0%
lan 782.4 Mbps 797.8 Mbps -1.9%
lossy 69.8 Mbps 69.8 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 27.3% slower on average

@matheus23

Copy link
Copy Markdown
Member Author

This makes the validate_then_reject_manually test flaky. I'm still looking into it. I suspect this test used to depend on the buggy behavior.

@matheus23 matheus23 moved this from 🚑 Needs Triage to 🏗 In progress in iroh Jun 19, 2026

@flub flub left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not test this in proto-only? The fix is only in proto. It is a bit strange to start testing protocol behaviour in noq itself, we don't really do that genereally.

@matheus23

Copy link
Copy Markdown
Member Author

No particular reason. Will change, if I actually want to do this.

@matheus23 matheus23 marked this pull request as draft June 22, 2026 14:13
@matheus23

Copy link
Copy Markdown
Member Author

I investigated the validate_then_reject_manually flakyness, and this PR is not quite the right way to handle this.

E.g. it might mean we only reciprocate the CONNECTION_CLOSE only in the Initial space, even though the server might've sent it to us in the Initial, Handshake and Data space.

The current code architecture isn't quite set up to start processing more frames properly after it received a CONNECTION_CLOSE in the Initial space.

This means there's this weird "limbo" state, where if both ends are noq, then connections that are closed fully in the Initial space (i.e. if the server side uses noq_proto::Endpoint::refuse) work, and if connections are full established and the server uses noq_proto::Connection::close work, but when you use noq::Connecting::drop(), you'll not close gracefully.

That said, this isn't terribly important so I might have to re-visit this later.

@flub

flub commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

just making sure you're aware this section exists: https://www.rfc-editor.org/rfc/rfc9000.html#name-immediate-close-during-the-

@matheus23

Copy link
Copy Markdown
Member Author

Prior to confirming the handshake, a peer might be unable to process 1-RTT packets, so an endpoint SHOULD send a CONNECTION_CLOSE frame in both Handshake and 1-RTT packets. A server SHOULD also send a CONNECTION_CLOSE frame in an Initial packet.

Why is this written so weirdly.

Anyways, yeah I've read this. It's still a bit unclear what to do to fix the main issue (drop(noq::Connecting) on the server side causes the client to leave the server hanging waiting for a CONNECTION_CLOSE response).

This PR would fix this particular case, but degrade other cases (when the server uses noq_proto::Endpoint::refuse, then it would get confused about a CONNECTION_CLOSE in the Initial space from the client).

I'll need to revisit this later.

@matheus23 matheus23 closed this Jun 29, 2026
@github-project-automation github-project-automation Bot moved this from 👍 Ready to ✅ Done in iroh Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants