Skip to content

Synchronously write HTTP responses#713

Merged
Kaliumhexacyanoferrat merged 6 commits intomainfrom
feature/sync-responses
Oct 30, 2025
Merged

Synchronously write HTTP responses#713
Kaliumhexacyanoferrat merged 6 commits intomainfrom
feature/sync-responses

Conversation

@Kaliumhexacyanoferrat
Copy link
Copy Markdown
Owner

No description provided.

@Kaliumhexacyanoferrat
Copy link
Copy Markdown
Owner Author

@MDA2AV would love to have your review on this. I know that ref structs would be faster but that is too much of a change for me right now.

@Kaliumhexacyanoferrat Kaliumhexacyanoferrat marked this pull request as ready for review October 28, 2025 15:36
@MDA2AV
Copy link
Copy Markdown
Collaborator

MDA2AV commented Oct 28, 2025

@MDA2AV would love to have your review on this. I know that ref structs would be faster but that is too much of a change for me right now.

Hmm interesting, so the idea is to go fully sync on the response which is nice, I see that you're using u8 literals for less allocations, that is the best performant solution for all the const data.

What is really missing is to use PipeWriter, while Stream has an overload to accept ReadOnlySpan<>:

public virtual void Write(ReadOnlySpan<byte> buffer)
{
    byte[] sharedBuffer = ArrayPool<byte>.Shared.Rent(buffer.Length);
    try
    {
        buffer.CopyTo(sharedBuffer);
        Write(sharedBuffer, 0, buffer.Length);
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(sharedBuffer);
    }
}

My guess would be that this aggressive array pooling will hurt performance, which wouldn't happen with PipeWriter.Write().
Edit: I've seen that for small allocations, pooling with ArrayPool can be actually slower than just allocating if the allocation is a gen0

My sugestion for performance test here would be to create a "short-circuit" response like a full u8 header and call Stream.Write() only once, and then compare to the current implementation, also testing with pipelined requests can amplify the results difference.

Output.Write(DateHeader.GetValue()); -> This can be improved to avoid calculating the date every call probably

As we discussed also before, some objects like ResponseHandler could be pooled but that can be a difficult adjustment that could require supporting parameterless constructors.

You can cache the entire Status line indexed by status code number, quite fast but would be a small gain.

Edit: Also try not to change a lot of things at once, it can hide perf loss/wins when benchmarking, would be interesting to see if the current changes have any performance changes, which environment are you using to benchmark?

Edit: ref structs are difficult to implement and aren't really going to improve much if you're fully using spans already. If you need a non GC'able memory you can try using pinned arrays, they are basically as fast as stack memory

@MDA2AV
Copy link
Copy Markdown
Collaborator

MDA2AV commented Oct 28, 2025

For the DateHelper check this impl

](https://github.com/MDA2AV/Wired.IO/blob/main/Wired.IO/Utilities/DateHelper.cs)

I adapted it from the asp net platform benchmark, cached values updated every 1 second

@Kaliumhexacyanoferrat
Copy link
Copy Markdown
Owner Author

Kaliumhexacyanoferrat commented Oct 29, 2025

Thanks, I have some caching for the date header, but it is a good idea to cache the whole header line at once. The lock is kind of suspicious to me here, as it is quite expensive, even if only executed once per second. Not for pipelining, but if you have thousands of requests at the same time running. I prefer to execute the calculation twice or more instead of synchronizing threads.

Regarding the Stream implementation: I agree that I need to investigate PipeWriter, but currently I have a PoolBufferedStream that caches all the small writes into larger buffer and writes them to the NetworkStream (or SslStream) as soon as this buffer overflows (or a flush is requested). Nevertheless, I had no overwrite for the ReadOnlySpan<> yet, so this should optimize:

    public override void Write(ReadOnlySpan<byte> buffer)
    {
       var count = buffer.Length;

       if (count < (Buffer.Length - Current))
       {
           buffer.CopyTo(Buffer.AsSpan(Current));
           Current += count;

           if (Current == Buffer.Length)
           {
               WriteBuffer();
           }
       }
       else
       {
           WriteBuffer();
           Stream.Write(buffer);
       }
    }

This buffer holds 65 kb by default, so this will cause most responses to be written synchronously to the buffer and then flushed once to the underlying stream.

- Add a specific overload for the newly used span based write methods
- Use specific stream type to avoid virtual calls
@MDA2AV
Copy link
Copy Markdown
Collaborator

MDA2AV commented Oct 29, 2025

Thanks, I have some caching for the date header, but it is a good idea to cache the whole header line at once. The lock is kind of suspicious to me here, as it is quite expensive, even if only executed once per second. Not for pipelining, but if you have thousands of requests at the same time running. I prefer to execute the calculation twice or more instead of synchronizing threads.

Regarding the Stream implementation: I agree that I need to investigate PipeWriter, but currently I have a PoolBufferedStream that caches all the small writes into larger buffer and writes them to the NetworkStream (or SslStream) as soon as this buffer overflows (or a flush is requested). Nevertheless, I had no overwrite for the ReadOnlySpan<> yet, so this should optimize:

    public override void Write(ReadOnlySpan<byte> buffer)
    {
       var count = buffer.Length;

       if (count < (Buffer.Length - Current))
       {
           buffer.CopyTo(Buffer.AsSpan(Current));
           Current += count;

           if (Current == Buffer.Length)
           {
               WriteBuffer();
           }
       }
       else
       {
           WriteBuffer();
           Stream.Write(buffer);
       }
    }

This buffer holds 65 kb by default, so this will cause most responses to be written synchronously to the buffer and then flushed once to the underlying stream.

I'll try a benchmark with the date, wonder how much overhead the lock adds.

I need to look deeper into this PoolBufferedStream, I'm still inclined to the PipeWriter because it is perfect for sync writing and you can easily swap between stream and pipe, these are just "views" to the memory, I've always thought as use Stream for async and heap allocated buffer manipulation and PipeWriter for sync, but let's not be fooled here, the memory is in the heap for both cases and it is GC managed.. If we want an unmanaged slab of memory in the heap we would need to deal with something like:
https://github.com/MDA2AV/Unhinged/blob/main/Unhinged/Utilities/UnmanagedMemoryManager.cs
which deals with pinned arrays that can't be moved by GC (which is my next target for Wired.IO), this came to me after doing some research on why these java frameworks are achieving so high results and found out java has a direct api to deal with pinned arrays.

I'd love to see actual numbers though, my guess would be that using Stream with sync writes or PipeWriter is likely to yield similar performance but sometimes we can be surprised

@Kaliumhexacyanoferrat
Copy link
Copy Markdown
Owner Author

Yes, this is basically what pipe writer does when writing to a stream. Instead of array pool they use a memory pool which also uses pinned memory. So there surely will be performance benefits. I will create an issue to switch to a pipe writer (either directly connected to the socket for HTTP or to the sslstream for HTTPS). With sockets this would be as efficient as it can be (in .NET SDK terms), with SSL it manages a pinned buffer which is written to the underlying stream when flushed.

@Kaliumhexacyanoferrat Kaliumhexacyanoferrat merged commit 27fac58 into main Oct 30, 2025
9 checks passed
@Kaliumhexacyanoferrat Kaliumhexacyanoferrat deleted the feature/sync-responses branch October 30, 2025 16:26
@Kaliumhexacyanoferrat Kaliumhexacyanoferrat removed this from the Version 10 milestone Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write response headers synchronously Switch to u8 literals where possible

2 participants