Skip to content

Commit 0cfc37f

Browse files
committed
Add buffers rationale docs
1 parent 2b3fe69 commit 0cfc37f

4 files changed

Lines changed: 983 additions & 0 deletions

File tree

doc/buffers-asio.md

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# Buffer Sequence Theory
2+
3+
This document explains Asio's buffer sequence abstraction - what it is, what rules govern it, and how users extend it with their own types.
4+
5+
## The Buffer Primitive
6+
7+
A buffer is a pointer and a size. It describes a contiguous region of memory without owning it.
8+
9+
Asio defines two buffer types:
10+
11+
- `mutable_buffer` - writable memory (`void*` + `size_t`)
12+
- `const_buffer` - read-only memory (`const void*` + `size_t`)
13+
14+
Both expose two member functions: `data()` returns the pointer, `size()` returns the byte count.
15+
16+
`mutable_buffer` is implicitly convertible to `const_buffer` (writable memory can always be read). The reverse conversion is disallowed - you cannot write to read-only memory.
17+
18+
The pointer type is `void*`, not `std::byte*`. This is deliberate. POSIX uses `void*` in its I/O structures (`iovec`) for semantic neutrality - raw I/O should not opine on what the memory contains. The buffer types preserve this neutrality.
19+
20+
These types are non-owning descriptors. They reference memory but do not manage its lifetime. Creating a buffer from a pointer does not allocate, copy, or extend the life of anything. The caller is responsible for ensuring the memory remains valid while the buffer is in use.
21+
22+
## Why Sequences
23+
24+
Operating systems support scatter-gather I/O. A gather-write (`writev` on POSIX, scatter/gather with IOCP on Windows) transmits multiple buffers in a single syscall. A scatter-read (`readv`) receives data into multiple buffers at once.
25+
26+
This is important for performance. Consider sending an HTTP response: the status line is in one buffer, each header in another, the body in yet another. Without scatter-gather, you must copy everything into a single contiguous allocation before writing. With scatter-gather, you pass all the buffers to one syscall and the kernel handles the rest.
27+
28+
A buffer sequence is the abstraction that represents this collection of buffers. It is the C++ type that maps to the array of `iovec` structures that the OS expects.
29+
30+
## The Abstraction
31+
32+
A buffer sequence is any type that produces a bidirectional iteration of buffers.
33+
34+
More precisely: a type `T` is a buffer sequence if the free functions `buffer_sequence_begin(t)` and `buffer_sequence_end(t)` return bidirectional iterators whose value type is convertible to `const_buffer` (for read operations) or `mutable_buffer` (for write-into operations).
35+
36+
### Customization Points
37+
38+
`buffer_sequence_begin` and `buffer_sequence_end` are free functions that serve as customization points. For standard containers, Asio provides default overloads that call `begin()` and `end()`. For user-defined types, the user provides overloads found via ADL (argument-dependent lookup).
39+
40+
This is the same customization pattern used throughout Asio. The type's namespace determines which overload is found. Wrapping a buffer sequence in a type-erasing container (like stuffing it into a lambda or a `std::function`) destroys the type information that ADL needs, breaking the mechanism.
41+
42+
### Why Bidirectional
43+
44+
The iterators must be at least bidirectional - not merely forward. Two reasons:
45+
46+
1. Algorithms that consume buffer sequences sometimes need to traverse backwards. When removing a prefix from a buffer sequence (consuming bytes from the front after a partial read), the implementation may need to adjust the first unconsumed buffer.
47+
48+
2. A read or write operation fills or drains buffers in order, front to back. If the operation is interrupted partway through a buffer, the implementation needs to locate that buffer and adjust its starting position for the next call. Bidirectional iteration simplifies this bookkeeping.
49+
50+
Forward-only ranges do not satisfy the buffer sequence requirements.
51+
52+
### The Single-Buffer Case
53+
54+
A lone `const_buffer` or `mutable_buffer` is itself a valid buffer sequence - a sequence of exactly one element. Asio provides overloads of `buffer_sequence_begin` and `buffer_sequence_end` that return a pointer to the buffer and a pointer one past it, respectively. This makes a single buffer act like a one-element array.
55+
56+
This unification matters: any function that accepts a buffer sequence also accepts a single buffer. There is no need for separate overloads.
57+
58+
```cpp
59+
template<ConstBufferSequence Buffers>
60+
void send(const Buffers& buffers);
61+
62+
const_buffer single = ...;
63+
send(single); // one buffer
64+
65+
std::array<const_buffer, 3> multiple = ...;
66+
send(multiple); // three buffers
67+
```
68+
69+
Both calls use the same function template. The concept is satisfied in both cases.
70+
71+
## The Formal Rules
72+
73+
A type `X` satisfies `ConstBufferSequence` if:
74+
75+
- `X` is `Destructible` and `CopyConstructible`
76+
- `buffer_sequence_begin(x)` and `buffer_sequence_end(x)` return bidirectional iterators whose value type is convertible to `const_buffer`
77+
- After copy construction `X u(x)`, the sequence of buffers in `u` is identical to the sequence in `x` - each corresponding buffer has the same `data()` pointer and the same `size()`
78+
79+
A type `X` satisfies `MutableBufferSequence` if the same rules hold with `mutable_buffer` in place of `const_buffer`.
80+
81+
Every `MutableBufferSequence` is automatically a `ConstBufferSequence`, because `mutable_buffer` converts to `const_buffer`. A function that accepts `ConstBufferSequence` will accept mutable buffer sequences without any additional work.
82+
83+
### The Copy Postcondition
84+
85+
The third rule deserves emphasis. After copying a buffer sequence, the copy must describe the exact same memory regions as the original. Same pointers. Same sizes. The copy is shallow - it duplicates the descriptors, not the bytes they point at.
86+
87+
This means a buffer sequence cannot own the memory it describes. If a type held an internal `std::string` and yielded a `const_buffer` pointing at that string's data, copying the type would copy the string to a new address. The copy's `data()` pointers would differ from the original's, violating the postcondition. Buffer sequences must reference externally-owned memory.
88+
89+
## What Already Satisfies the Requirements
90+
91+
Any standard bidirectional container of buffers works:
92+
93+
```cpp
94+
std::array<const_buffer, 4> bufs; // fixed-size, stack-allocated
95+
std::vector<mutable_buffer> bufs; // dynamic
96+
std::list<const_buffer> bufs; // linked, bidirectional
97+
```
98+
99+
These types are `CopyConstructible`, their `begin()`/`end()` return bidirectional iterators, and their value types convert to the appropriate buffer type. Asio's default overloads of `buffer_sequence_begin`/`buffer_sequence_end` delegate to the container's own iterators.
100+
101+
A single `const_buffer` or `mutable_buffer` also satisfies the requirements, as described above.
102+
103+
A `std::forward_list<const_buffer>` does not qualify - its iterators are forward-only, not bidirectional.
104+
105+
## Writing Your Own Buffer Sequence
106+
107+
There are two ways to make a user-defined type satisfy the buffer sequence requirements.
108+
109+
### Provide begin() and end() Members
110+
111+
If your type behaves like a container - it has `begin()` and `end()` member functions returning bidirectional iterators over buffers - then Asio's default `buffer_sequence_begin`/`buffer_sequence_end` overloads will find them automatically:
112+
113+
```cpp
114+
class header_buffers
115+
{
116+
const_buffer bufs_[3];
117+
118+
public:
119+
header_buffers(
120+
const_buffer status_line,
121+
const_buffer headers,
122+
const_buffer separator)
123+
: bufs_{status_line, headers, separator}
124+
{
125+
}
126+
127+
const const_buffer* begin() const { return bufs_; }
128+
const const_buffer* end() const { return bufs_ + 3; }
129+
};
130+
```
131+
132+
This type is `CopyConstructible` (the default copy copies the array of descriptors, preserving `data()` pointers and sizes). Its `begin()`/`end()` return pointers, which are random-access iterators (and therefore bidirectional). It satisfies `ConstBufferSequence`.
133+
134+
### Provide ADL Overloads
135+
136+
For types where `begin()`/`end()` members are not appropriate, provide free function overloads of `buffer_sequence_begin` and `buffer_sequence_end` in the same namespace as the type:
137+
138+
```cpp
139+
namespace app {
140+
141+
class composite_buffers
142+
{
143+
const_buffer bufs_[2];
144+
145+
public:
146+
composite_buffers(const_buffer head, const_buffer body)
147+
: bufs_{head, body}
148+
{
149+
}
150+
151+
friend const const_buffer*
152+
buffer_sequence_begin(const composite_buffers& b)
153+
{
154+
return b.bufs_;
155+
}
156+
157+
friend const const_buffer*
158+
buffer_sequence_end(const composite_buffers& b)
159+
{
160+
return b.bufs_ + 2;
161+
}
162+
};
163+
164+
} // namespace app
165+
```
166+
167+
ADL finds the friend functions when Asio calls `buffer_sequence_begin(x)` with an `app::composite_buffers` argument.
168+
169+
### A More Interesting Example
170+
171+
The real power of user-defined buffer sequences is lazy composition. Consider a type that concatenates two buffer sequences without allocating:
172+
173+
```cpp
174+
template<class BS1, class BS2>
175+
class buffers_cat
176+
{
177+
BS1 bs1_;
178+
BS2 bs2_;
179+
180+
public:
181+
class const_iterator
182+
{
183+
// Bidirectional iterator that walks bs1_ first, then bs2_.
184+
// When it reaches the end of bs1_, it transitions to
185+
// the beginning of bs2_. Decrementing from the beginning
186+
// of bs2_ transitions back to the end of bs1_.
187+
// ...
188+
};
189+
190+
buffers_cat(BS1 bs1, BS2 bs2)
191+
: bs1_(std::move(bs1))
192+
, bs2_(std::move(bs2))
193+
{
194+
}
195+
196+
const_iterator begin() const;
197+
const_iterator end() const;
198+
};
199+
```
200+
201+
Iterating this type yields all buffers from the first sequence followed by all buffers from the second. No allocation occurs - the composed sequence is a view over the two sub-sequences. The resulting type satisfies `ConstBufferSequence` (assuming both sub-sequences do), and it can be passed directly to `async_write`.
202+
203+
This is the composition that concrete types like `span<span<byte>>` cannot provide without allocation.
204+
205+
## Ownership and Lifetime
206+
207+
Buffer sequences have a two-layer ownership model. The buffer sequence object (the descriptor) and the underlying memory (the bytes it points at) follow separate rules.
208+
209+
### The Implementation Copies the Sequence
210+
211+
When an asynchronous read or write operation is initiated, the implementation stores a copy of the buffer sequence inside its composed operation state. The Asio specification states:
212+
213+
> If a read or write operation is also an asynchronous operation, the operation shall maintain one or more copies of the buffer sequence until such time as the operation no longer requires access to the memory specified by the buffers in the sequence.
214+
215+
This is why `CopyConstructible` is a requirement. It is not an abstract nicety - the implementation literally copies the buffer sequence object into its internal state so it can re-use it across the multiple `async_read_some` or `async_write_some` calls that compose the full operation.
216+
217+
### The Caller Owns the Memory
218+
219+
The implementation copies the buffer sequence object, but it never copies the underlying bytes. The Asio documentation for `async_read` and `async_write` states:
220+
221+
> Although the buffers object may be copied as necessary, ownership of the underlying memory blocks is retained by the caller, which must guarantee that they remain valid until the completion handler is called.
222+
223+
More precisely, the memory must remain valid until:
224+
225+
- the last copy of the buffer sequence is destroyed, or
226+
- the completion handler is invoked,
227+
228+
whichever comes first.
229+
230+
### What This Means in Practice
231+
232+
The buffer sequence is a view. It describes memory it does not own. The implementation copies the view. The caller owns the memory the view points at.
233+
234+
A common mistake: passing a buffer that references a local variable to an asynchronous operation, then returning from the function before the operation completes. The local variable is destroyed, the buffer's `data()` pointer dangles, and the operation reads or writes garbage.
235+
236+
```cpp
237+
void bad_example(tcp::socket& sock)
238+
{
239+
char buf[1024];
240+
// buf is on the stack - it will be destroyed when
241+
// this function returns, but the async operation
242+
// has not completed yet
243+
async_read(sock, mutable_buffer(buf, sizeof(buf)),
244+
[](error_code ec, std::size_t n) { /* ... */ });
245+
}
246+
```
247+
248+
The buffer sequence (a single `mutable_buffer`) is copied into the async operation's state - that copy is fine. But the memory at `buf` ceases to exist when `bad_example` returns. The operation proceeds to write into a destroyed stack frame.
249+
250+
The fix is to ensure the memory outlives the operation - allocate on the heap, use a member variable, or tie the buffer's lifetime to the completion handler via a shared pointer or similar mechanism.

0 commit comments

Comments
 (0)