Skip to content

Commit 29a8eb2

Browse files
etrclaude
andcommitted
Merge TASK-072: arena-allocated unescape on the warm path
Routes GET-argument unescape into the per-connection arena so the warm-path zero-allocation criterion holds even when a user unescaper is registered. Adds shared unescape_helpers.hpp, an arena-backed sink in build_request_args, a per-thread scratch buffer for the ABI-locked callback path, unit tests pinning zero global operator new calls, and bench_warm_path variants (5)/(6). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 parents dc24b7c + 142a74f commit 29a8eb2

11 files changed

Lines changed: 837 additions & 78 deletions

File tree

specs/architecture/04-components/http-request.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
**Key design notes:**
2121
- The arena allocator is plumbed through `webserver_impl` → connection state → `http_request` constructor. The user does not see it; it is an internal optimization.
22+
- GET-argument population (`http_request_impl::build_request_args`) writes the unescaped value directly into the per-connection arena (TASK-072). The standard `%HH` / `+`→space transformation runs in-place on the arena-backed `pmr::string`; when a user-registered unescaper is set, a per-thread reusable `std::string` scratch buffer adapts the ABI-locked `void(std::string&)` callback signature without a per-request global-heap allocation. Pinned by `test/unit/http_request_unescape_arena_test.cpp` (zero global `operator new` calls during the warm cycle) and exercised by `bench_warm_path` variants (5) and (6).
2223
- Containers returned by `get_*()` reference impl-owned storage; the request must outlive any view derived from it. Documented as a lifetime contract.
2324
- `gnutls_session_t` (raw GnuTLS handle) is not exposed publicly. Users wanting custom TLS introspection use the high-level `get_client_cert_*` accessors. The handle remains accessible via friend access from internal code.
2425

specs/tasks/M7-v2-cleanup/TASK-072.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@
88
`src/detail/http_request_impl.cpp:202-206` is an acknowledged TASK-018 deferral: the warm-path "0-alloc" criterion explicitly does not apply when a user unescaper is registered, because the unescape call still allocates a `std::string`. Route the unescape output into the per-connection arena (TASK-016) so the criterion holds unconditionally.
99

1010
**Action Items:**
11-
- [ ] Extend the per-connection arena (from TASK-016) with an `unescape_into(string_view in, void* sink) -> string_view` helper, or have the arena expose a scratch-buffer API the unescape can write into.
12-
- [ ] Refactor the unescape call site at `http_request_impl.cpp:202-206` to allocate into the arena. The returned `string_view` is valid for the lifetime of the request, matching the TASK-018 lifetime documentation.
13-
- [ ] Update the inline TASK-018 deferral comment to reflect that arena unescape is now wired.
14-
- [ ] Extend `test/bench_warm_path.cpp` (TASK-058) with an unescape variant: warm GET with `%2F` in the path. Pin "no allocations under heap profiler" as in TASK-058.
11+
- [x] Extend the per-connection arena (from TASK-016) with an `unescape_into(string_view in, void* sink) -> string_view` helper, or have the arena expose a scratch-buffer API the unescape can write into.
12+
- [x] Refactor the unescape call site at `http_request_impl.cpp:202-206` to allocate into the arena. The returned `string_view` is valid for the lifetime of the request, matching the TASK-018 lifetime documentation.
13+
- [x] Update the inline TASK-018 deferral comment to reflect that arena unescape is now wired.
14+
- [x] Extend `test/bench_warm_path.cpp` (TASK-058) with an unescape variant: warm GET with `%2F` in the path. Pin "no allocations under heap profiler" as in TASK-058.
1515

1616
**Dependencies:**
1717
- Blocked by: TASK-016 (arena, Done), TASK-018 (string_view getters, Done), TASK-058 (warm-path bench infra, Done)
@@ -27,4 +27,4 @@
2727
**Related Requirements:** PRD-REQ-REQ-001 (const-correctness + zero-alloc)
2828
**Related Decisions:** DR-003b (arena)
2929

30-
**Status:** Backlog
30+
**Status:** Done

specs/tasks/M7-v2-cleanup/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ TASK-093).
3636
| TASK-069 | Remove transitional two-arg `http_request_impl` constructor | MED | S | Done |
3737
| TASK-070 | Migrate `hook_table_` to `std::atomic<std::shared_ptr<T>>` | MED | M | Backlog |
3838
| TASK-071 | Wire `install_not_found_alias_` stub and remove dead `lambda_handler` arm | MED | S | Done |
39-
| TASK-072 | Arena-allocated unescape on the warm path | MED | M | Backlog |
39+
| TASK-072 | Arena-allocated unescape on the warm path | MED | M | Done |
4040
| TASK-073 | Revisit libmicrohttpd v0.99 unescape workaround | LOW | S | Backlog |
4141
| TASK-074 | Gate DEBUG raw-body printing behind explicit opt-in | LOW (sec) | S | Backlog |
4242
| TASK-075 | Propagate `wait_for_server_ready` to sibling hooks integ tests | HIGH | M | Backlog |

specs/unworked_review_issues/2026-06-08_112046_task-072.md

Lines changed: 152 additions & 0 deletions
Large diffs are not rendered by default.

src/Makefile.am

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ libhttpserver_la_SOURCES = string_utilities.cpp webserver.cpp http_utils.cpp fil
2929
# noinst_HEADERS: shipped in the tarball but NEVER installed under $prefix/include.
3030
# Detail headers (httpserver/detail/*.hpp) live here so they cannot leak to
3131
# downstream consumers — the public surface comes in through <httpserver.hpp>.
32-
noinst_HEADERS = httpserver/string_utilities.hpp httpserver/detail/modded_request.hpp httpserver/detail/http_endpoint.hpp httpserver/detail/body.hpp httpserver/detail/webserver_impl.hpp httpserver/detail/webserver_impl_dispatch.hpp httpserver/detail/connection_state.hpp httpserver/detail/secure_zero.hpp httpserver/detail/http_request_impl.hpp httpserver/detail/resource_hook_table.hpp httpserver/detail/route_entry.hpp httpserver/detail/lambda_resource.hpp httpserver/detail/radix_tree.hpp httpserver/detail/route_cache.hpp httpserver/detail/route_tier.hpp gettext.h
32+
noinst_HEADERS = httpserver/string_utilities.hpp httpserver/detail/modded_request.hpp httpserver/detail/http_endpoint.hpp httpserver/detail/body.hpp httpserver/detail/webserver_impl.hpp httpserver/detail/webserver_impl_dispatch.hpp httpserver/detail/connection_state.hpp httpserver/detail/secure_zero.hpp httpserver/detail/http_request_impl.hpp httpserver/detail/resource_hook_table.hpp httpserver/detail/route_entry.hpp httpserver/detail/lambda_resource.hpp httpserver/detail/radix_tree.hpp httpserver/detail/route_cache.hpp httpserver/detail/route_tier.hpp httpserver/detail/unescape_helpers.hpp gettext.h
3333
nobase_include_HEADERS = httpserver.hpp httpserver/body_kind.hpp httpserver/cookie.hpp httpserver/constants.hpp httpserver/create_webserver.hpp httpserver/create_test_request.hpp httpserver/webserver.hpp httpserver/webserver_routes.hpp httpserver/webserver_websocket.hpp httpserver/webserver_hooks.hpp httpserver/websocket_handler.hpp httpserver/http_utils.hpp httpserver/ip_representation.hpp httpserver/file_info.hpp httpserver/http_request.hpp httpserver/http_request_auth.hpp httpserver/http_response.hpp httpserver/http_resource.hpp httpserver/feature_unavailable.hpp httpserver/iovec_entry.hpp httpserver/http_arg_value.hpp httpserver/http_method.hpp httpserver/hook_phase.hpp httpserver/hook_action.hpp httpserver/hook_handle.hpp httpserver/hook_context.hpp
3434

3535
AM_CXXFLAGS += -fPIC -Wall

src/detail/http_request_impl.cpp

Lines changed: 64 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939

4040
#include "httpserver/detail/connection_state.hpp"
4141
#include "httpserver/detail/http_request_impl.hpp"
42+
#include "httpserver/detail/unescape_helpers.hpp"
4243
#include "httpserver/http_utils.hpp"
4344

4445
namespace httpserver {
@@ -155,6 +156,59 @@ const http::header_view_map& http_request_impl::ensure_headerlike_cache(MHD_Valu
155156
return *cache;
156157
}
157158

159+
namespace {
160+
161+
// TASK-072: arena-routed unescape. The caller passes an arena-backed
162+
// pmr::string already holding the raw bytes; we run the unescape
163+
// transformation directly on the pmr::string's storage so no
164+
// global-heap allocation occurs on the warm path.
165+
//
166+
// Default path: calls httpserver::detail::unescape_buf_raw (shared
167+
// helper from unescape_helpers.hpp -- same algorithm as the
168+
// http_unescape wrapper in http_utils.cpp, one truth-source).
169+
//
170+
// User-callback path: the public callback signature is
171+
// `void(std::string&)` (ABI-locked), so we route through a
172+
// thread_local std::string (thread_value) whose capacity amortises
173+
// across all requests on the same worker thread.
174+
//
175+
// Amortisation contract: zero global-heap allocations on the warm
176+
// path per thread, *after* thread_value has grown to the peak value
177+
// length seen on that thread. The first call on a new thread, or the
178+
// first call that sees a value longer than any previous one, triggers
179+
// exactly one global-heap allocation to grow thread_value's buffer;
180+
// all subsequent calls at or below that length are allocation-free.
181+
// (performance-reviewer-iter1-2)
182+
//
183+
// Arena-waste note: when the user callback grows the value
184+
// (thread_value.size() > original value.size()), value's original
185+
// arena allocation is abandoned in the monotonic arena until
186+
// reset_arena(). This is an accepted trade-off of the ABI-locked
187+
// void(std::string&) signature; the arena is reset per request so
188+
// the waste is bounded by the request's total value bytes.
189+
// (performance-reviewer-iter1-3)
190+
inline void unescape_in_arena(std::pmr::string& value,
191+
unescaper_ptr user_fn) {
192+
if (value.empty()) return;
193+
if (user_fn == nullptr) {
194+
// Default %HH / '+' decode: run in-place on the arena
195+
// buffer, then truncate to the new size.
196+
const std::size_t new_size = httpserver::detail::unescape_buf_raw(
197+
value.data(), value.size());
198+
value.resize(new_size);
199+
return;
200+
}
201+
// User-callback path: route through a per-thread reusable
202+
// std::string scratch buffer so the warm-path cost is zero
203+
// global-heap allocations after the first call on this thread.
204+
thread_local std::string thread_value;
205+
thread_value.assign(value.data(), value.size());
206+
user_fn(thread_value);
207+
value.assign(thread_value.data(), thread_value.size());
208+
}
209+
210+
} // namespace
211+
158212
MHD_Result http_request_impl::build_request_args(void* cls, MHD_ValueKind kind,
159213
const char* key, const char* arg_value) {
160214
// Parameters needed to respect MHD interface, but not used in the implementation.
@@ -190,28 +244,10 @@ MHD_Result http_request_impl::build_request_args(void* cls, MHD_ValueKind kind,
190244
}
191245
aa->accumulated_bytes += this_pair_bytes;
192246

193-
// Unescape into a temporary std::string (the C-style unescaper is
194-
// string-typed). This causes one global-heap allocation (the temp) even
195-
// on the warm arena path, followed by a second copy from the temp into
196-
// the arena-backed pmr::string via emplace_back below. The warm-path
197-
// zero-upstream-allocs guarantee therefore does NOT hold for requests
198-
// with GET arguments when an unescaper is active.
199-
//
200-
// Tracked as TASK-018: route the unescape output directly into a
201-
// pmr::string using the arena allocator, eliminating both the temp and
202-
// the second copy. Until then, the acceptance criterion '0 bytes
203-
// global-heap allocation from impl construction on warm connection'
204-
// applies to construction and arena-backed containers only -- not to
205-
// argument population when an unescaper is registered.
206-
// (performance-reviewer-iter1-1: acknowledged deferral)
207-
std::string value(val_sv);
208-
http::base_unescaper(&value, aa->unescaper);
209-
210-
// Reuse the iterator from the hoisted find above: insert if missing,
211-
// then append the (possibly unescaped) value. The key is stored as
212-
// pmr::string in the map's allocator domain; the value vector is
213-
// allocator-constructed in place via the same allocator (scoped
214-
// propagation gives nested pmr::strings the right allocator too).
247+
// Arena-routed unescape: see unescape_in_arena() above for path details.
248+
// The returned pmr::string's data is owned by the arena and lives until
249+
// connection_state::reset_arena() runs (request completion), matching
250+
// the TASK-018 string_view lifetime contract.
215251
auto pmr_alloc = args.get_allocator();
216252
auto it = existing_it;
217253
if (it == args.end()) {
@@ -221,11 +257,12 @@ MHD_Result http_request_impl::build_request_args(void* cls, MHD_ValueKind kind,
221257
std::move(empty));
222258
it = inserted.first;
223259
}
224-
// emplace_back into a pmr::vector<pmr::string>: use (ptr, size); the
225-
// outer vector's allocator-propagating construct wires the inner
226-
// pmr::string's allocator automatically. Passing the allocator
227-
// ourselves leads to double-injection via uses-allocator construction.
228-
it->second.emplace_back(value.data(), value.size());
260+
261+
// Allocate the destination pmr::string in the arena domain and
262+
// copy the raw input bytes into it. The unescape transformation
263+
// runs in-place on the arena-backed buffer.
264+
auto& arg_value_ref = it->second.emplace_back(val_sv.data(), val_sv.size());
265+
unescape_in_arena(arg_value_ref, aa->unescaper);
229266
return MHD_YES;
230267
}
231268

src/http_utils.cpp

Lines changed: 11 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
#include <vector>
5858

5959
#include "httpserver/constants.hpp"
60+
#include "httpserver/detail/unescape_helpers.hpp"
6061
#include "httpserver/string_utilities.hpp"
6162

6263
#if defined (__CYGWIN__)
@@ -302,49 +303,18 @@ std::string http_utils::sanitize_upload_filename(const std::string& filename) {
302303
// src/detail/ip_representation.cpp to keep this TU under the project
303304
// per-file LOC ceiling. See FILE_LOC_MAX in scripts/check-file-size.sh.
304305

305-
static inline int hex_digit_value(char c) {
306-
if (c >= '0' && c <= '9') return c - '0';
307-
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
308-
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
309-
return -1;
310-
}
311-
306+
// TASK-072 (code-simplifier-iter1-1/2): hex_digit_value and the core
307+
// unescape loop have been promoted to the shared internal header
308+
// src/httpserver/detail/unescape_helpers.hpp so that this TU and
309+
// src/detail/http_request_impl.cpp share one truth-source.
310+
// http_unescape is now a thin wrapper around unescape_buf_raw.
312311
size_t http_unescape(std::string* val) {
313312
if (val->empty()) return 0;
314-
315-
unsigned int rpos = 0;
316-
unsigned int wpos = 0;
317-
318-
unsigned int size = val->size();
319-
320-
while (rpos < size && (*val)[rpos] != '\0') {
321-
switch ((*val)[rpos]) {
322-
case '+':
323-
(*val)[wpos] = ' ';
324-
wpos++;
325-
rpos++;
326-
break;
327-
case '%':
328-
if (size > rpos + 2) {
329-
int hi = hex_digit_value((*val)[rpos + 1]);
330-
int lo = hex_digit_value((*val)[rpos + 2]);
331-
if (hi >= 0 && lo >= 0) {
332-
(*val)[wpos] = static_cast<unsigned char>((hi << 4) | lo);
333-
wpos++;
334-
rpos += 3;
335-
break;
336-
}
337-
}
338-
// intentional fall through!
339-
default:
340-
(*val)[wpos] = (*val)[rpos];
341-
wpos++;
342-
rpos++;
343-
}
344-
}
345-
(*val)[wpos] = '\0'; // add 0-terminator
346-
val->resize(wpos);
347-
return wpos; // = strlen(val)
313+
const std::size_t new_size =
314+
httpserver::detail::unescape_buf_raw(val->data(), val->size());
315+
(*val)[new_size] = '\0'; // add 0-terminator (std::string owns the byte)
316+
val->resize(new_size);
317+
return new_size; // = strlen(val)
348318
}
349319

350320
const std::string load_file(const std::string& filename) {
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
/*
2+
This file is part of libhttpserver
3+
Copyright (C) 2011-2026 Sebastiano Merlino
4+
5+
This library is free software; you can redistribute it and/or
6+
modify it under the terms of the GNU Lesser General Public
7+
License as published by the Free Software Foundation; either
8+
version 2.1 of the License, or (at your option) any later version.
9+
10+
This library is distributed in the hope that it will be useful,
11+
but WITHOUT ANY WARRANTY; without even the implied warranty of
12+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13+
Lesser General Public License for more details.
14+
15+
You should have received a copy of the GNU Lesser General Public
16+
License along with this library; if not, write to the Free Software
17+
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
18+
USA
19+
*/
20+
21+
// Internal-only helpers for URL percent-decoding.
22+
//
23+
// Shared between src/http_utils.cpp (http_unescape) and
24+
// src/detail/http_request_impl.cpp (arena-routed unescape).
25+
// Not part of the public API; not installed.
26+
//
27+
// TASK-072: extracted from the two translation units that previously
28+
// each maintained a private copy. Having one source of truth ensures
29+
// that bug-fixes and RFC-3986 edge-case corrections propagate uniformly.
30+
// (code-simplifier-iter1-1, code-simplifier-iter1-2)
31+
32+
#pragma once
33+
34+
#include <cstddef>
35+
36+
namespace httpserver {
37+
namespace detail {
38+
39+
// Map a single hex character to its integer value [0, 15].
40+
// Returns -1 for any character that is not a valid hex digit.
41+
//
42+
// constexpr so callers in constant-expression contexts can fold the
43+
// table at compile time; trivially inlined at -O1 and above.
44+
[[nodiscard]] constexpr int hex_digit_value(char c) noexcept {
45+
if (c >= '0' && c <= '9') return c - '0';
46+
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
47+
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
48+
return -1;
49+
}
50+
51+
// Percent-decode and '+'-to-space unescape a raw byte buffer in-place.
52+
//
53+
// Reads from [data, data+size), writes the decoded bytes back into the
54+
// same buffer starting at data[0], and returns the new (decoded) length.
55+
// The output length is always <= the input length because %HH sequences
56+
// (3 bytes) decode to 1 byte and '+' stays 1 byte.
57+
//
58+
// The loop stops at a '\0' byte or when rpos reaches size, whichever
59+
// comes first, so embedded null bytes in the input are treated as
60+
// terminators (matching the behaviour of the original http_unescape).
61+
//
62+
// The caller is responsible for adding any terminating '\0' after the
63+
// returned length if the buffer must be C-string-compatible.
64+
inline std::size_t unescape_buf_raw(char* data, std::size_t size) noexcept {
65+
if (size == 0) return 0;
66+
std::size_t rpos = 0;
67+
std::size_t wpos = 0;
68+
while (rpos < size && data[rpos] != '\0') {
69+
switch (data[rpos]) {
70+
case '+':
71+
data[wpos++] = ' ';
72+
++rpos;
73+
break;
74+
case '%':
75+
if (size > rpos + 2) {
76+
const int hi = hex_digit_value(data[rpos + 1]);
77+
const int lo = hex_digit_value(data[rpos + 2]);
78+
if (hi >= 0 && lo >= 0) {
79+
data[wpos++] =
80+
static_cast<char>((hi << 4) | lo);
81+
rpos += 3;
82+
break;
83+
}
84+
}
85+
// intentional fall through!
86+
default:
87+
data[wpos++] = data[rpos++];
88+
}
89+
}
90+
return wpos;
91+
}
92+
93+
} // namespace detail
94+
} // namespace httpserver

0 commit comments

Comments
 (0)