Skip to content

Replace snprintf with a branch-free writer for \uXXXX escapes#5235

Open
nlohmann wants to merge 2 commits into
developfrom
claude/recursing-goodall-e5e9eb
Open

Replace snprintf with a branch-free writer for \uXXXX escapes#5235
nlohmann wants to merge 2 commits into
developfrom
claude/recursing-goodall-e5e9eb

Conversation

@nlohmann

@nlohmann nlohmann commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Summary

  • serializer::dump_escaped called std::snprintf(..., "\u%04x", ...) once per escaped code point (both for the single-BMP-codepoint case and the surrogate-pair case). snprintf re-parses the format string and drags in locale/printf machinery on every call, which is much heavier than the fixed 6-/12-byte output requires. This is hot for any string with control characters, and for all non-ASCII text when ensure_ascii is set (e.g. serializing CJK/emoji-heavy data).
  • Replaced it with write_u_escape, a small branch-free helper that writes \uXXXX directly into string_buffer via a nibble_to_hex lookup table, operating on a buffer position so it can assert sufficient headroom internally. Mirrors the already-hand-rolled dump_integer fast path in the same file.
  • Added a JSON_ASSERT inside write_u_escape to guard the "must have at least 6 bytes of headroom" precondition, since the surrounding code already tracks/guarantees this.

Test plan

  • Added a new "string escape with ensure_ascii" section in tests/src/unit-convenience.cpp covering: BMP escapes across all nibble positions (U+0080, U+00FF, U+07FF, U+4F60, U+ABCD, U+FFFD), the surrogate-pair path for astral code points (U+10000, U+1F600, U+10FFFF), and verbatim pass-through when ensure_ascii is disabled.
  • Verified output is byte-identical to the old snprintf-based implementation across all escapable BMP code points plus astral cases.
  • Re-amalgamated single_include/nlohmann/json.hpp with the pinned astyle version.
  • Ran the updated unit test standalone against both include/ and single_include/; all assertions pass.

🤖 Generated with Claude Code

dump_escaped called std::snprintf(..., "\u%04x", ...) once per escaped
code point in the string serialization hot path. snprintf re-parses
the format string and pulls in locale/printf machinery on every call,
which is far heavier than the fixed 6-/12-byte output warrants. This
is hot for any string containing control characters, and for all
non-ASCII text when ensure_ascii is set.

Replace it with write_u_escape, a small helper that writes the escape
directly into string_buffer via a nibble-to-hex lookup table, mirroring
the existing hand-rolled dump_integer fast path in the same file.

Signed-off-by: Niels Lohmann <mail@nlohmann.me>
Use a const char* rather than a char[] lookup table, matching the
existing hex_bytes helper in the same file.

Signed-off-by: Niels Lohmann <mail@nlohmann.me>
@nlohmann nlohmann added the review needed It would be great if someone could review the proposed changes. label Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L review needed It would be great if someone could review the proposed changes. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant