Replace snprintf with a branch-free writer for \uXXXX escapes#5235
Open
nlohmann wants to merge 2 commits into
Open
Replace snprintf with a branch-free writer for \uXXXX escapes#5235nlohmann wants to merge 2 commits into
nlohmann wants to merge 2 commits into
Conversation
dump_escaped called std::snprintf(..., "\u%04x", ...) once per escaped code point in the string serialization hot path. snprintf re-parses the format string and pulls in locale/printf machinery on every call, which is far heavier than the fixed 6-/12-byte output warrants. This is hot for any string containing control characters, and for all non-ASCII text when ensure_ascii is set. Replace it with write_u_escape, a small helper that writes the escape directly into string_buffer via a nibble-to-hex lookup table, mirroring the existing hand-rolled dump_integer fast path in the same file. Signed-off-by: Niels Lohmann <mail@nlohmann.me>
Use a const char* rather than a char[] lookup table, matching the existing hex_bytes helper in the same file. Signed-off-by: Niels Lohmann <mail@nlohmann.me>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
serializer::dump_escapedcalledstd::snprintf(..., "\u%04x", ...)once per escaped code point (both for the single-BMP-codepoint case and the surrogate-pair case).snprintfre-parses the format string and drags in locale/printfmachinery on every call, which is much heavier than the fixed 6-/12-byte output requires. This is hot for any string with control characters, and for all non-ASCII text whenensure_asciiis set (e.g. serializing CJK/emoji-heavy data).write_u_escape, a small branch-free helper that writes\uXXXXdirectly intostring_buffervia anibble_to_hexlookup table, operating on a buffer position so it can assert sufficient headroom internally. Mirrors the already-hand-rolleddump_integerfast path in the same file.JSON_ASSERTinsidewrite_u_escapeto guard the "must have at least 6 bytes of headroom" precondition, since the surrounding code already tracks/guarantees this.Test plan
"string escape with ensure_ascii"section intests/src/unit-convenience.cppcovering: BMP escapes across all nibble positions (U+0080, U+00FF, U+07FF, U+4F60, U+ABCD, U+FFFD), the surrogate-pair path for astral code points (U+10000, U+1F600, U+10FFFF), and verbatim pass-through whenensure_asciiis disabled.snprintf-based implementation across all escapable BMP code points plus astral cases.single_include/nlohmann/json.hppwith the pinned astyle version.include/andsingle_include/; all assertions pass.🤖 Generated with Claude Code