From 4f13fed6371938a476fe1048c043ce12254c4a5f Mon Sep 17 00:00:00 2001 From: wuyangfan <1102042793@qq.com> Date: Mon, 18 May 2026 00:49:20 +0800 Subject: [PATCH] docs: clarify error_handler_t::ignore UTF-8 behavior Document that ignore skips invalid sequences during dump rather than copying every byte unchanged (#4552). --- README.md | 4 ++-- docs/mkdocs/docs/api/basic_json/dump.md | 5 +++-- docs/mkdocs/docs/api/basic_json/error_handler_t.md | 4 +++- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index eab5fe8476..14bd5ab2e6 100644 --- a/README.md +++ b/README.md @@ -363,7 +363,7 @@ std::cout << j_string << " == " << serialized_string << std::endl; [`.dump()`](https://json.nlohmann.me/api/basic_json/dump/) returns the originally stored string value. -Note the library only supports UTF-8. When you store strings with different encodings in the library, calling [`dump()`](https://json.nlohmann.me/api/basic_json/dump/) may throw an exception unless `json::error_handler_t::replace` or `json::error_handler_t::ignore` are used as error handlers. +Note the library only supports UTF-8. When you store strings with different encodings in the library, calling [`dump()`](https://json.nlohmann.me/api/basic_json/dump/) may throw an exception unless `json::error_handler_t::replace` or `json::error_handler_t::ignore` are used as error handlers. With `ignore`, invalid UTF-8 sequences are skipped during serialization (not copied byte-for-byte to the output). #### To/from streams (e.g., files, string streams) @@ -1831,7 +1831,7 @@ The library supports **Unicode input** as follows: - [Unicode noncharacters](https://www.unicode.org/faq/private_use.html#nonchar1) will not be replaced by the library. - Invalid surrogates (e.g., incomplete pairs such as `\uDEAD`) will yield parse errors. - The strings stored in the library are UTF-8 encoded. When using the default string type (`std::string`), note that its length/size functions return the number of stored bytes rather than the number of characters or glyphs. -- When you store strings with different encodings in the library, calling [`dump()`](https://json.nlohmann.me/api/basic_json/dump/) may throw an exception unless `json::error_handler_t::replace` or `json::error_handler_t::ignore` are used as error handlers. +- When you store strings with different encodings in the library, calling [`dump()`](https://json.nlohmann.me/api/basic_json/dump/) may throw an exception unless `json::error_handler_t::replace` or `json::error_handler_t::ignore` are used as error handlers. With `ignore`, invalid UTF-8 sequences are skipped during serialization (not copied byte-for-byte to the output). - To store wide strings (e.g., `std::wstring`), you need to convert them to a UTF-8 encoded `std::string` before, see [an example](https://json.nlohmann.me/home/faq/#wide-string-handling). ### Comments in JSON diff --git a/docs/mkdocs/docs/api/basic_json/dump.md b/docs/mkdocs/docs/api/basic_json/dump.md index 21e22a48ce..7ff2beb35a 100644 --- a/docs/mkdocs/docs/api/basic_json/dump.md +++ b/docs/mkdocs/docs/api/basic_json/dump.md @@ -27,8 +27,9 @@ and `ensure_ascii` parameters. `error_handler` (in) : how to react on decoding errors; there are three possible values (see [`error_handler_t`](error_handler_t.md): `strict` (throws and exception in case a decoding error occurs; default), `replace` (replace invalid UTF-8 sequences - with U+FFFD), and `ignore` (ignore invalid UTF-8 sequences during serialization; all bytes are copied to the output - unchanged)). + with U+FFFD), and `ignore` (skip invalid UTF-8 sequences during serialization rather than throwing; invalid bytes are + not written to the output — see [`error_handler_t`](error_handler_t.md) and + [#4552](https://github.com/nlohmann/json/issues/4552)). ## Return value diff --git a/docs/mkdocs/docs/api/basic_json/error_handler_t.md b/docs/mkdocs/docs/api/basic_json/error_handler_t.md index dc32ced9b9..e94fd30628 100644 --- a/docs/mkdocs/docs/api/basic_json/error_handler_t.md +++ b/docs/mkdocs/docs/api/basic_json/error_handler_t.md @@ -18,7 +18,9 @@ replace : replace invalid UTF-8 sequences with U+FFFD (� REPLACEMENT CHARACTER) ignore -: ignore invalid UTF-8 sequences; all bytes are copied to the output unchanged +: skip invalid UTF-8 sequences during serialization (they do not appear in the output). This + differs from copying every stored byte unchanged; see [#4552](https://github.com/nlohmann/json/issues/4552). + A byte-preserving mode is discussed in [#4555](https://github.com/nlohmann/json/pull/4555). ## Examples