Bug report
wcsxfrm() produces a sequence of wchar_t that can be compared using wcscmp(). There is no any promise that the resulting string can be interpreted as text in any way, all that you can do with it is to compare with other result of wcsxfrm() wchar_t by wchar_t.
For example, if wchar_t is 32-bit, the result can contain values larger than 0x10FFFF. Python strings can only contain Unicode code points in the range 0 to 0x10FFFF. If wchar_t is 16-bit, surrogate pair should not be interpreted as a single code point with value larger than 0xFFFF -- this breaks order when compare them wchar_t by wchar_t. PyUnicode_FromWideChar() will fail in the former case and produce wrong result in the latter case.
#138242 tries to solve this issue. We need to test on exotic platforms (AIX, Solaris) to check if it helps.
Linked PRs
Bug report
wcsxfrm()produces a sequence ofwchar_tthat can be compared usingwcscmp(). There is no any promise that the resulting string can be interpreted as text in any way, all that you can do with it is to compare with other result ofwcsxfrm()wchar_tbywchar_t.For example, if
wchar_tis 32-bit, the result can contain values larger than 0x10FFFF. Python strings can only contain Unicode code points in the range 0 to 0x10FFFF. Ifwchar_tis 16-bit, surrogate pair should not be interpreted as a single code point with value larger than 0xFFFF -- this breaks order when compare themwchar_tbywchar_t.PyUnicode_FromWideChar()will fail in the former case and produce wrong result in the latter case.#138242 tries to solve this issue. We need to test on exotic platforms (AIX, Solaris) to check if it helps.
Linked PRs