Fix problem with SetConsoleOutputCP CP_UTF8 on Windows#896
Fix problem with SetConsoleOutputCP CP_UTF8 on Windows#896madah81pnz1 wants to merge 3 commits intoxiph:masterfrom
Conversation
Also splits up the WriteConsoleW() calls if the buffer is too big, as this was at least a problem in Windows 7 and earlier.
|
Note this needs more testing, especially piping binary from stdin. Also might need another test binary to be able to test windows console output properly, e.g. by calling ReadConsoleW() to check what was actually written. |
Reading from stdin turns into mojibake, e.g. --import-tags-from-file=- |
|
This seems to work now, I will clean up the code and push more commits, but it might take some more time. The behavior is now changed for both stdout and stdin:
All calls to utf8_encode() and utf8_decode() have been removed when #ifdef _WIN32 due to this, since the conversion needs to happen as early as possible, not when handling the tags. --no-utf8-convert only affects pipe redirects from stdin, not stdout. This is because there is no way to pass down a "raw" flag to vfprintf_utf8 and wprint_console. It could be added with some additional refactoring, but I'm not sure if it is worth it. fgets() has also been replaced with new functions that handles buffering and any newline combination of \r \n \r\n. The reasoning behind this is that it lets flac import files that were created on other platforms, as otherwise the \r would be part of the tag on non-Windows platforms, and only Windows would handle \r and \r\n as the same. The pipe redirect is a bit troublesome though. Seems like an unfixable problem, unless you use chcp 65001 (CP_UTF8) by default for the console. But considering also that if you run flac/metaflac with Python, e.g. subprocess.run() and capture_output=True. This is now a redirect and thus everything will be converted to the console output codepage. It is not so straight-forward how you could set the console codepage to 65001/CP_UTF8 from Python, but this is true for any console-mode application on Windows. One way to handle this could be to print a warning on stderr if the current text can't be losslessly decoded/encoded via the console codepage. This could also be done for corrupted UTF-8 when converted into UTF-16 for the console. Testing all this is not so straightfoward though. I've used a Python script for the pipe and disk parts, and a C++ program that calls ReadConsoleOutputW() directly for the console output. |
The behavior is now changed for both stdout and stdin: * Console (FILE_TYPE_CHAR) uses ReadConsoleW()/WriteConsoleW() with UTF-16, doesn't use the codepage. * Redirect to/from pipe (FILE_TYPE_PIPE) uses the codepage from GetConsoleCP()/GetConsoleOutputCP(). * Redirect to/from disk (FILE_TYPE_DISK) uses UTF-8 only, ignores the codepage. * Command-line arguments uses UTF-16, so also not need for codepage conversion. All calls to utf8_encode() and utf8_decode() have been removed when #ifdef _WIN32 due to this, since the conversion needs to happen as early as possible, not when handling the tags. --no-utf8-convert only affects pipe redirects. fgets has been replaced with new functions that handles buffering and any newline combination of \r \n \r\n
No description provided.